Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices
On 2/13/2018 11:30 AM, Or Gerlitz wrote: On Tue, Feb 6, 2018 at 11:45 AM, Tal Gilboa wrote: On 2/6/2018 11:34 AM, Sagi Grimberg wrote: Hi Tal, I think Tal has idea/s on how the existing library can be changed to support more modes/models What I was thinking is allowing DIM algorithm to disregard data which is 0. Currently if bytes == 0 we return "SAME" immediately. We can change it to simply move to the packets check (which may be renamed to "completions"). This way you could use DIM while only optimizing to (P1) high packet rate and (P2) low interrupt rate. That was exactly where I started from. But unfortunately it did not work well :( From my experiments, the moderation was all over the place failing to converge. At least the workloads that I've tested with, it was more successful to have a stricter step policy and pulling towards latency if we are consistently catching single completion per event. I'm not an expert here at all, but at this point, based on my attempts so far, I'm not convinced the current net_dim scheme could work. I do believe we can make it work. I see your addition of the cpe part to stats compare. Might not be a bad idea for networking devices. Overall, it seems to me like this would be a private case of the general DIM optimization, since it doesn't need to account for aggregation, for instance, which breaks the "more packets == more data" ratio. Did U2 came to agreement/lead on how to re-use the upstream library for the matter Sagi is pushing for? I don't think so (yet). Sagi, I would like to avoid having 2 "net DIM"s if possible. You mentioned you tried implementing over net DIM lib and it wasn't working well. Can you share this code with me?
Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices
On Tue, Feb 6, 2018 at 11:45 AM, Tal Gilboa wrote: > On 2/6/2018 11:34 AM, Sagi Grimberg wrote: >> >> Hi Tal, >> I think Tal has idea/s on how the existing library can be changed to support more modes/models >>> What I was thinking is allowing DIM algorithm to disregard data which is >>> 0. Currently if bytes == 0 we return "SAME" immediately. We can change it to >>> simply move to the packets check (which may be renamed to "completions"). >>> This way you could use DIM while only optimizing to (P1) high packet rate >>> and (P2) low interrupt rate. >> >> >> That was exactly where I started from. But unfortunately it did not work >> well :( >> >> From my experiments, the moderation was all over the place failing to >> converge. At least the workloads that I've tested with, it was more >> successful to have a stricter step policy and pulling towards latency >> if we are consistently catching single completion per event. >> >> I'm not an expert here at all, but at this point, based on my attempts >> so far, I'm not convinced the current net_dim scheme could work. > > I do believe we can make it work. I see your addition of the cpe part to > stats compare. Might not be a bad idea for networking devices. Overall, it > seems to me like this would be a private case of the general DIM > optimization, since it doesn't need to account for aggregation, for > instance, which breaks the "more packets == more data" ratio. Did U2 came to agreement/lead on how to re-use the upstream library for the matter Sagi is pushing for?
Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices
On Tue, Feb 6, 2018 at 11:25 AM, Sagi Grimberg wrote: > Hey Or, > >> not any more, Andy and Tal made the mlx5 AM code a kernel library >> which is called DIM >> >> f4e5f0e MAINTAINERS: add entry for Dynamic Interrupt Moderation >> 6a8788f bnxt_en: add support for software dynamic interrupt moderation >> 8115b75 net/dim: use struct net_dim_sample as arg to net_dim >> 4c4dbb4 net/mlx5e: Move dynamic interrupt coalescing code to include/linux >> 9a31742 net/mlx5e: Change Mellanox references in DIM code >> b9c872f net/mlx5e: Move generic functions to new file >> f5e7f67 net/mlx5e: Move AM logic enums >> 138968e net/mlx5e: Remove rq references in mlx5e_rx_am >> f58ee09 net/mlx5e: Move interrupt moderation forward declarations >> 98dd1ed net/mlx5e: Move interrupt moderation structs to new file >> >> can you make use of that? cc-ing them in case you have questions/comments > > > Thanks a lot for the pointer, I'm not following net-next regularly. > This seems to be a copy of the mlx5 code. Just to make it clear, the code was moved to a library and not copied... so there's single instance of the code upstream and lets try to find a way for your use case to take advantage of it possibly under some modifications.
Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices
On 2/6/2018 11:34 AM, Sagi Grimberg wrote: Hi Tal, I think Tal has idea/s on how the existing library can be changed to support more modes/models What I was thinking is allowing DIM algorithm to disregard data which is 0. Currently if bytes == 0 we return "SAME" immediately. We can change it to simply move to the packets check (which may be renamed to "completions"). This way you could use DIM while only optimizing to (P1) high packet rate and (P2) low interrupt rate. That was exactly where I started from. But unfortunately it did not work well :( From my experiments, the moderation was all over the place failing to converge. At least the workloads that I've tested with, it was more successful to have a stricter step policy and pulling towards latency if we are consistently catching single completion per event. I'm not an expert here at all, but at this point, based on my attempts so far, I'm not convinced the current net_dim scheme could work. I do believe we can make it work. I see your addition of the cpe part to stats compare. Might not be a bad idea for networking devices. Overall, it seems to me like this would be a private case of the general DIM optimization, since it doesn't need to account for aggregation, for instance, which breaks the "more packets == more data" ratio.
Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices
Hi Tal, I think Tal has idea/s on how the existing library can be changed to support more modes/models What I was thinking is allowing DIM algorithm to disregard data which is 0. Currently if bytes == 0 we return "SAME" immediately. We can change it to simply move to the packets check (which may be renamed to "completions"). This way you could use DIM while only optimizing to (P1) high packet rate and (P2) low interrupt rate. That was exactly where I started from. But unfortunately it did not work well :( From my experiments, the moderation was all over the place failing to converge. At least the workloads that I've tested with, it was more successful to have a stricter step policy and pulling towards latency if we are consistently catching single completion per event. I'm not an expert here at all, but at this point, based on my attempts so far, I'm not convinced the current net_dim scheme could work.
Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices
Hey Or, not any more, Andy and Tal made the mlx5 AM code a kernel library which is called DIM f4e5f0e MAINTAINERS: add entry for Dynamic Interrupt Moderation 6a8788f bnxt_en: add support for software dynamic interrupt moderation 8115b75 net/dim: use struct net_dim_sample as arg to net_dim 4c4dbb4 net/mlx5e: Move dynamic interrupt coalescing code to include/linux 9a31742 net/mlx5e: Change Mellanox references in DIM code b9c872f net/mlx5e: Move generic functions to new file f5e7f67 net/mlx5e: Move AM logic enums 138968e net/mlx5e: Remove rq references in mlx5e_rx_am f58ee09 net/mlx5e: Move interrupt moderation forward declarations 98dd1ed net/mlx5e: Move interrupt moderation structs to new file can you make use of that? cc-ing them in case you have questions/comments Thanks a lot for the pointer, I'm not following net-next regularly. This seems to be a copy of the mlx5 code. So as mentioned in the cover letter, the implementation was inspired from the mlx5 driver as it had some of the abstraction properties I was aiming to achieve. What I didn't mention, was that I started from using the mlx5 implementation as is (slightly modified). Unfortunately in the workloads that I've tested, I was not able to get the am level to converge. So I went in a slightly different direction which gave me better results (although still not perfect and lightly tested). This led me to believe that mlx5 offered strategy might not suite storage I/O workloads (although I do acknowledge that I could be proven wrong here). For example, a design choice I took was that the moderation scheme would be more aggressive towards latency on the expense of throughput optimization since high end storage devices are often expected to meet strict latency requirements. Does that makes sense for network devices? I don't know. Another difference is that unlike net devices, storage I/O transactions usually look like request/response where the data transfer is offloaded by the controller/hba (net devices moderate on raw RX datagrams). Moreover, it might not be trivial to track the bytes/sec from a completion at all (might be embedded somewhere in the consumer context). So overall, at this point I'm a bit skeptical that it will makes sense to commonise the two implementations. I'd like to hear some more input if this is something that the community is interested in first. If this is something people are interested in, we can look if it makes sense to reuse net_dim.h or not.
Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices
On 2/6/2018 10:54 AM, Or Gerlitz wrote: On Tue, Feb 6, 2018 at 12:03 AM, Sagi Grimberg wrote: The main reason why this implementation is different then the common networking devices implementation (and kept separate) is that in my mind at least, network devices are different animals than other I/O devices in the sense that: Oh, I see now that you do refer to the netdev library, I was confused by "In the networking stack, each device driver implements adaptive IRQ moderation on its own" comment. (a) network devices rely heavily on byte count of raw ethernet frames for adaptive moderation while in storage or I/O, the byte count is often a result of a submission/completion transaction and sometimes even known only to the application on top of the infrastructure (like in the rdma case). (b) Performance characteristics and expectations in representative workloads. (c) network devices collect all sort of stats for different functionalities (where adaptive moderation is only one use-case) and I'm not sure at all that a subset of stats could easily migrate to a different context. I think Tal has idea/s on how the existing library can be changed to support more modes/models What I was thinking is allowing DIM algorithm to disregard data which is 0. Currently if bytes == 0 we return "SAME" immediately. We can change it to simply move to the packets check (which may be renamed to "completions"). This way you could use DIM while only optimizing to (P1) high packet rate and (P2) low interrupt rate.
Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices
On Tue, Feb 6, 2018 at 12:03 AM, Sagi Grimberg wrote: > The main reason why this implementation is different then the common > networking devices > implementation (and kept separate) is that in my mind at least, network > devices are different > animals than other I/O devices in the sense that: Oh, I see now that you do refer to the netdev library, I was confused by "In the networking stack, each device driver implements adaptive IRQ moderation on its own" comment. > (a) network devices rely heavily on byte count of raw ethernet frames for > adaptive moderation > while in storage or I/O, the byte count is often a result of a > submission/completion transaction > and sometimes even known only to the application on top of the > infrastructure (like in the > rdma case). > (b) Performance characteristics and expectations in representative workloads. > (c) network devices collect all sort of stats for different functionalities > (where adaptive moderation > is only one use-case) and I'm not sure at all that a subset of stats > could easily migrate to a different > context. I think Tal has idea/s on how the existing library can be changed to support more modes/models
Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices
On Tue, Feb 6, 2018 at 12:03 AM, Sagi Grimberg wrote: [...] > In the networking stack, each device driver implements adaptive IRQ moderation > on its own. The approach here is a bit different, it tries to take the common > denominator, > which is per-queue statistics gathering and workload change identification > (basically decides if the moderation scheme needs to change). not any more, Andy and Tal made the mlx5 AM code a kernel library which is called DIM f4e5f0e MAINTAINERS: add entry for Dynamic Interrupt Moderation 6a8788f bnxt_en: add support for software dynamic interrupt moderation 8115b75 net/dim: use struct net_dim_sample as arg to net_dim 4c4dbb4 net/mlx5e: Move dynamic interrupt coalescing code to include/linux 9a31742 net/mlx5e: Change Mellanox references in DIM code b9c872f net/mlx5e: Move generic functions to new file f5e7f67 net/mlx5e: Move AM logic enums 138968e net/mlx5e: Remove rq references in mlx5e_rx_am f58ee09 net/mlx5e: Move interrupt moderation forward declarations 98dd1ed net/mlx5e: Move interrupt moderation structs to new file can you make use of that? cc-ing them in case you have questions/comments > The library is targeted to multi-queue devices, but should work on single > queue > devices as well, however I'm not sure that these devices will need something > like interrupt moderation.