Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices

2018-02-13 Thread Tal Gilboa

On 2/13/2018 11:30 AM, Or Gerlitz wrote:

On Tue, Feb 6, 2018 at 11:45 AM, Tal Gilboa  wrote:

On 2/6/2018 11:34 AM, Sagi Grimberg wrote:


Hi Tal,


I think Tal has idea/s on how the existing library can be changed to
support more modes/models


What I was thinking is allowing DIM algorithm to disregard data which is
0. Currently if bytes == 0 we return "SAME" immediately. We can change it to
simply move to the packets check (which may be renamed to "completions").
This way you could use DIM while only optimizing to (P1) high packet rate
and (P2) low interrupt rate.



That was exactly where I started from. But unfortunately it did not work
well :(

  From my experiments, the moderation was all over the place failing to
converge. At least the workloads that I've tested with, it was more
successful to have a stricter step policy and pulling towards latency
if we are consistently catching single completion per event.

I'm not an expert here at all, but at this point, based on my attempts
so far, I'm not convinced the current net_dim scheme could work.


I do believe we can make it work. I see your addition of the cpe part to
stats compare. Might not be a bad idea for networking devices. Overall, it
seems to me like this would be a private case of the general DIM
optimization, since it doesn't need to account for aggregation, for
instance, which breaks the "more packets == more data" ratio.


Did U2 came to agreement/lead on how to re-use the upstream library
for the matter Sagi is pushing for?


I don't think so (yet).
Sagi, I would like to avoid having 2 "net DIM"s if possible. You 
mentioned you tried implementing over net DIM lib and it wasn't working 
well. Can you share this code with me?


Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices

2018-02-13 Thread Or Gerlitz
On Tue, Feb 6, 2018 at 11:45 AM, Tal Gilboa  wrote:
> On 2/6/2018 11:34 AM, Sagi Grimberg wrote:
>>
>> Hi Tal,
>>
 I think Tal has idea/s on how the existing library can be changed to
 support more modes/models

>>> What I was thinking is allowing DIM algorithm to disregard data which is
>>> 0. Currently if bytes == 0 we return "SAME" immediately. We can change it to
>>> simply move to the packets check (which may be renamed to "completions").
>>> This way you could use DIM while only optimizing to (P1) high packet rate
>>> and (P2) low interrupt rate.
>>
>>
>> That was exactly where I started from. But unfortunately it did not work
>> well :(
>>
>>  From my experiments, the moderation was all over the place failing to
>> converge. At least the workloads that I've tested with, it was more
>> successful to have a stricter step policy and pulling towards latency
>> if we are consistently catching single completion per event.
>>
>> I'm not an expert here at all, but at this point, based on my attempts
>> so far, I'm not convinced the current net_dim scheme could work.
>
> I do believe we can make it work. I see your addition of the cpe part to
> stats compare. Might not be a bad idea for networking devices. Overall, it
> seems to me like this would be a private case of the general DIM
> optimization, since it doesn't need to account for aggregation, for
> instance, which breaks the "more packets == more data" ratio.

Did U2 came to agreement/lead on how to re-use the upstream library
for the matter Sagi is pushing for?


Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices

2018-02-06 Thread Or Gerlitz
On Tue, Feb 6, 2018 at 11:25 AM, Sagi Grimberg  wrote:
> Hey Or,
>
>> not any more, Andy and Tal made the mlx5 AM code a kernel library
>> which is called DIM
>>
>> f4e5f0e MAINTAINERS: add entry for Dynamic Interrupt Moderation
>> 6a8788f bnxt_en: add support for software dynamic interrupt moderation
>> 8115b75 net/dim: use struct net_dim_sample as arg to net_dim
>> 4c4dbb4 net/mlx5e: Move dynamic interrupt coalescing code to include/linux
>> 9a31742 net/mlx5e: Change Mellanox references in DIM code
>> b9c872f net/mlx5e: Move generic functions to new file
>> f5e7f67 net/mlx5e: Move AM logic enums
>> 138968e net/mlx5e: Remove rq references in mlx5e_rx_am
>> f58ee09 net/mlx5e: Move interrupt moderation forward declarations
>> 98dd1ed net/mlx5e: Move interrupt moderation structs to new file
>>
>> can you make use of that? cc-ing them in case you have questions/comments
>
>
> Thanks a lot for the pointer, I'm not following net-next regularly.
> This seems to be a copy of the mlx5 code.

Just to make it clear, the code was moved to a library and not copied... so
there's single instance of the code upstream and lets try to find a way for
your use case to take advantage of it possibly under some modifications.


Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices

2018-02-06 Thread Tal Gilboa

On 2/6/2018 11:34 AM, Sagi Grimberg wrote:

Hi Tal,


I think Tal has idea/s on how the existing library can be changed to
support more modes/models

What I was thinking is allowing DIM algorithm to disregard data which 
is 0. Currently if bytes == 0 we return "SAME" immediately. We can 
change it to simply move to the packets check (which may be renamed to 
"completions"). This way you could use DIM while only optimizing to 
(P1) high packet rate and (P2) low interrupt rate.


That was exactly where I started from. But unfortunately it did not work
well :(

 From my experiments, the moderation was all over the place failing to
converge. At least the workloads that I've tested with, it was more
successful to have a stricter step policy and pulling towards latency
if we are consistently catching single completion per event.

I'm not an expert here at all, but at this point, based on my attempts
so far, I'm not convinced the current net_dim scheme could work.
I do believe we can make it work. I see your addition of the cpe part to 
stats compare. Might not be a bad idea for networking devices. Overall, 
it seems to me like this would be a private case of the general DIM 
optimization, since it doesn't need to account for aggregation, for 
instance, which breaks the "more packets == more data" ratio.


Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices

2018-02-06 Thread Sagi Grimberg

Hi Tal,


I think Tal has idea/s on how the existing library can be changed to
support more modes/models

What I was thinking is allowing DIM algorithm to disregard data which is 
0. Currently if bytes == 0 we return "SAME" immediately. We can change 
it to simply move to the packets check (which may be renamed to 
"completions"). This way you could use DIM while only optimizing to (P1) 
high packet rate and (P2) low interrupt rate.


That was exactly where I started from. But unfortunately it did not work
well :(

From my experiments, the moderation was all over the place failing to
converge. At least the workloads that I've tested with, it was more
successful to have a stricter step policy and pulling towards latency
if we are consistently catching single completion per event.

I'm not an expert here at all, but at this point, based on my attempts
so far, I'm not convinced the current net_dim scheme could work.


Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices

2018-02-06 Thread Sagi Grimberg

Hey Or,


not any more, Andy and Tal made the mlx5 AM code a kernel library
which is called DIM

f4e5f0e MAINTAINERS: add entry for Dynamic Interrupt Moderation
6a8788f bnxt_en: add support for software dynamic interrupt moderation
8115b75 net/dim: use struct net_dim_sample as arg to net_dim
4c4dbb4 net/mlx5e: Move dynamic interrupt coalescing code to include/linux
9a31742 net/mlx5e: Change Mellanox references in DIM code
b9c872f net/mlx5e: Move generic functions to new file
f5e7f67 net/mlx5e: Move AM logic enums
138968e net/mlx5e: Remove rq references in mlx5e_rx_am
f58ee09 net/mlx5e: Move interrupt moderation forward declarations
98dd1ed net/mlx5e: Move interrupt moderation structs to new file

can you make use of that? cc-ing them in case you have questions/comments


Thanks a lot for the pointer, I'm not following net-next regularly.
This seems to be a copy of the mlx5 code.

So as mentioned in the cover letter, the implementation was inspired
from the mlx5 driver as it had some of the abstraction properties I was
aiming to achieve.

What I didn't mention, was that I started from using the mlx5
implementation as is (slightly modified). Unfortunately in the
workloads that I've tested, I was not able to get the am level to
converge. So I went in a slightly different direction which gave me
better results (although still not perfect and lightly tested).

This led me to believe that mlx5 offered strategy might not suite
storage I/O workloads (although I do acknowledge that I could be proven
wrong here).

For example, a design choice I took was that the moderation scheme would
be more aggressive towards latency on the expense of throughput
optimization since high end storage devices are often expected to meet
strict latency requirements. Does that makes sense for network devices?
I don't know.

Another difference is that unlike net devices, storage I/O transactions
usually look like request/response where the data transfer is offloaded
by the controller/hba (net devices moderate on raw RX datagrams).
Moreover, it might not be trivial to track the bytes/sec from a
completion at all (might be embedded somewhere in the consumer context).

So overall, at this point I'm a bit skeptical that it will makes sense
to commonise the two implementations. I'd like to hear some more input
if this is something that the community is interested in first. If this
is something people are interested in, we can look if it makes sense to
reuse net_dim.h or not.


Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices

2018-02-06 Thread Tal Gilboa

On 2/6/2018 10:54 AM, Or Gerlitz wrote:

On Tue, Feb 6, 2018 at 12:03 AM, Sagi Grimberg  wrote:


The main reason why this implementation is different then the common networking 
devices
implementation (and kept separate) is that in my mind at least, network devices 
are different
animals than other I/O devices in the sense that:


Oh, I see now  that you do refer to the netdev library, I was confused
by "In the networking stack, each device driver
implements adaptive IRQ moderation on its own" comment.


(a) network devices rely heavily on byte count of raw ethernet frames for 
adaptive moderation
 while in storage or I/O, the byte count is often a result of a 
submission/completion transaction
 and sometimes even known only to the application on top of the 
infrastructure (like in the
 rdma case).



(b) Performance characteristics and expectations in representative workloads.
(c) network devices collect all sort of stats for different functionalities 
(where adaptive moderation
 is only one use-case) and I'm not sure at all that a subset of stats could 
easily migrate to a different
 context.


I think Tal has idea/s on how the existing library can be changed to
support more modes/models

What I was thinking is allowing DIM algorithm to disregard data which is 
0. Currently if bytes == 0 we return "SAME" immediately. We can change 
it to simply move to the packets check (which may be renamed to 
"completions"). This way you could use DIM while only optimizing to (P1) 
high packet rate and (P2) low interrupt rate.


Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices

2018-02-06 Thread Or Gerlitz
On Tue, Feb 6, 2018 at 12:03 AM, Sagi Grimberg  wrote:

> The main reason why this implementation is different then the common 
> networking devices
> implementation (and kept separate) is that in my mind at least, network 
> devices are different
> animals than other I/O devices in the sense that:

Oh, I see now  that you do refer to the netdev library, I was confused
by "In the networking stack, each device driver
implements adaptive IRQ moderation on its own" comment.

> (a) network devices rely heavily on byte count of raw ethernet frames for 
> adaptive moderation
> while in storage or I/O, the byte count is often a result of a 
> submission/completion transaction
> and sometimes even known only to the application on top of the 
> infrastructure (like in the
> rdma case).

> (b) Performance characteristics and expectations in representative workloads.
> (c) network devices collect all sort of stats for different functionalities 
> (where adaptive moderation
> is only one use-case) and I'm not sure at all that a subset of stats 
> could easily migrate to a different
> context.

I think Tal has idea/s on how the existing library can be changed to
support more modes/models


Re: [PATCH rfc 0/5] generic adaptive IRQ moderation library for I/O devices

2018-02-05 Thread Or Gerlitz
On Tue, Feb 6, 2018 at 12:03 AM, Sagi Grimberg  wrote:
[...]
> In the networking stack, each device driver implements adaptive IRQ moderation
> on its own. The approach here is a bit different, it tries to take the common 
> denominator,
> which is per-queue statistics gathering and workload change identification
> (basically decides if the moderation scheme needs to change).

not any more, Andy and Tal made the mlx5 AM code a kernel library
which is called DIM

f4e5f0e MAINTAINERS: add entry for Dynamic Interrupt Moderation
6a8788f bnxt_en: add support for software dynamic interrupt moderation
8115b75 net/dim: use struct net_dim_sample as arg to net_dim
4c4dbb4 net/mlx5e: Move dynamic interrupt coalescing code to include/linux
9a31742 net/mlx5e: Change Mellanox references in DIM code
b9c872f net/mlx5e: Move generic functions to new file
f5e7f67 net/mlx5e: Move AM logic enums
138968e net/mlx5e: Remove rq references in mlx5e_rx_am
f58ee09 net/mlx5e: Move interrupt moderation forward declarations
98dd1ed net/mlx5e: Move interrupt moderation structs to new file

can you make use of that? cc-ing them in case you have questions/comments


> The library is targeted to multi-queue devices, but should work on single 
> queue
> devices as well, however I'm not sure that these devices will need something
> like interrupt moderation.