Re: mlx5 broken affinity

2017-11-09 Thread Jes Sorensen
On 11/08/2017 12:33 PM, Thomas Gleixner wrote: > On Wed, 8 Nov 2017, Jes Sorensen wrote: >> On 11/07/2017 10:07 AM, Thomas Gleixner wrote: >>> Depending on the machine and the number of queues this might even result in >>> completely losing the ability to suspend/hibernate because the number of

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe
On 11/09/2017 02:23 PM, Thomas Gleixner wrote: > On Thu, 9 Nov 2017, Jens Axboe wrote: >> On 11/09/2017 10:03 AM, Thomas Gleixner wrote: >>> On Thu, 9 Nov 2017, Jens Axboe wrote: On 11/09/2017 07:19 AM, Thomas Gleixner wrote: If that's the attitude at your end, then I do suggest we just

Re: mlx5 broken affinity

2017-11-09 Thread Thomas Gleixner
On Thu, 9 Nov 2017, Jens Axboe wrote: > On 11/09/2017 10:07 AM, Thomas Gleixner wrote: > > I say it one last time: It can be done and I'm willing to help. > > It didn't sound like it earlier, but that's good news. Well, I'm equally frustrated by this whole thing, but I certainly never said that

Re: mlx5 broken affinity

2017-11-09 Thread Thomas Gleixner
On Thu, 9 Nov 2017, Jens Axboe wrote: > On 11/09/2017 10:03 AM, Thomas Gleixner wrote: > > On Thu, 9 Nov 2017, Jens Axboe wrote: > >> On 11/09/2017 07:19 AM, Thomas Gleixner wrote: > >> If that's the attitude at your end, then I do suggest we just revert the > >> driver changes. Clearly this isn't

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe
On 11/09/2017 10:07 AM, Thomas Gleixner wrote: > On Thu, 9 Nov 2017, Jens Axboe wrote: > >> On 11/09/2017 09:01 AM, Sagi Grimberg wrote: Now you try to blame the people who implemented the managed affinity stuff for the wreckage, which was created by people who changed drivers to use

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe
On 11/09/2017 10:03 AM, Thomas Gleixner wrote: > On Thu, 9 Nov 2017, Jens Axboe wrote: >> On 11/09/2017 07:19 AM, Thomas Gleixner wrote: >> If that's the attitude at your end, then I do suggest we just revert the >> driver changes. Clearly this isn't going to be productive going forward. >> >> The

Re: mlx5 broken affinity

2017-11-09 Thread Thomas Gleixner
On Thu, 9 Nov 2017, Jens Axboe wrote: > On 11/09/2017 09:01 AM, Sagi Grimberg wrote: > >> Now you try to blame the people who implemented the managed affinity stuff > >> for the wreckage, which was created by people who changed drivers to use > >> it. Nice try. > > > > I'm not trying to blame

Re: mlx5 broken affinity

2017-11-09 Thread Thomas Gleixner
On Thu, 9 Nov 2017, Jens Axboe wrote: > On 11/09/2017 07:19 AM, Thomas Gleixner wrote: > If that's the attitude at your end, then I do suggest we just revert the > driver changes. Clearly this isn't going to be productive going forward. > > The better solution was to make the managed setup more

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe
On 11/09/2017 09:01 AM, Sagi Grimberg wrote: >> Now you try to blame the people who implemented the managed affinity stuff >> for the wreckage, which was created by people who changed drivers to use >> it. Nice try. > > I'm not trying to blame anyone, really. I was just trying to understand > how

Re: mlx5 broken affinity

2017-11-09 Thread Sagi Grimberg
The early discussion of the managed facility came to the conclusion that it will manage this stuff completely to allow fixed association of 'queue / interrupt / corresponding memory' to a single CPU or a set of CPUs. That removes a lot of 'affinity' handling magic from the driver and utilizes

Re: mlx5 broken affinity

2017-11-09 Thread Sagi Grimberg
Again, I think Jes or others can provide more information. Sagi, I believe Jes is not trying to argue about what initial affinity values you give to the driver, We have a very critical regression that is afflicting Live systems today and common tools that already exists in various distros,

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe
On 11/09/2017 07:19 AM, Thomas Gleixner wrote: > On Thu, 9 Nov 2017, Sagi Grimberg wrote: >> Thomas, >> Because the user sometimes knows better based on statically assigned loads, or the user wants consistency across kernels. It's great that the system is better at allocating this

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe
On 11/09/2017 03:50 AM, Sagi Grimberg wrote: > Thomas, > >>> Because the user sometimes knows better based on statically assigned >>> loads, or the user wants consistency across kernels. It's great that the >>> system is better at allocating this now, but we also need to allow for a >>> user to

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe
On 11/09/2017 03:09 AM, Christoph Hellwig wrote: > On Wed, Nov 08, 2017 at 09:13:59AM -0700, Jens Axboe wrote: >> There are numerous valid reasons to be able to set the affinity, for >> both nics and block drivers. It's great that the kernel has a predefined >> layout that works well, but users do

Re: mlx5 broken affinity

2017-11-09 Thread Saeed Mahameed
On Wed, 2017-11-08 at 09:27 +0200, Sagi Grimberg wrote: > > Depending on the machine and the number of queues this might even > > result in > > completely losing the ability to suspend/hibernate because the > > number of > > available vectors on CPU0 is not sufficient to accomodate all queue > >

Re: mlx5 broken affinity

2017-11-09 Thread Thomas Gleixner
On Thu, 9 Nov 2017, Sagi Grimberg wrote: > Thomas, > > > > Because the user sometimes knows better based on statically assigned > > > loads, or the user wants consistency across kernels. It's great that the > > > system is better at allocating this now, but we also need to allow for a > > > user

Re: mlx5 broken affinity

2017-11-09 Thread Sagi Grimberg
Thomas, Because the user sometimes knows better based on statically assigned loads, or the user wants consistency across kernels. It's great that the system is better at allocating this now, but we also need to allow for a user to change it. Like anything on Linux, a user wanting to blow off

Re: mlx5 broken affinity

2017-11-09 Thread Christoph Hellwig
On Wed, Nov 08, 2017 at 09:13:59AM -0700, Jens Axboe wrote: > There are numerous valid reasons to be able to set the affinity, for > both nics and block drivers. It's great that the kernel has a predefined > layout that works well, but users do need the flexibility to be able to > reconfigure

Re: mlx5 broken affinity

2017-11-08 Thread Thomas Gleixner
On Wed, 8 Nov 2017, Jes Sorensen wrote: > On 11/07/2017 10:07 AM, Thomas Gleixner wrote: > > On Sun, 5 Nov 2017, Sagi Grimberg wrote: > >> I do agree that the user would lose better cpu online/offline behavior, > >> but it seems that users want to still have some control over the IRQ > >> affinity

Re: mlx5 broken affinity

2017-11-08 Thread Jes Sorensen
On 11/07/2017 10:07 AM, Thomas Gleixner wrote: > On Sun, 5 Nov 2017, Sagi Grimberg wrote: >> I do agree that the user would lose better cpu online/offline behavior, >> but it seems that users want to still have some control over the IRQ >> affinity assignments even if they lose this functionality.

Re: mlx5 broken affinity

2017-11-08 Thread Jens Axboe
On 11/08/2017 05:21 AM, David Laight wrote: > From: Sagi Grimberg >> Sent: 08 November 2017 07:28 > ... >>> Why would you give the user a knob to destroy what you carefully optimized? >> >> Well, looks like someone relies on this knob, the question is if he is >> doing something better for his

RE: mlx5 broken affinity

2017-11-08 Thread David Laight
From: Sagi Grimberg > Sent: 08 November 2017 07:28 ... > > Why would you give the user a knob to destroy what you carefully optimized? > > Well, looks like someone relies on this knob, the question is if he is > doing something better for his workload. I don't know, its really up to > the user to

Re: mlx5 broken affinity

2017-11-07 Thread Sagi Grimberg
Depending on the machine and the number of queues this might even result in completely losing the ability to suspend/hibernate because the number of available vectors on CPU0 is not sufficient to accomodate all queue interrupts. Would it be possible to keep the managed facility until a user

Re: mlx5 broken affinity

2017-11-07 Thread Thomas Gleixner
On Sun, 5 Nov 2017, Sagi Grimberg wrote: > > > > This wasn't to start a debate about which allocation method is the > > > > perfect solution. I am perfectly happy with the new default, the part > > > > that is broken is to take away the user's option to reassign the > > > > affinity. That is a bug

Re: mlx5 broken affinity

2017-11-05 Thread Sagi Grimberg
This wasn't to start a debate about which allocation method is the perfect solution. I am perfectly happy with the new default, the part that is broken is to take away the user's option to reassign the affinity. That is a bug and it needs to be fixed! Well, I would really want to wait for

Re: mlx5 broken affinity

2017-11-02 Thread Thomas Gleixner
On Thu, 2 Nov 2017, Sagi Grimberg wrote: > > > This wasn't to start a debate about which allocation method is the > > perfect solution. I am perfectly happy with the new default, the part > > that is broken is to take away the user's option to reassign the > > affinity. That is a bug and it

Re: mlx5 broken affinity

2017-11-02 Thread Jes Sorensen
On 11/02/2017 12:14 PM, Sagi Grimberg wrote: > >> This wasn't to start a debate about which allocation method is the >> perfect solution. I am perfectly happy with the new default, the part >> that is broken is to take away the user's option to reassign the >> affinity. That is a bug and it needs

Re: mlx5 broken affinity

2017-11-02 Thread Sagi Grimberg
This wasn't to start a debate about which allocation method is the perfect solution. I am perfectly happy with the new default, the part that is broken is to take away the user's option to reassign the affinity. That is a bug and it needs to be fixed! Well, I would really want to wait for

Re: mlx5 broken affinity

2017-11-02 Thread Jes Sorensen
On 11/02/2017 06:08 AM, Sagi Grimberg wrote: > I vaguely remember Nacking Sagi's patch as we knew it would break mlx5e netdev affinity assumptions. >> I remember that argument. Still the series found its way in. > > Of course it maid its way in, it was acked by three different >

Re: mlx5 broken affinity

2017-11-02 Thread Andrew Lunn
> >This means that if your NIC is on NUMA #1, and you reduce the number of > >channels, you might end up working only with the cores on the far NUMA. > >Not good! > We deliberated on this before, and concluded that application affinity > and device affinity are equally important. If you have a

Re: mlx5 broken affinity

2017-11-02 Thread Sagi Grimberg
I vaguely remember Nacking Sagi's patch as we knew it would break mlx5e netdev affinity assumptions. I remember that argument. Still the series found its way in. Of course it maid its way in, it was acked by three different maintainers, and I addressed all of Saeed's comments. That series

Re: mlx5 broken affinity

2017-11-02 Thread Tariq Toukan
On 02/11/2017 1:02 AM, Jes Sorensen wrote: On 11/01/2017 06:41 PM, Saeed Mahameed wrote: On Wed, Nov 1, 2017 at 11:20 AM, Jes Sorensen wrote: On 11/01/2017 01:21 PM, Sagi Grimberg wrote: I am all in favor of making the automatic setup better, but assuming an automatic

Re: mlx5 broken affinity

2017-11-02 Thread Sagi Grimberg
Jes, I am all in favor of making the automatic setup better, but assuming an automatic setup is always right seems problematic. There could be workloads where you may want to assign affinity explicitly. Adding Thomas to the thread. My understanding that the thought is to prevent user-space

Re: mlx5 broken affinity

2017-11-01 Thread Jes Sorensen
On 11/01/2017 06:41 PM, Saeed Mahameed wrote: > On Wed, Nov 1, 2017 at 11:20 AM, Jes Sorensen wrote: >> On 11/01/2017 01:21 PM, Sagi Grimberg wrote: >> I am all in favor of making the automatic setup better, but assuming an >> automatic setup is always right seems problematic.

Re: mlx5 broken affinity

2017-11-01 Thread Saeed Mahameed
On Wed, Nov 1, 2017 at 11:20 AM, Jes Sorensen wrote: > On 11/01/2017 01:21 PM, Sagi Grimberg wrote: >>> Hi, >> >> Hi Jes, >> >>> The below patch seems to have broken PCI IRQ affinity assignments for >>> mlx5. >> >> I wouldn't call it breaking IRQ affinity assignments. It just

Re: mlx5 broken affinity

2017-11-01 Thread Jes Sorensen
On 11/01/2017 01:21 PM, Sagi Grimberg wrote: >> Hi, > > Hi Jes, > >> The below patch seems to have broken PCI IRQ affinity assignments for >> mlx5. > > I wouldn't call it breaking IRQ affinity assignments. It just makes > them automatic. Well it explicitly breaks the option for an admin to

Re: mlx5 broken affinity

2017-11-01 Thread Sagi Grimberg
Hi, Hi Jes, The below patch seems to have broken PCI IRQ affinity assignments for mlx5. I wouldn't call it breaking IRQ affinity assignments. It just makes them automatic. Prior to this patch I could echo a value to /proc/irq//smp_affinity and it would get assigned. With this patch