Hi Mel,
we have tried to revert following 2 commits:
305c1fac3225
2d4056fafa196e1ab
We had to revert 10864a9e222048a862da2c21efa28929a4dfed15 as well.
The performance of the kernel was better than when only
2d4056fafa196e1ab was reverted but still worse than the performance of
4.18 kernel.
Hi Mel,
we have tried to revert following 2 commits:
305c1fac3225
2d4056fafa196e1ab
We had to revert 10864a9e222048a862da2c21efa28929a4dfed15 as well.
The performance of the kernel was better than when only
2d4056fafa196e1ab was reverted but still worse than the performance of
4.18 kernel.
> Maybe 305c1fac3225dfa7eeb89bfe91b7335a6edd5172. That introduces a weird
> condition in terms of idle CPU handling that has been problematic.
We will try that, thanks!
> I would suggest contacting Srikar directly.
I will do that right away. Whom should I put on Cc? Just you and
> Maybe 305c1fac3225dfa7eeb89bfe91b7335a6edd5172. That introduces a weird
> condition in terms of idle CPU handling that has been problematic.
We will try that, thanks!
> I would suggest contacting Srikar directly.
I will do that right away. Whom should I put on Cc? Just you and
On Thu, Sep 06, 2018 at 10:16:28AM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> we have results with 2d4056fafa196e1ab4e7161bae4df76f9602d56d reverted.
>
> * Compared to 4.18, there is still performance regression -
> especially with NAS (sp_C_x subtest) and SPECjvm2008. On 4 NUMA
> systems,
On Thu, Sep 06, 2018 at 10:16:28AM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> we have results with 2d4056fafa196e1ab4e7161bae4df76f9602d56d reverted.
>
> * Compared to 4.18, there is still performance regression -
> especially with NAS (sp_C_x subtest) and SPECjvm2008. On 4 NUMA
> systems,
Hi Mel,
we have results with 2d4056fafa196e1ab4e7161bae4df76f9602d56d reverted.
* Compared to 4.18, there is still performance regression -
especially with NAS (sp_C_x subtest) and SPECjvm2008. On 4 NUMA
systems, regression is around 10-15%
* Compared to 4.19rc1 there is a clear gain across
Hi Mel,
we have results with 2d4056fafa196e1ab4e7161bae4df76f9602d56d reverted.
* Compared to 4.18, there is still performance regression -
especially with NAS (sp_C_x subtest) and SPECjvm2008. On 4 NUMA
systems, regression is around 10-15%
* Compared to 4.19rc1 there is a clear gain across
Hi Mel,
thanks for sharing the background information! We will check if
2d4056fafa196e1ab4e7161bae4df76f9602d56d is causing the current
regression in 4.19 rc1 and let you know the outcome.
Jirka
On Tue, Sep 4, 2018 at 11:00 AM, Mel Gorman wrote:
> On Mon, Sep 03, 2018 at 05:07:15PM +0200,
Hi Mel,
thanks for sharing the background information! We will check if
2d4056fafa196e1ab4e7161bae4df76f9602d56d is causing the current
regression in 4.19 rc1 and let you know the outcome.
Jirka
On Tue, Sep 4, 2018 at 11:00 AM, Mel Gorman wrote:
> On Mon, Sep 03, 2018 at 05:07:15PM +0200,
On Mon, Sep 03, 2018 at 05:07:15PM +0200, Jirka Hladky wrote:
> Resending in the plain text mode.
>
> > My own testing completed and the results are within expectations and I
> > saw no red flags. Unfortunately, I consider it unlikely they'll be merged
> > for 4.18. Srikar Dronamraju's series is
On Mon, Sep 03, 2018 at 05:07:15PM +0200, Jirka Hladky wrote:
> Resending in the plain text mode.
>
> > My own testing completed and the results are within expectations and I
> > saw no red flags. Unfortunately, I consider it unlikely they'll be merged
> > for 4.18. Srikar Dronamraju's series is
Resending in the plain text mode.
> My own testing completed and the results are within expectations and I
> saw no red flags. Unfortunately, I consider it unlikely they'll be merged
> for 4.18. Srikar Dronamraju's series is likely to need another update
> and I would need to rebase my patches on
Resending in the plain text mode.
> My own testing completed and the results are within expectations and I
> saw no red flags. Unfortunately, I consider it unlikely they'll be merged
> for 4.18. Srikar Dronamraju's series is likely to need another update
> and I would need to rebase my patches on
On Tue, Jul 17, 2018 at 10:45:51AM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> we have compared 4.18 + git://
> git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git
> sched-numa-fast-crossnode-v1r12 against 4.16 kernel and performance results
> look very good!
>
Excellent, thanks to both Kamil
On Tue, Jul 17, 2018 at 10:45:51AM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> we have compared 4.18 + git://
> git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git
> sched-numa-fast-crossnode-v1r12 against 4.16 kernel and performance results
> look very good!
>
Excellent, thanks to both Kamil
On Wed, Jun 27, 2018 at 12:18:37AM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> we have results for the "Fixes for sched/numa_balancing" series and overall
> it looks very promising.
>
> We see improvements in the range 15-20% for the stream benchmark and
> upto 60% for the OpenMP NAS benchmark.
On Wed, Jun 27, 2018 at 12:18:37AM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> we have results for the "Fixes for sched/numa_balancing" series and overall
> it looks very promising.
>
> We see improvements in the range 15-20% for the stream benchmark and
> upto 60% for the OpenMP NAS benchmark.
On Wed, Jun 20, 2018 at 07:25:19PM +0200, Jirka Hladky wrote:
> Hi Mel and others,
>
> I would like to let you know that I have tested following patch
>
Understood. FWIW, there is a lot in flight at the moment but the first
likely patch is removing rate limiting entirely and see what falls out.
On Wed, Jun 20, 2018 at 07:25:19PM +0200, Jirka Hladky wrote:
> Hi Mel and others,
>
> I would like to let you know that I have tested following patch
>
Understood. FWIW, there is a lot in flight at the moment but the first
likely patch is removing rate limiting entirely and see what falls out.
On Tue, Jun 19, 2018 at 03:36:53PM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> we have tested following variants:
>
> var1: 4.16 + 2c83362734dad8e48ccc0710b5cd2436a0323893
> fix1: var1+ ratelimit_pages __read_mostly increased by factor 4x
> -static unsigned int ratelimit_pages __read_mostly = 128
On Tue, Jun 19, 2018 at 03:36:53PM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> we have tested following variants:
>
> var1: 4.16 + 2c83362734dad8e48ccc0710b5cd2436a0323893
> fix1: var1+ ratelimit_pages __read_mostly increased by factor 4x
> -static unsigned int ratelimit_pages __read_mostly = 128
On Fri, Jun 15, 2018 at 02:23:17PM +0200, Jirka Hladky wrote:
> I added configurations that used half of the CPUs. However, that would
> > mean it fits too nicely within sockets. I've added another set for one
> > third of the CPUs and scheduled the tests. Unfortunately, they will not
> > complete
On Fri, Jun 15, 2018 at 02:23:17PM +0200, Jirka Hladky wrote:
> I added configurations that used half of the CPUs. However, that would
> > mean it fits too nicely within sockets. I've added another set for one
> > third of the CPUs and scheduled the tests. Unfortunately, they will not
> > complete
On Fri, Jun 15, 2018 at 01:07:32AM +0200, Jirka Hladky wrote:
> >
> > In terms of the speed of migration, it may be worth checking how often the
> > mm_numa_migrate_ratelimit tracepoint is triggered with bonus points for
> > using
> > the nr_pages to calculate how many pages get throttled from
On Fri, Jun 15, 2018 at 01:07:32AM +0200, Jirka Hladky wrote:
> >
> > In terms of the speed of migration, it may be worth checking how often the
> > mm_numa_migrate_ratelimit tracepoint is triggered with bonus points for
> > using
> > the nr_pages to calculate how many pages get throttled from
On Mon, Jun 11, 2018 at 06:07:58PM +0200, Jirka Hladky wrote:
> >
> > Fixing any part of it for STREAM will end up regressing something else.
>
>
> I fully understand that. We run a set of benchmarks and we always look at
> the results as the ensemble. Looking only at one benchmark would be
>
On Mon, Jun 11, 2018 at 06:07:58PM +0200, Jirka Hladky wrote:
> >
> > Fixing any part of it for STREAM will end up regressing something else.
>
>
> I fully understand that. We run a set of benchmarks and we always look at
> the results as the ensemble. Looking only at one benchmark would be
>
On Mon, Jun 11, 2018 at 12:04:34PM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> your suggestion about the commit which has caused the regression was right
> - it's indeed this commit:
>
> 2c83362734dad8e48ccc0710b5cd2436a0323893
>
> The question now is what can be done to improve the results. I
On Mon, Jun 11, 2018 at 12:04:34PM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> your suggestion about the commit which has caused the regression was right
> - it's indeed this commit:
>
> 2c83362734dad8e48ccc0710b5cd2436a0323893
>
> The question now is what can be done to improve the results. I
On Fri, Jun 08, 2018 at 01:02:54PM +0200, Jirka Hladky wrote:
> >
> > Unknown and unknowable. It depends entirely on the reference pattern of
> > the different threads. If they are fully parallelised with private buffers
> > that are page-aligned then I expect it to be quick (to pass the
On Fri, Jun 08, 2018 at 01:02:54PM +0200, Jirka Hladky wrote:
> >
> > Unknown and unknowable. It depends entirely on the reference pattern of
> > the different threads. If they are fully parallelised with private buffers
> > that are page-aligned then I expect it to be quick (to pass the
On Fri, Jun 08, 2018 at 10:49:03AM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> automatic NUMA balancing doesn't run long enough to migrate all the
> > memory. That would definitely be the case for STREAM.
>
> This could explain the behavior we observe. stream is running ~20 seconds
> at the moment.
On Fri, Jun 08, 2018 at 10:49:03AM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> automatic NUMA balancing doesn't run long enough to migrate all the
> > memory. That would definitely be the case for STREAM.
>
> This could explain the behavior we observe. stream is running ~20 seconds
> at the moment.
On Fri, Jun 08, 2018 at 07:49:37AM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> we will do the bisection today and report the results back.
>
The most likely outcome is 2c83362734dad8e48ccc0710b5cd2436a0323893
which is a patch that restricts newly forked processes from selecting a
remote node when
On Fri, Jun 08, 2018 at 07:49:37AM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> we will do the bisection today and report the results back.
>
The most likely outcome is 2c83362734dad8e48ccc0710b5cd2436a0323893
which is a patch that restricts newly forked processes from selecting a
remote node when
On Wed, Jun 06, 2018 at 02:27:32PM +0200, Jakub Racek wrote:
> There is a huge performance regression on the 2 and 4 NUMA node systems on
> stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack
> and NAS parallel benchmarks show upto 50% performance drop.
>
I have not
On Wed, Jun 06, 2018 at 02:27:32PM +0200, Jakub Racek wrote:
> There is a huge performance regression on the 2 and 4 NUMA node systems on
> stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack
> and NAS parallel benchmarks show upto 50% performance drop.
>
I have not
Hi,
On 06/07/2018 01:07 PM, Michal Hocko wrote:
[CCing Mel and MM mailing list]
On Wed 06-06-18 14:27:32, Jakub Racek wrote:
Hi,
There is a huge performance regression on the 2 and 4 NUMA node systems on
stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack
and NAS
Hi,
On 06/07/2018 01:07 PM, Michal Hocko wrote:
[CCing Mel and MM mailing list]
On Wed 06-06-18 14:27:32, Jakub Racek wrote:
Hi,
There is a huge performance regression on the 2 and 4 NUMA node systems on
stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack
and NAS
[CCing Mel and MM mailing list]
On Wed 06-06-18 14:27:32, Jakub Racek wrote:
> Hi,
>
> There is a huge performance regression on the 2 and 4 NUMA node systems on
> stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack
> and NAS parallel benchmarks show upto 50% performance
[CCing Mel and MM mailing list]
On Wed 06-06-18 14:27:32, Jakub Racek wrote:
> Hi,
>
> There is a huge performance regression on the 2 and 4 NUMA node systems on
> stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack
> and NAS parallel benchmarks show upto 50% performance
On Wed, Jun 6, 2018 at 2:34 PM, Rafael J. Wysocki wrote:
> On Wed, Jun 6, 2018 at 2:27 PM, Jakub Racek wrote:
>> Hi,
>>
>> There is a huge performance regression on the 2 and 4 NUMA node systems on
>> stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack
>> and NAS parallel
On Wed, Jun 6, 2018 at 2:34 PM, Rafael J. Wysocki wrote:
> On Wed, Jun 6, 2018 at 2:27 PM, Jakub Racek wrote:
>> Hi,
>>
>> There is a huge performance regression on the 2 and 4 NUMA node systems on
>> stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack
>> and NAS parallel
On Wed, Jun 6, 2018 at 2:27 PM, Jakub Racek wrote:
> Hi,
>
> There is a huge performance regression on the 2 and 4 NUMA node systems on
> stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack
> and NAS parallel benchmarks show upto 50% performance drop.
>
> When running for
On Wed, Jun 6, 2018 at 2:27 PM, Jakub Racek wrote:
> Hi,
>
> There is a huge performance regression on the 2 and 4 NUMA node systems on
> stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack
> and NAS parallel benchmarks show upto 50% performance drop.
>
> When running for
Hi,
There is a huge performance regression on the 2 and 4 NUMA node systems on stream
benchmark with 4.17 kernel compared to 4.16 kernel.
Stream, Linpack and NAS parallel benchmarks show upto 50% performance drop.
When running for example 20 stream processes in parallel, we see the following
Hi,
There is a huge performance regression on the 2 and 4 NUMA node systems on stream
benchmark with 4.17 kernel compared to 4.16 kernel.
Stream, Linpack and NAS parallel benchmarks show upto 50% performance drop.
When running for example 20 stream processes in parallel, we see the following
48 matches
Mail list logo