Re: [PATCH v5 00/10] workingset protection/detection on the anonymous LRU list

2020-06-02 Thread Joonsoo Kim
2020년 6월 3일 (수) 오후 12:57, Suren Baghdasaryan 님이 작성:
>
> On Wed, Apr 8, 2020 at 5:50 PM Joonsoo Kim  wrote:
> >
> > 2020년 4월 9일 (목) 오전 1:55, Vlastimil Babka 님이 작성:
> > >
> > > On 4/3/20 7:40 AM, js1...@gmail.com wrote:
> > > > From: Joonsoo Kim 
> > > >
> > > > Hello,
> > > >
> > > > This patchset implements workingset protection and detection on
> > > > the anonymous LRU list.
> > >
> > > Hi!
> >
> > Hi!
> >
> > > > I did another test to show the performance effect of this patchset.
> > > >
> > > > - ebizzy (with modified random function)
> > > > ebizzy is the test program that main thread allocates lots of memory and
> > > > child threads access them randomly during the given times. Swap-in/out
> > > > will happen if allocated memory is larger than the system memory.
> > > >
> > > > The random function that represents the zipf distribution is used to
> > > > make hot/cold memory. Hot/cold ratio is controlled by the parameter. If
> > > > the parameter is high, hot memory is accessed much larger than cold one.
> > > > If the parameter is low, the number of access on each memory would be
> > > > similar. I uses various parameters in order to show the effect of
> > > > patchset on various hot/cold ratio workload.
> > > >
> > > > My test setup is a virtual machine with 8 cpus and 1024MB memory.
> > > >
> > > > Result format is as following.
> > > >
> > > > Parameter 0.1 ... 1.3
> > > > Allocated memory size
> > > > Throughput for base (larger is better)
> > > > Throughput for patchset (larger is better)
> > > > Improvement (larger is better)
> > > >
> > > >
> > > > * single thread
> > > >
> > > >  0.1  0.3  0.5  0.7  0.9  1.1  1.3
> > > > <512M>
> > > >   7009.0   7372.0   7774.0   8523.0   9569.0  10724.0  11936.0
> > > >   6973.0   7342.0   7745.0   8576.0   9441.0  10730.0  12033.0
> > > >-0.01 -0.0 -0.0 0.01-0.01  0.0 0.01
> > > > <768M>
> > > >915.0   1039.0   1275.0   1687.0   2328.0   3486.0   5445.0
> > > >920.0   1037.0   1238.0   1689.0   2384.0   3638.0   5381.0
> > > > 0.01 -0.0-0.03  0.0 0.02 0.04-0.01
> > > > <1024M>
> > > >425.0471.0539.0753.0   1183.0   2130.0   3839.0
> > > >414.0468.0553.0770.0   1242.0   2187.0   3932.0
> > > >-0.03-0.01 0.03 0.02 0.05 0.03 0.02
> > > > <1280M>
> > > >320.0346.0410.0556.0871.0   1654.0   3298.0
> > > >316.0346.0411.0550.0892.0   1652.0   3293.0
> > > >-0.01  0.0  0.0-0.01 0.02 -0.0 -0.0
> > > > <1536M>
> > > >273.0290.0341.0458.0733.0   1381.0   2925.0
> > > >271.0293.0344.0462.0740.0   1398.0   2969.0
> > > >-0.01 0.01 0.01 0.01 0.01 0.01 0.02
> > > > <2048M>
> > > > 77.0 79.0 95.0147.0276.0690.0   1816.0
> > > > 91.0 94.0115.0170.0321.0770.0   2018.0
> > > > 0.18 0.19 0.21 0.16 0.16 0.12 0.11
> > > >
> > > >
> > > > * multi thread (8)
> > > >
> > > >  0.1  0.3  0.5  0.7  0.9  1.1  1.3
> > > > <512M>
> > > >  29083.0  29648.0  30145.0  31668.0  33964.0  38414.0  43707.0
> > > >  29238.0  29701.0  30301.0  31328.0  33809.0  37991.0  43667.0
> > > > 0.01  0.0 0.01-0.01 -0.0-0.01 -0.0
> > > > <768M>
> > > >   3332.0   3699.0   4673.0   5830.0   8307.0  12969.0  17665.0
> > > >   3579.0   3992.0   4432.0   6111.0   8699.0  12604.0  18061.0
> > > > 0.07 0.08-0.05 0.05 0.05-0.03 0.02
> > > > <1024M>
> > > >   1921.0   2141.0   2484.0   3296.0   5391.0   8227.0  14574.0
> > > >   1989.0   2155.0   2609.0   3565.0   5463.0   8170.0  15642.0
> > > > 0.04 0.01 0.05 0.08 0.01-0.01 0.07
> > > > <1280M>
> > > >   1524.0   1625.0   1931.0   2581.0   4155.0   6959.0  12443.0
> > > >   1560.0   1707.0   2016.0   2714.0   4262.0   7518.0  13910.0
> > > > 0.02 0.05 0.04 0.05 0.03 0.08 0.12
> > > > <1536M>
> > > >   1303.0   1399.0   1550.0   2137.0   3469.0   6712.0  12944.0
> > > >   1356.0   1465.0   1701.0   2237.0   3583.0   6830.0  13580.0
> > > > 0.04 0.05  0.1 0.05 0.03 0.02 0.05
> > > > <2048M>
> > > >172.0184.0215.0289.0514.0   1318.0   4153.0
> > > >175.0190.0225.0329.0606.0   1585.0   5170.0
> > > > 0.02 0.03 0.05 0.14 0.18  0.2 0.24
> > > >
> > > > As we can see, as allocated memory grows, patched kernel get the better
> > > > result. Maximum improvement is 21% for the single thread test and
> > > > 24% for the multi thread test.
> > >
> > > So, these results seem to be identical since v1. After the various 
> > > changes up to
> > > v5, should  the benchmark be redone? And was that with a full patchset or
> > > patches 1+2?
> >
> > It was done with a full patchset. I think 

Re: [PATCH v5 00/10] workingset protection/detection on the anonymous LRU list

2020-06-02 Thread Suren Baghdasaryan
On Wed, Apr 8, 2020 at 5:50 PM Joonsoo Kim  wrote:
>
> 2020년 4월 9일 (목) 오전 1:55, Vlastimil Babka 님이 작성:
> >
> > On 4/3/20 7:40 AM, js1...@gmail.com wrote:
> > > From: Joonsoo Kim 
> > >
> > > Hello,
> > >
> > > This patchset implements workingset protection and detection on
> > > the anonymous LRU list.
> >
> > Hi!
>
> Hi!
>
> > > I did another test to show the performance effect of this patchset.
> > >
> > > - ebizzy (with modified random function)
> > > ebizzy is the test program that main thread allocates lots of memory and
> > > child threads access them randomly during the given times. Swap-in/out
> > > will happen if allocated memory is larger than the system memory.
> > >
> > > The random function that represents the zipf distribution is used to
> > > make hot/cold memory. Hot/cold ratio is controlled by the parameter. If
> > > the parameter is high, hot memory is accessed much larger than cold one.
> > > If the parameter is low, the number of access on each memory would be
> > > similar. I uses various parameters in order to show the effect of
> > > patchset on various hot/cold ratio workload.
> > >
> > > My test setup is a virtual machine with 8 cpus and 1024MB memory.
> > >
> > > Result format is as following.
> > >
> > > Parameter 0.1 ... 1.3
> > > Allocated memory size
> > > Throughput for base (larger is better)
> > > Throughput for patchset (larger is better)
> > > Improvement (larger is better)
> > >
> > >
> > > * single thread
> > >
> > >  0.1  0.3  0.5  0.7  0.9  1.1  1.3
> > > <512M>
> > >   7009.0   7372.0   7774.0   8523.0   9569.0  10724.0  11936.0
> > >   6973.0   7342.0   7745.0   8576.0   9441.0  10730.0  12033.0
> > >-0.01 -0.0 -0.0 0.01-0.01  0.0 0.01
> > > <768M>
> > >915.0   1039.0   1275.0   1687.0   2328.0   3486.0   5445.0
> > >920.0   1037.0   1238.0   1689.0   2384.0   3638.0   5381.0
> > > 0.01 -0.0-0.03  0.0 0.02 0.04-0.01
> > > <1024M>
> > >425.0471.0539.0753.0   1183.0   2130.0   3839.0
> > >414.0468.0553.0770.0   1242.0   2187.0   3932.0
> > >-0.03-0.01 0.03 0.02 0.05 0.03 0.02
> > > <1280M>
> > >320.0346.0410.0556.0871.0   1654.0   3298.0
> > >316.0346.0411.0550.0892.0   1652.0   3293.0
> > >-0.01  0.0  0.0-0.01 0.02 -0.0 -0.0
> > > <1536M>
> > >273.0290.0341.0458.0733.0   1381.0   2925.0
> > >271.0293.0344.0462.0740.0   1398.0   2969.0
> > >-0.01 0.01 0.01 0.01 0.01 0.01 0.02
> > > <2048M>
> > > 77.0 79.0 95.0147.0276.0690.0   1816.0
> > > 91.0 94.0115.0170.0321.0770.0   2018.0
> > > 0.18 0.19 0.21 0.16 0.16 0.12 0.11
> > >
> > >
> > > * multi thread (8)
> > >
> > >  0.1  0.3  0.5  0.7  0.9  1.1  1.3
> > > <512M>
> > >  29083.0  29648.0  30145.0  31668.0  33964.0  38414.0  43707.0
> > >  29238.0  29701.0  30301.0  31328.0  33809.0  37991.0  43667.0
> > > 0.01  0.0 0.01-0.01 -0.0-0.01 -0.0
> > > <768M>
> > >   3332.0   3699.0   4673.0   5830.0   8307.0  12969.0  17665.0
> > >   3579.0   3992.0   4432.0   6111.0   8699.0  12604.0  18061.0
> > > 0.07 0.08-0.05 0.05 0.05-0.03 0.02
> > > <1024M>
> > >   1921.0   2141.0   2484.0   3296.0   5391.0   8227.0  14574.0
> > >   1989.0   2155.0   2609.0   3565.0   5463.0   8170.0  15642.0
> > > 0.04 0.01 0.05 0.08 0.01-0.01 0.07
> > > <1280M>
> > >   1524.0   1625.0   1931.0   2581.0   4155.0   6959.0  12443.0
> > >   1560.0   1707.0   2016.0   2714.0   4262.0   7518.0  13910.0
> > > 0.02 0.05 0.04 0.05 0.03 0.08 0.12
> > > <1536M>
> > >   1303.0   1399.0   1550.0   2137.0   3469.0   6712.0  12944.0
> > >   1356.0   1465.0   1701.0   2237.0   3583.0   6830.0  13580.0
> > > 0.04 0.05  0.1 0.05 0.03 0.02 0.05
> > > <2048M>
> > >172.0184.0215.0289.0514.0   1318.0   4153.0
> > >175.0190.0225.0329.0606.0   1585.0   5170.0
> > > 0.02 0.03 0.05 0.14 0.18  0.2 0.24
> > >
> > > As we can see, as allocated memory grows, patched kernel get the better
> > > result. Maximum improvement is 21% for the single thread test and
> > > 24% for the multi thread test.
> >
> > So, these results seem to be identical since v1. After the various changes 
> > up to
> > v5, should  the benchmark be redone? And was that with a full patchset or
> > patches 1+2?
>
> It was done with a full patchset. I think that these results would not
> be changed
> even on v5 since it is improvement from the concept of this patchset and
> implementation detail doesn't much matter. However, I will redo.
>
> > > * EXPERIMENT
> > > I made a test program to imitates above scenario and