Re: [PATCH v5 00/10] workingset protection/detection on the anonymous LRU list
2020년 6월 3일 (수) 오후 12:57, Suren Baghdasaryan 님이 작성: > > On Wed, Apr 8, 2020 at 5:50 PM Joonsoo Kim wrote: > > > > 2020년 4월 9일 (목) 오전 1:55, Vlastimil Babka 님이 작성: > > > > > > On 4/3/20 7:40 AM, js1...@gmail.com wrote: > > > > From: Joonsoo Kim > > > > > > > > Hello, > > > > > > > > This patchset implements workingset protection and detection on > > > > the anonymous LRU list. > > > > > > Hi! > > > > Hi! > > > > > > I did another test to show the performance effect of this patchset. > > > > > > > > - ebizzy (with modified random function) > > > > ebizzy is the test program that main thread allocates lots of memory and > > > > child threads access them randomly during the given times. Swap-in/out > > > > will happen if allocated memory is larger than the system memory. > > > > > > > > The random function that represents the zipf distribution is used to > > > > make hot/cold memory. Hot/cold ratio is controlled by the parameter. If > > > > the parameter is high, hot memory is accessed much larger than cold one. > > > > If the parameter is low, the number of access on each memory would be > > > > similar. I uses various parameters in order to show the effect of > > > > patchset on various hot/cold ratio workload. > > > > > > > > My test setup is a virtual machine with 8 cpus and 1024MB memory. > > > > > > > > Result format is as following. > > > > > > > > Parameter 0.1 ... 1.3 > > > > Allocated memory size > > > > Throughput for base (larger is better) > > > > Throughput for patchset (larger is better) > > > > Improvement (larger is better) > > > > > > > > > > > > * single thread > > > > > > > > 0.1 0.3 0.5 0.7 0.9 1.1 1.3 > > > > <512M> > > > > 7009.0 7372.0 7774.0 8523.0 9569.0 10724.0 11936.0 > > > > 6973.0 7342.0 7745.0 8576.0 9441.0 10730.0 12033.0 > > > >-0.01 -0.0 -0.0 0.01-0.01 0.0 0.01 > > > > <768M> > > > >915.0 1039.0 1275.0 1687.0 2328.0 3486.0 5445.0 > > > >920.0 1037.0 1238.0 1689.0 2384.0 3638.0 5381.0 > > > > 0.01 -0.0-0.03 0.0 0.02 0.04-0.01 > > > > <1024M> > > > >425.0471.0539.0753.0 1183.0 2130.0 3839.0 > > > >414.0468.0553.0770.0 1242.0 2187.0 3932.0 > > > >-0.03-0.01 0.03 0.02 0.05 0.03 0.02 > > > > <1280M> > > > >320.0346.0410.0556.0871.0 1654.0 3298.0 > > > >316.0346.0411.0550.0892.0 1652.0 3293.0 > > > >-0.01 0.0 0.0-0.01 0.02 -0.0 -0.0 > > > > <1536M> > > > >273.0290.0341.0458.0733.0 1381.0 2925.0 > > > >271.0293.0344.0462.0740.0 1398.0 2969.0 > > > >-0.01 0.01 0.01 0.01 0.01 0.01 0.02 > > > > <2048M> > > > > 77.0 79.0 95.0147.0276.0690.0 1816.0 > > > > 91.0 94.0115.0170.0321.0770.0 2018.0 > > > > 0.18 0.19 0.21 0.16 0.16 0.12 0.11 > > > > > > > > > > > > * multi thread (8) > > > > > > > > 0.1 0.3 0.5 0.7 0.9 1.1 1.3 > > > > <512M> > > > > 29083.0 29648.0 30145.0 31668.0 33964.0 38414.0 43707.0 > > > > 29238.0 29701.0 30301.0 31328.0 33809.0 37991.0 43667.0 > > > > 0.01 0.0 0.01-0.01 -0.0-0.01 -0.0 > > > > <768M> > > > > 3332.0 3699.0 4673.0 5830.0 8307.0 12969.0 17665.0 > > > > 3579.0 3992.0 4432.0 6111.0 8699.0 12604.0 18061.0 > > > > 0.07 0.08-0.05 0.05 0.05-0.03 0.02 > > > > <1024M> > > > > 1921.0 2141.0 2484.0 3296.0 5391.0 8227.0 14574.0 > > > > 1989.0 2155.0 2609.0 3565.0 5463.0 8170.0 15642.0 > > > > 0.04 0.01 0.05 0.08 0.01-0.01 0.07 > > > > <1280M> > > > > 1524.0 1625.0 1931.0 2581.0 4155.0 6959.0 12443.0 > > > > 1560.0 1707.0 2016.0 2714.0 4262.0 7518.0 13910.0 > > > > 0.02 0.05 0.04 0.05 0.03 0.08 0.12 > > > > <1536M> > > > > 1303.0 1399.0 1550.0 2137.0 3469.0 6712.0 12944.0 > > > > 1356.0 1465.0 1701.0 2237.0 3583.0 6830.0 13580.0 > > > > 0.04 0.05 0.1 0.05 0.03 0.02 0.05 > > > > <2048M> > > > >172.0184.0215.0289.0514.0 1318.0 4153.0 > > > >175.0190.0225.0329.0606.0 1585.0 5170.0 > > > > 0.02 0.03 0.05 0.14 0.18 0.2 0.24 > > > > > > > > As we can see, as allocated memory grows, patched kernel get the better > > > > result. Maximum improvement is 21% for the single thread test and > > > > 24% for the multi thread test. > > > > > > So, these results seem to be identical since v1. After the various > > > changes up to > > > v5, should the benchmark be redone? And was that with a full patchset or > > > patches 1+2? > > > > It was done with a full patchset. I think
Re: [PATCH v5 00/10] workingset protection/detection on the anonymous LRU list
On Wed, Apr 8, 2020 at 5:50 PM Joonsoo Kim wrote: > > 2020년 4월 9일 (목) 오전 1:55, Vlastimil Babka 님이 작성: > > > > On 4/3/20 7:40 AM, js1...@gmail.com wrote: > > > From: Joonsoo Kim > > > > > > Hello, > > > > > > This patchset implements workingset protection and detection on > > > the anonymous LRU list. > > > > Hi! > > Hi! > > > > I did another test to show the performance effect of this patchset. > > > > > > - ebizzy (with modified random function) > > > ebizzy is the test program that main thread allocates lots of memory and > > > child threads access them randomly during the given times. Swap-in/out > > > will happen if allocated memory is larger than the system memory. > > > > > > The random function that represents the zipf distribution is used to > > > make hot/cold memory. Hot/cold ratio is controlled by the parameter. If > > > the parameter is high, hot memory is accessed much larger than cold one. > > > If the parameter is low, the number of access on each memory would be > > > similar. I uses various parameters in order to show the effect of > > > patchset on various hot/cold ratio workload. > > > > > > My test setup is a virtual machine with 8 cpus and 1024MB memory. > > > > > > Result format is as following. > > > > > > Parameter 0.1 ... 1.3 > > > Allocated memory size > > > Throughput for base (larger is better) > > > Throughput for patchset (larger is better) > > > Improvement (larger is better) > > > > > > > > > * single thread > > > > > > 0.1 0.3 0.5 0.7 0.9 1.1 1.3 > > > <512M> > > > 7009.0 7372.0 7774.0 8523.0 9569.0 10724.0 11936.0 > > > 6973.0 7342.0 7745.0 8576.0 9441.0 10730.0 12033.0 > > >-0.01 -0.0 -0.0 0.01-0.01 0.0 0.01 > > > <768M> > > >915.0 1039.0 1275.0 1687.0 2328.0 3486.0 5445.0 > > >920.0 1037.0 1238.0 1689.0 2384.0 3638.0 5381.0 > > > 0.01 -0.0-0.03 0.0 0.02 0.04-0.01 > > > <1024M> > > >425.0471.0539.0753.0 1183.0 2130.0 3839.0 > > >414.0468.0553.0770.0 1242.0 2187.0 3932.0 > > >-0.03-0.01 0.03 0.02 0.05 0.03 0.02 > > > <1280M> > > >320.0346.0410.0556.0871.0 1654.0 3298.0 > > >316.0346.0411.0550.0892.0 1652.0 3293.0 > > >-0.01 0.0 0.0-0.01 0.02 -0.0 -0.0 > > > <1536M> > > >273.0290.0341.0458.0733.0 1381.0 2925.0 > > >271.0293.0344.0462.0740.0 1398.0 2969.0 > > >-0.01 0.01 0.01 0.01 0.01 0.01 0.02 > > > <2048M> > > > 77.0 79.0 95.0147.0276.0690.0 1816.0 > > > 91.0 94.0115.0170.0321.0770.0 2018.0 > > > 0.18 0.19 0.21 0.16 0.16 0.12 0.11 > > > > > > > > > * multi thread (8) > > > > > > 0.1 0.3 0.5 0.7 0.9 1.1 1.3 > > > <512M> > > > 29083.0 29648.0 30145.0 31668.0 33964.0 38414.0 43707.0 > > > 29238.0 29701.0 30301.0 31328.0 33809.0 37991.0 43667.0 > > > 0.01 0.0 0.01-0.01 -0.0-0.01 -0.0 > > > <768M> > > > 3332.0 3699.0 4673.0 5830.0 8307.0 12969.0 17665.0 > > > 3579.0 3992.0 4432.0 6111.0 8699.0 12604.0 18061.0 > > > 0.07 0.08-0.05 0.05 0.05-0.03 0.02 > > > <1024M> > > > 1921.0 2141.0 2484.0 3296.0 5391.0 8227.0 14574.0 > > > 1989.0 2155.0 2609.0 3565.0 5463.0 8170.0 15642.0 > > > 0.04 0.01 0.05 0.08 0.01-0.01 0.07 > > > <1280M> > > > 1524.0 1625.0 1931.0 2581.0 4155.0 6959.0 12443.0 > > > 1560.0 1707.0 2016.0 2714.0 4262.0 7518.0 13910.0 > > > 0.02 0.05 0.04 0.05 0.03 0.08 0.12 > > > <1536M> > > > 1303.0 1399.0 1550.0 2137.0 3469.0 6712.0 12944.0 > > > 1356.0 1465.0 1701.0 2237.0 3583.0 6830.0 13580.0 > > > 0.04 0.05 0.1 0.05 0.03 0.02 0.05 > > > <2048M> > > >172.0184.0215.0289.0514.0 1318.0 4153.0 > > >175.0190.0225.0329.0606.0 1585.0 5170.0 > > > 0.02 0.03 0.05 0.14 0.18 0.2 0.24 > > > > > > As we can see, as allocated memory grows, patched kernel get the better > > > result. Maximum improvement is 21% for the single thread test and > > > 24% for the multi thread test. > > > > So, these results seem to be identical since v1. After the various changes > > up to > > v5, should the benchmark be redone? And was that with a full patchset or > > patches 1+2? > > It was done with a full patchset. I think that these results would not > be changed > even on v5 since it is improvement from the concept of this patchset and > implementation detail doesn't much matter. However, I will redo. > > > > * EXPERIMENT > > > I made a test program to imitates above scenario and