Re: Vector vs std::vector
On 7/02/2014 10:56 p.m., Eliezer Croitoru wrote: > On 02/07/2014 08:20 AM, Kinkie wrote: >> Thank you for running this test. >> I guess that with these results, it can make sense to go forward with >> the project of replacing Vector with std::vector, does everyone agree? >> >> Thanks! > Seems pretty reasonable to me. > I had only another question. > Will using std::vector help any portability of squid? As part of the STL it should. Amos
Re: Vector vs std::vector
On 02/07/2014 08:20 AM, Kinkie wrote: Thank you for running this test. I guess that with these results, it can make sense to go forward with the project of replacing Vector with std::vector, does everyone agree? Thanks! Seems pretty reasonable to me. I had only another question. Will using std::vector help any portability of squid? Thanks, Eliezer
Re: Vector vs std::vector
On 7/02/2014 7:20 p.m., Kinkie wrote: > Thank you for running this test. > I guess that with these results, it can make sense to go forward with > the project of replacing Vector with std::vector, does everyone agree? Yes. Amos
Re: Vector vs std::vector
Thank you for running this test. I guess that with these results, it can make sense to go forward with the project of replacing Vector with std::vector, does everyone agree? Thanks! On Fri, Feb 7, 2014 at 6:18 AM, Alex Rousskov wrote: > On 01/30/2014 01:37 PM, Alex Rousskov wrote: >> On 01/30/2014 12:14 PM, Kinkie wrote: >>> Ok, here's some numbers (same testing methodology as before) >>> >>> Trunk: mean RPS (CPU time) >>> 10029.11 (996.661872) >>> 9786.60 (1021.695007) >>> 10116.93 (988.395665) >>> 9958.71 (1004.039956) >>> >>> stdvector: mean RPS (CPUtime) >>> 9732.57 (1027.426563) >>> 10388.38 (962.418333) >>> 10332.17 (967.824790) >> >> OK, so we are probably just looking at noise: The variation within each >> data set (1.4% or 3.6%) is about the same or more than the difference >> between the set averages (1.8%). >> >> We will test this in a Polygraph lab, but I would not be surprised to >> see no significant difference there as well. > > > Polygraph results are in. No significant difference is visible. > > > FWIW, here are a few basic response stats extracted from four tests > ("rep" is reply and "rptm" is response time): > > * trunk vs. branch, take1: > rep.rate: 0% 2199.97 = 2199.97 > rep.rptm.count: 0% 11880004.00 = 11880004.00 > rep.rptm.mean: 0% 501.12 < 501.16 > rep.rptm.std_dev: 0% 485.77 = 485.77 > rep.size.count: 0% 11880004.00 = 11880004.00 > rep.size.mean: 0% 8738.48 < 8739.91 > rep.size.std_dev: 0% 11446.90 > 11446.12 > > * trunk vs. branch, take2: > rep.rate: 0% 2199.97 = 2199.97 > rep.rptm.count: 0% 11880008.00 = 11880002.00 > rep.rptm.mean: 0% 501.14 < 501.16 > rep.rptm.std_dev: 0% 485.77 = 485.77 > rep.size.count: 0% 11880008.00 = 11880002.00 > rep.size.mean: 0% 8738.49 < 8739.14 > rep.size.std_dev: 0% 11446.42 < 11448.64 > > > The results appear to be pretty stable, but we have not run > more/different tests to really verify that: > > * take1 vs. take2, trunk: > rep.rate: 0% 2199.97 = 2199.97 > rep.rptm.count: 0% 11880004.00 = 11880008.00 > rep.rptm.mean: 0% 501.12 < 501.14 > rep.rptm.std_dev: 0% 485.77 = 485.77 > rep.size.count: 0% 11880004.00 = 11880008.00 > rep.size.mean: 0% 8738.48 < 8738.49 > rep.size.std_dev: 0% 11446.90 > 11446.42 > > * take1 vs. take2, branch: > rep.rate: 0% 2199.97 = 2199.97 > rep.rptm.count: 0% 11880004.00 = 11880002.00 > rep.rptm.mean: 0% 501.16 = 501.16 > rep.rptm.std_dev: 0% 485.77 = 485.77 > rep.size.count: 0% 11880004.00 = 11880002.00 > rep.size.mean: 0% 8739.91 > 8739.14 > rep.size.std_dev: 0% 11446.12 < 11448.64 > > > Note that the comparison script uses 1.0e-6 epsilon to declare equality > so some slightly different numbers are printed with an equal sign. > > I hope I did not screw up while pasting these numbers :-). > > > HTH, > > Alex. > -- Francesco
Re: Vector vs std::vector
On 01/30/2014 01:37 PM, Alex Rousskov wrote: > On 01/30/2014 12:14 PM, Kinkie wrote: >> Ok, here's some numbers (same testing methodology as before) >> >> Trunk: mean RPS (CPU time) >> 10029.11 (996.661872) >> 9786.60 (1021.695007) >> 10116.93 (988.395665) >> 9958.71 (1004.039956) >> >> stdvector: mean RPS (CPUtime) >> 9732.57 (1027.426563) >> 10388.38 (962.418333) >> 10332.17 (967.824790) > > OK, so we are probably just looking at noise: The variation within each > data set (1.4% or 3.6%) is about the same or more than the difference > between the set averages (1.8%). > > We will test this in a Polygraph lab, but I would not be surprised to > see no significant difference there as well. Polygraph results are in. No significant difference is visible. FWIW, here are a few basic response stats extracted from four tests ("rep" is reply and "rptm" is response time): * trunk vs. branch, take1: rep.rate: 0% 2199.97 = 2199.97 rep.rptm.count: 0% 11880004.00 = 11880004.00 rep.rptm.mean: 0% 501.12 < 501.16 rep.rptm.std_dev: 0% 485.77 = 485.77 rep.size.count: 0% 11880004.00 = 11880004.00 rep.size.mean: 0% 8738.48 < 8739.91 rep.size.std_dev: 0% 11446.90 > 11446.12 * trunk vs. branch, take2: rep.rate: 0% 2199.97 = 2199.97 rep.rptm.count: 0% 11880008.00 = 11880002.00 rep.rptm.mean: 0% 501.14 < 501.16 rep.rptm.std_dev: 0% 485.77 = 485.77 rep.size.count: 0% 11880008.00 = 11880002.00 rep.size.mean: 0% 8738.49 < 8739.14 rep.size.std_dev: 0% 11446.42 < 11448.64 The results appear to be pretty stable, but we have not run more/different tests to really verify that: * take1 vs. take2, trunk: rep.rate: 0% 2199.97 = 2199.97 rep.rptm.count: 0% 11880004.00 = 11880008.00 rep.rptm.mean: 0% 501.12 < 501.14 rep.rptm.std_dev: 0% 485.77 = 485.77 rep.size.count: 0% 11880004.00 = 11880008.00 rep.size.mean: 0% 8738.48 < 8738.49 rep.size.std_dev: 0% 11446.90 > 11446.42 * take1 vs. take2, branch: rep.rate: 0% 2199.97 = 2199.97 rep.rptm.count: 0% 11880004.00 = 11880002.00 rep.rptm.mean: 0% 501.16 = 501.16 rep.rptm.std_dev: 0% 485.77 = 485.77 rep.size.count: 0% 11880004.00 = 11880002.00 rep.size.mean: 0% 8739.91 > 8739.14 rep.size.std_dev: 0% 11446.12 < 11448.64 Note that the comparison script uses 1.0e-6 epsilon to declare equality so some slightly different numbers are printed with an equal sign. I hope I did not screw up while pasting these numbers :-). HTH, Alex.
Re: Vector vs std::vector
On 01/30/2014 12:14 PM, Kinkie wrote: > Ok, here's some numbers (same testing methodology as before) > > Trunk: mean RPS (CPU time) > 10029.11 (996.661872) > 9786.60 (1021.695007) > 10116.93 (988.395665) > 9958.71 (1004.039956) > > stdvector: mean RPS (CPUtime) > 9732.57 (1027.426563) > 10388.38 (962.418333) > 10332.17 (967.824790) OK, so we are probably just looking at noise: The variation within each data set (1.4% or 3.6%) is about the same or more than the difference between the set averages (1.8%). We will test this in a Polygraph lab, but I would not be surprised to see no significant difference there as well. > Some other insights I got by varying parameters: > By raw RPS, it seems that performance varies with the number of number > of parallel clients in this way (best to worst) > 100 > 10 > 500 > 1 Sure, any best-effort test will have such a dependency: Too few best-effort clients do not create enough concurrent requests to keep the proxy busy (it has time to sleep) while too many best-effort robots overwhelm the proxy with too many concurrent requests (the per-request overheads grow, decreasing the total throughput). Cheers, Alex.
Re: Vector vs std::vector
On Thu, Jan 30, 2014 at 12:39 AM, Alex Rousskov wrote: > On 01/29/2014 02:32 PM, Kinkie wrote: >> On Wed, Jan 29, 2014 at 7:52 PM, Alex Rousskov wrote: >>> On 01/29/2014 07:08 AM, Kinkie wrote: > >> (in a trunk checkout) >> bzr diff -r lp:/squid/squid/vector-to-stdvector >> >> The resulting diff is reversed, but that should be easy enough to manage. > > Thanks! Not sure whether reversing a diff in my head is easier than > downloading the branch :-(. Maybe using qdiff (from the qt-bzr plugin) might help by writing things in a graphical format? Apart from that, I don't know. >>> Can you give any specific examples of the code change that you would >>> attribute to a loss of performance when using std::vector? I did not >>> notice any obvious cases, but I did not look closely. > > >> I suspect that it's all those lines doing vector.items[accessor] and >> thus using C-style unchecked accesses. > > std::vector[] element access operator does not check anything either, as > we discussed earlier. IIRC, you [correctly] used a few std::vector::at() > calls as well, but I did not see any in a critical performance path. test conditions: - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM) - testing with ab. 1m requests @10 parallelism with keepalive, stressing the TCP_MEM_HIT code path on a cold cache - test on a multicore VM; default out-of-the-box configuration, ab running on same hardware over the loopback interface. - immediately after ab exits, collect counters (mgr:counters) numbers (for trunk / stdvector) - mean response time: 1.032/1.060ms - cpu_time: 102.878167/106.013725 > > >> If you can suggest a more thorough set of commands using the rough >> tools I have, I'll gladly run them. > > How about repeating the same pair of tests a few times, in random order? > Do you get consistent results? Ok, here's some numbers (same testing methodology as before) Trunk: mean RPS (CPU time) 10029.11 (996.661872) 9786.60 (1021.695007) 10116.93 (988.395665) 9958.71 (1004.039956) stdvector: mean RPS (CPUtime) 9732.57 (1027.426563) 10388.38 (962.418333) 10332.17 (967.824790) Some other insights I got by varying parameters: By raw RPS, it seems that performance varies with the number of number of parallel clients in this way (best to worst) 100 > 10 > 500 > 1 I also tried for fun to strip non strictly needed configuration (e.g. logging, access control) and use 3 workers letting the fourth core for ab: the best result was 39900.52 RPS. Useless, but impressive :) -- /kinkie
Re: Vector vs std::vector
On 01/29/2014 02:32 PM, Kinkie wrote: > On Wed, Jan 29, 2014 at 7:52 PM, Alex Rousskov wrote: >> On 01/29/2014 07:08 AM, Kinkie wrote: > (in a trunk checkout) > bzr diff -r lp:/squid/squid/vector-to-stdvector > > The resulting diff is reversed, but that should be easy enough to manage. Thanks! Not sure whether reversing a diff in my head is easier than downloading the branch :-(. >> Can you give any specific examples of the code change that you would >> attribute to a loss of performance when using std::vector? I did not >> notice any obvious cases, but I did not look closely. > I suspect that it's all those lines doing vector.items[accessor] and > thus using C-style unchecked accesses. std::vector[] element access operator does not check anything either, as we discussed earlier. IIRC, you [correctly] used a few std::vector::at() calls as well, but I did not see any in a critical performance path. >>> test conditions: >>> - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM) >>> - testing with ab. 1m requests @10 parallelism with keepalive, >>> stressing the TCP_MEM_HIT code path on a cold cache >>> - test on a multicore VM; default out-of-the-box configuration, ab >>> running on same hardware over the loopback interface. >>> - immediately after ab exits, collect counters (mgr:counters) >>> >>> numbers (for trunk / stdvector) >>> - mean response time: 1.032/1.060ms >>> - cpu_time: 102.878167/106.013725 > If you can suggest a more thorough set of commands using the rough > tools I have, I'll gladly run them. How about repeating the same pair of tests a few times, in random order? Do you get consistent results? Thank you, Alex.
Re: Vector vs std::vector
On Wed, Jan 29, 2014 at 7:52 PM, Alex Rousskov wrote: > On 01/29/2014 07:08 AM, Kinkie wrote: > >>Amos has asked me over IRC to investigate any performance >> differences between Vector and std::vector. To do that, I've >> implemented astd::vector-based implementation of Vector >> (feature-branch: lp:~squid/squid/vector-to-stdvector). > > Does Launchpad offer a way of generating a merge patch/diff on the site? > Currently, I have to checkout the branch and do "bzr send" to get the > right diff. Is there a better way? Sure: (in a trunk checkout) bzr diff -r lp:/squid/squid/vector-to-stdvector The resulting diff is reversed, but that should be easy enough to manage. >> I've then done the performance testing using ab. The results are in: a >> Vector-based squid is about 3% speedier than a std::vector based >> squid. > > >> This may also be due to some egregious layering by users of Vector. I >> have seen things which I would like to correct, also with the >> objective of having Vector implement the same exact API as std::vector >> to make future porting easier. > > Can you give any specific examples of the code change that you would > attribute to a loss of performance when using std::vector? I did not > notice any obvious cases, but I did not look closely. I suspect that it's all those lines doing vector.items[accessor] and thus using C-style unchecked accesses. >> test conditions: >> - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM) >> - testing with ab. 1m requests @10 parallelism with keepalive, >> stressing the TCP_MEM_HIT code path on a cold cache >> - test on a multicore VM; default out-of-the-box configuration, ab >> running on same hardware over the loopback interface. >> - immediately after ab exits, collect counters (mgr:counters) >> >> numbers (for trunk / stdvector) >> - mean response time: 1.032/1.060ms >> - req/sec: 9685/9430 >> - cpu_time: 102.878167/106.013725 > > > I hate to be the one asking this, but with so many red flags in the > testing methodology, are you reasonably sure that the 0.28 millisecond > difference does not include 0.29+ milliseconds of noise? At the very > least, do you consistently get the same set of numbers when repeating > the two tests in random order? I know that the testing methodology is very rough, and I am not offended by you pointing that out. In fact, that is one of the reasons why I tried being thorough in describing the method. I hope that you or maybe Pawel can obtain more meaningful measures without investing too much effort in it. > BTW, your req/sec line is redundant. In your best-effort test, the proxy > response time determines the offered request rate: > >9685/9430 = 1.027 (your "3% speedier") > 1.060/1.032 = 1.027 (your "3% speedier") > > 10*(1000/1.032) = 9690 (10-robot request rate from response time) > 10*(1000/1.060) = 9434 (10-robot request rate from response time) Yes. In fact, I consider that the interesting value is not either those, but the ~3seconds of extra CPU time needed. If you can suggest a more thorough set of commands using the rough tools I have, I'll gladly run them. As another option, I hope Pawel can take the time to run the standard polygraph on that branch. (can you, Pawel?) Thanks! -- /kinkie
Re: Vector vs std::vector
On 2014-01-30 07:52, Alex Rousskov wrote: On 01/29/2014 07:08 AM, Kinkie wrote: Amos has asked me over IRC to investigate any performance differences between Vector and std::vector. To do that, I've implemented astd::vector-based implementation of Vector (feature-branch: lp:~squid/squid/vector-to-stdvector). Does Launchpad offer a way of generating a merge patch/diff on the site? Currently, I have to checkout the branch and do "bzr send" to get the right diff. Is there a better way? I've then done the performance testing using ab. The results are in: a Vector-based squid is about 3% speedier than a std::vector based squid. This may also be due to some egregious layering by users of Vector. I have seen things which I would like to correct, also with the objective of having Vector implement the same exact API as std::vector to make future porting easier. Can you give any specific examples of the code change that you would attribute to a loss of performance when using std::vector? I did not notice any obvious cases, but I did not look closely. One of the things to check is memory management. Squid::Vector<> uses xmalloc/xfree. Does the performance even up anything when those are detatched? (we can implement custom allocator for std::vector's later if useful). Amos
Re: Vector vs std::vector
On 01/29/2014 07:08 AM, Kinkie wrote: >Amos has asked me over IRC to investigate any performance > differences between Vector and std::vector. To do that, I've > implemented astd::vector-based implementation of Vector > (feature-branch: lp:~squid/squid/vector-to-stdvector). Does Launchpad offer a way of generating a merge patch/diff on the site? Currently, I have to checkout the branch and do "bzr send" to get the right diff. Is there a better way? > I've then done the performance testing using ab. The results are in: a > Vector-based squid is about 3% speedier than a std::vector based > squid. > This may also be due to some egregious layering by users of Vector. I > have seen things which I would like to correct, also with the > objective of having Vector implement the same exact API as std::vector > to make future porting easier. Can you give any specific examples of the code change that you would attribute to a loss of performance when using std::vector? I did not notice any obvious cases, but I did not look closely. > test conditions: > - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM) > - testing with ab. 1m requests @10 parallelism with keepalive, > stressing the TCP_MEM_HIT code path on a cold cache > - test on a multicore VM; default out-of-the-box configuration, ab > running on same hardware over the loopback interface. > - immediately after ab exits, collect counters (mgr:counters) > > numbers (for trunk / stdvector) > - mean response time: 1.032/1.060ms > - req/sec: 9685/9430 > - cpu_time: 102.878167/106.013725 I hate to be the one asking this, but with so many red flags in the testing methodology, are you reasonably sure that the 0.28 millisecond difference does not include 0.29+ milliseconds of noise? At the very least, do you consistently get the same set of numbers when repeating the two tests in random order? BTW, your req/sec line is redundant. In your best-effort test, the proxy response time determines the offered request rate: 9685/9430 = 1.027 (your "3% speedier") 1.060/1.032 = 1.027 (your "3% speedier") 10*(1000/1.032) = 9690 (10-robot request rate from response time) 10*(1000/1.060) = 9434 (10-robot request rate from response time) Cheers, Alex.
Vector vs std::vector
Hi, Amos has asked me over IRC to investigate any performance differences between Vector and std::vector. To do that, I've implemented astd::vector-based implementation of Vector (feature-branch: lp:~squid/squid/vector-to-stdvector). I've then done the performance testing using ab. The results are in: a Vector-based squid is about 3% speedier than a std::vector based squid. This may also be due to some egregious layering by users of Vector. I have seen things which I would like to correct, also with the objective of having Vector implement the same exact API as std::vector to make future porting easier. test conditions: - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM) - testing with ab. 1m requests @10 parallelism with keepalive, stressing the TCP_MEM_HIT code path on a cold cache - test on a multicore VM; default out-of-the-box configuration, ab running on same hardware over the loopback interface. - immediately after ab exits, collect counters (mgr:counters) numbers (for trunk / stdvector) - mean response time: 1.032/1.060ms - req/sec: 9685/9430 - cpu_time: 102.878167/106.013725 -- /kinkie