Re: Vector vs std::vector

2014-02-07 Thread Amos Jeffries
On 7/02/2014 7:20 p.m., Kinkie wrote:
 Thank you for running this test.
 I guess that with these results, it can make sense to go forward with
 the project of replacing Vector with std::vector, does everyone agree?

Yes.

Amos


Re: Vector vs std::vector

2014-02-07 Thread Eliezer Croitoru

On 02/07/2014 08:20 AM, Kinkie wrote:

Thank you for running this test.
I guess that with these results, it can make sense to go forward with
the project of replacing Vector with std::vector, does everyone agree?

Thanks!

Seems pretty reasonable to me.
I had only another question.
Will using std::vector help any portability of squid?

Thanks,
Eliezer


Re: Vector vs std::vector

2014-02-07 Thread Amos Jeffries
On 7/02/2014 10:56 p.m., Eliezer Croitoru wrote:
 On 02/07/2014 08:20 AM, Kinkie wrote:
 Thank you for running this test.
 I guess that with these results, it can make sense to go forward with
 the project of replacing Vector with std::vector, does everyone agree?

 Thanks!
 Seems pretty reasonable to me.
 I had only another question.
 Will using std::vector help any portability of squid?

As part of the STL it should.

Amos



Re: Vector vs std::vector

2014-02-06 Thread Alex Rousskov
On 01/30/2014 01:37 PM, Alex Rousskov wrote:
 On 01/30/2014 12:14 PM, Kinkie wrote:
 Ok, here's some numbers (same testing methodology as before)

 Trunk: mean RPS (CPU time)
 10029.11 (996.661872)
 9786.60 (1021.695007)
 10116.93 (988.395665)
 9958.71 (1004.039956)

 stdvector: mean RPS (CPUtime)
 9732.57 (1027.426563)
 10388.38 (962.418333)
 10332.17 (967.824790)
 
 OK, so we are probably just looking at noise: The variation within each
 data set (1.4% or 3.6%) is about the same or more than the difference
 between the set averages (1.8%).
 
 We will test this in a Polygraph lab, but I would not be surprised to
 see no significant difference there as well.


Polygraph results are in. No significant difference is visible.


FWIW, here are a few basic response stats extracted from four tests
(rep is reply and rptm is response time):

* trunk vs. branch, take1:
rep.rate:   0% 2199.97 = 2199.97
rep.rptm.count: 0% 11880004.00 = 11880004.00
rep.rptm.mean:  0% 501.12  501.16
rep.rptm.std_dev:   0% 485.77 = 485.77
rep.size.count: 0% 11880004.00 = 11880004.00
rep.size.mean:  0% 8738.48  8739.91
rep.size.std_dev:   0% 11446.90  11446.12

* trunk vs. branch, take2:
rep.rate:   0% 2199.97 = 2199.97
rep.rptm.count: 0% 11880008.00 = 11880002.00
rep.rptm.mean:  0% 501.14  501.16
rep.rptm.std_dev:   0% 485.77 = 485.77
rep.size.count: 0% 11880008.00 = 11880002.00
rep.size.mean:  0% 8738.49  8739.14
rep.size.std_dev:   0% 11446.42  11448.64


The results appear to be pretty stable, but we have not run
more/different tests to really verify that:

* take1 vs. take2, trunk:
rep.rate:   0% 2199.97 = 2199.97
rep.rptm.count: 0% 11880004.00 = 11880008.00
rep.rptm.mean:  0% 501.12  501.14
rep.rptm.std_dev:   0% 485.77 = 485.77
rep.size.count: 0% 11880004.00 = 11880008.00
rep.size.mean:  0% 8738.48  8738.49
rep.size.std_dev:   0% 11446.90  11446.42

* take1 vs. take2, branch:
rep.rate:   0% 2199.97 = 2199.97
rep.rptm.count: 0% 11880004.00 = 11880002.00
rep.rptm.mean:  0% 501.16 = 501.16
rep.rptm.std_dev:   0% 485.77 = 485.77
rep.size.count: 0% 11880004.00 = 11880002.00
rep.size.mean:  0% 8739.91  8739.14
rep.size.std_dev:   0% 11446.12  11448.64


Note that the comparison script uses 1.0e-6 epsilon to declare equality
so some slightly different numbers are printed with an equal sign.

I hope I did not screw up while pasting these numbers :-).


HTH,

Alex.



Re: Vector vs std::vector

2014-02-06 Thread Kinkie
Thank you for running this test.
I guess that with these results, it can make sense to go forward with
the project of replacing Vector with std::vector, does everyone agree?

Thanks!

On Fri, Feb 7, 2014 at 6:18 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 01/30/2014 01:37 PM, Alex Rousskov wrote:
 On 01/30/2014 12:14 PM, Kinkie wrote:
 Ok, here's some numbers (same testing methodology as before)

 Trunk: mean RPS (CPU time)
 10029.11 (996.661872)
 9786.60 (1021.695007)
 10116.93 (988.395665)
 9958.71 (1004.039956)

 stdvector: mean RPS (CPUtime)
 9732.57 (1027.426563)
 10388.38 (962.418333)
 10332.17 (967.824790)

 OK, so we are probably just looking at noise: The variation within each
 data set (1.4% or 3.6%) is about the same or more than the difference
 between the set averages (1.8%).

 We will test this in a Polygraph lab, but I would not be surprised to
 see no significant difference there as well.


 Polygraph results are in. No significant difference is visible.


 FWIW, here are a few basic response stats extracted from four tests
 (rep is reply and rptm is response time):

 * trunk vs. branch, take1:
 rep.rate:   0% 2199.97 = 2199.97
 rep.rptm.count: 0% 11880004.00 = 11880004.00
 rep.rptm.mean:  0% 501.12  501.16
 rep.rptm.std_dev:   0% 485.77 = 485.77
 rep.size.count: 0% 11880004.00 = 11880004.00
 rep.size.mean:  0% 8738.48  8739.91
 rep.size.std_dev:   0% 11446.90  11446.12

 * trunk vs. branch, take2:
 rep.rate:   0% 2199.97 = 2199.97
 rep.rptm.count: 0% 11880008.00 = 11880002.00
 rep.rptm.mean:  0% 501.14  501.16
 rep.rptm.std_dev:   0% 485.77 = 485.77
 rep.size.count: 0% 11880008.00 = 11880002.00
 rep.size.mean:  0% 8738.49  8739.14
 rep.size.std_dev:   0% 11446.42  11448.64


 The results appear to be pretty stable, but we have not run
 more/different tests to really verify that:

 * take1 vs. take2, trunk:
 rep.rate:   0% 2199.97 = 2199.97
 rep.rptm.count: 0% 11880004.00 = 11880008.00
 rep.rptm.mean:  0% 501.12  501.14
 rep.rptm.std_dev:   0% 485.77 = 485.77
 rep.size.count: 0% 11880004.00 = 11880008.00
 rep.size.mean:  0% 8738.48  8738.49
 rep.size.std_dev:   0% 11446.90  11446.42

 * take1 vs. take2, branch:
 rep.rate:   0% 2199.97 = 2199.97
 rep.rptm.count: 0% 11880004.00 = 11880002.00
 rep.rptm.mean:  0% 501.16 = 501.16
 rep.rptm.std_dev:   0% 485.77 = 485.77
 rep.size.count: 0% 11880004.00 = 11880002.00
 rep.size.mean:  0% 8739.91  8739.14
 rep.size.std_dev:   0% 11446.12  11448.64


 Note that the comparison script uses 1.0e-6 epsilon to declare equality
 so some slightly different numbers are printed with an equal sign.

 I hope I did not screw up while pasting these numbers :-).


 HTH,

 Alex.




-- 
Francesco


Re: Vector vs std::vector

2014-01-30 Thread Kinkie
On Thu, Jan 30, 2014 at 12:39 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 01/29/2014 02:32 PM, Kinkie wrote:
 On Wed, Jan 29, 2014 at 7:52 PM, Alex Rousskov wrote:
 On 01/29/2014 07:08 AM, Kinkie wrote:

 (in a trunk checkout)
 bzr diff -r lp:/squid/squid/vector-to-stdvector

 The resulting diff is reversed, but that should be easy enough to manage.

 Thanks! Not sure whether reversing a diff in my head is easier than
 downloading the branch :-(.

Maybe using qdiff (from the qt-bzr plugin) might help by writing
things in a graphical format?
Apart from that, I don't know.

 Can you give any specific examples of the code change that you would
 attribute to a loss of performance when using std::vector? I did not
 notice any obvious cases, but I did not look closely.


 I suspect that it's all those lines doing vector.items[accessor] and
 thus using C-style unchecked accesses.

 std::vector[] element access operator does not check anything either, as
 we discussed earlier. IIRC, you [correctly] used a few std::vector::at()
 calls as well, but I did not see any in a critical performance path.
 test conditions:
 - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM)
 - testing with ab. 1m requests @10 parallelism with keepalive,
 stressing the TCP_MEM_HIT code path on a cold cache
 - test on a multicore VM; default out-of-the-box configuration, ab
 running on same hardware over the loopback interface.
 - immediately after ab exits, collect counters (mgr:counters)

 numbers (for trunk / stdvector)
 - mean response time: 1.032/1.060ms
 - cpu_time: 102.878167/106.013725


 If you can suggest a more thorough set of commands using the rough
 tools I have, I'll gladly run them.

 How about repeating the same pair of tests a few times, in random order?
 Do you get consistent results?

Ok, here's some numbers (same testing methodology as before)

Trunk: mean RPS (CPU time)
10029.11 (996.661872)
9786.60 (1021.695007)
10116.93 (988.395665)
9958.71 (1004.039956)

stdvector: mean RPS (CPUtime)
9732.57 (1027.426563)
10388.38 (962.418333)
10332.17 (967.824790)

Some other insights I got by varying parameters:
By raw RPS, it seems that performance varies with the number of number
of parallel clients in this way (best to worst)
100  10  500  1

I also tried for fun to strip non strictly needed configuration (e.g.
logging, access control) and use 3 workers letting the fourth core for
ab: the best result was 39900.52 RPS. Useless, but impressive :)

-- 
/kinkie


Re: Vector vs std::vector

2014-01-30 Thread Alex Rousskov
On 01/30/2014 12:14 PM, Kinkie wrote:
 Ok, here's some numbers (same testing methodology as before)
 
 Trunk: mean RPS (CPU time)
 10029.11 (996.661872)
 9786.60 (1021.695007)
 10116.93 (988.395665)
 9958.71 (1004.039956)
 
 stdvector: mean RPS (CPUtime)
 9732.57 (1027.426563)
 10388.38 (962.418333)
 10332.17 (967.824790)

OK, so we are probably just looking at noise: The variation within each
data set (1.4% or 3.6%) is about the same or more than the difference
between the set averages (1.8%).

We will test this in a Polygraph lab, but I would not be surprised to
see no significant difference there as well.


 Some other insights I got by varying parameters:
 By raw RPS, it seems that performance varies with the number of number
 of parallel clients in this way (best to worst)
 100  10  500  1

Sure, any best-effort test will have such a dependency: Too few
best-effort clients do not create enough concurrent requests to keep the
proxy busy (it has time to sleep) while too many best-effort robots
overwhelm the proxy with too many concurrent requests (the per-request
overheads grow, decreasing the total throughput).


Cheers,

Alex.



Re: Vector vs std::vector

2014-01-29 Thread Alex Rousskov
On 01/29/2014 07:08 AM, Kinkie wrote:

Amos has asked me over IRC to investigate any performance
 differences between Vector and std::vector. To do that, I've
 implemented astd::vector-based implementation of Vector
 (feature-branch: lp:~squid/squid/vector-to-stdvector).

Does Launchpad offer a way of generating a merge patch/diff on the site?
Currently, I have to checkout the branch and do bzr send to get the
right diff. Is there a better way?


 I've then done the performance testing using ab. The results are in: a
 Vector-based squid is about 3% speedier than a std::vector based
 squid.


 This may also be due to some egregious layering by users of Vector. I
 have seen things which I would like to correct, also with the
 objective of having Vector implement the same exact API as std::vector
 to make future porting easier.

Can you give any specific examples of the code change that you would
attribute to a loss of performance when using std::vector? I did not
notice any obvious cases, but I did not look closely.


 test conditions:
 - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM)
 - testing with ab. 1m requests @10 parallelism with keepalive,
 stressing the TCP_MEM_HIT code path on a cold cache
 - test on a multicore VM; default out-of-the-box configuration, ab
 running on same hardware over the loopback interface.
 - immediately after ab exits, collect counters (mgr:counters)
 
 numbers (for trunk / stdvector)
 - mean response time: 1.032/1.060ms
 - req/sec: 9685/9430
 - cpu_time: 102.878167/106.013725


I hate to be the one asking this, but with so many red flags in the
testing methodology, are you reasonably sure that the 0.28 millisecond
difference does not include 0.29+ milliseconds of noise? At the very
least, do you consistently get the same set of numbers when repeating
the two tests in random order?


BTW, your req/sec line is redundant. In your best-effort test, the proxy
response time determines the offered request rate:

   9685/9430  = 1.027  (your 3% speedier)
  1.060/1.032 = 1.027  (your 3% speedier)

  10*(1000/1.032) = 9690 (10-robot request rate from response time)
  10*(1000/1.060) = 9434 (10-robot request rate from response time)


Cheers,

Alex.



Re: Vector vs std::vector

2014-01-29 Thread Amos Jeffries

On 2014-01-30 07:52, Alex Rousskov wrote:

On 01/29/2014 07:08 AM, Kinkie wrote:


   Amos has asked me over IRC to investigate any performance
differences between Vector and std::vector. To do that, I've
implemented astd::vector-based implementation of Vector
(feature-branch: lp:~squid/squid/vector-to-stdvector).


Does Launchpad offer a way of generating a merge patch/diff on the 
site?

Currently, I have to checkout the branch and do bzr send to get the
right diff. Is there a better way?



I've then done the performance testing using ab. The results are in: a
Vector-based squid is about 3% speedier than a std::vector based
squid.




This may also be due to some egregious layering by users of Vector. I
have seen things which I would like to correct, also with the
objective of having Vector implement the same exact API as std::vector
to make future porting easier.


Can you give any specific examples of the code change that you would
attribute to a loss of performance when using std::vector? I did not
notice any obvious cases, but I did not look closely.


One of the things to check is memory management. Squid::Vector uses 
xmalloc/xfree. Does the performance even up anything when those are 
detatched? (we can implement custom allocator for std::vector's later if 
useful).


Amos


Re: Vector vs std::vector

2014-01-29 Thread Kinkie
On Wed, Jan 29, 2014 at 7:52 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 01/29/2014 07:08 AM, Kinkie wrote:

Amos has asked me over IRC to investigate any performance
 differences between Vector and std::vector. To do that, I've
 implemented astd::vector-based implementation of Vector
 (feature-branch: lp:~squid/squid/vector-to-stdvector).

 Does Launchpad offer a way of generating a merge patch/diff on the site?
 Currently, I have to checkout the branch and do bzr send to get the
 right diff. Is there a better way?

Sure:
(in a trunk checkout)
bzr diff -r lp:/squid/squid/vector-to-stdvector

The resulting diff is reversed, but that should be easy enough to manage.

 I've then done the performance testing using ab. The results are in: a
 Vector-based squid is about 3% speedier than a std::vector based
 squid.


 This may also be due to some egregious layering by users of Vector. I
 have seen things which I would like to correct, also with the
 objective of having Vector implement the same exact API as std::vector
 to make future porting easier.

 Can you give any specific examples of the code change that you would
 attribute to a loss of performance when using std::vector? I did not
 notice any obvious cases, but I did not look closely.

I suspect that it's all those lines doing vector.items[accessor] and
thus using C-style unchecked accesses.

 test conditions:
 - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM)
 - testing with ab. 1m requests @10 parallelism with keepalive,
 stressing the TCP_MEM_HIT code path on a cold cache
 - test on a multicore VM; default out-of-the-box configuration, ab
 running on same hardware over the loopback interface.
 - immediately after ab exits, collect counters (mgr:counters)

 numbers (for trunk / stdvector)
 - mean response time: 1.032/1.060ms
 - req/sec: 9685/9430
 - cpu_time: 102.878167/106.013725


 I hate to be the one asking this, but with so many red flags in the
 testing methodology, are you reasonably sure that the 0.28 millisecond
 difference does not include 0.29+ milliseconds of noise? At the very
 least, do you consistently get the same set of numbers when repeating
 the two tests in random order?

I know that the testing methodology is very rough, and I am not
offended by you pointing that out. In fact, that is one of the reasons
why I tried being thorough in describing the method. I hope that you
or maybe Pawel can obtain more meaningful measures without investing
too much effort in it.

 BTW, your req/sec line is redundant. In your best-effort test, the proxy
 response time determines the offered request rate:

9685/9430  = 1.027  (your 3% speedier)
   1.060/1.032 = 1.027  (your 3% speedier)

   10*(1000/1.032) = 9690 (10-robot request rate from response time)
   10*(1000/1.060) = 9434 (10-robot request rate from response time)

Yes.
In fact, I consider that the interesting value is not either those,
but the ~3seconds of extra CPU time needed.

If you can suggest a more thorough set of commands using the rough
tools I have, I'll gladly run them. As another option, I hope Pawel
can take the time to run the standard polygraph on that branch. (can
you, Pawel?)

Thanks!

-- 
/kinkie


Re: Vector vs std::vector

2014-01-29 Thread Alex Rousskov
On 01/29/2014 02:32 PM, Kinkie wrote:
 On Wed, Jan 29, 2014 at 7:52 PM, Alex Rousskov wrote:
 On 01/29/2014 07:08 AM, Kinkie wrote:

 (in a trunk checkout)
 bzr diff -r lp:/squid/squid/vector-to-stdvector
 
 The resulting diff is reversed, but that should be easy enough to manage.

Thanks! Not sure whether reversing a diff in my head is easier than
downloading the branch :-(.


 Can you give any specific examples of the code change that you would
 attribute to a loss of performance when using std::vector? I did not
 notice any obvious cases, but I did not look closely.


 I suspect that it's all those lines doing vector.items[accessor] and
 thus using C-style unchecked accesses.

std::vector[] element access operator does not check anything either, as
we discussed earlier. IIRC, you [correctly] used a few std::vector::at()
calls as well, but I did not see any in a critical performance path.


 test conditions:
 - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM)
 - testing with ab. 1m requests @10 parallelism with keepalive,
 stressing the TCP_MEM_HIT code path on a cold cache
 - test on a multicore VM; default out-of-the-box configuration, ab
 running on same hardware over the loopback interface.
 - immediately after ab exits, collect counters (mgr:counters)

 numbers (for trunk / stdvector)
 - mean response time: 1.032/1.060ms
 - cpu_time: 102.878167/106.013725


 If you can suggest a more thorough set of commands using the rough
 tools I have, I'll gladly run them.

How about repeating the same pair of tests a few times, in random order?
Do you get consistent results?


Thank you,

Alex.