Re: Vector vs std::vector

2014-02-07 Thread Amos Jeffries
On 7/02/2014 10:56 p.m., Eliezer Croitoru wrote:
> On 02/07/2014 08:20 AM, Kinkie wrote:
>> Thank you for running this test.
>> I guess that with these results, it can make sense to go forward with
>> the project of replacing Vector with std::vector, does everyone agree?
>>
>> Thanks!
> Seems pretty reasonable to me.
> I had only another question.
> Will using std::vector help any portability of squid?

As part of the STL it should.

Amos



Re: Vector vs std::vector

2014-02-07 Thread Eliezer Croitoru

On 02/07/2014 08:20 AM, Kinkie wrote:

Thank you for running this test.
I guess that with these results, it can make sense to go forward with
the project of replacing Vector with std::vector, does everyone agree?

Thanks!

Seems pretty reasonable to me.
I had only another question.
Will using std::vector help any portability of squid?

Thanks,
Eliezer


Re: Vector vs std::vector

2014-02-07 Thread Amos Jeffries
On 7/02/2014 7:20 p.m., Kinkie wrote:
> Thank you for running this test.
> I guess that with these results, it can make sense to go forward with
> the project of replacing Vector with std::vector, does everyone agree?

Yes.

Amos


Re: Vector vs std::vector

2014-02-06 Thread Kinkie
Thank you for running this test.
I guess that with these results, it can make sense to go forward with
the project of replacing Vector with std::vector, does everyone agree?

Thanks!

On Fri, Feb 7, 2014 at 6:18 AM, Alex Rousskov
 wrote:
> On 01/30/2014 01:37 PM, Alex Rousskov wrote:
>> On 01/30/2014 12:14 PM, Kinkie wrote:
>>> Ok, here's some numbers (same testing methodology as before)
>>>
>>> Trunk: mean RPS (CPU time)
>>> 10029.11 (996.661872)
>>> 9786.60 (1021.695007)
>>> 10116.93 (988.395665)
>>> 9958.71 (1004.039956)
>>>
>>> stdvector: mean RPS (CPUtime)
>>> 9732.57 (1027.426563)
>>> 10388.38 (962.418333)
>>> 10332.17 (967.824790)
>>
>> OK, so we are probably just looking at noise: The variation within each
>> data set (1.4% or 3.6%) is about the same or more than the difference
>> between the set averages (1.8%).
>>
>> We will test this in a Polygraph lab, but I would not be surprised to
>> see no significant difference there as well.
>
>
> Polygraph results are in. No significant difference is visible.
>
>
> FWIW, here are a few basic response stats extracted from four tests
> ("rep" is reply and "rptm" is response time):
>
> * trunk vs. branch, take1:
> rep.rate:   0% 2199.97 = 2199.97
> rep.rptm.count: 0% 11880004.00 = 11880004.00
> rep.rptm.mean:  0% 501.12 < 501.16
> rep.rptm.std_dev:   0% 485.77 = 485.77
> rep.size.count: 0% 11880004.00 = 11880004.00
> rep.size.mean:  0% 8738.48 < 8739.91
> rep.size.std_dev:   0% 11446.90 > 11446.12
>
> * trunk vs. branch, take2:
> rep.rate:   0% 2199.97 = 2199.97
> rep.rptm.count: 0% 11880008.00 = 11880002.00
> rep.rptm.mean:  0% 501.14 < 501.16
> rep.rptm.std_dev:   0% 485.77 = 485.77
> rep.size.count: 0% 11880008.00 = 11880002.00
> rep.size.mean:  0% 8738.49 < 8739.14
> rep.size.std_dev:   0% 11446.42 < 11448.64
>
>
> The results appear to be pretty stable, but we have not run
> more/different tests to really verify that:
>
> * take1 vs. take2, trunk:
> rep.rate:   0% 2199.97 = 2199.97
> rep.rptm.count: 0% 11880004.00 = 11880008.00
> rep.rptm.mean:  0% 501.12 < 501.14
> rep.rptm.std_dev:   0% 485.77 = 485.77
> rep.size.count: 0% 11880004.00 = 11880008.00
> rep.size.mean:  0% 8738.48 < 8738.49
> rep.size.std_dev:   0% 11446.90 > 11446.42
>
> * take1 vs. take2, branch:
> rep.rate:   0% 2199.97 = 2199.97
> rep.rptm.count: 0% 11880004.00 = 11880002.00
> rep.rptm.mean:  0% 501.16 = 501.16
> rep.rptm.std_dev:   0% 485.77 = 485.77
> rep.size.count: 0% 11880004.00 = 11880002.00
> rep.size.mean:  0% 8739.91 > 8739.14
> rep.size.std_dev:   0% 11446.12 < 11448.64
>
>
> Note that the comparison script uses 1.0e-6 epsilon to declare equality
> so some slightly different numbers are printed with an equal sign.
>
> I hope I did not screw up while pasting these numbers :-).
>
>
> HTH,
>
> Alex.
>



-- 
Francesco


Re: Vector vs std::vector

2014-02-06 Thread Alex Rousskov
On 01/30/2014 01:37 PM, Alex Rousskov wrote:
> On 01/30/2014 12:14 PM, Kinkie wrote:
>> Ok, here's some numbers (same testing methodology as before)
>>
>> Trunk: mean RPS (CPU time)
>> 10029.11 (996.661872)
>> 9786.60 (1021.695007)
>> 10116.93 (988.395665)
>> 9958.71 (1004.039956)
>>
>> stdvector: mean RPS (CPUtime)
>> 9732.57 (1027.426563)
>> 10388.38 (962.418333)
>> 10332.17 (967.824790)
> 
> OK, so we are probably just looking at noise: The variation within each
> data set (1.4% or 3.6%) is about the same or more than the difference
> between the set averages (1.8%).
> 
> We will test this in a Polygraph lab, but I would not be surprised to
> see no significant difference there as well.


Polygraph results are in. No significant difference is visible.


FWIW, here are a few basic response stats extracted from four tests
("rep" is reply and "rptm" is response time):

* trunk vs. branch, take1:
rep.rate:   0% 2199.97 = 2199.97
rep.rptm.count: 0% 11880004.00 = 11880004.00
rep.rptm.mean:  0% 501.12 < 501.16
rep.rptm.std_dev:   0% 485.77 = 485.77
rep.size.count: 0% 11880004.00 = 11880004.00
rep.size.mean:  0% 8738.48 < 8739.91
rep.size.std_dev:   0% 11446.90 > 11446.12

* trunk vs. branch, take2:
rep.rate:   0% 2199.97 = 2199.97
rep.rptm.count: 0% 11880008.00 = 11880002.00
rep.rptm.mean:  0% 501.14 < 501.16
rep.rptm.std_dev:   0% 485.77 = 485.77
rep.size.count: 0% 11880008.00 = 11880002.00
rep.size.mean:  0% 8738.49 < 8739.14
rep.size.std_dev:   0% 11446.42 < 11448.64


The results appear to be pretty stable, but we have not run
more/different tests to really verify that:

* take1 vs. take2, trunk:
rep.rate:   0% 2199.97 = 2199.97
rep.rptm.count: 0% 11880004.00 = 11880008.00
rep.rptm.mean:  0% 501.12 < 501.14
rep.rptm.std_dev:   0% 485.77 = 485.77
rep.size.count: 0% 11880004.00 = 11880008.00
rep.size.mean:  0% 8738.48 < 8738.49
rep.size.std_dev:   0% 11446.90 > 11446.42

* take1 vs. take2, branch:
rep.rate:   0% 2199.97 = 2199.97
rep.rptm.count: 0% 11880004.00 = 11880002.00
rep.rptm.mean:  0% 501.16 = 501.16
rep.rptm.std_dev:   0% 485.77 = 485.77
rep.size.count: 0% 11880004.00 = 11880002.00
rep.size.mean:  0% 8739.91 > 8739.14
rep.size.std_dev:   0% 11446.12 < 11448.64


Note that the comparison script uses 1.0e-6 epsilon to declare equality
so some slightly different numbers are printed with an equal sign.

I hope I did not screw up while pasting these numbers :-).


HTH,

Alex.



Re: Vector vs std::vector

2014-01-30 Thread Alex Rousskov
On 01/30/2014 12:14 PM, Kinkie wrote:
> Ok, here's some numbers (same testing methodology as before)
> 
> Trunk: mean RPS (CPU time)
> 10029.11 (996.661872)
> 9786.60 (1021.695007)
> 10116.93 (988.395665)
> 9958.71 (1004.039956)
> 
> stdvector: mean RPS (CPUtime)
> 9732.57 (1027.426563)
> 10388.38 (962.418333)
> 10332.17 (967.824790)

OK, so we are probably just looking at noise: The variation within each
data set (1.4% or 3.6%) is about the same or more than the difference
between the set averages (1.8%).

We will test this in a Polygraph lab, but I would not be surprised to
see no significant difference there as well.


> Some other insights I got by varying parameters:
> By raw RPS, it seems that performance varies with the number of number
> of parallel clients in this way (best to worst)
> 100 > 10 > 500 > 1

Sure, any best-effort test will have such a dependency: Too few
best-effort clients do not create enough concurrent requests to keep the
proxy busy (it has time to sleep) while too many best-effort robots
overwhelm the proxy with too many concurrent requests (the per-request
overheads grow, decreasing the total throughput).


Cheers,

Alex.



Re: Vector vs std::vector

2014-01-30 Thread Kinkie
On Thu, Jan 30, 2014 at 12:39 AM, Alex Rousskov
 wrote:
> On 01/29/2014 02:32 PM, Kinkie wrote:
>> On Wed, Jan 29, 2014 at 7:52 PM, Alex Rousskov wrote:
>>> On 01/29/2014 07:08 AM, Kinkie wrote:
>
>> (in a trunk checkout)
>> bzr diff -r lp:/squid/squid/vector-to-stdvector
>>
>> The resulting diff is reversed, but that should be easy enough to manage.
>
> Thanks! Not sure whether reversing a diff in my head is easier than
> downloading the branch :-(.

Maybe using qdiff (from the qt-bzr plugin) might help by writing
things in a graphical format?
Apart from that, I don't know.

>>> Can you give any specific examples of the code change that you would
>>> attribute to a loss of performance when using std::vector? I did not
>>> notice any obvious cases, but I did not look closely.
>
>
>> I suspect that it's all those lines doing vector.items[accessor] and
>> thus using C-style unchecked accesses.
>
> std::vector[] element access operator does not check anything either, as
> we discussed earlier. IIRC, you [correctly] used a few std::vector::at()
> calls as well, but I did not see any in a critical performance path.
 test conditions:
 - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM)
 - testing with ab. 1m requests @10 parallelism with keepalive,
 stressing the TCP_MEM_HIT code path on a cold cache
 - test on a multicore VM; default out-of-the-box configuration, ab
 running on same hardware over the loopback interface.
 - immediately after ab exits, collect counters (mgr:counters)

 numbers (for trunk / stdvector)
 - mean response time: 1.032/1.060ms
 - cpu_time: 102.878167/106.013725
>
>
>> If you can suggest a more thorough set of commands using the rough
>> tools I have, I'll gladly run them.
>
> How about repeating the same pair of tests a few times, in random order?
> Do you get consistent results?

Ok, here's some numbers (same testing methodology as before)

Trunk: mean RPS (CPU time)
10029.11 (996.661872)
9786.60 (1021.695007)
10116.93 (988.395665)
9958.71 (1004.039956)

stdvector: mean RPS (CPUtime)
9732.57 (1027.426563)
10388.38 (962.418333)
10332.17 (967.824790)

Some other insights I got by varying parameters:
By raw RPS, it seems that performance varies with the number of number
of parallel clients in this way (best to worst)
100 > 10 > 500 > 1

I also tried for fun to strip non strictly needed configuration (e.g.
logging, access control) and use 3 workers letting the fourth core for
ab: the best result was 39900.52 RPS. Useless, but impressive :)

-- 
/kinkie


Re: Vector vs std::vector

2014-01-29 Thread Alex Rousskov
On 01/29/2014 02:32 PM, Kinkie wrote:
> On Wed, Jan 29, 2014 at 7:52 PM, Alex Rousskov wrote:
>> On 01/29/2014 07:08 AM, Kinkie wrote:

> (in a trunk checkout)
> bzr diff -r lp:/squid/squid/vector-to-stdvector
> 
> The resulting diff is reversed, but that should be easy enough to manage.

Thanks! Not sure whether reversing a diff in my head is easier than
downloading the branch :-(.


>> Can you give any specific examples of the code change that you would
>> attribute to a loss of performance when using std::vector? I did not
>> notice any obvious cases, but I did not look closely.


> I suspect that it's all those lines doing vector.items[accessor] and
> thus using C-style unchecked accesses.

std::vector[] element access operator does not check anything either, as
we discussed earlier. IIRC, you [correctly] used a few std::vector::at()
calls as well, but I did not see any in a critical performance path.


>>> test conditions:
>>> - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM)
>>> - testing with ab. 1m requests @10 parallelism with keepalive,
>>> stressing the TCP_MEM_HIT code path on a cold cache
>>> - test on a multicore VM; default out-of-the-box configuration, ab
>>> running on same hardware over the loopback interface.
>>> - immediately after ab exits, collect counters (mgr:counters)
>>>
>>> numbers (for trunk / stdvector)
>>> - mean response time: 1.032/1.060ms
>>> - cpu_time: 102.878167/106.013725


> If you can suggest a more thorough set of commands using the rough
> tools I have, I'll gladly run them.

How about repeating the same pair of tests a few times, in random order?
Do you get consistent results?


Thank you,

Alex.



Re: Vector vs std::vector

2014-01-29 Thread Kinkie
On Wed, Jan 29, 2014 at 7:52 PM, Alex Rousskov
 wrote:
> On 01/29/2014 07:08 AM, Kinkie wrote:
>
>>Amos has asked me over IRC to investigate any performance
>> differences between Vector and std::vector. To do that, I've
>> implemented astd::vector-based implementation of Vector
>> (feature-branch: lp:~squid/squid/vector-to-stdvector).
>
> Does Launchpad offer a way of generating a merge patch/diff on the site?
> Currently, I have to checkout the branch and do "bzr send" to get the
> right diff. Is there a better way?

Sure:
(in a trunk checkout)
bzr diff -r lp:/squid/squid/vector-to-stdvector

The resulting diff is reversed, but that should be easy enough to manage.

>> I've then done the performance testing using ab. The results are in: a
>> Vector-based squid is about 3% speedier than a std::vector based
>> squid.
>
>
>> This may also be due to some egregious layering by users of Vector. I
>> have seen things which I would like to correct, also with the
>> objective of having Vector implement the same exact API as std::vector
>> to make future porting easier.
>
> Can you give any specific examples of the code change that you would
> attribute to a loss of performance when using std::vector? I did not
> notice any obvious cases, but I did not look closely.

I suspect that it's all those lines doing vector.items[accessor] and
thus using C-style unchecked accesses.

>> test conditions:
>> - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM)
>> - testing with ab. 1m requests @10 parallelism with keepalive,
>> stressing the TCP_MEM_HIT code path on a cold cache
>> - test on a multicore VM; default out-of-the-box configuration, ab
>> running on same hardware over the loopback interface.
>> - immediately after ab exits, collect counters (mgr:counters)
>>
>> numbers (for trunk / stdvector)
>> - mean response time: 1.032/1.060ms
>> - req/sec: 9685/9430
>> - cpu_time: 102.878167/106.013725
>
>
> I hate to be the one asking this, but with so many red flags in the
> testing methodology, are you reasonably sure that the 0.28 millisecond
> difference does not include 0.29+ milliseconds of noise? At the very
> least, do you consistently get the same set of numbers when repeating
> the two tests in random order?

I know that the testing methodology is very rough, and I am not
offended by you pointing that out. In fact, that is one of the reasons
why I tried being thorough in describing the method. I hope that you
or maybe Pawel can obtain more meaningful measures without investing
too much effort in it.

> BTW, your req/sec line is redundant. In your best-effort test, the proxy
> response time determines the offered request rate:
>
>9685/9430  = 1.027  (your "3% speedier")
>   1.060/1.032 = 1.027  (your "3% speedier")
>
>   10*(1000/1.032) = 9690 (10-robot request rate from response time)
>   10*(1000/1.060) = 9434 (10-robot request rate from response time)

Yes.
In fact, I consider that the interesting value is not either those,
but the ~3seconds of extra CPU time needed.

If you can suggest a more thorough set of commands using the rough
tools I have, I'll gladly run them. As another option, I hope Pawel
can take the time to run the standard polygraph on that branch. (can
you, Pawel?)

Thanks!

-- 
/kinkie


Re: Vector vs std::vector

2014-01-29 Thread Amos Jeffries

On 2014-01-30 07:52, Alex Rousskov wrote:

On 01/29/2014 07:08 AM, Kinkie wrote:


   Amos has asked me over IRC to investigate any performance
differences between Vector and std::vector. To do that, I've
implemented astd::vector-based implementation of Vector
(feature-branch: lp:~squid/squid/vector-to-stdvector).


Does Launchpad offer a way of generating a merge patch/diff on the 
site?

Currently, I have to checkout the branch and do "bzr send" to get the
right diff. Is there a better way?



I've then done the performance testing using ab. The results are in: a
Vector-based squid is about 3% speedier than a std::vector based
squid.




This may also be due to some egregious layering by users of Vector. I
have seen things which I would like to correct, also with the
objective of having Vector implement the same exact API as std::vector
to make future porting easier.


Can you give any specific examples of the code change that you would
attribute to a loss of performance when using std::vector? I did not
notice any obvious cases, but I did not look closely.


One of the things to check is memory management. Squid::Vector<> uses 
xmalloc/xfree. Does the performance even up anything when those are 
detatched? (we can implement custom allocator for std::vector's later if 
useful).


Amos


Re: Vector vs std::vector

2014-01-29 Thread Alex Rousskov
On 01/29/2014 07:08 AM, Kinkie wrote:

>Amos has asked me over IRC to investigate any performance
> differences between Vector and std::vector. To do that, I've
> implemented astd::vector-based implementation of Vector
> (feature-branch: lp:~squid/squid/vector-to-stdvector).

Does Launchpad offer a way of generating a merge patch/diff on the site?
Currently, I have to checkout the branch and do "bzr send" to get the
right diff. Is there a better way?


> I've then done the performance testing using ab. The results are in: a
> Vector-based squid is about 3% speedier than a std::vector based
> squid.


> This may also be due to some egregious layering by users of Vector. I
> have seen things which I would like to correct, also with the
> objective of having Vector implement the same exact API as std::vector
> to make future porting easier.

Can you give any specific examples of the code change that you would
attribute to a loss of performance when using std::vector? I did not
notice any obvious cases, but I did not look closely.


> test conditions:
> - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM)
> - testing with ab. 1m requests @10 parallelism with keepalive,
> stressing the TCP_MEM_HIT code path on a cold cache
> - test on a multicore VM; default out-of-the-box configuration, ab
> running on same hardware over the loopback interface.
> - immediately after ab exits, collect counters (mgr:counters)
> 
> numbers (for trunk / stdvector)
> - mean response time: 1.032/1.060ms
> - req/sec: 9685/9430
> - cpu_time: 102.878167/106.013725


I hate to be the one asking this, but with so many red flags in the
testing methodology, are you reasonably sure that the 0.28 millisecond
difference does not include 0.29+ milliseconds of noise? At the very
least, do you consistently get the same set of numbers when repeating
the two tests in random order?


BTW, your req/sec line is redundant. In your best-effort test, the proxy
response time determines the offered request rate:

   9685/9430  = 1.027  (your "3% speedier")
  1.060/1.032 = 1.027  (your "3% speedier")

  10*(1000/1.032) = 9690 (10-robot request rate from response time)
  10*(1000/1.060) = 9434 (10-robot request rate from response time)


Cheers,

Alex.



Vector vs std::vector

2014-01-29 Thread Kinkie
Hi,
   Amos has asked me over IRC to investigate any performance
differences between Vector and std::vector. To do that, I've
implemented astd::vector-based implementation of Vector
(feature-branch: lp:~squid/squid/vector-to-stdvector).

I've then done the performance testing using ab. The results are in: a
Vector-based squid is about 3% speedier than a std::vector based
squid.
This may also be due to some egregious layering by users of Vector. I
have seen things which I would like to correct, also with the
objective of having Vector implement the same exact API as std::vector
to make future porting easier.

test conditions:
- done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM)
- testing with ab. 1m requests @10 parallelism with keepalive,
stressing the TCP_MEM_HIT code path on a cold cache
- test on a multicore VM; default out-of-the-box configuration, ab
running on same hardware over the loopback interface.
- immediately after ab exits, collect counters (mgr:counters)

numbers (for trunk / stdvector)
- mean response time: 1.032/1.060ms
- req/sec: 9685/9430
- cpu_time: 102.878167/106.013725


-- 
/kinkie