On Thu, Jan 30, 2014 at 12:39 AM, Alex Rousskov <rouss...@measurement-factory.com> wrote: > On 01/29/2014 02:32 PM, Kinkie wrote: >> On Wed, Jan 29, 2014 at 7:52 PM, Alex Rousskov wrote: >>> On 01/29/2014 07:08 AM, Kinkie wrote: > >> (in a trunk checkout) >> bzr diff -r lp:/squid/squid/vector-to-stdvector >> >> The resulting diff is reversed, but that should be easy enough to manage. > > Thanks! Not sure whether reversing a diff in my head is easier than > downloading the branch :-(.
Maybe using qdiff (from the qt-bzr plugin) might help by writing things in a graphical format? Apart from that, I don't know. >>> Can you give any specific examples of the code change that you would >>> attribute to a loss of performance when using std::vector? I did not >>> notice any obvious cases, but I did not look closely. > > >> I suspect that it's all those lines doing vector.items[accessor] and >> thus using C-style unchecked accesses. > > std::vector[] element access operator does not check anything either, as > we discussed earlier. IIRC, you [correctly] used a few std::vector::at() > calls as well, but I did not see any in a critical performance path. >>>> test conditions: >>>> - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM) >>>> - testing with ab. 1m requests @10 parallelism with keepalive, >>>> stressing the TCP_MEM_HIT code path on a cold cache >>>> - test on a multicore VM; default out-of-the-box configuration, ab >>>> running on same hardware over the loopback interface. >>>> - immediately after ab exits, collect counters (mgr:counters) >>>> >>>> numbers (for trunk / stdvector) >>>> - mean response time: 1.032/1.060ms >>>> - cpu_time: 102.878167/106.013725 > > >> If you can suggest a more thorough set of commands using the rough >> tools I have, I'll gladly run them. > > How about repeating the same pair of tests a few times, in random order? > Do you get consistent results? Ok, here's some numbers (same testing methodology as before) Trunk: mean RPS (CPU time) 10029.11 (996.661872) 9786.60 (1021.695007) 10116.93 (988.395665) 9958.71 (1004.039956) stdvector: mean RPS (CPUtime) 9732.57 (1027.426563) 10388.38 (962.418333) 10332.17 (967.824790) Some other insights I got by varying parameters: By raw RPS, it seems that performance varies with the number of number of parallel clients in this way (best to worst) 100 > 10 > 500 > 1 I also tried for fun to strip non strictly needed configuration (e.g. logging, access control) and use 3 workers letting the fourth core for ab: the best result was 39900.52 RPS. Useless, but impressive :) -- /kinkie