Thanks for the detailed reply. In my results then using string values can be significantly slower than scalar values if the string is big enough, e.g. a C++ function returning an 8 byte string can be called 7.1M times per second, but change the string returned to a 4KB string and that 7.1M drops down to only 1.4M times per second. Whereas, the same 4KB string returned using String::NewExternal() manages 5.4M calls per second. So you might consider adding String::NewExternal() to ems if it is not used already :-)
* estimate 7142827 calls per second; object: unwrapped, input: 3 ints, output: 8 byte str * estimate 1428573 calls per second; object: unwrapped, input: 3 ints, output: 4KB byte str * estimate 5405379 calls per second: object: unwrapped, input: 3 ints, output: 4KB byte str external Also, in case you haven't already seen it, then whitedb [1] reminds me a bit of ems. Seems like somebody has also attempted a node port of whitedb too [2]. -- Simon [1] http://whitedb.org/ [2] https://github.com/brettlangdon/node-wgdb On Wed, Apr 23, 2014 at 3:37 PM, <[email protected]> wrote: > Simon, > > One difference is the second set (replicating the experiment in > https://kkaefer.com/node-cpp-modules/#benchmark-thread-pool) uses a > synthetic workload which the compiler can get rid of entirely, so the > benchmark isn't timing work, it's the same as executing a no-op. A second > difference is the timings include some combination of optimized and > unoptimized execution which isn't a number you can use to make performance > predictions based on number of iterations. > > The problem is the work function is invariant and the results are not > used, so the compiler is free to hoist the loop body out or just get rid of > it: > function() { return Math.floor(133.7 / Math.PI); } > > My test loop calls sin() which the compiler cannot analyze so it must > assume there are side effects, and it must call the function every > iteration. Additionally, the return values are summed so the compiler > can't optimize the loop to execute only the last iteration, they must all > be executed: > for(var i = 0; i < nOps; i++) { > sum += Math.sin(i) > } > > Crankshaft performs many additional optimizations (dead code elimination, > hoisting, native compilation, etc.), but v8 can't recompile the interface > to a native addon so all the copy-in/out scaffolding remains, the use of > Math.sin allows that overhead to be optimized. For practical purposes, > the overhead of copy in/out is the only difference between the two sin() > experiments. > > Regardless of how it's compiled, as the trip counts increase the > performance asymptotically approaches some maximum for the architecture. > If anything, native code gets in the way of Crankshaft optimizations which > is why the benefit is smaller for the native addon experiments than JS code > alone. > > The overhead for copy-in/out is significant, but unavoidable. For EMS, > the benefit of the overhead was getting access to all the cores, and that > performance multiplier easily overcomes the overhead. FWIW, the additional > overhead for handling strings is relatively small, so you shouldn't > consider use limited to scalar values. > > -J > > > On Wednesday, April 23, 2014 3:00:43 PM UTC-7, SimonHF wrote: > >> Thanks for the info but hmmm... I'm a bit confused now. In the first >> example sent then 'addon sin sum' hardly changes at all after recompilation >> and 'homes in' on 5.9M ops/sec. In the second example sent then there's a >> massive jump for both after recompilation. Why the difference in behaviour? >> Under which circumstances can addons benefit from the recompilation? >> Thanks, Simon >> >> >> On Wed, Apr 23, 2014 at 2:52 PM, <[email protected]> wrote: >> >>> I should point out this experiment came from when I was trying to >>> replicate these results: >>> https://kkaefer.com/node-cpp-modules/#benchmark-thread-pool >>> >>> In his case, the entire work function is optimized away by Crankshaft in >>> a very obvious way. The experiment compares his loop body to a no-op. >>> >>> -J >>> >>> >>> Work Function: Math.floor(133.7 / Math.PI) >>> >>> 1 workfun operations performed at 7092 ops/sec >>> 1 no-ops performed at 333333 ops/sec >>> 2 workfun operations performed at 142857 ops/sec >>> 2 no-ops performed at Infinity ops/sec >>> 4 workfun operations performed at Infinity ops/sec >>> 4 no-ops performed at Infinity ops/sec >>> 8 workfun operations performed at 8000000 ops/sec >>> 8 no-ops performed at Infinity ops/sec >>> 16 workfun operations performed at 16000000 ops/sec >>> 16 no-ops performed at Infinity ops/sec >>> 32 workfun operations performed at 16000000 ops/sec >>> 32 no-ops performed at Infinity ops/sec >>> 64 workfun operations performed at 9142857 ops/sec >>> 64 no-ops performed at Infinity ops/sec >>> 128 workfun operations performed at 405063 ops/sec >>> 128 no-ops performed at Infinity ops/sec >>> 256 workfun operations performed at 1855072 ops/sec >>> 256 no-ops performed at 256000000 ops/sec >>> 512 workfun operations performed at 64000000 ops/sec >>> 512 no-ops performed at 256000000 ops/sec >>> 1024 workfun operations performed at 68266666 ops/sec >>> 1024 no-ops performed at 256000000 ops/sec >>> 2048 workfun operations performed at 60235294 ops/sec >>> 2048 no-ops performed at 292571428 ops/sec >>> 4096 workfun operations performed at 52512820 ops/sec >>> 4096 no-ops performed at 273066666 ops/sec >>> 8192 workfun operations performed at 66064516 ops/sec >>> 8192 no-ops performed at 282482758 ops/sec >>> 16384 workfun operations performed at 59148014 ops/sec >>> 16384 no-ops performed at 163840000 ops/sec >>> >>> *Suddenly a re-compilation with additional optimization occurs: * >>> 32768 workfun operations performed at 910222222 ops/sec >>> 32768 no-ops performed at 910222222 ops/sec >>> 65536 workfun operations performed at 923042253 ops/sec >>> 65536 no-ops performed at 923042253 ops/sec >>> 131072 workfun operations performed at 929588652 ops/sec >>> 131072 no-ops performed at 929588652 ops/sec >>> 262144 workfun operations performed at 929588652 ops/sec >>> 262144 no-ops performed at 929588652 ops/sec >>> 524288 workfun operations performed at 931239786 ops/sec >>> 524288 no-ops performed at 931239786 ops/sec >>> >>> >>> >>>>> -- >>> -- >>> v8-users mailing list >>> [email protected] >>> >>> http://groups.google.com/group/v8-users >>> --- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "v8-users" group. >>> To unsubscribe from this topic, visit https://groups.google.com/d/ >>> topic/v8-users/oIouqgJGfn4/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > -- > v8-users mailing list > [email protected] > http://groups.google.com/group/v8-users > --- > You received this message because you are subscribed to a topic in the > Google Groups "v8-users" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/v8-users/oIouqgJGfn4/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- -- v8-users mailing list [email protected] http://groups.google.com/group/v8-users --- You received this message because you are subscribed to the Google Groups "v8-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
