Thanks for the references, I will take a look.

But about performance, a GC is perhaps faster for a period of time but when the collector kicks in we notice a CPU usage spiking for a bit followed by a performance slowdown on some Javascript animation, specially on a embedded box with 1 or 2 processors only.

With block_ptr<> the animation would go smoothly at the same constant speed.

But if you say that would involve a 2x slowdown on the UI thread regardless then I am surprised.

Anyway I am not sure if I can create a patch within a short period of time but if I happen to have an interesting Javascript benchmark then I will repost it to this mailing list.


Regards,
-Phil

On 03/06/2016 12:30 AM, Filip Pizlo wrote:
Phil,

I would expect our GC to be much faster than shared_ptr.  This shouldn’t
really be surprising; it’s the expected behavior according to the GC
literature.  High-level languages avoid the kind of eager reference
counting that shared_ptr does because it’s too expensive.  I would
expect a 2x-5x slow-down if we switched to anything that did reference
counting.

You should take a look at our GC, and maybe read some of the major
papers about GC.  It’s awesome stuff.  Here are a few papers that I
consider good reading:

Some great ideas about high-throughput GC:
http://www.cs.utexas.edu/users/mckinley/papers/mmtk-icse-2004.pdf
Some great ideas about low-latency GC:
http://www.filpizlo.com/papers/pizlo-pldi2010-schism.pdf
Some great ideas about GC roots:
http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-88-2.pdf
A good exploration of the limits of reference counting performance:
http://research.microsoft.com/pubs/70475/tr-2007-104.pdf

Anyway, you can’t ask us to change our code to use your memory manager.
  You can, however, try to get your memory manager to work in WebKit,
and post a patch if you get it working.  If that patch is an improvement
- in the sense that both you and the reviewers can apply the patch and
confirm that it is in fact a progression and doesn’t break anything -
then this would be the kind of thing we would accept.

Having looked at your code a bit, I think that you’ll encounter the
following problems:
- Your code uses std::mutex for synchronization.  std::mutex is quite
slow.  You should look at WTF::Lock, it’s much better (as in, orders of
magnitude better).
- Your code implements lifecycle management that is limited to reference
counting.  This is not adequate to support JS, DOM, and JIT semantics,
which are based on solving arbitrary data flow equations over the
reachability set.
- It’s not clear that your allocator results in fast path code that is
competitive against either of the JSC GC’s allocators.  Both of those
require ~5 instructions in the common case.  That instruction count
includes the oversize object safety checks.
- It’s not clear that your allocator is compatible with JITing and
standard JavaScript performance optimizations, which assume that values
can be passed around as bits without calling into the runtime.  A
reference counter needs to do some kinds of memory operations on
variable assignments.  This is likely to be about a 2x-5x slow-down.  I
would expect a 2x slow-down if you did non-thread-safe reference
counting, and 5x if you made it thread-safe.

-Filip


_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev

Reply via email to