Am 11.10.2016 um 13:33 schrieb Steve Piner:
On Tue, 04 Oct 2016 04:23:54 +1300, Joachim Durchholz via RT
<> wrote:

Am 03.10.2016 um 06:34 schrieb Zoffix Znet via RT:
Seems the issue has more to do with running an empty loop, rather than
performing a real computation.

This is a run on a 4-core box. Attempting to parallelize an empty
loop makes
the execution 1 second slower:


But running actual real-life code makes it almost 4 times faster, as
would be expected on a 4-core box:

(Disclaimer: I have no ideas of the internals, but I know a bit about

This might be four cores competing to get update access to the loop
Core-to-core synchronization of a memory cell with high-frequency
updates is an extremely expensive operation, with dozens or hundreds of
wait states to request exclusive cache lines access and to move the
current state of the variable from one CPU's cache to the next.

For what it's worth, I've tried this with 4 separate functions, one per
thread. That should - to my mind - hopefully avoid blocking on cache
line access.

I thought that the 4 processes would be trying to update the for ^2_000_000 loop counter. On closer look, I guess that was a gross misunderstanding; my apologies.

Four different functions should not make a difference, unless Perl is updating a usage counter on the functions themselves or something like that.

I don't know enough about Perl to decide what the single-threaded code is actually doing, so I'll leave that to people with more knowledge.

Sorry I can't help more, I'm just learning :-)


Reply via email to