Hi Andrew,

On Wed, Aug 17, 2011 at 1:50 PM, Andrew P. Black <[email protected]> wrote:

> I was looking at some old slides of Joe Armstrong on Concurrency-orinted
> programming.  He set the following challenge:
>
> Put N processes in a ring:
> Send a simple message round the ring M times.
> Increase N until the system crashes.
> How long did it take to start the ring?
> How long did it take to send a message?
> When did it crash?
>
> He gave graphs comparing Erlang, Java and C#.  I decided to compare Pharo.
>  Here is what I got; the creation times are PER PROCESS and the messaging
> times are PER MESSAGE.
>
> first run
>
> procs    creation/µs    msg/µs
>
>  200            0       7.0
>  500            0       9.7
> 1000            2       15.4
> 2000            1       21.6
> 5000            13      31.5
> 10000           19.9    40.7
> 20000           46.5    55.4
> 50000           130.9   98.0
>
> second run
>
> procs    creation/µs    msg/µs
>
>  200            0.0     7.0
>  500            0.0     10.12
> 1000            0.0     16.53
> 2000            1.5     24.26
> 5000            12.8    32.15
> 10000           28.1    39.497
> 20000           58.15   52.0295
> 50000           75.1    95.581
>
> third run
>
> procs    creation/µs    msg/µs
>
>  200            0.0     7.0
>  500            0.0     8.6
> 1000            2.0     11.0
> 2000            1.0     16.55
> 5000            10.2    21.76
> 10000           12.0    49.57
> 20000           52.35   65.035
> 50000           91.76   117.1
>
> Each process is a Pharo object (an instance of ERringElement) that contains
> a counter, a reference to the next ERringElement, and an "ErlangProcess"
> that is a Process that contains a reference to an instance of SharedQueue
> (its "mailbox").
>
> The good news is that up to 50k processes, it didn't crash.  But it did run
> with increasing sloth.
>
> I can imagine that the increasing process-creation time is due to beating
> on the memory manager.  But why the increasing message-sending time as the
> number of processes increases?  (Recall that exactly one process is runnable
> at any given time).  I'm wondering if the scheduler is somehow getting
> overwhelmed by all of the non-runable processes that are blocked on
> Semaphores in SharedQueue.  Any ideas?
>

If you're using Cog then one reason performance falls off with number of
processes is context-to-stack mapping, see 08 Under Cover Contexts and the
Big 
Frame-Up<http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/>.
 Once there are more processes than stack pages every process switch faults
out a(t least one) frame to a heap context and faults in a heap context to a
frame.  You can experiment by changing the number of stack pages (see
vmAttributeAt:put:) but you can't have thousands of stack pages; it uses too
much C stack memory.  I think the default is 64 pages and Teleplace uses ~
112.  Each stack page can hold up to approximately 50 activations.

But to be sure what the cause of the slowdown is one could use my
VMProfiler.  Has anyone ported this to Pharo yet?


> (My code is on Squeaksource in project Erlang.  But be warned that there is
> a simulation of the Erlang "universal server" in there too.  To run this
> code, look for class ErlangRingTest.)
>



-- 
best,
Eliot

Reply via email to