On 18 August 2011 00:20, Igor Stasenko <[email protected]> wrote: > On 18 August 2011 00:02, Eliot Miranda <[email protected]> wrote: >> Hi Andrew, >> >> On Wed, Aug 17, 2011 at 1:50 PM, Andrew P. Black <[email protected]> wrote: >>> >>> I was looking at some old slides of Joe Armstrong on Concurrency-orinted >>> programming. He set the following challenge: >>> >>> Put N processes in a ring: >>> Send a simple message round the ring M times. >>> Increase N until the system crashes. >>> How long did it take to start the ring? >>> How long did it take to send a message? >>> When did it crash? >>> >>> He gave graphs comparing Erlang, Java and C#. I decided to compare Pharo. >>> Here is what I got; the creation times are PER PROCESS and the messaging >>> times are PER MESSAGE. >>> >>> first run >>> >>> procs creation/µs msg/µs >>> >>> 200 0 7.0 >>> 500 0 9.7 >>> 1000 2 15.4 >>> 2000 1 21.6 >>> 5000 13 31.5 >>> 10000 19.9 40.7 >>> 20000 46.5 55.4 >>> 50000 130.9 98.0 >>> >>> second run >>> >>> procs creation/µs msg/µs >>> >>> 200 0.0 7.0 >>> 500 0.0 10.12 >>> 1000 0.0 16.53 >>> 2000 1.5 24.26 >>> 5000 12.8 32.15 >>> 10000 28.1 39.497 >>> 20000 58.15 52.0295 >>> 50000 75.1 95.581 >>> >>> third run >>> >>> procs creation/µs msg/µs >>> >>> 200 0.0 7.0 >>> 500 0.0 8.6 >>> 1000 2.0 11.0 >>> 2000 1.0 16.55 >>> 5000 10.2 21.76 >>> 10000 12.0 49.57 >>> 20000 52.35 65.035 >>> 50000 91.76 117.1 >>> >>> Each process is a Pharo object (an instance of ERringElement) that >>> contains a counter, a reference to the next ERringElement, and an >>> "ErlangProcess" that is a Process that contains a reference to an instance >>> of SharedQueue (its "mailbox"). >>> >>> The good news is that up to 50k processes, it didn't crash. But it did >>> run with increasing sloth. >>> >>> I can imagine that the increasing process-creation time is due to beating >>> on the memory manager. But why the increasing message-sending time as the >>> number of processes increases? (Recall that exactly one process is runnable >>> at any given time). I'm wondering if the scheduler is somehow getting >>> overwhelmed by all of the non-runable processes that are blocked on >>> Semaphores in SharedQueue. Any ideas? >> >> If you're using Cog then one reason performance falls off with number of >> processes is context-to-stack mapping, see 08 Under Cover Contexts and the >> Big Frame-Up. Once there are more processes than stack pages every process >> switch faults out a(t least one) frame to a heap context and faults in a >> heap context to a frame. You can experiment by changing the number of stack >> pages (see vmAttributeAt:put:) but you can't have thousands of stack pages; >> it uses too much C stack memory. I think the default is 64 pages and >> Teleplace uses ~ 112. Each stack page can hold up to approximately 50 >> activations. >> But to be sure what the cause of the slowdown is one could use my >> VMProfiler. Has anyone ported this to Pharo yet? > > > Hmm, as to me it doesn't explains why messages/second degrading > linearly to number of processes. > Because after hitting certain limit (all stack pages are full), the > messages/time proportion should not degrade anymore. > Given your explanation, i would expect something like following: > 10000 - 49.57 > ... somewhere here we hit stack pages limit ... > 20000 - 65.035 > 30000 - 65.035 > 40000 - 65.035 > 50000 - 65.035 >
yes.. except one thing: since contexts are allocated on heap, it means more work for GC, then it explains that it degrading linearly. Andrew, can you play with GC parameters, like increase number of allocations between incremental GCs etc? >>> >>> (My code is on Squeaksource in project Erlang. But be warned that there >>> is a simulation of the Erlang "universal server" in there too. To run this >>> code, look for class ErlangRingTest.) >> >> >> >> -- >> best, >> Eliot >> > > > > -- > Best regards, > Igor Stasenko AKA sig. > -- Best regards, Igor Stasenko AKA sig.
