Even in the Erlang model, the afore-mentioned issues of bus contention put a cap on the number of threads you can run in any given application assuming there's any amount of cross-thread synchronization. I wrote a blog post on this subject with respect to my experience in tuning RabbitMQ on NUMA architectures.
http://blog.agoragames.com/blog/2011/06/24/of-penguins-rabbits-and-buses/ It should be noted that Erlang processes are not the same as OS processes. They are more akin to green threads, scheduled on N number of legit OS threads which are in turn run on C number of cores. The end effect is the same though, as the data is effectively shared across NUMA nodes, which runs into basic physical constraints. I used to think the GIL was a major bottleneck, and though I'm not fond of it, my recent experience has highlighted that *any* application which uses shared memory will have significant bus contention when scaling across all cores. The best course of action is shared-nothing MPI style, but in 64bit land, that can mean significant wasted address space. <http://blog.agoragames.com/blog/2011/06/24/of-penguins-rabbits-and-buses/> -Aaron On Fri, Aug 12, 2011 at 2:59 PM, Sturla Molden <stu...@molden.no> wrote: > Den 12.08.2011 18:51, skrev Xavier Morel: > >> * Erlang uses "erlang processes", which are very cheap preempted >> *processes* (no shared memory). There have always been tens to thousands to >> millions of erlang processes per interpreter source contention within the >> interpreter going back to pre-SMP by setting the number of schedulers per >> node to 1 can yield increased overall performances) >> > > Technically, one can make threads behave like processes if they don't share > memory pages (though they will still share address space). Erlangs use of > 'process' instead of 'thread' does not mean an Erlang process has to be > implemented as an OS process. With one interpreter per thread, and a malloc > that does not let threads share memory pages (one heap per thread), Python > could do the same. > > On Windows, there is an API function called HeapAlloc, which lets us > allocate memory form a dedicated heap. The common use case is to prevent > threads from sharing memory, thus behaving like light-weight processes > (except address space is shared). On Unix, is is more common to use fork() > to create new processes instead, as processes are more light-weight than on > Windows. > > Sturla > >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com