A short followup:

I spent the last several days wondering if I'd exaggerated the various risks of committing early to a 1:1 model for tasks (modeling tasks strictly as threads). I don't like over-engineering any more than the next person and the possibility that threads are "sufficiently" fast and small on all platforms continued to nag at me. If we're injecting yield-points in all cases to make code interruptable (which we are) then I think *all* the arguments I had roughly boil down to "potential cost problems".

So I did a little research on costs. It might be worth doing more substantial research (measurements, gathering hard scalability numbers ourselves, making benchmarks) but I'm a *bit* more convinced that I wasn't just speaking nonsense the other day. Gathering more hard numbers outselves may make sense, but I'm less uncertain now. The following turned up in my search:

- Limits of windows: kernel stack for threads is minimum 12kb resident on win32, 24k on win64, and expands to 20 and 48k respectively if the thread touched GDI. Looks like you can push a process into the 10,000s of threads, but probably not the 100,000s

- Limits of OSX: weird kernel restrictions. Non-server 10.6 has arbitrary clamp at 2500 threads per system. Server 10.6 and both versions of 10.7 clamp at 12,500 per 8gb installed memory, with only 20% available to a given process (i.e. 2,500 threads per process, per 8gb). This still sounds like an arbitrary non-adjustable limit though, unless they happen to be dedicating 640k of kernel memory to each thread or something.

- Limits of iOS: iPhones clamp to 1024 threads.

- Limits of solaris 10: kernel stacks are 8k on x86 (later bumped to 12k), and 20k on x64. But they're in a 512mb (x86) or 24gb (x64) pinned segment; so on x86 this will clamp to around 45,000 threads.

- Linux has 8k or 4k kernel stacks. Much smaller, much better; but still a fair bit larger than seems *necessary* for our task segment granularity.

- Even then, Intel claims (at least in '07) that its TBB tasks are ~18x faster than a linux thread setup/teardown.

- Erlang processes are 300 bytes (?) whereas Haskell tasks get 1k stack segments by default.


Some light reading for the interested:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

http://blogs.technet.com/b/markrussinovich/archive/2009/07/08/3261309.aspx

http://support.apple.com/kb/HT3854

http://www.codeguru.com/cpp/misc/misc/threadsprocesses/article.php/c13533

http://gurkulindia.com/2011/05/05/solaris-reference-understanding-solaris-kernel-stack-overflows/

In support of threads! http://www.mailinator.com/tymaPaulMultithreaded.pdf is an interesting counterpoint, where threads-and-blocking-IO are shown to have pulled back ahead of the NIO interface (at least in java).

http://www.theserverside.com/discussions/thread.tss?thread_id=26700

-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to