On Thu, Mar 14, 2013 at 05:21:09AM -0700, Christian Heimes wrote: > Am 14.03.2013 03:05, schrieb Trent Nelson: > > Just posted the slides for those that didn't have the benefit of > > attending the language summit today: > > > > > > https://speakerdeck.com/trent/parallelizing-the-python-interpreter-an-alternate-approach-to-async > > Wow, neat! Your idea with Py_PXCTC is ingenious.
Yeah, it's funny how the viability and performance of the whole approach comes down to a quirky little trick for quickly detecting if we're in a parallel thread ;-) I was very chuffed when it all fell into place. (And I hope the quirkiness of it doesn't detract from the overall approach.) > As far as I remember the FS and GS segment registers are used by most > modern operating systems on x86 and x86_64 platforms nowadays to > distinguish threads. TLS is implemented with FS and GS registers. I > guess the __read[gf]sdword() intrinsics do exactly the same. Yup, in fact, if I hadn't come up with the __read[gf]sword() trick, my only other option would have been TLS (or the GetCurrentThreadId /pthread_self() approach in the presentation). TLS is fantastic, and it's definitely an intrinsic part of the solution (the "Y" part of "if we're a parallel thread, do Y"), but it definitely more costly than a simple FS/GS register read. > Reading > registers is super fast and should have a negligible effect on code. Yeah the actual instruction is practically free; the main thing you pay for is the extra branch. However, most of the code looks like this: if (Py_PXCTX) something_small_and_inlineable(); else Py_INCREF(op); /* also small and inlineable */ In the majority of the cases, all the code for both branches is going to be in the same cache line, so a mispredicted branch is only going to result in a pipeline stall, which is better than a cache miss. > ARM CPUs don't have segment registers because they have a simpler > addressing model. The register CP15 came up after a couple of Google > searches. Noted, thanks! > IMHO you should target x86, x86_64, ARMv6 and ARMv7. ARMv7 is going to > be more important than x86 in the future. We are going to see more ARM > based servers. Yeah that's my general sentiment too. I'm definitely curious to see if other ISAs offer similar facilities (Sparc, IA64, POWER etc), but the hierarchy will be x86/x64 > ARM > * for the foreseeable future. Porting the Py_PXCTX part is trivial compared to the work that is going to be required to get this stuff working on POSIX where none of the sublime Windows concurrency, synchronisation and async IO primitives exist. > Christian Trent. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com