Re: Parallelism and Concurrency was Re: Ideas for a Object-Belongs-to-Thread threading model (nntp: message 20 of 20 -last one!-)
On Fri, 14 May 2010 17:35:20 +0100, B. Estrade - estr...@gmail.com +nntp+browseruk+c4c81fb0fa.estrabd#gmail@spamgourmet.com wrote: The future is indeed multicore - or, rather, *many-core. What this means is that however the hardware jockeys have to strap them together on a single node, we'll be looking at the ability to invoke hundreds (or thousands) of threads on a single SMP machine. There are very few algorithms that actually benefit from using even low hundreds of threads, let alone thousands. The ability of Erlang (and go an IO and many others) to spawn 100,000 threads makes an impressive demo for the uninitiated, but finding practical uses of such abilities is very hard. One example cited is that of gaming software that runs each sprite ina separate thread. The claim is that this simplifies code because each sprite only has to respond to situations directly applicable to it, rather than some common sprite handler having to select which sprite to operate upon. But all it does is move the goal posts. You either have to select which sprite to send a message to; or send a message to the sprite handler and have it select the sprite to operate upon. A third technique is to send the message to all the sprites and have then decide if it is applicable to them. But it still requires a loop, and you then have the communications overhead *100,000 + the context witch costs * 100,000. The numbers do not add up. Then, inevitably, *someone will want to strap these together into a cluster, thus making message passing an attractive way to glue related threads together over a network. Getting back to the availability of many threads on a single SMP box, issues of data locality and affinity and thread binding will become of critical importance. Perhaps surprisingly, these are not the issues they once were. Whilst cache misses are horribly expensive, the multi-layered caching in modern CPUs combines with deep pipelines, branch prediction, register renaming and other features in ways that are beyond the ability of the human mind to reason about. For a whirlwind introduction to the complexities, see the short video here: http://www.infoq.com/presentations/click-crash-course-modern-hardware The only way to test the affects is to profile, and most of the research into the effects of cache locality tend to be done in isolation of real-world application mixes. very few machines, even servers of various types, run a single application these days. This is even truer as server virtualisation becomes ubiquitous. Mix in a soupçon of virtual server load-balancing and trying to code for cache locality becomes almost impossible. These issues are closely related to the operating system's capabilities and paging policies, but eventually (hopefully) current, provably beneficial strategies will be available on most platforms. Brett
Re: Ideas for a Object-Belongs-to-Thread threading model
Jason Switzer wrote: On Thu, May 13, 2010 at 3:59 AM, nigelsande...@btconnect.com wrote: And at the core of that, is the need for preemptive (kernel) threading and shared memory. These can (and should!) be hidden from the application programmer, through the use of language and/or library level abstractions, of which there are many promising [sic] candidates. But fundamentally, they all require that: 1) Preemptive scheduling be utilised. 2) The core interpreter be fully reentrant. 3) The core runtime libraries be fully reentrant. 4) That the language distinguishes between, and handles appropriately, - process-global entites: IO handles; environment; pwd etc. - runtime stack-based(*) (lexical) entities: locals (my vars in perl's terms). I agree with this more than anything Daniel proposed. I prefer Perl 6 providing a thin interface to a kernel thread (i.e. NPTL), a means of creating shared memory objects between processes and threads, maintain reentrancy as per Buk's summary, and leave the rest for non-core modules. This allows for different threading, event, and shared memory models to emerge. You could then have different models, such as one that emulates Java's abandoned green thread model, something similar to POE, or something that emulates Erlangs process management. If you keep Buk's bullet points and give me a minimalistic interface to threads/shared memory, then it would allow me to create whatever wacky threading/shared memory model I can imagine. I think that's better than doing something that sounds dangerously similar to Java's RMI. The support of threading should be completely optional. The threading support should not be active by default. See also http://www.ibm.com/developerworks/linux/library/l-posix1.html and fathom why Threads are fun reads to me like how a drug dealer lures you to at least try it once. Rather fork-join! (Do Perl_6 hyper-operators need pthreads?) -- Ruud
Re: Ideas for a Object-Belongs-to-Thread threading model
Ruud (): (Do Perl_6 hyper-operators need pthreads?) No. The ability to thread over list elements in a hyper operator is more of a possibility than a requirement, if I understand things correctly. // Carl
Re: Ideas for a Object-Belongs-to-Thread threading model (nntp: message 9 of 20)
On Fri, 14 May 2010 10:01:41 +0100, Ruud H.G. van Tol - rv...@isolution.nl +nntp+browseruk+014f2ed3f9.rvtol#isolution...@spamgourmet.com wrote: The support of threading should be completely optional. The threading support should not be active by default. I'd like to understand why you say that? Two reasons I can think of: 1: Performance. The perception that adding support for threading will impact the performance of non-threaded applications. If you don't use threads, the presence of the ability to use them if you need to will not affect you at all. The presence of Unicode support will have a far more measurable affect upon performance. And it will be unavoidable. 2: Complexity. The perception that the presence of threading support will complicate non-threaded apps. Again, the presence of Unicode support adds far more complexity to the mix that that for threading. But with either, if you choose not to use it, you shouldn't even be aware of its presence. Do you believe that Unicode support should be dropped? See also http://www.ibm.com/developerworks/linux/library/l-posix1.html and fathom why Threads are fun reads to me like how a drug dealer lures you to at least try it once. To me, that reads far more like some of the advocacy I've seen for Giving Blood. If your squeamish, get a friend to distract you, or listen to some good music whilst they put the needle in. Rather fork-join! For platforms where fork is native, it doesn't go away just because threads support is present. (Do Perl_6 hyper-operators need pthreads?) Buk.
Parallelism and Concurrency was Re: Ideas for a Object-Belongs-to-Thread threading model
After reading this thread and S17, I have lots of questions and some remarks. Parallelism and Concurrency could be considered to be two different things. The hyperoperators and junctions imply, but do not require, parallelism. It is left for the implementors to resolve whether a single or multiple processor(s) is/are used. Hence, parallelism could be considered to be something under the hood of perl6 and not directly specified. Given that: - concurrency is a topic of ongoing research - several models of concurrency have been tried, including two in perl5 - there are a variety of contexts (internet, clouds, multiple cores, etc) - different operating systems provide different resources then: How much needs to be specified and implemented in perl6 so that different concurrency models can be implemented in modules to take into account the above diversity? The less, or rather the more abstract, the specification in perl6, the less likely perl6 will 'age'. On 05/12/2010 09:12 PM, Dave Whipp wrote: Daniel Ruoso wrote: Hi, The threading model topic still needs lots of thinking, so I decided to try out some ideas. Every concurrency model has its advantages and drawbacks, I've been wondering about this ideas for a while now and I think I finally have a sketch. My primary concerns were: 1 - It can't require locking: Locking is just not scalable; 2 - It should perform better with lots of cores even if it suffers when you have only a few; 3 - It shouldn't require complicated memory management techniques that will make it difficult to bind native libraries (yes, STM is damn hard); 4 - It should suport implicit threading and implicit event-based programming (i.e. the feed operator); 5 - It must be easier to use then Perl 5 shared variables; 6 - It can't use a Global Interpreter Lock (that already said in 1, but, as this is a widely accepted idea in some other environments, I thought it would be better to make it explicit). The idea I started was that every object has an owner thread, and only that thread should talk to it, and I ended up with the following, comments are appreciated: comments? ideas? Before discussing the implementation, I think it's worth while stating what it is that you are attempting to abstract. For example, is the abstraction intended for a mapping down to a GPU (e.g. OpenCL) with a hierarchical address space, or is it intended for a multicore CPU with linear address space, or is it intended to abstract a LAN, with communication via sockets (reliable TCP? unreliable UDP?), or is it intended to abstract the internet/cloud? Are you thinking in terms of streaming computation where throughput is dominant, or interacting agents where latency is the critical metric? I'm not sure that it makes sense to talk of a single abstraction that supports all of those environments. However, there may be bunch of abstractions that can be combined in different ways. object belongs to thread can have two interpretations: one is that the object-thread binding lasts for the life of the object; the other is that a client that wishes to use an object must request ownership, and wait to be granted (in some scenarios, the granting of ownership would require the thread to migrate to the physical processor that owns the state). In many cases, we might find that specific object-state must live in specific places, but not all of the state that is encapsulated by an object lives in the same place. Often, an object will encapsulate state that is, itself, accessed via objects. If a model requires delegated access to owned state to be passed through an intermediate object then this may imply significant overhead. A better way to think about such scenarios may be that a client would request access to a subset of methods -- and thus we have role belongs to thread, not object belongs to thread. One could imagine that a FIFO object might have a put role and a get role that producer/consumer clients would (temporarily) own while using (note that granting of ownership may imply arbitration, and later forced-revocation if the resource-ownership is not released/extended before some timeout expires). It may be wrong to conflate role as a unit of reuse with role as an owned window onto a subset of an object's methods. Perl6 has a set of language primitives to support various aspects of concurrency. It is indeed interesting to consider how these map ot vastly difference computation platforms: OpenCl Vs OpenMP Vs Cloud. It deeps a little premature to be defining roles (e.g. RemoteInvocation) without defining the mapping of the core operators to these various models of computation. Dave.
Re: Parallelism and Concurrency was Re: Ideas for a Object-Belongs-to-Thread threading model
On Fri, May 14, 2010 at 03:48:10PM +0400, Richard Hainsworth wrote: : After reading this thread and S17, I have lots of questions and some : remarks. : : Parallelism and Concurrency could be considered to be two different things. : : The hyperoperators and junctions imply, but do not require, : parallelism. It is left for the implementors to resolve whether a : single or multiple processor(s) is/are used. Hence, parallelism : could be considered to be something under the hood of perl6 and not : directly specified. Certainly we've put in a number of abstract constructs that, if used by the programmer, make promises of parallelizability, even if the the implementation makes no promises about actual parallelization. : Given that: : - concurrency is a topic of ongoing research : - several models of concurrency have been tried, including two in perl5 : - there are a variety of contexts (internet, clouds, multiple cores, etc) : - different operating systems provide different resources : : then: : How much needs to be specified and implemented in perl6 so that : different concurrency models can be implemented in modules to take : into account the above diversity? : : The less, or rather the more abstract, the specification in perl6, : the less likely perl6 will 'age'. Yes, but... We need to understand the possibilities sufficiently well to provide a default implementation that most modules can code to, or we'll simply end up with a mess of incompatible modules. This goes doubly for the standard library; if it is written in a way that violates the constraints of a thread model, it will not be useable under that model. (See the Perl 5 ecosystem for what happens if you bolt on threading after the fact.) So what we're primarily trying to understand and predict is how the various threading models will constrain us in our normal coding patterns, and how we can make those constraints as invisible as possible without either violating them by accident or inducing unnecessary overhead. Requiring the programmer to make a few minor accomodations to achieve this may buy us a lot of non-grief in the future. But as you say, this is not a simple problem to solve; our response should not be to punt this to future generations, but to solve it as best as we can, and hope we can make some of the hard decisions right enough to allow future evolution. I am very glad to see several passionate but mostly-rational people thrashing this out here; the future is many-core, and none of us understand the implications of that well enough yet to write off the topic as a bikeshed, or to wish for the good old days of single core. Sure, it should be possible to write a Perl program in a single-threaded mindset, but while certain popular languages cling to the past and try to make single-threadedness a feature, Perl is more about embracing the future and about freeing the programmer from arbitrary restrictions. And as Perl 5 OO demonstrated, sometimes not picking a good default can be just about as damaging as picking the wrong one. Note also that the fundamental difficulty with doing threading in Perl 5 is not the exact model chosen, but rather that the fundamental underpinnings of locality were (for various historical reasons) poorly designed/evolved in the first place, so we ended up with far too much information having to be managed outside of its proper scope, for many different definitions of scope. This has been one of the secret sauces of the Perl 6 redesign, to hang every piece of information on the peg where it belongs, and not somewhere else. And that is why threading of *any* kind will work much better in Perl 6. Larry
Re: Ideas for a Object-Belongs-to-Thread threading model (nntp: message 9 of 20)
On Fri, 14 May 2010 15:05:44 +0100, B. Estrade estr...@gmail.com wrote: On Fri, May 14, 2010 at 12:27:18PM +0100, nigelsande...@btconnect.com wrote: On Fri, 14 May 2010 10:01:41 +0100, Ruud H.G. van Tol - rv...@isolution.nl +nntp+browseruk+014f2ed3f9.rvtol#isolution...@spamgourmet.com wrote: The support of threading should be completely optional. The threading support should not be active by default. I'd like to understand why you say that? Two reasons I can think of: 1: Performance. The perception that adding support for threading will impact the performance of non-threaded applications. I think that perhaps he's thinking of overhead associated with spawning and managing threads - even just one...so, if only 1 thread bound to a single core is desired, then I think this is a reasonable and natural thing to want. Maybe the core binding on an SMP box would be the more challenging issue to tackle. Then, again, this is the role of the OS and libnuma (on Linux, anyway)... Hm. Every process gets one thread by default. There is no overhead there. And spawning 1000 (do nothing but sleep) threads takes 0.171 seconds? Buk.
Re: Parallelism and Concurrency was Re: Ideas for a Object-Belongs-to-Thread threading model
On Fri, May 14, 2010 at 03:48:10PM +0400, Richard Hainsworth wrote: After reading this thread and S17, I have lots of questions and some remarks. Parallelism and Concurrency could be considered to be two different things. The hyperoperators and junctions imply, but do not require, parallelism. It is left for the implementors to resolve whether a single or multiple processor(s) is/are used. Hence, parallelism could be considered to be something under the hood of perl6 and not directly specified. Given that: - concurrency is a topic of ongoing research - several models of concurrency have been tried, including two in perl5 - there are a variety of contexts (internet, clouds, multiple cores, etc) - different operating systems provide different resources then: How much needs to be specified and implemented in perl6 so that different concurrency models can be implemented in modules to take into account the above diversity? The less, or rather the more abstract, the specification in perl6, the less likely perl6 will 'age'. I might be over simplifying this, but you're going to be building on essentially 2 different underlying approaches - one is message passing. The second is threading - I mean *real* threading. If these basic facilities are provided in the base, or even via a couple of robust and tightly integrated modules, then I believe any other scheme one wishes to implement (even a hybrid scheme), could be done so - thus solidifying Perl 6's ability to implement the parallel scheme de jour. Brett On 05/12/2010 09:12 PM, Dave Whipp wrote: Daniel Ruoso wrote: Hi, The threading model topic still needs lots of thinking, so I decided to try out some ideas. Every concurrency model has its advantages and drawbacks, I've been wondering about this ideas for a while now and I think I finally have a sketch. My primary concerns were: 1 - It can't require locking: Locking is just not scalable; 2 - It should perform better with lots of cores even if it suffers when you have only a few; 3 - It shouldn't require complicated memory management techniques that will make it difficult to bind native libraries (yes, STM is damn hard); 4 - It should suport implicit threading and implicit event-based programming (i.e. the feed operator); 5 - It must be easier to use then Perl 5 shared variables; 6 - It can't use a Global Interpreter Lock (that already said in 1, but, as this is a widely accepted idea in some other environments, I thought it would be better to make it explicit). The idea I started was that every object has an owner thread, and only that thread should talk to it, and I ended up with the following, comments are appreciated: comments? ideas? Before discussing the implementation, I think it's worth while stating what it is that you are attempting to abstract. For example, is the abstraction intended for a mapping down to a GPU (e.g. OpenCL) with a hierarchical address space, or is it intended for a multicore CPU with linear address space, or is it intended to abstract a LAN, with communication via sockets (reliable TCP? unreliable UDP?), or is it intended to abstract the internet/cloud? Are you thinking in terms of streaming computation where throughput is dominant, or interacting agents where latency is the critical metric? I'm not sure that it makes sense to talk of a single abstraction that supports all of those environments. However, there may be bunch of abstractions that can be combined in different ways. object belongs to thread can have two interpretations: one is that the object-thread binding lasts for the life of the object; the other is that a client that wishes to use an object must request ownership, and wait to be granted (in some scenarios, the granting of ownership would require the thread to migrate to the physical processor that owns the state). In many cases, we might find that specific object-state must live in specific places, but not all of the state that is encapsulated by an object lives in the same place. Often, an object will encapsulate state that is, itself, accessed via objects. If a model requires delegated access to owned state to be passed through an intermediate object then this may imply significant overhead. A better way to think about such scenarios may be that a client would request access to a subset of methods -- and thus we have role belongs to thread, not object belongs to thread. One could imagine that a FIFO object might have a put role and a get role that producer/consumer clients would (temporarily) own while using (note that granting of ownership may imply arbitration, and later forced-revocation if the resource-ownership is not released/extended before some timeout expires). It may be wrong to conflate role as a unit of reuse with role as an owned window onto a subset of an object's
Re: Parallelism and Concurrency was Re: Ideas for a Object-Belongs-to-Thread threading model
On Fri, May 14, 2010 at 09:50:21AM -0700, Larry Wall wrote: On Fri, May 14, 2010 at 03:48:10PM +0400, Richard Hainsworth wrote: ...snip But as you say, this is not a simple problem to solve; our response should not be to punt this to future generations, but to solve it as best as we can, and hope we can make some of the hard decisions right enough to allow future evolution. I am very glad to see several passionate but mostly-rational people thrashing this out here; the future is many-core, and none of us understand the implications of that well enough yet to write off the topic as a bikeshed, or to wish for the good old days of single core. Sure, it should be possible to write a Perl program in a single-threaded mindset, but while certain popular languages cling to the past and try to make single-threadedness a feature, Perl is more about embracing the future and about freeing the programmer from arbitrary restrictions. And as Perl 5 OO demonstrated, sometimes not picking a good default can be just about as damaging as picking the wrong one. The future is indeed multicore - or, rather, *many-core. What this means is that however the hardware jockeys have to strap them together on a single node, we'll be looking at the ability to invoke hundreds (or thousands) of threads on a single SMP machine. Then, inevitably, *someone will want to strap these together into a cluster, thus making message passing an attractive way to glue related threads together over a network. Getting back to the availability of many threads on a single SMP box, issues of data locality and affinity and thread binding will become of critical importance. These issues are closely related to the operating system's capabilities and paging policies, but eventually (hopefully) current, provably beneficial strategies will be available on most platforms. Brett Note also that the fundamental difficulty with doing threading in Perl 5 is not the exact model chosen, but rather that the fundamental underpinnings of locality were (for various historical reasons) poorly designed/evolved in the first place, so we ended up with far too much information having to be managed outside of its proper scope, for many different definitions of scope. This has been one of the secret sauces of the Perl 6 redesign, to hang every piece of information on the peg where it belongs, and not somewhere else. And that is why threading of *any* kind will work much better in Perl 6. Larry -- B. Estrade estr...@gmail.com
Re: Ideas for a Object-Belongs-to-Thread threading model (nntp: message 9 of 20)
On Fri, May 14, 2010 at 06:03:46PM +0100, nigelsande...@btconnect.com wrote: On Fri, 14 May 2010 15:05:44 +0100, B. Estrade estr...@gmail.com wrote: On Fri, May 14, 2010 at 12:27:18PM +0100, nigelsande...@btconnect.com wrote: On Fri, 14 May 2010 10:01:41 +0100, Ruud H.G. van Tol - rv...@isolution.nl +nntp+browseruk+014f2ed3f9.rvtol#isolution...@spamgourmet.com wrote: The support of threading should be completely optional. The threading support should not be active by default. I'd like to understand why you say that? Two reasons I can think of: 1: Performance. The perception that adding support for threading will impact the performance of non-threaded applications. I think that perhaps he's thinking of overhead associated with spawning and managing threads - even just one...so, if only 1 thread bound to a single core is desired, then I think this is a reasonable and natural thing to want. Maybe the core binding on an SMP box would be the more challenging issue to tackle. Then, again, this is the role of the OS and libnuma (on Linux, anyway)... Hm. Every process gets one thread by default. There is no overhead there. I am not sure I undestand the context under which one process gets 1 thread. And spawning 1000 (do nothing but sleep) threads takes 0.171 seconds? Assuming this is a latency cost per 1000 threads, this could substantially impact an application. The goal is always to minimize overhead, so this is where I am coming from. And there's overhead not just form spawning, but from anything that requires some number of threads be synchronized - barriers, critical sections, atomic updates to memory, etc. And depending on the consistency model one enforces, there could also be implicit calls for each thread to flush out its cache in order to ensure the most up to date version of a shared object is seen by all. So if one is running a single process only, I think is reasonable to be concerned that it not be subject to this overhead unnecessarily. An additional concern related to not just overhead, would be data locality issues - you don't want your *single process migrating do difference cores where it's data would have to follow. The OS needs to know it is a distinct process and not to be regarded also as a thread (I would think..) Brett Buk. -- B. Estrade estr...@gmail.com
Re: Ideas for a Object-Belongs-to-Thread threading model (nntp: message 9 of 20)
On Fri, May 14, 2010 at 12:27:18PM +0100, nigelsande...@btconnect.com wrote: On Fri, 14 May 2010 10:01:41 +0100, Ruud H.G. van Tol - rv...@isolution.nl +nntp+browseruk+014f2ed3f9.rvtol#isolution...@spamgourmet.com wrote: The support of threading should be completely optional. The threading support should not be active by default. I'd like to understand why you say that? Two reasons I can think of: 1: Performance. The perception that adding support for threading will impact the performance of non-threaded applications. I think that perhaps he's thinking of overhead associated with spawning and managing threads - even just one...so, if only 1 thread bound to a single core is desired, then I think this is a reasonable and natural thing to want. Maybe the core binding on an SMP box would be the more challenging issue to tackle. Then, again, this is the role of the OS and libnuma (on Linux, anyway)... If you don't use threads, the presence of the ability to use them if you need to will not affect you at all. The presence of Unicode support will have a far more measurable affect upon performance. And it will be unavoidable. 2: Complexity. The perception that the presence of threading support will complicate non-threaded apps. Again, the presence of Unicode support adds far more complexity to the mix that that for threading. But with either, if you choose not to use it, you shouldn't even be aware of its presence. Do you believe that Unicode support should be dropped? See also http://www.ibm.com/developerworks/linux/library/l-posix1.html and fathom why Threads are fun reads to me like how a drug dealer lures you to at least try it once. To me, that reads far more like some of the advocacy I've seen for Giving Blood. If your squeamish, get a friend to distract you, or listen to some good music whilst they put the needle in. Rather fork-join! For platforms where fork is native, it doesn't go away just because threads support is present. (Do Perl_6 hyper-operators need pthreads?) Buk. -- B. Estrade estr...@gmail.com
Re: Ideas for a Object-Belongs-to-Thread threading model (nntp: message 5 of 20)
This should be a reply to Daniel Ruoso's post above, but I cannot persuade my nntp reader to reply to a post made before I subscribed here. Sorry On Wed, 12 May 2010 14:16:35 +0100, Daniel Ruoso dan...@ruoso.com wrote: I have 3 main problems with your thinking. 1: You are conflating two fundamentally different views of the problem. a) The Perl 6 programmers semantic view. b) The P6 compiler (writers) implementation view. These two views need to be kept cleanly separated in order that reference implementation does not define the *only possible* implementation. But, it is important that when designing the semantic view, that it is done with a considerable regard for what /can/ be implemented. 2: You appear to be taking your references at face value. For example, you've cited Erlang as one of your reference points. And the Erlang docs describe the units of concurrency as processes; with the parallelism is provided by Erlang and not the host operating system. But, if I run one of the Erlang examples, http://www.erlang.org/examples/small_examples/tetris.erl it uses two procesess: one with 13 OS threads, and the other with 5 OS threads; even if I only run tetris:start(1). Whilst until recently, Erlang did not use OS threads, relying instead upon and internal, user-space scheduler--green threading, though you may find some denials of that by Erlangers because of the unfavorable comparison with Java green threading in Java version 1 thru 4. But recent versions have implemented multiple OS trheads each running a coroutine scheduler. The had to do this in order to achieve SMP scaling. Here is a little salient information: The Erlang VM without SMP support has 1 scheduler which runs in the main process thread. The scheduler picks runnable Erlang processes and IO-jobs from the run-queue and there is no need to lock data structures since there is only one thread accessing them. The Erlang VM with SMP support can have 1 to many schedulers which are run in 1 thread each. The schedulers pick runnable Erlang processes and IO-jobs from one common run-queue. In the SMP VM all shared data structures are protected with locks, the run-queue is one example of a data structure protected with locks. Lock-free at the semantic level is a nice-to-have. But, whenever you have kernel threads talking to each other through shared memory--and you have to have if you are going to achieve SMP scalability--then there will be some form of locking required. All talk of message passing protocols is simply disguising the realities of the implementation. That is not a bad thing from the applications programmer's point of view--nor even the language designer's POV--but it still leaves the problem to be dealt with by the language system implementers. Whilst lock-free queues are possible--there are implementations of these available for Java 5 (which, of necessity, and to great effect, has now moved away from green threads and gone the Kernel threading route.)--they are very, very hardware dependant. Relying as they do upon CAS, which not all processor architectures support and not all languages give adequate access to. For a very interesting, if rather long, insight into some of this, see Cliff Click's video about Fast Wait-free Hashtables: http://www.youtube.com/watch?v=WYXgtXWejRMfeature=player_embedded#! One thing to note if you watch it all the way through is that your claim (in an earlier revision?) that shared memory doesn't scale is incorrect in the light of this video where 786 SMP processors are using a hash for caching at very high speed. 3: By conflating the POVs of the sematic design and implementation, you are in danger of reinventing several bad wheels, badly. a) A green threading scheduler: The Java guys spent a long time trying to get their's right before abandoning it. The Erlang guys have taken a long time tuning their's, but due to Moore's Law running out, they have had to bow to the inevitability of kernel threading. And are now having to go through the pain of understanding and addressing how multiple event driven and cooperative schedulers running under the control of (various) preemptive scheduler(s) interact. Even Haskell has to use kernel threading: In GHC, threads created by forkIO are lightweight threads, and are managed entirely by the GHC runtime. Typically Haskell threads are an order of magnitude or two more efficient (in terms of both time and space) than operating system threads. The downside of having lightweight threads is that only one can run at a time, so if one thread blocks in a foreign call, for example, the other threads cannot continue. The GHC runtime works around this by making use of full OS threads where necessary. When the program is built with the -threaded option (to link against the
Re: Ideas for a Object-Belongs-to-Thread threading model (nntp: message 5 of 20)
On Thu, May 13, 2010 at 3:59 AM, nigelsande...@btconnect.com wrote: This should be a reply to Daniel Ruoso's post above, but I cannot persuade my nntp reader to reply to a post made before I subscribed here. Sorry And at the core of that, is the need for preemptive (kernel) threading and shared memory. These can (and should!) be hidden from the application programmer, through the use of language and/or library level abstractions, of which there are many promising [sic] candidates. But fundamentally, they all require that: 1) Preemptive scheduling be utilised. 2) The core interpreter be fully reentrant. 3) The core runtime libraries be fully reentrant. 4) That the language distinguishes between, and handles appropriately, - process-global entites: IO handles; environment; pwd etc. - runtime stack-based(*) (lexical) entities: locals (my vars in perl's terms). I agree with this more than anything Daniel proposed. I prefer Perl 6 providing a thin interface to a kernel thread (i.e. NPTL), a means of creating shared memory objects between processes and threads, maintain reentrancy as per Buk's summary, and leave the rest for non-core modules. This allows for different threading, event, and shared memory models to emerge. You could then have different models, such as one that emulates Java's abandoned green thread model, something similar to POE, or something that emulates Erlangs process management. If you keep Buk's bullet points and give me a minimalistic interface to threads/shared memory, then it would allow me to create whatever wacky threading/shared memory model I can imagine. I think that's better than doing something that sounds dangerously similar to Java's RMI. -Jason s1n Switzer
Re: Ideas for a Object-Belongs-to-Thread threading model
BrowserUK wrote: -there are the interpreter processes. Inventing (overloaded) terminology will just create confusion. Very unhelpful in a context that suffers more than its fair share already. Okay, I should probably call them Actors to use a more precise terminology - since this is highly inspired in two Actor Model languages. - The interpreter implements a scheduler, just like POE. POE does *NOT* implement a scheduler. Okay, mentioning POE was just a side comment, it doesn't interfere directly in the model. -3 - The scheduler, ulike POE, should be able to schedule in several OS threads, such that any OS thread may raise any waiting process. And how are you going to implement that? That was the part I took directly from the inspiring languages, just take a look in how Erlang and the IO language schedule their actors. The only way would be for there to be multiple concurrent (kernel threaded) instances of the state-machine running sharing (as in shared state concurrency) their controlling state. But maybe each actor is tied to a particular OS thread, which would simplify a bit... Also, it is possible to suspend an actor in order to implement a time-sharing scheduler as well... daniel
Second Version of Ideas for a Object-Belongs-to-Thread threading model
Em Ter, 2010-05-11 às 21:45 -0300, Daniel Ruoso escreveu: The threading model topic still needs lots of thinking, so I decided to try out some ideas. After BrowserUK feedback and some more reading (including http://www.c2.com/cgi/wiki?MessagePassingConcurrency ) and links from there on, I decided to rewrite that ideas in a bit different model, but still with the same spirit. 0 - The idea is inspired by Erlang and the IO Language. Additionally to OS threads there are the Coroutine Groups. 1 - No memory is shared between Coroutine Groups, so no locking is necessary. 2 - A value and a coroutine always belong to a Coroutine Group, which should be assigned to a single OS thread, thus naturally implementing synchronized access to data. 3 - The interpreter implements a scheduler, which will pick one of the waiting coroutines that belong to the groups assined to the current thread. The scheduler may also suspend a coroutine in order to implement time-sharing. The scheduler should support blocking states in the coroutines. 4 - When comparing to Perl 5, each coroutine is an ithread, but memory is shared between all the coroutines in the same group, given that they will always run in the same OS thread. 5 - When a coroutine group is created, it is assigned to one OS thread, the interpreter might decide to create new OS threads as necessary, it might optionally implement one OS thread per coroutine group. 6 - In order to implement inter-coroutine-group communication, there are: 6.1 - A MessageQueue works just like an Unix Pipe, it looks like a slurpy array. It has a configurable buffer size and coroutines might block when trying to read and/or write to it. 6.2 - A RemoteInvocation is an object that has a identifier, a capture (which might, optionally, point to a MessageQueue as input) and another MessageQueue to be used as output. New coroutines are created in the target group to execute that invocation. 6.3 - An InvocationQueue is a special type of MessageQueue that accepts only RemoteInvocation objects. 6.4 - A RemoteValue is an object that proxies requests to another coroutine group through a RemoteInvocation. 7 - The coroutine group boundary is drawn by language constructs such as async, the feed operator, junctions, hyper operators. 8 - A value might have its ownership transferred to another group if it can be detected that this value is in use only for that invocation or return value, in order to reduce the amount of RemoteInvocations. 9 - A value might do a special ThreadSafe role if it is thread-safe (such as implementing bindings to thread-safe native libraries) In which case it is sent as-is to a different group. 10 - A value might do a special ThreadCloneable role if it should be cloned instead of being proxied through a RemoteValue when sent in a RemoteInvocation. 11 - The MessageQueue notifies the scheduler whenever new data is available in that queue so the target coroutine might be raised. 12 - Exception handling gets a bit hairy, since exceptions might only be raised at the calling scope when the value is consumed. 13 - List assignment and Sink context might result in synchronized behavior. comments are appreciated... daniel
Third and simplified version of Ideas for a Object-Belongs-to-Thread threading model
Em Ter, 2010-05-11 às 21:45 -0300, Daniel Ruoso escreveu: he threading model topic still needs lots of thinking, so I decided to try out some ideas. After I sent the second version, I just realized I could make it simpler by just assuming one OS thread per Coroutine Group... so here goes the new version. 0 - No memory is shared between threads, so no locking is necessary. 1 - A value and a coroutine always belong to a thread, thus naturally implementing synchronized access to data. 2 - Coroutines are, conceptually, the equivalent to green threads, running in the same OS thread. Coroutines waiting for a return value are blocked. 3 - The interpreter implements a scheduler, which will pick one of the waiting coroutines, it may also suspend a coroutine in order to implement time-sharing. 4 - In order to implement inter-thread communication, there are: 4.1 - A MessageQueue works just like an Unix Pipe, it looks like a slurpy array. It has a configurable buffer size and coroutines might block when trying to read and/or write to it. 4.2 - A RemoteInvocation is an object that has a identifier, a capture (which might, optionally, point to a MessageQueue as input) and another MessageQueue to be used as output. New coroutines are created in the target thread to execute that invocation. 4.3 - An InvocationQueue is a special type of MessageQueue that accepts only RemoteInvocation objects. 4.4 - A RemoteValue is an object that proxies requests to another coroutine group through a RemoteInvocation. 5 - The thread group boundary is drawn by language constructs such as async, the feed operator, junctions, hyper operators. 6 - A value might have its ownership transferred to another thread if it can be detected that this value is in use only for that invocation or return value, in order to reduce the amount of RemoteInvocations. 7 - A value might do a special ThreadSafe role if it is thread-safe (such as implementing bindings to thread-safe native libraries) In which case it is sent as-is to a different thread. 8 - A value might do a special ThreadCloneable role if it should be cloned instead of being proxied through a RemoteValue when sent in a RemoteInvocation. 9 - Exception handling gets a bit hairy, since exceptions might only be raised at the calling scope when the value is consumed. 10 - List assignment and Sink context might result in synchronized behavior. daniel
Re: Second Version of Ideas for a Object-Belongs-to-Thread threading model
I might have some more to say about any threading model later, but for now I wanted to make everyone aware of a scripting language that is truly multi-threaded - you may want to check it out. Some of it's syntax is Perlish, whereas some is not - the point is that it is supposed to scale on SMP machines. It's called Qore - http://www.qore.org. I maintain the FreeBSD port for it and have played with it quite a bit. It's a nice interface - though traditional. And it does seem to scale pretty well. If the debate is shared memory threads vs message passing (ala Erlang), then I would suggest that they are not mutually exclusive (pun intended) and could actually provide some complementary benefits if deployed on a large scale distributed memory machine composed of SMP nodes. In otherwords, a mixed mode style of distributed programming where the SMP threads run on each node and the MP is used to connect these disjoint processes over the network. I know that the SMP threads is best implemented with a low level runtime (maybe even using a Qore backend?), but I have no idea how one might facilitate Erlang style remote processes - still, I believe offering both styles would be totally awesome :^). Cheers, Brett On Wed, May 12, 2010 at 09:50:19AM -0300, Daniel Ruoso wrote: Em Ter, 2010-05-11 ??s 21:45 -0300, Daniel Ruoso escreveu: The threading model topic still needs lots of thinking, so I decided to try out some ideas. After BrowserUK feedback and some more reading (including http://www.c2.com/cgi/wiki?MessagePassingConcurrency ) and links from there on, I decided to rewrite that ideas in a bit different model, but still with the same spirit. 0 - The idea is inspired by Erlang and the IO Language. Additionally to OS threads there are the Coroutine Groups. 1 - No memory is shared between Coroutine Groups, so no locking is necessary. 2 - A value and a coroutine always belong to a Coroutine Group, which should be assigned to a single OS thread, thus naturally implementing synchronized access to data. 3 - The interpreter implements a scheduler, which will pick one of the waiting coroutines that belong to the groups assined to the current thread. The scheduler may also suspend a coroutine in order to implement time-sharing. The scheduler should support blocking states in the coroutines. 4 - When comparing to Perl 5, each coroutine is an ithread, but memory is shared between all the coroutines in the same group, given that they will always run in the same OS thread. 5 - When a coroutine group is created, it is assigned to one OS thread, the interpreter might decide to create new OS threads as necessary, it might optionally implement one OS thread per coroutine group. 6 - In order to implement inter-coroutine-group communication, there are: 6.1 - A MessageQueue works just like an Unix Pipe, it looks like a slurpy array. It has a configurable buffer size and coroutines might block when trying to read and/or write to it. 6.2 - A RemoteInvocation is an object that has a identifier, a capture (which might, optionally, point to a MessageQueue as input) and another MessageQueue to be used as output. New coroutines are created in the target group to execute that invocation. 6.3 - An InvocationQueue is a special type of MessageQueue that accepts only RemoteInvocation objects. 6.4 - A RemoteValue is an object that proxies requests to another coroutine group through a RemoteInvocation. 7 - The coroutine group boundary is drawn by language constructs such as async, the feed operator, junctions, hyper operators. 8 - A value might have its ownership transferred to another group if it can be detected that this value is in use only for that invocation or return value, in order to reduce the amount of RemoteInvocations. 9 - A value might do a special ThreadSafe role if it is thread-safe (such as implementing bindings to thread-safe native libraries) In which case it is sent as-is to a different group. 10 - A value might do a special ThreadCloneable role if it should be cloned instead of being proxied through a RemoteValue when sent in a RemoteInvocation. 11 - The MessageQueue notifies the scheduler whenever new data is available in that queue so the target coroutine might be raised. 12 - Exception handling gets a bit hairy, since exceptions might only be raised at the calling scope when the value is consumed. 13 - List assignment and Sink context might result in synchronized behavior. comments are appreciated... daniel -- B. Estrade estr...@gmail.com
Re: Ideas for a Object-Belongs-to-Thread threading model
Daniel Ruoso wrote: Hi, The threading model topic still needs lots of thinking, so I decided to try out some ideas. Every concurrency model has its advantages and drawbacks, I've been wondering about this ideas for a while now and I think I finally have a sketch. My primary concerns were: 1 - It can't require locking: Locking is just not scalable; 2 - It should perform better with lots of cores even if it suffers when you have only a few; 3 - It shouldn't require complicated memory management techniques that will make it difficult to bind native libraries (yes, STM is damn hard); 4 - It should suport implicit threading and implicit event-based programming (i.e. the feed operator); 5 - It must be easier to use then Perl 5 shared variables; 6 - It can't use a Global Interpreter Lock (that already said in 1, but, as this is a widely accepted idea in some other environments, I thought it would be better to make it explicit). The idea I started was that every object has an owner thread, and only that thread should talk to it, and I ended up with the following, comments are appreciated: comments? ideas? Before discussing the implementation, I think it's worth while stating what it is that you are attempting to abstract. For example, is the abstraction intended for a mapping down to a GPU (e.g. OpenCL) with a hierarchical address space, or is it intended for a multicore CPU with linear address space, or is it intended to abstract a LAN, with communication via sockets (reliable TCP? unreliable UDP?), or is it intended to abstract the internet/cloud? Are you thinking in terms of streaming computation where throughput is dominant, or interacting agents where latency is the critical metric? I'm not sure that it makes sense to talk of a single abstraction that supports all of those environments. However, there may be bunch of abstractions that can be combined in different ways. object belongs to thread can have two interpretations: one is that the object-thread binding lasts for the life of the object; the other is that a client that wishes to use an object must request ownership, and wait to be granted (in some scenarios, the granting of ownership would require the thread to migrate to the physical processor that owns the state). In many cases, we might find that specific object-state must live in specific places, but not all of the state that is encapsulated by an object lives in the same place. Often, an object will encapsulate state that is, itself, accessed via objects. If a model requires delegated access to owned state to be passed through an intermediate object then this may imply significant overhead. A better way to think about such scenarios may be that a client would request access to a subset of methods -- and thus we have role belongs to thread, not object belongs to thread. One could imagine that a FIFO object might have a put role and a get role that producer/consumer clients would (temporarily) own while using (note that granting of ownership may imply arbitration, and later forced-revocation if the resource-ownership is not released/extended before some timeout expires). It may be wrong to conflate role as a unit of reuse with role as an owned window onto a subset of an object's methods. Perl6 has a set of language primitives to support various aspects of concurrency. It is indeed interesting to consider how these map ot vastly difference computation platforms: OpenCl Vs OpenMP Vs Cloud. It deeps a little premature to be defining roles (e.g. RemoteInvocation) without defining the mapping of the core operators to these various models of computation. Dave.
Re: Ideas for a Object-Belongs-to-Thread threading model
Em Qua, 2010-05-12 às 10:12 -0700, Dave Whipp escreveu: Before discussing the implementation, I think it's worth while stating what it is that you are attempting to abstract. For example, is the abstraction intended for a mapping down to a GPU (e.g. OpenCL) with a hierarchical address space, or is it intended for a multicore CPU with linear address space, or is it intended to abstract a LAN, with communication via sockets (reliable TCP? unreliable UDP?), or is it intended to abstract the internet/cloud? Initially I'd consider regular OS threads and queues implemented in the process address space. I'd consider other abstractions to be possible, but probably better implement them as separated modules... daniel
Fwd: Ideas for a Object-Belongs-to-Thread threading model
Forgot to send this to the list. -- Forwarded message -- From: Alex Elsayed eternal...@gmail.com Date: Wed, May 12, 2010 at 8:55 PM Subject: Re: Ideas for a Object-Belongs-to-Thread threading model To: Daniel Ruoso dan...@ruoso.com You may find interesting a paper that was (at one point) listed in the /topic of #perl6. The paper is: Combining Events And Threads For Scalable Network Services http://www.cis.upenn.edu/~stevez/papers/LZ07.ps Steve Zdancewic and Peng Li, who wrote it, implemented their proof of concept in Haskell, and I think it would mesh rather well with the 'hybrid threads' GSoC project that Parrot is undertaking. What's more, the proof-of-concept demonstrated that it performed very well, well enough that the threading/event abstractions were never a bottle neck even up to 10M threads (for memory usage, this came out to 48bytes per thread of overhead), and with 100 threads it outperformed NPTL(pthreads)+AIO on IO. It's also CPS based, which fits pretty well.
Re: Ideas for a Object-Belongs-to-Thread threading model
On Wed, May 12, 2010 at 8:57 PM, Alex Elsayed eternal...@gmail.com wrote: Forgot to send this to the list. -- Forwarded message -- From: Alex Elsayed eternal...@gmail.com ... It's also CPS based, which fits pretty well. Here's another, one that might fit more readily with perlesque/CLR: Actors that Unify Threads and Events pdf: http://lamp.epfl.ch/~phaller/doc/haller07actorsunify.pdf slides: http://lamp.epfl.ch/~phaller/doc/ActorsUnify.pdf In this paper we present an abstraction of actors that combines the benefits of thread-based and event-based concurrency. Threads support blocking operations such as system I/O, and can be executed on multiple processor cores in parallel. Event-based computation, on the other hand, is more lightweight and scales to large numbers of actors. We also present a set of combinators that allows a flexible composition of these actors. Scala actors are implemented on the JVM, but our techniques can be applied to all multi-threaded VMs with a similar architecture, such as the CLR.
Ideas for a Object-Belongs-to-Thread threading model
Hi, The threading model topic still needs lots of thinking, so I decided to try out some ideas. Every concurrency model has its advantages and drawbacks, I've been wondering about this ideas for a while now and I think I finally have a sketch. My primary concerns were: 1 - It can't require locking: Locking is just not scalable; 2 - It should perform better with lots of cores even if it suffers when you have only a few; 3 - It shouldn't require complicated memory management techniques that will make it difficult to bind native libraries (yes, STM is damn hard); 4 - It should suport implicit threading and implicit event-based programming (i.e. the feed operator); 5 - It must be easier to use then Perl 5 shared variables; 6 - It can't use a Global Interpreter Lock (that already said in 1, but, as this is a widely accepted idea in some other environments, I thought it would be better to make it explicit). The idea I started was that every object has an owner thread, and only that thread should talk to it, and I ended up with the following, comments are appreciated: 0 - The idea is similar to Erlang and the IO Language. Additionally to OS threads there are the interpreter processes. 1 - No memory is shared between processes, so no locking is necessary. 2 - The interpreter implements a scheduler, just like POE. 3 - The scheduler, unlike POE, should be able to schedule in several OS threads, such that any OS thread may raise any waiting process. 4 - Each process is run in only one OS thread at a time, it's like a Global Interpreter Lock, but it's related only to one specific process. 5 - A process may block, and the scheduler must become aware of that blocking. That is implemented through Control Exceptions. 6 - In order to implement inter-process communication, there are: 6.1 - A MessageQueue works just like an Unix Pipe, it looks like a slurpy array. It has a configurable buffer size and processes might block when trying to read and/or write to it. 6.2 - A RemoteInvocation is an object that has an identifier, a capture (which might, optionally, point to a MessageQueue as input) and another MessageQueue to be used as output. 6.3 - An InvocationQueue is a special type of MessageQueue that accepts RemoteInvocation objects. 6.4 - A RemoteValue is an object that proxies requests to another processes through a RemoteInvocation. 7 - The process boundary is drawn at each closure, every closure belongs to a process, every value initialized inside a closure belongs to that closure. You might read coroutine instead of closure if you like. 8 - A value might have its ownership transferred to another closure if it can be detected that this value is in use only for that invocation or return value, in order to reduce the amount of RemoteInvocations. 9 - A value might do a special ThreadSafe role if it is thread-safe (such as implementing bindings to thread-safe native libraries) In which case it is sent as-is to a different thread. 10 - A value might do a special ThreadCloneable role if it should be cloned instead of being proxied through a RemoteValue when sent to a different process. 11 - The MessageQueue notifies the scheduler through a Control Exception whenever new data is available in that queue so the target process might be raised. 12 - Exception handling gets a bit hairy, since exceptions might only be raised at the calling scope when the value is consumed. 13 - List assignment and Sink context might result in synchronized behavior. comments? ideas? daniel
Re: Ideas for a Object-Belongs-to-Thread threading model
Since I don't think BrowserUK subscribes here, I'll paste in the remarks he attached to your earlier paste, just to help get the discussion going, and on the assumption this will not be regarded as antisocial. :) Larry BrowserUK wrote: -there are the interpreter processes. Inventing (overloaded) terminology will just create confusion. Very unhelpful in a context that suffers more than its fair share already. - The interpreter implements a scheduler, just like POE. POE does *NOT* implement a scheduler. It implements a state machine--with the controlling state both global and shared. It provides no concurrency. Not even the illusion of concurrency. It does a little of this; and a little of that; and a little of something else; but only one thing at a time regardless of how many cores are available. And if one of those little bits of something hangs, the entire edifice hangs. -3 - The scheduler, ulike POE, should be able to schedule in several OS threads, such that any OS thread may raise any waiting process. And how are you going to implement that? The only way would be for there to be multiple concurrent (kernel threaded) instances of the state-machine running sharing (as in shared state concurrency) their controlling state. This re-creates all the very worst problems of: 505threads, green threads; and Windows3 cooperative scheduling. Besides that it would be a nightmare to implement; it would be an even worse nightmare to program,
Re: Ideas for a Object-Belongs-to-Thread threading model
On Tue, 11 May 2010, Daniel Ruoso wrote: 2 - The interpreter implements a scheduler, just like POE. I don't have a clue about threading, but I saw POE, and since I know that's an event loop mechanism, I thought I'd comment that I want to be able to do GTK programming, which I think requires using the GTK event loop. There may be some way to hook all this together, but I just thought I'd put this thought into the collective mind again. HTH, - | Name: Tim Nelson | Because the Creator is,| | E-mail: wayl...@wayland.id.au| I am | - BEGIN GEEK CODE BLOCK Version 3.12 GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+++ PGP-+++ R(+) !tv b++ DI D G+ e++ h! y- -END GEEK CODE BLOCK-