Re: Ideas for a "Object-Belongs-to-Thread" threading model (nntp: message 5 of 20)

nigelsandever Thu, 13 May 2010 02:00:31 -0700

This should be a reply to Daniel Ruoso's post above, but I cannot persuademy nntp reader

to reply to a post made before I subscribed here. Sorry


On Wed, 12 May 2010 14:16:35 +0100, Daniel Ruoso <dan...@ruoso.com> wrote:

I have 3 main problems with your thinking.

1: You are conflating two fundamentally different views of the problem.
  a) The Perl 6 programmers semantic view.
  b) The P6 compiler (writers) implementation view.

These two views need to be kept cleanly separated in order that reference
implementation does not define the *only possible* implementation.

But, it is important that when designing the semantic view, that it is done
with a considerable regard for what /can/ be implemented.

2: You appear to be taking your references at face value.

For example, you've cited Erlang as one of your reference points.
And the Erlang docs describe the units of concurrency as "processes";

with "the parallelism is provided by Erlang and not the host operatingsystem."


But, if I run one of the Erlang examples,
    http://www.erlang.org/examples/small_examples/tetris.erl

it uses two procesess: one with 13 OS threads, and the other with 5 OSthreads;

even if I only run tetris:start(1).

Whilst until recently, Erlang did not use OS threads, relying instead upon

and internal, user-space scheduler--green threading, though you may findsome denialsof that by Erlangers because of the unfavorable comparison with Java greenthreading

in Java version 1 thru 4.

But recent versions have implemented multiple OS trheads each running acoroutine scheduler.The had to do this in order to achieve SMP scaling. Here is a littlesalient information:


        "The Erlang VM without SMP support has 1 scheduler which runs in the
        main process thread. The scheduler picks runnable Erlang processes and

IO-jobs from the run-queue and there is no need to lock data structuressince

        there is only one thread accessing them.

        The Erlang VM with SMP support can have 1 to many schedulers which are
        run in 1 thread each. The schedulers pick runnable Erlang processes
        and IO-jobs from one common run-queue. In the SMP VM all shared data
        structures are protected with locks, the run-queue is one example of a
        data structure protected with locks."

Lock-free at the semantic level is a nice-to-have. But, whenever you havekernel threadstalking to each other through shared memory--and you have to have if youare going to achieve

SMP scalability--then there will be some form of locking required.

All talk of "message passing protocols" is simply disguising the realitiesof the implementation.That is not a bad thing from the applications programmer's point ofview--nor even the languagedesigner's POV--but it still leaves the problem to be dealt with by thelanguage & system implementers.

Whilst lock-free queues are possible--there are implementations of theseavailable for Java 5(which, of necessity, and to great effect, has now moved away from greenthreads and gone theKernel threading route.)--they are very, very hardware dependant. Relyingas they do upon CAS, whichnot all processor architectures support and not all languages giveadequate access to.

For a very interesting, if rather long, insight into some of this, seeCliff Click's video about

"Fast Wait-free Hashtables":

        http://www.youtube.com/watch?v=WYXgtXWejRM&feature=player_embedded#!

One thing to note if you watch it all the way through is that your claim(in an earlier revision?)that "shared memory doesn't scale" is incorrect in the light of this videowhere 786 SMP processors

are using a hash for caching at very high speed.

3: By conflating the POVs of the sematic design and implementation, youare in danger of reinventing

several bad wheels, badly.

a) A green threading scheduler:

The Java guys spent a long time trying to get their's right beforeabandoning it.

The Erlang guys have taken a long time tuning their's, but due to Moore'sLaw running out,they have had to bow to the inevitability of kernel threading. And arenow having to gothrough the pain of understanding and addressing how multiple eventdriven and cooperativeschedulers running under the control of (various) preemptive scheduler(s)interact.


        Even Haskell has to use kernel threading:

"In GHC, threads created by forkIO are lightweight threads, and aremanaged entirely by theGHC runtime. Typically Haskell threads are an order of magnitude or twomore efficient

        (in terms of both time and space) than operating system threads.

The downside of having lightweight threads is that only one can run at atime, so if one threadblocks in a foreign call, for example, the other threads cannot continue.The GHC runtime worksaround this by making use of full OS threads where necessary. When theprogram is built with the-threaded option (to link against the multithreaded version of theruntime), a thread making asafe foreign call will not block the other threads in the system; anotherOS thread will takeover running Haskell threads until the original call returns. The runtimemaintains a pool of theseworker threads so that multiple Haskell threads can be involved inexternal calls simultaneously."


b) Cooperative scheduling:
        From memories of Windows 3/95/98/ME; Java version 1.1, 1.2, 1.3, 1.4;
        co-operative (pseudo) multi-tasking     is fraught with problems.

Besides the requirement for "good behavior", it leads to an extremelyunnatural and difficult wayof programming. Unless the programming environment takes care ofmaintaining application program statebetween yields, this falls to the application programmer. And theyusually get it wrong.

CPS can alleviate this, but to my knowledge, as yet, only Scheme hassuccessfully adopted the CPS.

        You have to ask yourself: why is that?

c) Event loops:

Event loops only work for processing where events are generatednaturally. For GUIs and Web servers,they work well--until the response to the event takes longer to producethan the target maximum response

        time for the loop.

        The image takes longer to render than the GUIs 1/10 second response 
time.

The genome takes longer to process than the Web users 3-10 secondpatience.

        
----------

Currently, the most promising idea for resolving Perl 6's ambitions, withthe realities of the ongoing move tocompensate for the inability to keeping ramping up the processor clocks bymoving to medium and large-scale

multi-core processors, is:

        http://lamp.epfl.ch/~phaller/doc/haller07coord.pdf

(Note: I'm been trying to digest and assimilate that paper (and several ofits precursors) for a while now.I'm not yet entirely comfortable that it is a silver bullet for all ofPerl 6's concurrency issues.But to date, it is the most promising, and it does at least offer someexperimental results. )

And at the core of that, is the need for preemptive (kernel) threading andshared memory.

These can (and should!) be hidden from the application programmer, throughthe use of language and/or library

level abstractions, of which there are many promising [sic] candidates.

But fundamentally, they all require that:

        1) Preemptive scheduling be utilised.
        2) The core interpreter be fully reentrant.
        3) The core runtime libraries be fully reentrant.
        4) That the language distinguishes between, and handles appropriately,
                - process-global entites: IO handles; environment; pwd etc.

- runtime stack-based(*) (lexical) entities: locals (my vars in perl'sterms).


(*)Regardless of whether these actually live on the thread-owned(**),
   processor stack or in a heap-based logical 'stack'.

(**)Even in a single threaded process, the unit of (kernel) dispatch isthe thread,

    and the thread owns the processor stack.

Please note. Despite these criticisms of what you have so far, I applaudyou for picking this hotpotato up and running with it. It desperately needed to be brought outinto the light and addressed.

Thank you for being the one :)

In this reply I have not detailed any alternatives to your proposals--I'mworking on it furiously,but like you, I keep conflating the semantics and implementation. I hopeto have something for youand others to summarily dismiss within a few days. Subject to Jensonwinning in Monaco :)


Buk.



Em Ter, 2010-05-11 às 21:45 -0300, Daniel Ruoso escreveu:
he threading model topic still needs lots of thinking, so I decided to
try out some ideas.

After I sent the second version, I just realized I could make it simpler
by just assuming "one OS thread per Coroutine Group"... so here goes the
new version.

0 - No memory is shared between threads, so no locking is necessary.

At one level, Perl 6 pretty much gives you this for free.
As perl 6 does all it can to


1 - A value and a coroutine always belong to a thread, thus naturally
    implementing synchronized access to data.

2 - Coroutines are, conceptually, the equivalent to green threads,
    running in the same OS thread. Coroutines waiting for a return
    value are "blocked".

3 - The interpreter implements a scheduler, which will pick one of the
    "waiting" coroutines, it may also suspend a coroutine in order to
    implement time-sharing.

4 - In order to implement inter-thread communication, there are:

    4.1 - A "MessageQueue" works just like an Unix Pipe, it looks like
          a slurpy array. It has a configurable buffer size and
          coroutines might block when trying to read and/or write to
          it.

    4.2 - A "RemoteInvocation" is an object that has a identifier, a
          capture (which might, optionally, point to a "MessageQueue"
          as input) and another "MessageQueue" to be used as output.
          New coroutines are created in the target thread to execute
          that invocation.

    4.3 - An "InvocationQueue" is a special type of "MessageQueue"
          that accepts only "RemoteInvocation" objects.
   4.4 - A "RemoteValue" is an object that proxies requests to
          another coroutine group through a "RemoteInvocation".

5 - The thread group boundary is drawn by language constructs such
    as async, the feed operator, junctions, hyper operators.

6 - A value might have its ownership transferred to another thread if
    it can be detected that this value is in use only for that
    invocation or return value, in order to reduce the amount of
    "RemoteInvocation"s.

7 - A value might do a special "ThreadSafe" role if it is thread-safe
    (such as implementing bindings to thread-safe native libraries) In
    which case it is sent as-is to a different thread.

8 - A value might do a special "ThreadCloneable" role if it should be
     cloned instead of being proxied through a "RemoteValue" when sent
     in a "RemoteInvocation".

9 - Exception handling gets a bit hairy, since exceptions might only
     be raised at the calling scope when the value is consumed.

10 - List assignment and Sink context might result in synchronized
     behavior.


daniel



--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Re: Ideas for a "Object-Belongs-to-Thread" threading model (nntp: message 5 of 20)

Reply via email to