RE: hybrid threads questions

Matthew Wilson Sat, 08 Dec 2012 13:28:03 -0800

Thanks for the replies; responses inline

> Hi Matthew,
>
> I finally found some time to answer your questions. I hope you don't mind my
> CCing parrot-dev. Your questions are very good and very important for showing
> the limitations of the current implementation.
>
> On Monday 03 December 2012 18:29:26 Matthew Wilson wrote:
>
> >    1. On page 17 you say "But since all objects which could be accessed
> >    from other threads have to be pushed onto the task object representing
> >    these threads, the objects can still be referenced from the task object."
> > Are you saying thread A's objects cannot hold references to thread B's
> > objects without a Task referencing it being enqueued on thread B?
> > Otherwise, what task object are you talking about?  If thread B's GC runs
> > and doesn't find a reference to the object in its own live object graph or
> > its own Task queue, how does it know not to collect the object?  Or are
> > threads not allowed to hold references to other threads' objects apart from
> > having an open read or write operation involving it?
>
> Each thread is represented by it's own interpreter and has a completely
> independent GC domain. So yes, thread B would not see any references from
> thread A. Even more: thread A is not allowed to reference thread B's objects
> directly. Cross thread references must go through proxy objects, so the GC
> knows not to follow those references.


What do you mean by "the GC knows not to follow those references"?  I was
asking about the GC in thread B.  follow what references?  How does thread B
know not to collect proxy objects it created for its own objects?  So, here is
an example that I believe shows the unsafety of the proxy system: Thread A
(main) allocates ResizablePMCArray Z and passes [a proxy of it] to thread
B, sleeping until it's done (because it's sleeping until a bunch of other tasks
are also done).  Thread B allocates ResizablePMCArray Y and X, then pushes
X to Y, then pushes a new Integer 43 to X.  Then B wants to push Y to the
proxy of Z.  However that's accomplished, it seems from what you're saying
that a proxy of Y is created and sent to A, and A pushes the proxy of Y to Z.
Stop me right now if a child thread can't have the main thread stash an object
in a proxied collection from the main thread, be it a Hash, Array, or attribute
(if not, <tears>).  Anyway, Y is now [indirectly] in Z[0].  Two problems: (1)
main thread now has an unproxied reference to X and its Integer 43, via Y in
Z[0].  (2) But lets say it doesn't care about that and then thread B is finished
with that task and another task (that main queued up) comes along that gets
scheduled to the same OS thread that ran thread B, and it news a billion Ints
in a loop.  How does B's GC know that A has references (via proxies or not!)
to X, Y, and 43?  If it knows about the references, what path does it traverse
from its known live roots to get to find them?  On #parrot, benabik suggested
this means Tasks can never be collected.  However, even this doesn't help,
since A can get Y[0] from Y (and then null Y[0]) and still have a reference
to X.  That is, even if the data attribute of the child's push-to-Z task is
somehow enforced to still be Y, how does B know that X (and its Integer 43)
are still live, since it has nothing to refer to them?  rurban
suggested that Y's proxy would return a B-proxied X from Y[0] (is this
implemented?), but this doesn't help because B cannot know that X's
proxy is live since it's not referenced by any Task, since Y's proxy's Task
could have now been collected (or does this again mean Tasks can never
be collected?)

So I guess the general question is: how do you handle proxies that persist
(via stashing in parent objects) longer than their Task?  If you don't, does
this mean no proxy can ever be collected and must be linked as live from
owner threads?  or does it mean "doctor, it segfaults when I pir this" "so
don't pir that"?

>
> >    2. If threads cannot hold references to foreign objects between Task
> >    requests to their threads, how is this useful?
>
> The programmer (or rather the high level language compiler) has to make sure
> that the interpreter knows when some of its objects are (indirectly)
> referenced by other threads. This is done by pushing these objects onto the
> Task that is going to run on a different thread. When the Task is scheduled on
> another thread, a copy of the Task object is created on the target thread's
> interpreter. For all objects pushed onto the original Task, proxies are
> created on the target interpreter. These two versions of the Task object
> remain within their own interpreter's GC domain and are linked through the
> partner pointer.
>
> As long as the Task exists on the other thread, the local copy remains alive
> in the interpreter's foreign_tasks list. This way, all shared objects are
> still referenced from this local copy of the Task. When the Task on the other
> thread is finished and thus freed, the original copy is removed from the
> foreign_tasks list. If it's not referenced anymore from some variable or
> register, it's freed and thus the shared objects may be freed as well.

" If it's not referenced anymore from some variable" - this is exactly my
example above.  X is no longer referenced (indirectly) by the Task, both
because it has been popped from Y and because the Task in which Y was
stashed is no longer active and has been freed, so how does B know it's alive?

>
> This implementation is one reason for the limitation mentioned in 8.1.2 that
> worker threads may only schedule Tasks on the main thread. If the main thread
> starts a Task on thread A and shares some objects with it and this Task would
> then want to start a task on thread B and share the same objects, this would
> have to be communicated back to the main thread, so it knows about another
> thread using those objects.
>
> This is not a limitation of the model itself, but of the current
> implementation.
>
> >    4. Same question about a situation where the only live reference to an
> >    object is stored in an attribute of the data PMC of a Task in some
> > *other* arbitrary thread (and the only reference to that data PMC is in the
> > Task). Does the GC scan all other threads' Task queues (and therefore wait
> > for locks on *all* of them to do so, since it would need to lock them all
> > at once to ensure it didn't miss any cross-writes)?
>
> No, GC domains are strictly separated. As described, there exists a local copy
> of the Task object on the interpreter which originally created it.

But when the Task is done...

>

Another issue is (if proxied collections/containers always return new proxies as
needed from getters, as benabik says on #parrot that's the only sane way he
can imagine it, and I agree) - how will nqp/rakudo's 6model objects
know to return
proxies from their getters?  Does this mean all nqp/rakudo 6model getters need
to know when they are being invoked from proxy-invoked code, and therefore
need to create proxies?  also how will arbitrary library nqp/rakudo code know to
create write tasks in owners when about to call a mutating operation on some
parameter, since nqp/rakudo generally do not use parrot's vtables?  benabik
suggested a 6model repr of Proxy that knows which repr ops need to return
proxies and which ones need to create write Tasks...

-Matthew
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

RE: hybrid threads questions

Reply via email to