Re: [whatwg] MessagePorts in Web Workers: implementation feedback

Maciej Stachowiak Fri, 08 May 2009 05:52:24 -0700


On May 7, 2009, at 5:40 PM, Drew Wilson wrote:

Agreed that removing this requirement:
User agents must act as if MessagePort objects have a strongreference to their entangled MessagePort object.
would make MessagePort implementation much easier, as it wouldremove the need to track reachability across multiple threads. Thisrequirement can get tricky especially as both sides can be cloned,in-flight to a new owner, etc.
My only concern is that removing this requirement introduces non-deterministic behavior - if I have an entangled MessagePort and Iregister an onmessage() handler with it, then drop my reference toit, after which someone calls postMessage() on the entangled port,there's no way to tell if my onmessage() handler will be invoked ;it entirely depends on whether a GC happens first or not. That seemsbad.


That's a fair concern. I would mention a few counterpoints:

1) Nondeterministic behavior is inevitable with an API designed forconcurrency. There are surely already possible cases of nondeterminismin the MessagePort API. Consider sending a message to two differentworkers and waiting for the reply. The replies may arrive in eitherorder; indeed, the workers may receive the messages in either order,so if they are in communication with each other you cannot rely on onegetting the message and performing its action first first.

2) The nondeterministic behavior in this case is easily avoided bywhat are in any case good coding practices: (a) don't drop allreferences to a MessagePort you are still using, and (b) call close()on the MessagePort when you are done with it and don't want moremessages.

3) The alternatives on the table to removing this requirement areeither removing the ability to use MessagePorts to communicate withWorkers, or leaving the spec as-is with its attendant highimplementation cost.

Given all these factors, I think avoid nondeterminism in the oneparticular case you describe, when authors can already avoid it in areasonable way, is not worth the order of magnitude increase inimplementation complexity imposed by the entanglement keepaliverequirement. I also think accepting this small amount of potentialnondeterminism is preferable to excluding Workers from usingMessagePorts.

Thus, on the whole, I think the best option is to remove the keepaliverequirement.


Regards,
Maciej

-atw
On Thu, May 7, 2009 at 3:28 PM, Maciej Stachowiak <[email protected]>wrote:
I agree with Drew's assessment that MessagePorts in combination withWorkers are extremely complicated to implement correctly, ascurrently specified. In fact, the design seems to push towardshaving lockable shared state, even though one potential advantage ofthe message passing design is to avoid locking and shared state.
Besides removing MessagePorts as a way to communicate with workers,another possibility is simplifying the life cycle requirements. Forexample, getting rid of the keepalive rule, whereby bothMessagePorts remain live so long as either is otherwise live, wouldremove the majority of the complexity. I don't think the slightconvenience of that rule is worth the extra implementation cost.
On May 7, 2009, at 1:39 PM, Drew Wilson wrote:
Hi all,
I've been hashing through a bunch of the design issues around usingMessagePorts within Workers with IanH and the Chrome/WebKit teamsand I wanted to follow up with the list with my progress.
The problems we've encountered are all solveable, but I've beensurprised at the amount of work involved in implementing workerMessagePorts (and the resulting implications that MessagePorts haveon worker lifecycles/reachability). My concern is that the amountof work to implement MessagePorts within Worker context may be sohigh that it will prevent vendors from implementing theSharedWorker API. Have other implementers started working on thispart of the spec yet?
Let me quickly run down some of the implementation issues I've runinto - some of these may be WebKit/Chrome specific, but otherbrowsers may run into some of them as well:
1) MessagePort reachability is challenging in the context ofseparate Worker heaps
In WebKit, each worker has its own heap (in Chrome, they will havetheir own process as well). The spec reads:User agents must act as if MessagePort objects have a strongreference to their entangled MessagePort object.
Thus, a message port can be received, given an event listener, andthen forgotten, and so long as that event listener could receive amessage, the channel will be maintained.
Of course, if this was to occur on both sides of the channel, thenboth ports would be garbage collected, since they would not bereachable from live code, despite having a strong reference to eachother.
Furthermore, a MessagePort object must not be garbage collectedwhile there exists a message in a task queue that is to bedispatched on that MessagePort object, or while the MessagePortobject's port message queue is open and there exists a messageevent in that queue.
The end result of this is the need to track some common stateacross an entangled MessagePort pair such as: number of outstandingmessages, open state of each end, and number of active referencesto each port (zero or non-zero). Turns out this last bit willrequire adding new hooks to the JavaScriptCore garbage collector todetect transitioning between 1 and 0 references without actuallyfreeing the object - not that difficult, but possibly somethingthat other implementers should keep in mind.
2) MessagePorts dramatically change the worker lifecycle
Having MessagePorts in worker context means that Workers canoutlive their parent window(s) - I can create a worker, pass off anentangled MessagePort to another window (say, to a differentdomain), then close the original window, and the worker should stayalive. In the case of WebKit, this causes some problems for thingslike worker-initiated network requests - if workers can continue torun even though there are no open windows for that origin, then itbecomes problematic to perform network requests (part of this isdue to the architecture of WebKit which requires proxying networkrequests to window context, but part of this is just a generalproblem of "how do you handle things like HTTP Auth when there areno open windows for that origin?")
Finally, the spec defines a fairly broad definition of what makes aworker reachable - here's an excerpt from my WebKit Shared Workerdesign doc, where I summarize the spec (possibly incorrectly - feelfree to correct any misconceptions):
Permissible
The spec specifies that a worker is permissible based on whether ithas a reachable MessagePort that has been entangled at some pointin the past with an active window (or with a worker who is itselfpermissible). Basically, if a worker has ever been entangled withan active window, or if it's ever been entangled with a worker whois itself permissible (i.e. it's associated with an active windowvia a chain of workers that have been entangled at some point inthe past) then it's permissible.
The reason why the "at some point in the past" language is presentis to allow a page to create a fire-and-forget worker (for example,a worker that does a set of long network operations) without havingto keep a reference to that worker around.
Once the referent windows close, the worker should also close, asbeing permissible is a necessary (but not sufficient) criteria forbeing runnable.
Active needed

A permissible worker is active needed if:
it has pending timers/network requests/DB activity, or
it is currently entangled with an active window, or another activeneeded worker.
The intent behind #1 is to enable fire-and-forget workers thatdon't exit until they are idle. The intent behind #2 is that anidle worker shouldn't exit as long as it's reachable by an activewindow (possibly chained through other workers).The end result is that for each worker we need to keep track of abig list of every window it's ever been entangled with. As workersbecome entangled with other workers, they each inherit the list ofentangled windows from the other worker. As windows becomeinactive, we then walk the lists of every worker to removereferences to the window and properly shutdown the worker asappropriate. All of this with the appropriate cross-threadsynchronization, of course :)Likewise, determining when a worker is active needed requirestracking a graph of entangled message ports, and walking that graphto determine whether a given worker is reachable by any activewindow. Typically this is only needed when either a window closes,or when a worker goes idle.
Again, none of these issues individually are insurmountable, but intotal they add up to a significant amount of work for what shouldbe a fairly incremental improvement (going from dedicated workersto shared workers). Have other vendors started investigating whatit takes to implement SharedWorkers (and therefore MessagePorts inworkers)?
Another approach for SharedWorkers would be to give them animplicit MessagePort-esque API like dedicated Workers and not allowpassing in MessagePorts to postMessage(). This would mean thatreferences to workers can't really be passed around to otherwindows/workers, but rather are kept per-origin. Dedicated workerscould work as they do now in Firefox/WebKit (with no MessagePorts).The SharedWorker lifecycle could be significantly simplified suchthat a SharedWorker is permissible as long as there's an activewindow under the same origin (no more walking some distributedcross-thread dependency graph).The thing we'd give up is the capabilities-based API thatMessagePorts provide, but I'd argue that the workaround is simple:the creating window can just act as a proxy for the worker. IMO,the implementation burden far outstrips the benefit of allowingdirect foreign access to workers. Literally 90% of the work on myplate for SharedWorkers seems to derive from MessagePorts in oneform or another, which seems completely wrong.I'd like to hear your thoughts on this - are people open toremoving MessagePort support from Workers?
-atw

Re: [whatwg] MessagePorts in Web Workers: implementation feedback

Reply via email to