On May 7, 2009, at 5:40 PM, Drew Wilson wrote:
Agreed that removing this requirement:
User agents must act as if MessagePort objects have a strong
reference to their entangled MessagePort object.
would make MessagePort implementation much easier, as it would
remove the need to track reachability across multiple threads. This
requirement can get tricky especially as both sides can be cloned,
in-flight to a new owner, etc.
My only concern is that removing this requirement introduces non-
deterministic behavior - if I have an entangled MessagePort and I
register an onmessage() handler with it, then drop my reference to
it, after which someone calls postMessage() on the entangled port,
there's no way to tell if my onmessage() handler will be invoked ;
it entirely depends on whether a GC happens first or not. That seems
bad.
That's a fair concern. I would mention a few counterpoints:
1) Nondeterministic behavior is inevitable with an API designed for
concurrency. There are surely already possible cases of nondeterminism
in the MessagePort API. Consider sending a message to two different
workers and waiting for the reply. The replies may arrive in either
order; indeed, the workers may receive the messages in either order,
so if they are in communication with each other you cannot rely on one
getting the message and performing its action first first.
2) The nondeterministic behavior in this case is easily avoided by
what are in any case good coding practices: (a) don't drop all
references to a MessagePort you are still using, and (b) call close()
on the MessagePort when you are done with it and don't want more
messages.
3) The alternatives on the table to removing this requirement are
either removing the ability to use MessagePorts to communicate with
Workers, or leaving the spec as-is with its attendant high
implementation cost.
Given all these factors, I think avoid nondeterminism in the one
particular case you describe, when authors can already avoid it in a
reasonable way, is not worth the order of magnitude increase in
implementation complexity imposed by the entanglement keepalive
requirement. I also think accepting this small amount of potential
nondeterminism is preferable to excluding Workers from using
MessagePorts.
Thus, on the whole, I think the best option is to remove the keepalive
requirement.
Regards,
Maciej
-atw
On Thu, May 7, 2009 at 3:28 PM, Maciej Stachowiak <[email protected]>
wrote:
I agree with Drew's assessment that MessagePorts in combination with
Workers are extremely complicated to implement correctly, as
currently specified. In fact, the design seems to push towards
having lockable shared state, even though one potential advantage of
the message passing design is to avoid locking and shared state.
Besides removing MessagePorts as a way to communicate with workers,
another possibility is simplifying the life cycle requirements. For
example, getting rid of the keepalive rule, whereby both
MessagePorts remain live so long as either is otherwise live, would
remove the majority of the complexity. I don't think the slight
convenience of that rule is worth the extra implementation cost.
On May 7, 2009, at 1:39 PM, Drew Wilson wrote:
Hi all,
I've been hashing through a bunch of the design issues around using
MessagePorts within Workers with IanH and the Chrome/WebKit teams
and I wanted to follow up with the list with my progress.
The problems we've encountered are all solveable, but I've been
surprised at the amount of work involved in implementing worker
MessagePorts (and the resulting implications that MessagePorts have
on worker lifecycles/reachability). My concern is that the amount
of work to implement MessagePorts within Worker context may be so
high that it will prevent vendors from implementing the
SharedWorker API. Have other implementers started working on this
part of the spec yet?
Let me quickly run down some of the implementation issues I've run
into - some of these may be WebKit/Chrome specific, but other
browsers may run into some of them as well:
1) MessagePort reachability is challenging in the context of
separate Worker heaps
In WebKit, each worker has its own heap (in Chrome, they will have
their own process as well). The spec reads:
User agents must act as if MessagePort objects have a strong
reference to their entangled MessagePort object.
Thus, a message port can be received, given an event listener, and
then forgotten, and so long as that event listener could receive a
message, the channel will be maintained.
Of course, if this was to occur on both sides of the channel, then
both ports would be garbage collected, since they would not be
reachable from live code, despite having a strong reference to each
other.
Furthermore, a MessagePort object must not be garbage collected
while there exists a message in a task queue that is to be
dispatched on that MessagePort object, or while the MessagePort
object's port message queue is open and there exists a message
event in that queue.
The end result of this is the need to track some common state
across an entangled MessagePort pair such as: number of outstanding
messages, open state of each end, and number of active references
to each port (zero or non-zero). Turns out this last bit will
require adding new hooks to the JavaScriptCore garbage collector to
detect transitioning between 1 and 0 references without actually
freeing the object - not that difficult, but possibly something
that other implementers should keep in mind.
2) MessagePorts dramatically change the worker lifecycle
Having MessagePorts in worker context means that Workers can
outlive their parent window(s) - I can create a worker, pass off an
entangled MessagePort to another window (say, to a different
domain), then close the original window, and the worker should stay
alive. In the case of WebKit, this causes some problems for things
like worker-initiated network requests - if workers can continue to
run even though there are no open windows for that origin, then it
becomes problematic to perform network requests (part of this is
due to the architecture of WebKit which requires proxying network
requests to window context, but part of this is just a general
problem of "how do you handle things like HTTP Auth when there are
no open windows for that origin?")
Finally, the spec defines a fairly broad definition of what makes a
worker reachable - here's an excerpt from my WebKit Shared Worker
design doc, where I summarize the spec (possibly incorrectly - feel
free to correct any misconceptions):
Permissible
The spec specifies that a worker is permissible based on whether it
has a reachable MessagePort that has been entangled at some point
in the past with an active window (or with a worker who is itself
permissible). Basically, if a worker has ever been entangled with
an active window, or if it's ever been entangled with a worker who
is itself permissible (i.e. it's associated with an active window
via a chain of workers that have been entangled at some point in
the past) then it's permissible.
The reason why the "at some point in the past" language is present
is to allow a page to create a fire-and-forget worker (for example,
a worker that does a set of long network operations) without having
to keep a reference to that worker around.
Once the referent windows close, the worker should also close, as
being permissible is a necessary (but not sufficient) criteria for
being runnable.
Active needed
A permissible worker is active needed if:
it has pending timers/network requests/DB activity, or
it is currently entangled with an active window, or another active
needed worker.
The intent behind #1 is to enable fire-and-forget workers that
don't exit until they are idle. The intent behind #2 is that an
idle worker shouldn't exit as long as it's reachable by an active
window (possibly chained through other workers).
The end result is that for each worker we need to keep track of a
big list of every window it's ever been entangled with. As workers
become entangled with other workers, they each inherit the list of
entangled windows from the other worker. As windows become
inactive, we then walk the lists of every worker to remove
references to the window and properly shutdown the worker as
appropriate. All of this with the appropriate cross-thread
synchronization, of course :)
Likewise, determining when a worker is active needed requires
tracking a graph of entangled message ports, and walking that graph
to determine whether a given worker is reachable by any active
window. Typically this is only needed when either a window closes,
or when a worker goes idle.
Again, none of these issues individually are insurmountable, but in
total they add up to a significant amount of work for what should
be a fairly incremental improvement (going from dedicated workers
to shared workers). Have other vendors started investigating what
it takes to implement SharedWorkers (and therefore MessagePorts in
workers)?
Another approach for SharedWorkers would be to give them an
implicit MessagePort-esque API like dedicated Workers and not allow
passing in MessagePorts to postMessage(). This would mean that
references to workers can't really be passed around to other
windows/workers, but rather are kept per-origin. Dedicated workers
could work as they do now in Firefox/WebKit (with no MessagePorts).
The SharedWorker lifecycle could be significantly simplified such
that a SharedWorker is permissible as long as there's an active
window under the same origin (no more walking some distributed
cross-thread dependency graph).
The thing we'd give up is the capabilities-based API that
MessagePorts provide, but I'd argue that the workaround is simple:
the creating window can just act as a proxy for the worker. IMO,
the implementation burden far outstrips the benefit of allowing
direct foreign access to workers. Literally 90% of the work on my
plate for SharedWorkers seems to derive from MessagePorts in one
form or another, which seems completely wrong.
I'd like to hear your thoughts on this - are people open to
removing MessagePort support from Workers?
-atw