Chandon's GSoC project is already starting to highlight some unresolved related issues we have in Parrot. Perhaps the most important is how we control cross-thread data corruption. We used to have an STM system though it was non-functional. We've recently also removed a "_sync" member of the PMC structure which ostensibly would have been used to perform fine-grained locking of shared PMCs. Both of those things were unused and unfunctional at the time they were removed, but we are going to need to replace them with something eventually, especially if we ever want to have proper threads support. Throughout this email I'm going to be using the term "threads" to mean OS-level threads, not the new "Green Threads" that Chandon is working on (in Green Threads, data corruption is a much much smaller problem).
An obvious choice would be to create a new STM implementation. Done right, we wouldn't need to add new fields to the PMC structure and we could avoid almost all locking. Plus, there are several libraries out there that we could tap into to get STM "for free". I think there are some STM libraries affiliated with the LLVM project as well, so we might be able to tap into those at the same time we're adding an LLVM-based JIT backend. Implementing simple STM shouldn't be too big a project. However, doing it correctly and robustly, following all the current research on optimization and whatever is much harder. If we want to go the route of using STM, we should seriously evaluate some existing libraries. With our shiny new immutable strings implementation we already don't have to worry about locking strings because they can't be written to and therefore can't be corrupted. We may need to make some changes to the implementation to make sure there are no exceptions and that a reference to a STRING cannot escape into PIR land before it has been completely constructed and write-projected. We also obviously don't need to worry about locking INTVALs and FLOATVALs, since those aren't passed internally by reference. So a better question than "how do we safely share PMCs" might be "How do we stop sharing PMCs entirely?". If PMCs were not shared, or if we create clones when we pass a PMC from one thread to another, we don't need to worry about locks or safe sharing. Thread-based COW on PMCs would do the same job. If PMCs can only be written from the thread that they originated from, other threads could schedule method/vtable calls as "messages" on the originating thread when updates need to be made. This can either raise performance issues, where for every method or vtable call we send a message a yield to allow the message to complete processing, or we would require threads to be aware of the shared state of PMCs and manually wait over some kind of flag until a batch of messages is processed. We really need to consider whether we want PMCs to be transparently modifiable by reference across multiple threads. If they are, we need a system for managing either locks or atomic transactions, up to and maybe including some kind of GIL. If they are not, we need to consider a system for messaging. I don't think we're going to need to have any kind of system in place for Chandon to continue his work and even reach a successful conclusion. However, without a mechanism for data sharing any uses of threads will need to either explicitly avoid data sharing entirely or take the risk of crashing with fire. --Andrew Whitworth _______________________________________________ http://lists.parrot.org/mailman/listinfo/parrot-dev
