Hi, I have had an idea for some time to re-implement how GIL works, but nothing radical. On purpose I'm saying that I don't want to "remove" GIL, but to modify it, so people who follow your list don't get too excited.
Let me describe what I want to accomplish, and what kind of results am looking for, and also hear any possible design or implementation issue I/we may face. I want threads to work in parallel as they already do, with very little waiting times on other threads in case of GIL'ing. Actually, I want to use time reserved for GIL to do something else. To be clear, I don't need JIT, and lets take it out of discussion for some time later. First step: ------------- My solution require wait-free lock-free queue, so many threads can only append/push to it atomically. Every thread created, including main thread, has to have (at least one) such a queue, and lets say queue has pointer to thread where it belongs. Not only queue is required, but also conditional variable (such as pthread_cond_t) per thread. What I imagine is high-level thread object that consists of low-level thread, low-level wait-free lock-free queue, and low-level conditional variable. Imagine that this high-level thread object is local per each OS thread created. From now on I will call high-level thread object just "thread". Every mutable object created has to have pointer to a thread in which its created. As a result, this way we can know origin of a mutable object by thread by inspecting mutable object itself, so as a consequence also know low-level thread, low-level queue and low-level condition variable associated by it. Having this special field is naive approach, so of course we can have allocation areas per thread, and GC can be smarter, but lets just forget about it for now. Second step: ----------------- Every time "ceval" equivalent function observes a mutable object on bytecode interpretation, it checks if local thread is same as object's thread. If they are same, then proceed with interpretation of local ceval operation. If they are different, then: - lets call local thread just T - lets call operation just O - take low-level queue of object's thread, lets call it Q - group O and T into item and push it to Q - set T's condition variable to wait Third step: ------------- Lets imagine that we have old style GIL that counts opcodes executed, and after N opcodes instead of acquiring GIL, it processes queue. This processing is done by popping out operation (O) and thread (T) items from local thread's queue, and serially executing operations. After each executed operation it sends signal to condition variable passed inside thread which was passed in same queue item where is operation. Once, signal is received by awaiting thread, until now awaiting thread proceeds with execution, while thread that was processing queue also continues executing queued operations. Conclusion: --------------- There is ceval per thread. All ceval's run at same time. In the best case, they don't wait on each other because threads do not touch objects created in other threads. In the worst case, there is always only one thread executing operation, and all other are waiting it to finish, and this is close to how GIL already works (almost). Special care has to be taken for threads that are going to finish, and what happens with their objects if they are referenced by other objects that belong to other threads. So I have also some questions: - How flexible is to modify GIL in pypy? - Where should I look? - Do I need to fully translate pypy in order to test it? - I need wait-free look-free queue, so what kind of low-level atomic primitives do RPython has to offer? - I need to have special field for all mutable objects, so how to do so? I do not expect any performance boost, just correctness of approach. Perhaps, I'm wrong. What do you think, is this practical approach? Regards, Marko Tasic _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev