Re: [pypy-dev] Continuations and sandboxing
When I was last in the USA, I met Alessandro Warth, who now works for the Viewpoints Research Institute and whose Phd thesis which you can get on this page: http://www.tinlizzie.org/~awarth/ is about experiments in computer languages to do the sorts of things that you are interested in. He had a neat implementation of something he called 'worlds' to show off, and gave a talk about it. It's chapter 4 of his thesis. It was neat, but I got the distinct impression that the price he was paying for having unlimited cheap undo was that it was going to be quite difficult when users in different worlds, all of whom 'sprouted' from a common, but very distant ancestor and who didn't sprout from each other wanted to share information. Many other people in the room had pressing needs for such things now, and some of them were interested in PyPy. They concluded that what they would really like is a programming language which multiple heaps -- in the sense of chunks of memory that you get when you malloc -- guaranteed to be separate from each other. I couldn't tell them how hard this would be to implement in PyPy. Aside from needing to write a custom garbage collector, what else would be needed? Laura ___ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
Re: [pypy-dev] Continuations and sandboxing
Fijal is right, fwiw. There are lots of wonderful things that *could* be implemented, but what we *have* is the stackless transformation and a low-level sandbox system. Discussions of python architecture are always enjoyable, but they don't have a proper place, as far as I know; I often abuse #python for that. On 11 January 2011 22:03, Laura Creighton l...@openend.se wrote: When I was last in the USA, I met Alessandro Warth, who now works for the Viewpoints Research Institute and whose Phd thesis which you can get on this page: http://www.tinlizzie.org/~awarth/ is about experiments in computer languages to do the sorts of things that you are interested in. He had a neat implementation of something he called 'worlds' to show off, and gave a talk about it. It's chapter 4 of his thesis. I first read this paper as background for pymeta - it's one of those that you just can't put down. There is one thing I felt was missing from the worlds thesis though, namely, handling arrays. To treat them like any other ecmascript object results in strange semantics - if two worlds push to the one array and both are merged, it is as if only one of the pushes occurred. Python lists have different usage patterns than javascript objects, so handling their diffs correctly and efficiently might require some experiments. They concluded that what they would really like is a programming language which multiple heaps -- in the sense of chunks of memory that you get when you malloc -- guaranteed to be separate from each other. I couldn't tell them how hard this would be to implement in PyPy. Aside from needing to write a custom garbage collector, what else would be needed? How would that be different to separate pypy processes? Or is that more like regions in cyclone, or like subinterpreters in TCL? -- William Leslie PS: worlds are an ideal use case for first-class algebraic effects! ___ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
Re: [pypy-dev] Continuations and sandboxing
Hi Maciej, it's not clear to me what posts your are refering to. Certainly not the original post from Nathan? If you think it's a general issue then do a top-level posting and please suggest how non-active (defined how?) pypy developers are supposed to know if some topic is interesting without posting about it. And include some concrete rules about when side discussions are ok and when not. Probably all harder to do than improving your ignoring techniques? cheers, holger On Tue, Jan 11, 2011 at 09:48 +0200, Maciej Fijalkowski wrote: Hello. This discussion have drifted considerably. In general this is the pypy development mailing list, so personally I would like to keep discussions only to things that are either very interesting for people actively working on PyPy or have remote possibility that someone will start working on them. Discussing STM without someone being interested in actually looking is out of topic. PS. Feel free to correct me if anyone working actively on pypy is finding this discussion useful for PyPy development. Cheers, fijal On Tue, Jan 11, 2011 at 9:43 AM, William ML Leslie william.leslie@gmail.com wrote: On 11 January 2011 15:25, Nathanael D. Jones nathanael.jo...@gmail.com wrote: Regardng #1: Sandboxing is a major concern for me. Different code will need different sandboxing levels depending upon who created/approved the code. I can't have everything in one sandbox - I need isolated boxes on a per-request level. You will probably want to implement a capability architecture, where the authority to perform some action is bundled together with the method that invokes it. The language E ( http://erights.org ) was designed for this sort of thing (I think it was written for a MUD originally), and would be worth looking at even if you didn't use the language itself. Capability theory is sadly not well known, considering just how well it solves ever more common problems. The idea is that if functions return the next function to call instead of calling them, we have explicit tail-call elimination, and we have an explicit point at which we can rebuild the sandbox and upgrade the code. Right, this is the driver loop implementation of TCE. There are several implementations of TCE kicking around, (including one I did for cpython some time ago, a decorator that would patch load ... call* ... return bytecodes with a call to a class that collects the function and arguments, and wraps the function in a driver loop that is bypassed when the target is marked as recursive, too). Unfortunately, using only implicit continuations, once you enter somebody's function, you have no way to get out of it - that person can except: and trap you there if they like. E has a way to protect against this, but you can't protect against the function busy-looping without something more (a way to kill threads, specifically, which probably means taking down the process). Much data will be user scoped, and therefore lockable. What does lockable mean in this case? Under what conditions does reloading happen while a user continuation is in play? How do you want continuations to interact with the locking? On Mon, Jan 10, 2011 at 5:05 PM, William ML Leslie william.leslie@gmail.com wrote: On 11 January 2011 07:18, Paolo Giarrusso p.giarru...@gmail.com wrote: The one issue that interests me is where you implement the persistence boundary - do you go with orthogonal persistence and act as if everything is preserved, or assume all user code is run within some sort of (fast and loose) transaction that can be re-entered at will, providing an API for persistent data access? The second case makes the reloading question a bit more reasonable, because you can always throw away the current delta and replay the external effects, assuming the interface for the external events hasn't changed significantly. The key question is: when would you start and commit such transactions? Apart from that, your idea looks very similar to Software Transactional Memory (STM). STM restarts explicitly-marked transactions in a thread when some other thread modifies the affected data (which would be a data race) and commits its transaction. In your case, a transaction is restarted when some other thread modifies the involved code. I don't think it would be useful at all to have regular STM transactions delimited at the room boundary - there are surely going to be other people interacting with the room at the same time as you, and sending everyone else back to the start of the room every time someone modifies the room is going to get old very quickly. The boundary introduced by modifying code is probably significantly more coarse. If you are storing (as the transaction log) only the external effects (user input, random.* output), however, you can build
Re: [pypy-dev] Continuations and sandboxing
@fijal: I agree that parts of this topic have drifted off-topic for the py-py dev mailing list, so I've created a new mailing list specifically for discussions of collaboratively-edited software. I'm inviting everyone to join the new mailing list at collaborative-ga...@googlegroups.com As far as the discussion regarding PyPy's capabilities with multiple-sandboxing, reloading, and continuation serialization go, I think that is a relevant discussion for this forum. I'm still very interested in understanding (a) what PyPy can already do, and (b) what is involved in adding the features I need. I'm itching to work on PyPy :) @Laura + William: Alessandro's paper concisely describes what I was planning on doing, with the exception that the sprout() method was taking place in Java. After reading the paper I see the utility of also allowing sprouting within Javascript. I am a bit concerned about data size, however. Loading all world data is an expensive op, even if it creates a simple way to perform psuedo-transactions. Nathanael http://nathanaeljones.com On Tue, Jan 11, 2011 at 6:51 AM, William ML Leslie william.leslie@gmail.com wrote: Fijal is right, fwiw. There are lots of wonderful things that *could* be implemented, but what we *have* is the stackless transformation and a low-level sandbox system. Discussions of python architecture are always enjoyable, but they don't have a proper place, as far as I know; I often abuse #python for that. On 11 January 2011 22:03, Laura Creighton l...@openend.se wrote: When I was last in the USA, I met Alessandro Warth, who now works for the Viewpoints Research Institute and whose Phd thesis which you can get on this page: http://www.tinlizzie.org/~awarth/ is about experiments in computer languages to do the sorts of things that you are interested in. He had a neat implementation of something he called 'worlds' to show off, and gave a talk about it. It's chapter 4 of his thesis. I first read this paper as background for pymeta - it's one of those that you just can't put down. There is one thing I felt was missing from the worlds thesis though, namely, handling arrays. To treat them like any other ecmascript object results in strange semantics - if two worlds push to the one array and both are merged, it is as if only one of the pushes occurred. Python lists have different usage patterns than javascript objects, so handling their diffs correctly and efficiently might require some experiments. They concluded that what they would really like is a programming language which multiple heaps -- in the sense of chunks of memory that you get when you malloc -- guaranteed to be separate from each other. I couldn't tell them how hard this would be to implement in PyPy. Aside from needing to write a custom garbage collector, what else would be needed? How would that be different to separate pypy processes? Or is that more like regions in cyclone, or like subinterpreters in TCL? -- William Leslie PS: worlds are an ideal use case for first-class algebraic effects! ___ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
Re: [pypy-dev] Continuations and sandboxing
On 01/11/2011 04:48 PM, Nathanael D. Jones wrote: As far as the discussion regarding PyPy's capabilities with multiple-sandboxing, reloading, and continuation serialization go, I think that is a relevant discussion for this forum. I'm still very interested in understanding (a) what PyPy can already do, and (b) what is involved in adding the features I need. I'm itching to work on PyPy :) PyPy implements two things that are related somehow: 1) process sandboxing, which is just a way to control what one instance of an interpreter is allowed to do in terms of system calls, CPU and memory used. This cannot be controlled in a fine-grained way, every sandbox needs to be its own process, and communication is not implemented yet. 2) stackless, which is a way for interpreters to get some control over the C stack, including being able to artificially create one. This does not mean that you get serialization for free, it's just that we implemented it carefully for Python. To go further, I really think you need careful language design to get the capability model and the versioning/updating right. IMO you won't be able to extend standard Python, the semantics of it are too complex to shoehorn something with (so far) vague goals on top of it. This language design is partially orthogonal to what PyPy provides you with. After thinking about what your language should look like, you can consider to use PyPy as an implementation platform for it, or not. Cheers, Carl Friedrich ___ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
Re: [pypy-dev] Continuations and sandboxing
On 10 January 2011 15:24, Nathanael D. Jones nathanael.jo...@gmail.com wrote: Hi folks, Hi! 2) Serializeable continuations. With gameplay being based on good plot and story flow, continuations are critical to allow straightforward implementation of 'workflows' that depend on user choice at every turn. 3) Tail-call elimination. By nature, players will accumulate a very large call stack. While this isn't terribly bad a first glance, the following issue combines with it to cause a very big problem: When code changes underneath a continuation, we need to determine how to resume flow. One option is annotating a checkpoint method in each code 'file'. However, if a user's call stack includes every file in the system, each change will cause them to restart. Tail-call elimination would help eliminate unneeded stack frames and minimize re-spawning. I wonder if you don't want instead to manage the (user) dynamic context yourself - if players already need to remember to write all actions in a tail-recursive manner, it is probably worth making that API explicit, and if you do that, the underlying implementation doesn't need to do TCE at all. But I guess the existing stackless support should be good enough if you just want one-shot continuations. 4) Dynamic code loading. Users will be able to 'branch' their own version of the world and share it with others. There may be thousands of versions of a class, and they need to be able to execute in separate sandboxes at the same time. Source code will be pulled from a Git repository or some kind of versioning database. Quite like this idea. You do have to deal with a bunch of (fairly well known) problems, which any specific implementation of dynamic code loading is going to need to solve (or not). Pypy doesn't currently implement any hot-schema-change magic, and reloading has always been error prone in the presence of state. First-class mutable types make it particularly difficult (there is no sensible answer to what it means to reload a python class). The one issue that interests me is where you implement the persistence boundary - do you go with orthogonal persistence and act as if everything is preserved, or assume all user code is run within some sort of (fast and loose) transaction that can be re-entered at will, providing an API for persistent data access? The second case makes the reloading question a bit more reasonable, because you can always throw away the current delta and replay the external effects, assuming the interface for the external events hasn't changed significantly. -- William Leslie ___ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
Re: [pypy-dev] Continuations and sandboxing
On 11 January 2011 07:18, Paolo Giarrusso p.giarru...@gmail.com wrote: Hi all, On Mon, Jan 10, 2011 at 09:22, William ML Leslie william.leslie@gmail.com wrote: On 10 January 2011 15:24, Nathanael D. Jones nathanael.jo...@gmail.com wrote: 4) Dynamic code loading. Users will be able to 'branch' their own version of the world and share it with others. There may be thousands of versions of a class, and they need to be able to execute in separate sandboxes at the same time. Source code will be pulled from a Git repository or some kind of versioning database. Quite like this idea. You do have to deal with a bunch of (fairly well known) problems, which any specific implementation of dynamic code loading is going to need to solve (or not). Pypy doesn't currently implement any hot-schema-change magic, and reloading has always been error prone in the presence of state. First-class mutable types make it particularly difficult (there is no sensible answer to what it means to reload a python class). You might want to reuse the solutions to those issues used in the Java (and maybe .NET) world. Java allows reloading a class in a different classloader, and that has been used inside OSGi (see http://www.osgi.org/About/Technology). Not sure about the solution in OSGi, but Java serialization allows to serialize an instance of version 1 of a class, and to de-serialize it with version 2 of that class, if v2 takes extra care for that; one could use this to convert existing instances. Sure, and there is also Dynamic Class Evolution for Hotspot, and many other VMs support some form of reloading and redefinition (goops is an exceptional and accessible example, see http://wingolog.org/archives/2009/11/09/class-redefinition-in-guile for a post with lots of diagrams). What I am trying to say is that the *ability* to do reloading does not mean your code will work in the presence of interface changes. You have to decide whether updating a closure in a way that captures more variables or changes the signature updates existing instances of that closure: if it does, there will be entries missing in the closure slot, if it doesn't, older closures won't be callable in the same way as the new closures. No matter what, you here need to write code to be explicitly reloadable, or provide some way to instruct the runtime what to do for each schema change. The one issue that interests me is where you implement the persistence boundary - do you go with orthogonal persistence and act as if everything is preserved, or assume all user code is run within some sort of (fast and loose) transaction that can be re-entered at will, providing an API for persistent data access? The second case makes the reloading question a bit more reasonable, because you can always throw away the current delta and replay the external effects, assuming the interface for the external events hasn't changed significantly. The key question is: when would you start and commit such transactions? Apart from that, your idea looks very similar to Software Transactional Memory (STM). STM restarts explicitly-marked transactions in a thread when some other thread modifies the affected data (which would be a data race) and commits its transaction. In your case, a transaction is restarted when some other thread modifies the involved code. There is a bit of ambiguity in the some other thread modifies here. I don't know what synchronisation and communication is going on in your game, but I suspect that it only rarely interacts with reloading code in an interesting way. I'll reply to this properly in another email, I'd better get back to work :) -- William Leslie ___ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
Re: [pypy-dev] Continuations and sandboxing
Regardng #1: Sandboxing is a major concern for me. Different code will need different sandboxing levels depending upon who created/approved the code. I can't have everything in one sandbox - I need isolated boxes on a per-request level. I think I see a way to sidestep the need for #3 and also help with hot-swapping. If I try to empty the stack as often as possible, it should provide frequent opportunities for 'free' reloading. I.E. (psuedocode) def garden(): if btnHouse.clicked: return house def house(): if btnGarden.clicked: return garden def loop(): func = house; while(true): #TODO: Update func to the latest version #TODO: Build an execution sandbox appropriate to the security clearance func has func = sandbox.call(func) print Your are now in the + func The idea is that if functions return the next function to call instead of calling them, we have explicit tail-call elimination, and we have an explicit point at which we can rebuild the sandbox and upgrade the code. Regarding the persistence boundary, I've seen some very good points come up. A certain amount of orthogonal persistence is needed in the form of a continuation, but that would only be function-scoped data. I don't intent to allow users to use global variables or suchlike to persist data, I want a tailored API. Much data will be user scoped, and therefore lockable. However, some data will be shared across all users. I'm not sure what the best way to handle this is. With sandboxing and full control over the persistence API I could theoretically implement STM. I want to make sure the architecture is very scalable, so I've been considering something like BigTable or SimpleDB as the persistence store. Here transaction and locking options are more limited. Thoughts? Nathanael On Mon, Jan 10, 2011 at 5:05 PM, William ML Leslie william.leslie@gmail.com wrote: On 11 January 2011 07:18, Paolo Giarrusso p.giarru...@gmail.com wrote: Hi all, On Mon, Jan 10, 2011 at 09:22, William ML Leslie william.leslie@gmail.com wrote: On 10 January 2011 15:24, Nathanael D. Jones nathanael.jo...@gmail.com wrote: 4) Dynamic code loading. Users will be able to 'branch' their own version of the world and share it with others. There may be thousands of versions of a class, and they need to be able to execute in separate sandboxes at the same time. Source code will be pulled from a Git repository or some kind of versioning database. Quite like this idea. You do have to deal with a bunch of (fairly well known) problems, which any specific implementation of dynamic code loading is going to need to solve (or not). Pypy doesn't currently implement any hot-schema-change magic, and reloading has always been error prone in the presence of state. First-class mutable types make it particularly difficult (there is no sensible answer to what it means to reload a python class). You might want to reuse the solutions to those issues used in the Java (and maybe .NET) world. Java allows reloading a class in a different classloader, and that has been used inside OSGi (see http://www.osgi.org/About/Technology). Not sure about the solution in OSGi, but Java serialization allows to serialize an instance of version 1 of a class, and to de-serialize it with version 2 of that class, if v2 takes extra care for that; one could use this to convert existing instances. Sure, and there is also Dynamic Class Evolution for Hotspot, and many other VMs support some form of reloading and redefinition (goops is an exceptional and accessible example, see http://wingolog.org/archives/2009/11/09/class-redefinition-in-guile for a post with lots of diagrams). What I am trying to say is that the *ability* to do reloading does not mean your code will work in the presence of interface changes. You have to decide whether updating a closure in a way that captures more variables or changes the signature updates existing instances of that closure: if it does, there will be entries missing in the closure slot, if it doesn't, older closures won't be callable in the same way as the new closures. No matter what, you here need to write code to be explicitly reloadable, or provide some way to instruct the runtime what to do for each schema change. The one issue that interests me is where you implement the persistence boundary - do you go with orthogonal persistence and act as if everything is preserved, or assume all user code is run within some sort of (fast and loose) transaction that can be re-entered at will, providing an API for persistent data access? The second case makes the reloading question a bit more reasonable, because you can always throw away the current delta and replay the external effects, assuming the interface for the external events hasn't changed significantly. The key question is: when would you start and commit such transactions?
Re: [pypy-dev] Continuations and sandboxing
First of all, congratulations on getting THE idea, I think user content is essential too, but I see why commercial games shy away from it An open source game could do really well! 1, yes, but not in a way you want (you want many boxes that can't touch each other, right?) 2, yes, stackless is the name/misnomer 3a, I doubt is relevant, you have to show that your workload has a substantial number of tail calls; stack depth is not that important with stackless. 3b, I imagine this is impossible in general case, how do you see this done in any language? 3 again, yes of course, and I bet you knew this already. 4 indeed, unloading old code is a bit harder as you have to make sure all references to module or function or class are lost in timely fashion. p.s. I'm not an expert on pypy yet On 9 January 2011 21:24, Nathanael D. Jones nathanael.jo...@gmail.com wrote: Hi folks, I've got a challenging set of requirements that I think PyPy may be able to meet, but I'd like to get some feedback before I jump in. I'm developing a collaboratively edited game. It's not a MUD, but similar in concept, and also text based in nature. HogwartsLive.com and lotgd.net are examples of the genre I'm going for. I believe if a game of that genre makes user modification simple and rewarding enough, it could be self-sustaining and grow quickly enough to keep users interested perpetually, like Wikipedia. It's also a fascinating computer science challenge, because it requires a LOT from a computer language. 1) Sandboxing. To allow users to make changes to the game without being able to cheat at it or escalate privileges. 2) Serializeable continuations. With gameplay being based on good plot and story flow, continuations are critical to allow straightforward implementation of 'workflows' that depend on user choice at every turn. 3) Tail-call elimination. By nature, players will accumulate a very large call stack. While this isn't terribly bad a first glance, the following issue combines with it to cause a very big problem: When code changes underneath a continuation, we need to determine how to resume flow. One option is annotating a checkpoint method in each code 'file'. However, if a user's call stack includes every file in the system, each change will cause them to restart. Tail-call elimination would help eliminate unneeded stack frames and minimize re-spawning. 3) Dynamic, strong typing and metaprogramming are key for keeping the API simple. 4) Dynamic code loading. Users will be able to 'branch' their own version of the world and share it with others. There may be thousands of versions of a class, and they need to be able to execute in separate sandboxes at the same time. Source code will be pulled from a Git repository or some kind of versioning database. I'm interested in knowing which of these PyPy already does, and which of them I can make it do. I appreciate your help! Nathanael Jones http://nathanaeljones.com/ ___ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev ___ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev