Re: [pypy-dev] Continuations and sandboxing

2011-01-11 Thread Laura Creighton
When I was last in the USA, I met Alessandro Warth, who now works for
the Viewpoints Research Institute and whose Phd thesis which you can get
on this page: http://www.tinlizzie.org/~awarth/ is about experiments in
computer languages to do the sorts of things that you are interested in.
He had a neat implementation of something he called 'worlds' to show off,
and gave a talk about it.  It's chapter 4 of his thesis.

It was neat, but I got the distinct impression that the price he was
paying for having unlimited cheap undo was that it was going to be
quite difficult when users in different worlds, all of whom 'sprouted'
from a common, but very distant ancestor and who didn't sprout from
each other wanted to share information.  Many other people in the room
had pressing needs for such things now, and some of them were
interested in PyPy.

They concluded that what they would really like is a programming language
which multiple heaps -- in the sense of chunks of memory that you get
when you malloc -- guaranteed to be separate from each other.  I couldn't
tell them how hard this would be to implement in PyPy.  Aside from needing
to write a custom garbage collector, what else would be needed?

Laura

___
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev


Re: [pypy-dev] Continuations and sandboxing

2011-01-11 Thread William ML Leslie
Fijal is right, fwiw.  There are lots of wonderful things that *could*
be implemented, but what we *have* is the stackless transformation and
a low-level sandbox system.  Discussions of python architecture are
always enjoyable, but they don't have a proper place, as far as I
know; I often abuse #python for that.

On 11 January 2011 22:03, Laura Creighton l...@openend.se wrote:
 When I was last in the USA, I met Alessandro Warth, who now works for
 the Viewpoints Research Institute and whose Phd thesis which you can get
 on this page: http://www.tinlizzie.org/~awarth/ is about experiments in
 computer languages to do the sorts of things that you are interested in.
 He had a neat implementation of something he called 'worlds' to show off,
 and gave a talk about it.  It's chapter 4 of his thesis.

I first read this paper as background for pymeta - it's one of those
that you just can't put down.  There is one thing I felt was missing
from the worlds thesis though, namely, handling arrays.  To treat them
like any other ecmascript object results in strange semantics - if two
worlds push to the one array and both are merged, it is as if only one
of the pushes occurred.  Python lists have different usage patterns
than javascript objects, so handling their diffs correctly and
efficiently might require some experiments.

 They concluded that what they would really like is a programming language
 which multiple heaps -- in the sense of chunks of memory that you get
 when you malloc -- guaranteed to be separate from each other.  I couldn't
 tell them how hard this would be to implement in PyPy.  Aside from needing
 to write a custom garbage collector, what else would be needed?

How would that be different to separate pypy processes? Or is that
more like regions in cyclone, or like subinterpreters in TCL?

-- 
William Leslie

PS: worlds are an ideal use case for first-class algebraic effects!
___
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev


Re: [pypy-dev] Continuations and sandboxing

2011-01-11 Thread holger krekel
Hi Maciej,

it's not clear to me what posts your are refering to. Certainly
not the original post from Nathan?

If you think it's a general issue then do a top-level posting and please 
suggest how non-active (defined how?) pypy developers are supposed to 
know if some topic is interesting without posting about it.  And include 
some concrete rules about when side discussions are ok and when not.   
Probably all harder to do than improving your ignoring techniques?

cheers,
holger

On Tue, Jan 11, 2011 at 09:48 +0200, Maciej Fijalkowski wrote:
 Hello.
 
 This discussion have drifted considerably. In general this is the pypy
 development mailing list, so personally I would like to keep
 discussions only to things that are either very interesting for people
 actively working on PyPy or have remote possibility that someone will
 start working on them. Discussing STM without someone being interested
 in actually looking is out of topic.
 
 PS. Feel free to correct me if anyone working actively on pypy is
 finding this discussion useful for PyPy development.
 
 Cheers,
 fijal
 
 On Tue, Jan 11, 2011 at 9:43 AM, William ML Leslie
 william.leslie@gmail.com wrote:
  On 11 January 2011 15:25, Nathanael D. Jones nathanael.jo...@gmail.com 
  wrote:
  Regardng #1: Sandboxing is a major concern for me. Different code will need
  different sandboxing levels depending upon who created/approved the code.
  I can't have everything in one sandbox - I need isolated boxes on a
  per-request level.
 
  You will probably want to implement a capability architecture, where
  the authority to perform some action is bundled together with the
  method that invokes it.  The language E ( http://erights.org ) was
  designed for this sort of thing (I think it was written for a MUD
  originally), and would be worth looking at even if you didn't use the
  language itself.  Capability theory is sadly not well known,
  considering just how well it solves ever more common problems.
 
  The idea is that if functions return the next function to call instead of
  calling them, we have explicit tail-call elimination, and we have an
  explicit point at which we can rebuild the sandbox and upgrade the code.
 
  Right, this is the driver loop implementation of TCE.  There are
  several implementations of TCE kicking around, (including one I did
  for cpython some time ago, a decorator that would patch load ... call*
  ... return bytecodes with a call to a class that collects the function
  and arguments, and wraps the function in a driver loop that is
  bypassed when the target is marked as recursive, too).
 
  Unfortunately, using only implicit continuations, once you enter
  somebody's function, you have no way to get out of it - that person
  can except: and trap you there if they like.  E has a way to protect
  against this, but you can't protect against the function busy-looping
  without something more (a way to kill threads, specifically, which
  probably means taking down the process).
 
  Much data will be user scoped, and therefore lockable.
 
  What does lockable mean in this case?
 
  Under what conditions does reloading happen while a user continuation
  is in play?  How do you want continuations to interact with the
  locking?
 
  On Mon, Jan 10, 2011 at 5:05 PM, William ML Leslie
  william.leslie@gmail.com wrote:
 
  On 11 January 2011 07:18, Paolo Giarrusso p.giarru...@gmail.com wrote:
   The one issue that interests me is where you implement the persistence
   boundary - do you go with orthogonal persistence and act as if
   everything is preserved, or assume all user code is run within some
   sort of (fast and loose) transaction that can be re-entered at will,
   providing an API for persistent data access?  The second case makes
   the reloading question a bit more reasonable, because you can always
   throw away the current delta and replay the external effects, assuming
   the interface for the external events hasn't changed significantly.
  
   The key question is: when would you start and commit such transactions?
  
   Apart from that, your idea looks very similar to Software
   Transactional Memory (STM). STM restarts explicitly-marked
   transactions in a thread when some other thread modifies the affected
   data (which would be a data race) and commits its transaction. In your
   case, a transaction is restarted when some other thread modifies the
   involved code.
 
  I don't think it would be useful at all to have regular STM
  transactions delimited at the room boundary - there are surely going
  to be other people interacting with the room at the same time as you,
  and sending everyone else back to the start of the room every time
  someone modifies the room is going to get old very quickly.  The
  boundary introduced by modifying code is probably significantly more
  coarse.
 
  If you are storing (as the transaction log) only the external effects
  (user input, random.* output), however, you can build 

Re: [pypy-dev] Continuations and sandboxing

2011-01-11 Thread Nathanael D. Jones
@fijal:
I agree that parts of this topic have drifted off-topic for the py-py dev
mailing list, so I've created a new mailing list specifically for
discussions of collaboratively-edited software.

I'm inviting everyone to join the new mailing list at
collaborative-ga...@googlegroups.com

As far as the discussion regarding PyPy's capabilities with
multiple-sandboxing, reloading, and continuation serialization go, I think
that is a relevant discussion for this forum.

I'm still very interested in understanding (a) what PyPy can already do, and
(b) what is involved in adding the features I need. I'm itching to work on
PyPy :)

@Laura + William:

Alessandro's paper concisely describes what I was planning on doing, with
the exception that the sprout() method was taking place in Java.

After reading the paper I see the utility of also allowing sprouting within
Javascript.

I am a bit concerned about data size, however. Loading all world data is an
expensive op, even if it creates a simple way to perform
psuedo-transactions.

Nathanael

http://nathanaeljones.com

On Tue, Jan 11, 2011 at 6:51 AM, William ML Leslie 
william.leslie@gmail.com wrote:

 Fijal is right, fwiw.  There are lots of wonderful things that *could*
 be implemented, but what we *have* is the stackless transformation and
 a low-level sandbox system.  Discussions of python architecture are
 always enjoyable, but they don't have a proper place, as far as I
 know; I often abuse #python for that.

 On 11 January 2011 22:03, Laura Creighton l...@openend.se wrote:
  When I was last in the USA, I met Alessandro Warth, who now works for
  the Viewpoints Research Institute and whose Phd thesis which you can get
  on this page: http://www.tinlizzie.org/~awarth/ is about experiments in
  computer languages to do the sorts of things that you are interested in.
  He had a neat implementation of something he called 'worlds' to show off,
  and gave a talk about it.  It's chapter 4 of his thesis.

 I first read this paper as background for pymeta - it's one of those
 that you just can't put down.  There is one thing I felt was missing
 from the worlds thesis though, namely, handling arrays.  To treat them
 like any other ecmascript object results in strange semantics - if two
 worlds push to the one array and both are merged, it is as if only one
 of the pushes occurred.  Python lists have different usage patterns
 than javascript objects, so handling their diffs correctly and
 efficiently might require some experiments.

  They concluded that what they would really like is a programming language
  which multiple heaps -- in the sense of chunks of memory that you get
  when you malloc -- guaranteed to be separate from each other.  I couldn't
  tell them how hard this would be to implement in PyPy.  Aside from
 needing
  to write a custom garbage collector, what else would be needed?

 How would that be different to separate pypy processes? Or is that
 more like regions in cyclone, or like subinterpreters in TCL?

 --
 William Leslie

 PS: worlds are an ideal use case for first-class algebraic effects!

___
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev

Re: [pypy-dev] Continuations and sandboxing

2011-01-11 Thread Carl Friedrich Bolz
On 01/11/2011 04:48 PM, Nathanael D. Jones wrote:
 As far as the discussion regarding PyPy's capabilities with
 multiple-sandboxing, reloading, and continuation serialization go, I
 think that is a relevant discussion for this forum.

 I'm still very interested in understanding (a) what PyPy can already do,
 and (b) what is involved in adding the features I need. I'm itching to
 work on PyPy :)

PyPy implements two things that are related somehow:

1) process sandboxing, which is just a way to control what one instance 
of an interpreter is allowed to do in terms of system calls, CPU and 
memory used. This cannot be controlled in a fine-grained way, every 
sandbox needs to be its own process, and communication is not 
implemented yet.

2) stackless, which is a way for interpreters to get some control over 
the C stack, including being able to artificially create one. This does 
not mean that you get serialization for free, it's just that we 
implemented it carefully for Python.


To go further, I really think you need careful language design to get 
the capability model and the versioning/updating right. IMO you won't be 
able to extend standard Python, the semantics of it are too complex to 
shoehorn something with (so far) vague goals on top of it.

This language design is partially orthogonal to what PyPy provides you 
with. After thinking about what your language should look like, you can 
consider to use PyPy as an implementation platform for it, or not.

Cheers,

Carl Friedrich
___
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev


Re: [pypy-dev] Continuations and sandboxing

2011-01-10 Thread William ML Leslie
On 10 January 2011 15:24, Nathanael D. Jones nathanael.jo...@gmail.com wrote:
 Hi folks,

Hi!

 2) Serializeable continuations. With gameplay being based on good plot and
 story flow, continuations are critical to allow straightforward
 implementation of 'workflows' that depend on user choice at every turn.
 3) Tail-call elimination.  By nature, players will accumulate a very large
 call stack. While this isn't terribly bad a first glance, the following
 issue combines with it to cause a very big problem:
 When code changes underneath a continuation, we need to determine how to
 resume flow. One option is annotating a checkpoint method in each code
 'file'. However, if a user's call stack includes every file in the system,
 each change will cause them to restart.
 Tail-call elimination would help eliminate unneeded stack frames and
 minimize re-spawning.

I wonder if you don't want instead to manage the (user) dynamic
context yourself - if players already need to remember to write all
actions in a tail-recursive manner, it is probably worth making that
API explicit, and if you do that, the underlying implementation
doesn't need to do TCE at all.  But I guess the existing stackless
support should be good enough if you just want one-shot continuations.

 4) Dynamic code loading. Users will be able to 'branch' their own version of
 the world and share it with others. There may be thousands of versions of a
 class, and they need to be able to execute in separate sandboxes at the same
 time. Source code will be pulled from a Git repository or some kind of
 versioning database.

Quite like this idea.

You do have to deal with a bunch of (fairly well known) problems,
which any specific implementation of dynamic code loading is going to
need to solve (or not).  Pypy doesn't currently implement any
hot-schema-change magic, and reloading has always been error prone in
the presence of state.  First-class mutable types make it particularly
difficult (there is no sensible answer to what it means to reload a
python class).

The one issue that interests me is where you implement the persistence
boundary - do you go with orthogonal persistence and act as if
everything is preserved, or assume all user code is run within some
sort of (fast and loose) transaction that can be re-entered at will,
providing an API for persistent data access?  The second case makes
the reloading question a bit more reasonable, because you can always
throw away the current delta and replay the external effects, assuming
the interface for the external events hasn't changed significantly.

-- 
William Leslie
___
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev


Re: [pypy-dev] Continuations and sandboxing

2011-01-10 Thread William ML Leslie
On 11 January 2011 07:18, Paolo Giarrusso p.giarru...@gmail.com wrote:
 Hi all,

 On Mon, Jan 10, 2011 at 09:22, William ML Leslie
 william.leslie@gmail.com wrote:
 On 10 January 2011 15:24, Nathanael D. Jones nathanael.jo...@gmail.com 
 wrote:
 4) Dynamic code loading. Users will be able to 'branch' their own version of
 the world and share it with others. There may be thousands of versions of a
 class, and they need to be able to execute in separate sandboxes at the same
 time. Source code will be pulled from a Git repository or some kind of
 versioning database.

 Quite like this idea.

 You do have to deal with a bunch of (fairly well known) problems,
 which any specific implementation of dynamic code loading is going to
 need to solve (or not).  Pypy doesn't currently implement any
 hot-schema-change magic, and reloading has always been error prone in
 the presence of state.  First-class mutable types make it particularly
 difficult (there is no sensible answer to what it means to reload a
 python class).

 You might want to reuse the solutions to those issues used in the Java
 (and maybe .NET) world. Java allows reloading a class in a different
 classloader, and that has been used inside OSGi (see
 http://www.osgi.org/About/Technology).
 Not sure about the solution in OSGi, but Java serialization allows to
 serialize an instance of version 1 of a class, and to de-serialize it
 with version 2 of that class, if v2 takes extra care for that; one
 could use this to convert existing instances.

Sure, and there is also Dynamic Class Evolution for Hotspot, and many
other VMs support some form of reloading and redefinition (goops is an
exceptional and accessible example, see
http://wingolog.org/archives/2009/11/09/class-redefinition-in-guile
for a post with lots of diagrams).

What I am trying to say is that the *ability* to do reloading does not
mean your code will work in the presence of interface changes.  You
have to decide whether updating a closure in a way that captures more
variables or changes the signature updates existing instances of that
closure: if it does, there will be entries missing in the closure
slot, if it doesn't, older closures won't be callable in the same way
as the new closures.  No matter what, you here need to write code to
be explicitly reloadable, or provide some way to instruct the runtime
what to do for each schema change.


 The one issue that interests me is where you implement the persistence
 boundary - do you go with orthogonal persistence and act as if
 everything is preserved, or assume all user code is run within some
 sort of (fast and loose) transaction that can be re-entered at will,
 providing an API for persistent data access?  The second case makes
 the reloading question a bit more reasonable, because you can always
 throw away the current delta and replay the external effects, assuming
 the interface for the external events hasn't changed significantly.

 The key question is: when would you start and commit such transactions?

 Apart from that, your idea looks very similar to Software
 Transactional Memory (STM). STM restarts explicitly-marked
 transactions in a thread when some other thread modifies the affected
 data (which would be a data race) and commits its transaction. In your
 case, a transaction is restarted when some other thread modifies the
 involved code.

There is a bit of ambiguity in the some other thread modifies here.
I don't know what synchronisation and communication is going on in
your game, but I suspect that it only rarely interacts with reloading
code in an interesting way.  I'll reply to this properly in another
email, I'd better get back to work :)

-- 
William Leslie
___
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev


Re: [pypy-dev] Continuations and sandboxing

2011-01-10 Thread Nathanael D. Jones
Regardng #1: Sandboxing is a major concern for me. Different code will need
different sandboxing levels depending upon who created/approved the code.
I can't have everything in one sandbox - I need isolated boxes on a
per-request level.

I think I see a way to sidestep the need for #3 and also help with
hot-swapping.

If I try to empty the stack as often as possible, it should provide frequent
opportunities for 'free' reloading.

I.E. (psuedocode)

def garden():
  if  btnHouse.clicked: return house

def house():
  if btnGarden.clicked: return garden


def loop():
  func = house;
  while(true):
  #TODO: Update func to the latest version
  #TODO: Build an execution sandbox appropriate to the security
clearance func has
  func = sandbox.call(func)
  print Your are now in the  + func


The idea is that if functions return the next function to call instead of
calling them, we have explicit tail-call elimination, and we have an
explicit point at which we can rebuild the sandbox and upgrade the code.


Regarding the persistence boundary, I've seen some very good points come up.

A certain amount of orthogonal persistence is needed in the form of a
continuation, but that would only be function-scoped data. I don't intent to
allow users to use global variables or suchlike to persist data, I want a
tailored API.

Much data will be user scoped, and therefore lockable.

However, some data will be shared across all users.  I'm not sure what the
best way to handle this is. With sandboxing and full control over the
persistence API I could theoretically implement STM.

I want to make sure the architecture is very scalable, so I've been
considering something like BigTable or SimpleDB as the persistence store.
Here transaction and locking options are more limited.

Thoughts?

Nathanael

On Mon, Jan 10, 2011 at 5:05 PM, William ML Leslie 
william.leslie@gmail.com wrote:

 On 11 January 2011 07:18, Paolo Giarrusso p.giarru...@gmail.com wrote:
  Hi all,
 
  On Mon, Jan 10, 2011 at 09:22, William ML Leslie
  william.leslie@gmail.com wrote:
  On 10 January 2011 15:24, Nathanael D. Jones nathanael.jo...@gmail.com
 wrote:
  4) Dynamic code loading. Users will be able to 'branch' their own
 version of
  the world and share it with others. There may be thousands of versions
 of a
  class, and they need to be able to execute in separate sandboxes at the
 same
  time. Source code will be pulled from a Git repository or some kind of
  versioning database.
 
  Quite like this idea.
 
  You do have to deal with a bunch of (fairly well known) problems,
  which any specific implementation of dynamic code loading is going to
  need to solve (or not).  Pypy doesn't currently implement any
  hot-schema-change magic, and reloading has always been error prone in
  the presence of state.  First-class mutable types make it particularly
  difficult (there is no sensible answer to what it means to reload a
  python class).
 
  You might want to reuse the solutions to those issues used in the Java
  (and maybe .NET) world. Java allows reloading a class in a different
  classloader, and that has been used inside OSGi (see
  http://www.osgi.org/About/Technology).
  Not sure about the solution in OSGi, but Java serialization allows to
  serialize an instance of version 1 of a class, and to de-serialize it
  with version 2 of that class, if v2 takes extra care for that; one
  could use this to convert existing instances.

 Sure, and there is also Dynamic Class Evolution for Hotspot, and many
 other VMs support some form of reloading and redefinition (goops is an
 exceptional and accessible example, see
 http://wingolog.org/archives/2009/11/09/class-redefinition-in-guile
 for a post with lots of diagrams).

 What I am trying to say is that the *ability* to do reloading does not
 mean your code will work in the presence of interface changes.  You
 have to decide whether updating a closure in a way that captures more
 variables or changes the signature updates existing instances of that
 closure: if it does, there will be entries missing in the closure
 slot, if it doesn't, older closures won't be callable in the same way
 as the new closures.  No matter what, you here need to write code to
 be explicitly reloadable, or provide some way to instruct the runtime
 what to do for each schema change.

 
  The one issue that interests me is where you implement the persistence
  boundary - do you go with orthogonal persistence and act as if
  everything is preserved, or assume all user code is run within some
  sort of (fast and loose) transaction that can be re-entered at will,
  providing an API for persistent data access?  The second case makes
  the reloading question a bit more reasonable, because you can always
  throw away the current delta and replay the external effects, assuming
  the interface for the external events hasn't changed significantly.
 
  The key question is: when would you start and commit such transactions?
 
  

Re: [pypy-dev] Continuations and sandboxing

2011-01-09 Thread Dima Tisnek
First of all, congratulations on getting THE idea, I think user
content is essential too, but I see why commercial games shy away from
it
 An open source game could do really well!

1, yes, but not in a way you want (you want many boxes that can't
touch each other, right?)
2, yes, stackless is the name/misnomer
3a, I doubt is relevant, you have to show that your workload has a
substantial number of tail calls; stack depth is not that important
with stackless.
3b, I imagine this is impossible in general case, how do you see this
done in any language?
3 again, yes of course, and I bet you knew this already.
4 indeed, unloading old code is a bit harder as you have to make sure
all references to module or function or class are lost in timely
fashion.

p.s. I'm not an expert on pypy yet

On 9 January 2011 21:24, Nathanael D. Jones nathanael.jo...@gmail.com wrote:
 Hi folks,
 I've got a challenging set of requirements that I think PyPy may be able to
 meet, but I'd like to get some feedback before I jump in.
 I'm developing a collaboratively edited game. It's not a MUD, but similar in
 concept, and also text based in nature.
 HogwartsLive.com and lotgd.net are examples of the genre I'm going for.
 I believe if a game of that genre makes user modification simple and
 rewarding enough, it could be self-sustaining and grow quickly enough to
 keep users interested perpetually, like Wikipedia.
 It's also a fascinating computer science challenge, because it requires a
 LOT from a computer language.
 1) Sandboxing. To allow users to make changes to the game without being able
 to cheat at it or escalate privileges.
 2) Serializeable continuations. With gameplay being based on good plot and
 story flow, continuations are critical to allow straightforward
 implementation of 'workflows' that depend on user choice at every turn.
 3) Tail-call elimination.  By nature, players will accumulate a very large
 call stack. While this isn't terribly bad a first glance, the following
 issue combines with it to cause a very big problem:
 When code changes underneath a continuation, we need to determine how to
 resume flow. One option is annotating a checkpoint method in each code
 'file'. However, if a user's call stack includes every file in the system,
 each change will cause them to restart.
 Tail-call elimination would help eliminate unneeded stack frames and
 minimize re-spawning.
 3) Dynamic, strong typing and metaprogramming are key for keeping the API
 simple.
 4) Dynamic code loading. Users will be able to 'branch' their own version of
 the world and share it with others. There may be thousands of versions of a
 class, and they need to be able to execute in separate sandboxes at the same
 time. Source code will be pulled from a Git repository or some kind of
 versioning database.
 I'm interested in knowing which of these PyPy already does, and which of
 them I can make it do.
 I appreciate your help!
 Nathanael Jones
 http://nathanaeljones.com/



 ___
 pypy-dev@codespeak.net
 http://codespeak.net/mailman/listinfo/pypy-dev

___
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev