[sage-support] Re: [[email protected]: Re: sage server]

William Stein Thu, 29 Oct 2009 09:53:25 -0700

On Thu, Oct 29, 2009 at 9:05 AM, Jason Grout
<[email protected]> wrote:
>
> David Guichard wrote:
>> Thanks Robert--works fine.
>>
>> I sort of answered my question about simultaneous connections:
>> apparently one account in the server pool can be used by two users--it
>> appears to just log in twice as that user. There must be a good reason
>> to have more users in the server pool, though, right? What is it? I
>> did notice that when I stopped the server one of the logins stayed
>> alive and I had to kill it manually.
>>
>
>
> The server pool is just a load-balancing and security thing. When a user
> starts up a Sage session, Sage selects on of the entries in the server
> pool (using random or round-robin load balancing, I think) and uses that
> to start the user's session.
>
> At least, that's how it was used to work.  I don't know if the new
> notebook code changed things.


This is all still true, and the relevant code hasn't changed at all.

> In general, there is no connection between the number of notebook users
> and the number of server pool entries.  The nice security thing about
> having lots of server-pool entries even when on the same server is that
> you can make each a separate unix account.  Then simultaneous logins
> likely won't be running on the same unix account, and so can't mess with
> each other's files in a malicious (or nonmalicious) way.  At least, I
> think permissions are set up so the workspace is not writeable by other
> people.

This motivation has changed significantly.   When the notebook server
evaluates an input cell, the following now happens:

     (1) The notebook server writes the code to be evaluated to
/tmp/randomstuff/code.py (or something like that).
     (2) The worksheet process (which is being controlled over ssh)
changes its current directory to /tmp/randomstuff, which is world
writable.
     (3) The code is run which produces output, including output to
stdout and the creation of files (e.g., images).
     (4) The output is copied back to the private notebook server's
directory, which is *not* world writable.  Unfortunately, some of this
directory is world-readable right now, only because of the DATA
directories.
    (5) The tmp/randomstuff directory is deleted.

As you can see, the vandal-style damage that a user can do is vastly
more limited now.   It used to be the case that any user could delete
any files of any other user that happened to have been created by the
same worksheet process.  Now, even if the server pool has only one
account, there is only a very tiny window of opportunity for the user
to do something malicious (i.e., right when another user is evaluating
some code, maybe the code.py file could be changed, or the output
files be deleted).   Moreover, the worksheet process can be setup so
it only has write permissions on /tmp/ (i.e., don't give them a home
directory), and /tmp/ can be made a RAM disk.

In the near future I'll also fully support the worksheet processes
running on several completely separate virtual machines, which NSF
mount various /tmp directories, say /tmp0, /tmp1, /tmp2, etc.
Worksheet processes could then be assigned on a round-robin basis to
the virtual machines round-robin, and the virtual machines (and
corresponding /tmp) can be reset to their default state once per hour
(say).    Moreover, I can add a feature so in step (4) above, any file
beyond a certain size is flagged and not copied (instead, replaced by
a warning).    Morever, the server could limit the total maximum
number of worksheets a given user has to some hard coded limit.

I think the above design would mostly mitigate successfully against
every malicious attack I've personally heard of on the notebook.
Obviously, somebody could do something nasty to one machine for up to
one hour, but that's it.   The design scales well, in that even if n
users are trying to factor huge numbers (i.e., a seriously CPU bound
computation), the machine on which the notebook server runs is not
slowed down at all by this, since all computations run on a different
machine.    I would also imagine that adding or removing machines from
the pool could be done dynamically without having to restart the
notebook server.

If one removes support for the DATA directory, then the requires of
having a shared NFS /tmp directory could be removed, which would
significantly increase flexibility.  (I only mean that there could be
a way to start the notebook server without the DATA functionality, but
with more flexible worksheet processes, not that DATA would be gone in
any other modes.)

The above design would be complementary to everything currently
available -- i.e., it doesn't require changing any existing setups, if
you don't want to.

CREDIT: Martin Albrecht, Yoav Aner, and I came up with the above
design with  over dinner in Barcelona this summer.

 -- William

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-support] Re: [[email protected]: Re: sage server]

Reply via email to