Re: [Scikit-learn-general] Shared scikit/ipython server

2014-09-01 Thread Anders Aagaard
That is a good point.. with persistant storage sync this isn't a major issue. Thanks a lot for the input so far everyone. I'll probably end up spending some time on this over the next month or so, if I end up with some interesting scripts I'll stick them on github. On Mon, Sep 1, 2014 at 4:34 P

Re: [Scikit-learn-general] Shared scikit/ipython server

2014-09-01 Thread Sujit Pal
Hi Anders, >> The problem as I see it is the "tearing it down" bit, I don't want the jobs shutting down before the user has had a chance to get the resulting data, but I suspect if we let users shut them down themselfes a lot of them will sit around for no reason. With Amazon EMR you read and writ

Re: [Scikit-learn-general] Shared scikit/ipython server

2014-09-01 Thread Olivier Grisel
2014-09-01 12:39 GMT+02:00 Anders Aagaard : > Data sync is a very good point.. and will vary greatly depending on how we > set things up. If we do a single major server thing we can probably get > people to scp things in, if we use containers that are started up and killed > off on VM's that's not

Re: [Scikit-learn-general] Shared scikit/ipython server

2014-09-01 Thread Anders Aagaard
Data sync is a very good point.. and will vary greatly depending on how we set things up. If we do a single major server thing we can probably get people to scp things in, if we use containers that are started up and killed off on VM's that's not really a good option. I've used reverse sshfs (moun

Re: [Scikit-learn-general] Shared scikit/ipython server

2014-09-01 Thread Gavin Gray
I've used git-annex recently. It works basically like git, with a few caveats. I don't know if sparkleshare deals with large files in the same way but git-annex has no problems with very large data files. -Gavin On Mon, Sep 1, 2014 at 9:36 AM, Gael Varoquaux <

Re: [Scikit-learn-general] Shared scikit/ipython server

2014-09-01 Thread Olivier Grisel
Le 1 sept. 2014 10:37, "Gael Varoquaux" a écrit : > > On Mon, Sep 01, 2014 at 10:33:20AM +0200, Olivier Grisel wrote: > > However I do not know any ready made equivalent to dropbox that is > > vendor agnostic. > > I like SparkleShare: git-based distributed storage. > http://sparkleshare.org/ Base

Re: [Scikit-learn-general] Shared scikit/ipython server

2014-09-01 Thread Gael Varoquaux
On Mon, Sep 01, 2014 at 10:33:20AM +0200, Olivier Grisel wrote: > However I do not know any ready made equivalent to dropbox that is > vendor agnostic. I like SparkleShare: git-based distributed storage. http://sparkleshare.org/ Gaël --

Re: [Scikit-learn-general] Shared scikit/ipython server

2014-09-01 Thread Olivier Grisel
This might help: http://file-syncer.readthedocs.org/en/latest/ It looks like an equivalent to rsync based on libcloud. -- Olivier -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/

Re: [Scikit-learn-general] Shared scikit/ipython server

2014-09-01 Thread Olivier Grisel
> The problem as I see it is the "tearing it down" bit, I don't want the jobs > shutting down before the user has had a chance to get the resulting data, but > I suspect if we let users shut them down themselfes a lot of them will sit > around for no reason. I think it's important to provide th

[Scikit-learn-general] Shared scikit/ipython server

2014-08-31 Thread Anders Aagaard
Hi My company is considering setting up some infrastructure for ML. Right now we're either using our own laptops / google comput engine. Has anyone done this/found good tools for it? I was considering looking into GCE/amazon and auto scaling, maybe having it setup another ipython notebook (probabl