2014-09-01 12:39 GMT+02:00 Anders Aagaard <[email protected]>:
> Data sync is a very good point.. and will vary greatly depending on how we
> set things up. If we do a single major server thing we can probably get
> people to scp things in, if we use containers that are started up and killed
> off on VM's that's not really a good option.
>
> I've used reverse sshfs (mounting a local directory into a directory on the
> host) with success, but that's a fairly platform specific solution, and
> won't really work for a lot of the consumers...
>
> Another important point is data safety really. We're doing dumps of massive
> amounts of company data, I'd prefer if that data wasn't available on any
> laptops. The python code can (and should be) available, but the entire data
> dumps would be nice if were kept as safe as possible. (while still of course
> granting developers raw access to it)

This is why I recommend to use the cloud storage for that, with a
local working copy synced in the container at startup and shutdown.

Cloud storage services like Amazon S3, Google Storage, Rackspace Cloud
Files and Azure Blob Store allow for highly concurrent access to
replicated data with optional access control policies, encryption and
automated replication (potentially cross-datacenters).

Furthermore the connection between cloud compute and cloud storage
(e.g. Amazon EC2 to S3) can support high throughput via concurrent
calls.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to