hi, We are looking for a Scikit-Learn/Python fan interested in helping us to implement native persistence of Scikit-Learn estimators and data. The technology we plan to use is called NEO (http://www.neoppod.org/). It is a distributed object database that can store serialized python objects on a redundant array of inexpensive computers. It is based on the ZODB protocol but supports high performance and distributed architecture. NEO is already used by ERP5, an open source ERP/CRM that powers large companies and governments.
The main task will consist in allowing Numpy ndarrays to implement ZODB's Persistent class interface. We are also considering adding a meta-object protocol that can be used to extend the representation of Numpy arrays and distribute them across multiple storage nodes transparently. One application of this project consists in analyzing with Scikit-Learn large collections of logs from a Cloud Computing infrastructure in order to implement predictive decisions that can help increasing resiliency: predict process migration, predict disaster recovery, etc. However, the general goal of this work goes beyond this initial application and intends to create a native distributed storage for Scikit-Learn that is flexible enough for a wide range of applications. Future applications that are considered include: internet of things, large scale scientific data processing (neuroimaging, chemistry, genomics, physics etc.), financials, discrete simulation, etc. If you're interested, please send me a resume. If you have a github login please share. Position is available now for a period of 8 months (renewal possible). It will be located at Telecom ParisTech in downtown Paris. Prior experience with sklearn is a must have. Experience with object databases is a plus. Best, Alex ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general