2013/10/5 ML Fan <[email protected]>: > Hi, > > Do we know what's the best way to run scikit learn based code over hadoop > streaming? > Note that I don't have sudo access to the cluster and so only can install > packages on my local directories. > > I tried this: > http://stackoverflow.com/questions/6811549/how-can-i-include-a-python-package-with-hadoop-streaming-job > but python zipimporter didn't work. > > Anyone tried this before? Any thoughts?
You can try hadoopy that leverages PyInstaller for freezing scikit-learn, numpy, scipy and the related shared libraries such as blas / lapack / atlas. I am not sure it will work out of the box but I am interested in detailed bug reports if it does not. http://www.hadoopy.com/en/latest/ -- Olivier ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
