New to Spark and MLlib. Coming from sickit learn. I am launching my Spark 1.6 instance through AWS EMR and pyspark. All the examples using Mllib work fine.
But I have seen a couple examples where you can combine scikit learn packages and syntax with mllib. Like in this example- https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html However, it does not seem that Pyspark on AWS EMR comes with scikit (or other standard pydata packages) loaded. Is this something you can/should load on pyspark and how would you do it? Thanks for assisting. Myles