The test I was referring to was the included KMeans algorithm, which uses
NumPy for PySpark but can be done without jBlas in scala, so it's more
testing basic performance, not matrix libraries.

I can certainly try the ALS test, though note that the scala example you
pointed to uses Colt, whereas most of MlLib at this point uses jBlas, so
probably most relevant to compare to something using jBlas (or simply
rewrite that example to use jBlas). 

I basically agree with Evan that if you're only using matrices, and not the
richer features of SciPy/NumPy, scala is the way to go, but I'll report back
with more tests. I also like Josh's suggestion of adding proper PySpark
benchmarking, I'll take a stab at that.

-- Jeremy



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048p1099.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to