Hi,

We are trying to port over some code that uses Mahout Logistic Regression to
Mllib Logistic Regression and our preliminary performance tests indicate a
performance bottleneck. It is not clear to me if this is due to one of three
factors:

o Comparing apples to oranges
o Inadequate tuning
o Insufficient parallelism

The test results and the code that produced the results are below. I am
hoping that someone can shed some light on the performance problem we are
having. 

thanks much
-Raj

P.S. Apologies if this is a duplicate posting. I got a response to a
previous posting that suggested that the posting may not have correctly
registered. 

----- Mahout LR vs. Mllib LR -------------
Data       Cluster       MLLIb                    Mahout
size        type           Train Test  Rate      Train   Test  Rate
----        ------          ----- ----   ----        -----   ----    ----
100        local[*]       .03    .1    54          1.1    11      100
100        Cluster[6]   .036   .09  59          1       9        100
500,000 local[*]        32      9    83          326   1086   82
500,000 Cluster[6]     8       4    83          310   877     81

All rates are in records/milliseconds
The 100 dataset is the sample_libsvm_data.txt
My cluster was a set of 6 worker-machines on aws
Rate indicate the % of the test set that were labeled correctly
The latest versions of mllib (1.6) and Mahout (0.9) were used in the tests
-------------------------------------------- 
MllMahout.scala
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26346/MllMahout.scala>
  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Mllib-Logistic-Regression-performance-relative-to-Mahout-tp26346.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to