[GitHub] spark pull request: [WIP][MLLIB][SPARK-4675][SPARK-4823]RowSimilar...

debasish83 Sat, 23 May 2015 19:16:56 -0700

Github user debasish83 commented on the pull request:

    https://github.com/apache/spark/pull/6213#issuecomment-104968079
  
    Runtime comparison are posted on SPARK-4823 on MovieLens1m dataset, 8 core, 
4 GB executor memory from my laptop.
    
    Stage 24 - 35 is the row similarity flow. Total runtime ~ 20 s
    Stage 64 is col similarity mapPartitions. Total runtime ~ 4.6 mins
    
    I have not yet gone to gemv which will decrease the runtime further but 
will add some approximations in RBFKernel. I think for users we should give 
both vector based flow and gemv based flow to let them choose what they want.
    
    I updated the driver code in examples.mllib.MovieLensSimilarity
    
    @MLnick @sowen could you please take a look at 
examples.mllib.MovieLensSimilarity ? I am running ALS in implicit mode with no 
regularization (basically full RMSE optimization) and comparing similarities as 
generated from raw features and item similarities. 
    
    I get topK=50 from raw features as golden labels and find MAP on top50 
predictions from MatrixFactorizationModel.similarItems() that this PR added.
    
    I will add a testcase for RBFKernel and add the PowerIterationClustering 
driver to use IndexedRowMatrix.rowSimilarities code before taking out WIP label 
from the PR.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [WIP][MLLIB][SPARK-4675][SPARK-4823]RowSimilar...

Reply via email to