[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

debasish83 Tue, 11 Nov 2014 21:04:30 -0800

Github user debasish83 commented on the pull request:

    https://github.com/apache/spark/pull/3098#issuecomment-62670769
  
    @mengxr added recommendAll API to MatrixFactorizationModel and right now 
the catesian based topK finding is also in the code for validation...
    
    Example run:
    
    ./bin/spark-submit --master spark://TUSCA09LMLVT00C.local:7077 --jars 
/Users/v606014/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar
 --total-executor-cores 4 --executor-memory 4g --driver-memory 1g --class 
org.apache.spark.examples.mllib.MovieLensALS 
./examples/target/spark-examples_2.10-1.2.0-SNAPSHOT.jar --kryo --lambda 0.065 
--validateRecommendation 1.0 hdfs://localhost:8020/sandbox/movielens/
    
    Got 1000209 ratings from 6040 users on 3706 movies.
    Training: 800670, test: 199539.
    Test RMSE = 0.8485243993052966.
    Using recommendAll API
    k 20 prec@k 0.04393839019542896
    k 40 prec@k 0.039640609473335565
    k 60 prec@k 0.03763387435133046
    k 80 prec@k 0.0337777409738324
    k 100 prec@k 0.0318681682676383
    k 120 prec@k 0.0318289720658054
    k 140 prec@k 0.030209861354280044
    k 160 prec@k 0.028638415038092092
    k 180 prec@k 0.02780078024364211
    k 200 prec@k 0.027149718449817808
    Test userMapAPI = 0.02964195032393224
    
    Using Cartesian
    k 20 prec@k 0.05635144087446176
    k 40 prec@k 0.052252401457436246
    k 60 prec@k 0.04856188583416142
    k 80 prec@k 0.0453461411063266
    k 100 prec@k 0.04296621397813845
    k 120 prec@k 0.040878602186154356
    k 140 prec@k 0.03914612217858326
    k 160 prec@k 0.03766768797615105
    k 180 prec@k 0.03648559125538258
    k 200 prec@k 0.03540990394170256
    Test userMap = 0.038507998497677914
    
    Results with predictAll and cartesian should match but right now they are 
not same...debugging it further...
    
    From the JIRA reference https://issues.apache.org/jira/browse/SPARK-3066, 
implemented 2 ideas:
    1) collect one side (either user or product) and broadcast it as a matrix
    3) use Utils.takeOrdered to find top-k
    
    The third idea 2) use level-3 BLAS to compute inner products will be 
refactored once dense distributed matrix multiplication comes online...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

Reply via email to