Github user coderh commented on the pull request:
https://github.com/apache/spark/pull/597#issuecomment-45847264
Here is the values I have tried: seed is set to 42
in & out means in sample (training set) out-of-sample (test set)
# #factor = 12, lamda = 1, alpha = 1
iter 20 =>
MAP_in = 0.035399855240788425
MAP_out = 0.007907455900941737
EPR_in = 0.4902389595686534
EPR_out = 0.4931204751436468
iter 40 =>
MAP_in = 0.033210624652830374
MAP_out = 0.007158070987320343
EPR_in = 0.4907502816419743
EPR_out = 0.49214166351173705
# #factor = 50, alpha = 1, iter = 30
lambda = 1, =>
MAP_in = 0.029096938174350682
MAP_out = 0.006634856811818636
EPR_in = 0.4928298931862564
EPR_out = 0.49328834081999423
lambda = 0.001 =>
MAP_in = 0.02903970778838223
MAP_out = 0.006569378517284138
EPR_in = 0.4929466287464198
EPR_out = 0.49337539845412665
I have not tried other metrics, as said before, RMSE is not that good. I
will give AUC and ROC a try.
I listed some code snippets here. There are 2 evaluation methods and the
main
https://gist.github.com/coderh/05a83be081c1f713e15b
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---