With the older Hadoop-based recommenders you can use ratings or binary data. As
Ted says binary data is pretty much always better. Your error is in treating
any rating as a preference. A rating of 1 is unlikely to indicate a preference.
Also you may have unresolved problems in your user and
On Sun, Nov 29, 2015 at 9:36 PM, Niklas Ekvall
wrote:
> My conclusion is that recommenditembased in Mahout works better for ratings
> than binary data, what is your conclusions?
>
Still operator error somewhere. Binary data works much better as a real
recommender.
Hello again Pat!
I did find a testcase that I was able to recreate:
1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
5,106,4.0
bin/mahout
I wouldn’t pre-filter but in any case the ids input to hadoop-mahout need to
follow those rules.
The new recommender I mentioned has no such requirements, it uses string IDs.
On Nov 24, 2015, at 11:44 AM, Niklas Ekvall wrote:
No, it does not start from 0 and does not
> On Nov 24, 2015, at 12:21 PM, Niklas Ekvall wrote:
>
> Okay!
>
> No pre-filter and the user/item ids should start from 0 and go as many user
> and items there are. So, all the data we have should go into Mahout and we
> filter inside Mahoutcorrect?
Yes, but I
No, it does not start from 0 and does not cover all number between 0 and
the number of items/users. We do a prefiltering before (a user must have
bought at lest 5 product and a product must have been bought by 3 users)
we use Mahout on the dataset. Therefore we start with user 3, then it jumps
to
Okay!
No pre-filter and the user/item ids should start from 0 and go as many user
and items there are. So, all the data we have should go into Mahout and we
filter inside Mahoutcorrect?
We do the same pre-filter for Spark item-similarity, is that wrong to?
Best regards, Niklas
On Tuesday,
Do your ids start with 0 and cover all numbers between 0 and the number of
items -1 (same for user ids)?
The old hadoop-mahout code required ordinal ids starting at 0
On Nov 24, 2015, at 8:19 AM, Niklas Ekvall wrote:
Hi Pat,
Here is some input:
3 7414
3
Hi Pat,
Here is some input:
3 7414
3 12682
3 18947
3 19980
3 26975
3 54635
3 67789
3 73212
3 118932
3 138846
3 141268
5 3
5 2123
5 37955
5 39975
5 113289
6 3
6 456
6 2188
6
Hello Mahout Users!
I use today Mahout - Recommenditembased with Log-similarity to produce
personal recommendations for Trigger Eamils in a offline mode. But when I
produce e.g. 50 recommendations the rank value of the recommendations are
always of magnitude 1. Why is this so? And, is the first
Sounds like you may not have the input right. Recommendations should be sorted
by the strength and so shouldn’t all be 1 unless the data is very odd.
Can you give us a small sample of the input?
BTW a newer recommender using Mahout’s Spark based code and a search engine is
here:
11 matches
Mail list logo