Hello again Pat!

I did find a testcase that I was able to recreate:

1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
5,106,4.0

bin/mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i
/path/to/input/file -o /path/to/output/folder/ —numRecommendations 1

Output:

1 [104:2.8088317]
2 [105:3.5743618]
3 [103:4.336442]
4 [105:3.6903737]
5 [107:3.663558]

But when I change the ratings above to ones I got only ones in the
output-file for the recommendations values:

1 [104:1.0]
2 [105:1.0]
3 [103:1.0]
4 [105:1.0]
5 [107:1.0]

My conclusion is that recommenditembased in Mahout works better for ratings
than binary data, what is your conclusions?

Best, Niklas

2015-11-24 21:56 GMT+01:00 Pat Ferrel <p...@occamsmachete.com>:

>
>
> > On Nov 24, 2015, at 12:21 PM, Niklas Ekvall <niklas.ekv...@gmail.com>
> wrote:
> >
> > Okay!
> >
> > No pre-filter and the user/item ids should start from 0 and go as many
> user
> > and items there are. So, all the data we have should go into Mahout and
> we
> > filter inside Mahout....correct?
>
> Yes, but I wouldn't filter. The recs will very likely be better than
> random with only a small number of events.
>
> >
> > We do the same pre-filter for Spark item-similarity, is that wrong to?
>
> No, spark-itemsimilarity uses string ids.
>
> >
> > Best regards, Niklas
> >
> > On Tuesday, November 24, 2015, Pat Ferrel <p...@occamsmachete.com> wrote:
> >
> >> I wouldn’t pre-filter but in any case the ids input to hadoop-mahout
> need
> >> to follow those rules.
> >>
> >> The new recommender I mentioned has no such requirements, it uses string
> >> IDs.
> >>
> >> On Nov 24, 2015, at 11:44 AM, Niklas Ekvall <niklas.ekv...@gmail.com
> >> <javascript:;>> wrote:
> >>
> >> No, it does not start from 0 and does not cover all number between 0 and
> >> the number of items/users. We do a prefiltering before (a user must have
> >> bought at lest 5 product and a product must have been  bought by 3
> users)
> >> we use Mahout on the dataset. Therefore we start with user 3, then it
> jumps
> >> to user 5, etc.
> >>
> >> Is this wrong? Should we use all data as input to Mahout and do the
> >> filtring inside Mahout?
> >>
> >> We use the second latest version of Mahout!
> >>
> >> Best regards, Niklas
> >>
> >> On Tuesday, November 24, 2015, Pat Ferrel <p...@occamsmachete.com
> >> <javascript:;>
> >> <javascript:_e(%7B%7D,'cvml','p...@occamsmachete.com <javascript:;>');>>
> >> wrote:
> >>
> >>> Do your ids start with 0 and cover all numbers between 0 and the number
> >> of
> >>> items -1 (same for user ids)?
> >>> The old hadoop-mahout code required ordinal ids starting at 0
> >>>
> >>>
> >>> On Nov 24, 2015, at 8:19 AM, Niklas Ekvall <niklas.ekv...@gmail.com
> >> <javascript:;>>
> >>> wrote:
> >>>
> >>> Hi Pat,
> >>>
> >>> Here is some input:
> >>>
> >>> 3       7414
> >>> 3       12682
> >>> 3       18947
> >>> 3       19980
> >>> 3       26975
> >>> 3       54635
> >>> 3       67789
> >>> 3       73212
> >>> 3       118932
> >>> 3       138846
> >>> 3       141268
> >>> 5       3
> >>> 5       2123
> >>> 5       37955
> >>> 5       39975
> >>> 5       113289
> >>> 6       3
> >>> 6       456
> >>> 6       2188
> >>> 6       2496
> >>> 6       6194
> >>> 6       6361
> >>> 6       6768
> >>> 6       6919
> >>> 6       6920
> >>> 6       7257
> >>> 6       7705
> >>> 6       7706
> >>> 6       11788
> >>>
> >>> And some output:
> >>>
> >>> 3
> >>>
> >>>
> >>
> [122086:1.0,1846:1.0,74638:1.0,63240:1.0,87540:1.0,2742:1.0,2981:1.0,8325:1.0,145598:1.0,49675:1.0,131388:1.0,72113:1.0,3493:1.0,56131:1.0,30422:1.0,87829:1.0,111190:1.0,13597:1.0,83436:1.0,61772:1.0]
> >>> 5
> >>>
> >>>
> >>
> [32349:1.0,29413:1.0,111896:1.0,61845:1.0,50016:1.0,1607:1.0,15237:1.0,133229:1.0,65805:1.0,34034:1.0,133071:1.0,28894:1.0,18658:1.0,32095:1.0,4402:1.0,47522:1.0,31022:1.0,23936:1.0,6243:1.0,53214:1.0]
> >>> 6
> >>>
> >>>
> >>
> [40756:1.0,34420:1.0,31153:1.0,114717:1.0,53945:1.0,71148:1.0,26095:1.0,112941:1.0,55284:1.0,111346:1.0,112201:1.0,65759:1.0,133127:1.0,61378:1.0,16413:1.0,113289:1.0,49675:1.0,14995:1.0,141028:1.0,27506:1.0]
> >>>
> >>> Best regards, Niklas
> >>>
> >>> 2015-11-24 16:48 GMT+01:00 Pat Ferrel <p...@occamsmachete.com
> >> <javascript:;>>:
> >>>
> >>>> Sounds like you may not have the input right. Recommendations should
> be
> >>>> sorted by the strength and so shouldn’t all be 1 unless the data is
> very
> >>>> odd.
> >>>>
> >>>> Can you give us a small sample of the input?
> >>>>
> >>>>
> >>>> BTW a newer recommender using Mahout’s Spark based code and a search
> >>>> engine is here:
> >>>>
> >>>
> >>
> https://github.com/PredictionIO/template-scala-parallel-universal-recommendation
> >>>> a single machine install script is here:
> >>> https://docs.prediction.io/start/
> >>>>
> >>>> On Nov 24, 2015, at 2:16 AM, Niklas Ekvall <niklas.ekv...@gmail.com
> >> <javascript:;>>
> >>>> wrote:
> >>>>
> >>>> Hello Mahout Users!
> >>>>
> >>>> I use today Mahout - Recommenditembased with Log-similarity to produce
> >>>> personal recommendations for Trigger Eamils in a offline mode. But
> when
> >> I
> >>>> produce e.g. 50 recommendations the rank value of the recommendations
> >> are
> >>>> always of magnitude 1. Why is this so? And, is the first
> recommendations
> >>> in
> >>>> this list the best one or is there some randomness in this list?
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Niklas Ekvall
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
>

Reply via email to