Hello again Pat! I did find a testcase that I was able to recreate:
1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0 4,104,4.5 4,106,4.0 5,101,4.0 5,102,3.0 5,103,2.0 5,104,4.0 5,105,3.5 5,106,4.0 bin/mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i /path/to/input/file -o /path/to/output/folder/ —numRecommendations 1 Output: 1 [104:2.8088317] 2 [105:3.5743618] 3 [103:4.336442] 4 [105:3.6903737] 5 [107:3.663558] But when I change the ratings above to ones I got only ones in the output-file for the recommendations values: 1 [104:1.0] 2 [105:1.0] 3 [103:1.0] 4 [105:1.0] 5 [107:1.0] My conclusion is that recommenditembased in Mahout works better for ratings than binary data, what is your conclusions? Best, Niklas 2015-11-24 21:56 GMT+01:00 Pat Ferrel <p...@occamsmachete.com>: > > > > On Nov 24, 2015, at 12:21 PM, Niklas Ekvall <niklas.ekv...@gmail.com> > wrote: > > > > Okay! > > > > No pre-filter and the user/item ids should start from 0 and go as many > user > > and items there are. So, all the data we have should go into Mahout and > we > > filter inside Mahout....correct? > > Yes, but I wouldn't filter. The recs will very likely be better than > random with only a small number of events. > > > > > We do the same pre-filter for Spark item-similarity, is that wrong to? > > No, spark-itemsimilarity uses string ids. > > > > > Best regards, Niklas > > > > On Tuesday, November 24, 2015, Pat Ferrel <p...@occamsmachete.com> wrote: > > > >> I wouldn’t pre-filter but in any case the ids input to hadoop-mahout > need > >> to follow those rules. > >> > >> The new recommender I mentioned has no such requirements, it uses string > >> IDs. > >> > >> On Nov 24, 2015, at 11:44 AM, Niklas Ekvall <niklas.ekv...@gmail.com > >> <javascript:;>> wrote: > >> > >> No, it does not start from 0 and does not cover all number between 0 and > >> the number of items/users. We do a prefiltering before (a user must have > >> bought at lest 5 product and a product must have been bought by 3 > users) > >> we use Mahout on the dataset. Therefore we start with user 3, then it > jumps > >> to user 5, etc. > >> > >> Is this wrong? Should we use all data as input to Mahout and do the > >> filtring inside Mahout? > >> > >> We use the second latest version of Mahout! > >> > >> Best regards, Niklas > >> > >> On Tuesday, November 24, 2015, Pat Ferrel <p...@occamsmachete.com > >> <javascript:;> > >> <javascript:_e(%7B%7D,'cvml','p...@occamsmachete.com <javascript:;>');>> > >> wrote: > >> > >>> Do your ids start with 0 and cover all numbers between 0 and the number > >> of > >>> items -1 (same for user ids)? > >>> The old hadoop-mahout code required ordinal ids starting at 0 > >>> > >>> > >>> On Nov 24, 2015, at 8:19 AM, Niklas Ekvall <niklas.ekv...@gmail.com > >> <javascript:;>> > >>> wrote: > >>> > >>> Hi Pat, > >>> > >>> Here is some input: > >>> > >>> 3 7414 > >>> 3 12682 > >>> 3 18947 > >>> 3 19980 > >>> 3 26975 > >>> 3 54635 > >>> 3 67789 > >>> 3 73212 > >>> 3 118932 > >>> 3 138846 > >>> 3 141268 > >>> 5 3 > >>> 5 2123 > >>> 5 37955 > >>> 5 39975 > >>> 5 113289 > >>> 6 3 > >>> 6 456 > >>> 6 2188 > >>> 6 2496 > >>> 6 6194 > >>> 6 6361 > >>> 6 6768 > >>> 6 6919 > >>> 6 6920 > >>> 6 7257 > >>> 6 7705 > >>> 6 7706 > >>> 6 11788 > >>> > >>> And some output: > >>> > >>> 3 > >>> > >>> > >> > [122086:1.0,1846:1.0,74638:1.0,63240:1.0,87540:1.0,2742:1.0,2981:1.0,8325:1.0,145598:1.0,49675:1.0,131388:1.0,72113:1.0,3493:1.0,56131:1.0,30422:1.0,87829:1.0,111190:1.0,13597:1.0,83436:1.0,61772:1.0] > >>> 5 > >>> > >>> > >> > [32349:1.0,29413:1.0,111896:1.0,61845:1.0,50016:1.0,1607:1.0,15237:1.0,133229:1.0,65805:1.0,34034:1.0,133071:1.0,28894:1.0,18658:1.0,32095:1.0,4402:1.0,47522:1.0,31022:1.0,23936:1.0,6243:1.0,53214:1.0] > >>> 6 > >>> > >>> > >> > [40756:1.0,34420:1.0,31153:1.0,114717:1.0,53945:1.0,71148:1.0,26095:1.0,112941:1.0,55284:1.0,111346:1.0,112201:1.0,65759:1.0,133127:1.0,61378:1.0,16413:1.0,113289:1.0,49675:1.0,14995:1.0,141028:1.0,27506:1.0] > >>> > >>> Best regards, Niklas > >>> > >>> 2015-11-24 16:48 GMT+01:00 Pat Ferrel <p...@occamsmachete.com > >> <javascript:;>>: > >>> > >>>> Sounds like you may not have the input right. Recommendations should > be > >>>> sorted by the strength and so shouldn’t all be 1 unless the data is > very > >>>> odd. > >>>> > >>>> Can you give us a small sample of the input? > >>>> > >>>> > >>>> BTW a newer recommender using Mahout’s Spark based code and a search > >>>> engine is here: > >>>> > >>> > >> > https://github.com/PredictionIO/template-scala-parallel-universal-recommendation > >>>> a single machine install script is here: > >>> https://docs.prediction.io/start/ > >>>> > >>>> On Nov 24, 2015, at 2:16 AM, Niklas Ekvall <niklas.ekv...@gmail.com > >> <javascript:;>> > >>>> wrote: > >>>> > >>>> Hello Mahout Users! > >>>> > >>>> I use today Mahout - Recommenditembased with Log-similarity to produce > >>>> personal recommendations for Trigger Eamils in a offline mode. But > when > >> I > >>>> produce e.g. 50 recommendations the rank value of the recommendations > >> are > >>>> always of magnitude 1. Why is this so? And, is the first > recommendations > >>> in > >>>> this list the best one or is there some randomness in this list? > >>>> > >>>> Best regards, > >>>> > >>>> Niklas Ekvall > >>>> > >>>> > >>> > >>> > >> > >> > > >