You need to create Mahout IDs, you can’t use your own unless they were created in the same way.
if 3601420184132 is the first user id in the set give that person a mahout id = 0 and so on. You won’t need longs unless you have more users than the largest positive int. You have to do the same for products. Ids in Mahout (at least in the recommenders) are the coordinates in a giant sparse matrix/table of user rows and product columns. So you need the ids to correspond to the row and column number. It is not sufficient that the id be an int or long. Sorry that this is so hidden in the docs. I think it is the single most confusing thing about Mahout. For what it’s worth this restriction will be removed from future versions. On Jul 8, 2014, at 10:59 AM, Sneha Venkatesh <[email protected]> wrote: Hi Pat, Thank you for your response. My original input row (userID,productId,NoOfVisits) looks like this : 3601420184132,028V003838264000P,1 I transformed the product ids into long values since the original input is a string. The transformed input looks something like this: 3601420184132,23423984,1 Since I need to use long ids, I set the 'usesLongIDs' flag to true while running the 'parallelALS' job. While running the 'recommendfactorized' job, I passed a path to the 'userIDIndex' and 'itemIdIndex' and set the 'usesLongIDs' flag to true. The resulting recommendations are all product id '0'. To validate that using 'long' ids is the problem, I passed the same parameters as mentioned above to the “factorize-movielens-1M” example. Even that is returning the value '0' for every product id in the recommendations. Regards, Sneha On Tue, Jul 8, 2014 at 10:35 AM, Pat Ferrel <[email protected]> wrote: > I replied on stackoverflow. > > Did you translate your ids into mahout ids? Mahout ids must be ordinal > integers for users and items. You will need to translate into mahout ids > before the data is prepared correctly and translate into your application > specific ids when reading the output. I updated the page you referenced to > note this but it’s just a guess. > > Can you share a few lines of your input? > > On Jul 7, 2014, at 11:29 AM, Sneha Venkatesh <[email protected]> wrote: > > Hi, > > I am new to mahout and I building an implicit feedback recommender using > the parallelALS job given here > <https://mahout.apache.org/users/recommender/intro-als-hadoop.html>. Each > row of my dataset consists of user_id, product_id, preference_score(which > is the number of visits made by the user for the product). The user and > product ids are of type long. I have a million data points of this kind > after filtering out single or double visits. > > I have basically written a bash script that runs the two jobs “parallelALS” > and “recommendfactorized” just as shown in the example > “factorize-movielens-1M”. After running the script, the resulting > recommendations seem to have a bug. The format of each row of the results > (as explained in several blog posts) seems to be :- > user_id [product_id:score,…] > > However all the products_ids in every row is 0. I am not sure what is going > wrong here. Is this a problem with the dataset or a matter of tuning > parameters (alpha,lambda, etc) or something else? > > Regards, > > Sneha > >
