Thanks Sean,
So, do you suggest something like this?
LogLikelihoodSimilarity similarity = new
LogLikelihoodSimilarity(fileDataModel);
PreferredItemsNeighborhoodCandidateItemsStrategy candidateStrategy =
new PreferredItemsNeighborhoodCandidateItemsStrategy();
recommender = new GenericItemBasedRecommender(fileDataModel,
similarity, candidateStrategy, candidateStrategy);
or this?
LogLikelihoodSimilarity similarity = new
LogLikelihoodSimilarity(fileDataModel);
SamplingCandidateItemsStrategy candidateStrategy = new
SamplingCandidateItemsStrategy();
recommender = new GenericItemBasedRecommender(fileDataModel,
similarity, candidateStrategy, candidateStrategy);
-emilio
You need to apply a CandidateItemStrategy to reduce the number of
elements you consider, or else it will take a very long time because
almost the entire model is a candidate for recommendation.
On Fri, May 11, 2012 at 6:18 PM, Emilio Suarez
<[email protected]<mailto:[email protected]>> wrote:
Hi there,
The usual setting for the Mahout recommendation input file is:
user, item, rating
Now, for the purposes of my application, what I really wanted was a
recommendation of users for a specific item, so my input files are:
item, user, rating
My input CSV file contains the following stats:
model file: 560,901 records
item "24441": 31,585 records
rating contains one of 3 values: 1, 2 or 3
When I ask for a recommendation of users for item "24441", these are the
results:
total recommended "users": 50,162
Elapsed time: 3h 13m
As you can see⦠this is a very long time processing⦠and this all started when
I added "ratings" to the input files.
Before I was using the recommender with GenericBooleanPrefItemBasedRecommender,
and the process would run in minutes.
Now with the ratings, I am using the following:
LogLikelihoodSimilarity similarity = new
LogLikelihoodSimilarity(fileDataModel);
AllSimilarItemsCandidateItemsStrategy candidateStrategy = new
AllSimilarItemsCandidateItemsStrategy(similarity);
recommender = new GenericItemBasedRecommender(fileDataModel, similarity,
candidateStrategy, candidateStrategy);
I have another input file with the following stats:
model file: 276,543 records
item "11205": 5,968 records
rating contains one of 3 values: 1, 2 or 3
and when I ask for a recommendation of users for item "11205", these are the
results:
total recommended "users": 26,083
Elapsed time: 23m
As you can see, the difference is size is just 2x, but the time difference is
8x !!!
Is this the expected behavior for the recommender to take this long?
Is there anything I can do to speed up the process?
Thanks
-emilio