Re: Performance Issue using item-based approach!

Ted Dunning Thu, 17 Apr 2014 07:23:48 -0700

Najum,

You should also be able to use the ItemSimilarityJob to compute a limited
indicator set.


This is stepping off of the path you have been on, but it would allow you
to deploy the recommender via a search engine.

That makes a lot of code simply vanish.  THis is also a well trod
production path.




On Thu, Apr 17, 2014 at 3:57 AM, Najum Ali <[email protected]> wrote:

> @Sebastian
>
> wow … you are right. The original csv file is about 21mb and the
> corresponding precomputed item-item similarity file is about 260mb!!
> And yes, there are wide more than 50 "most similar items“ for an item ..
>
> Trying to restrict this to 50 (or something like that) most similar items
> for an item could do the trick as you said.
> Ok I will give it try and reply later.
>
> By the way, what´s about the SampingCandidateItemsStrategy or something
> like this, by using this Constructor:
> *GenericItemBasedRecommender
> <https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.html#GenericItemBasedRecommender(org.apache.mahout.cf.taste.model.DataModel,%20org.apache.mahout.cf.taste.similarity.ItemSimilarity,%20org.apache.mahout.cf.taste.recommender.CandidateItemsStrategy,%20org.apache.mahout.cf.taste.recommender.MostSimilarItemsCandidateItemsStrategy)>*
> (DataModel<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/model/DataModel.html>
>  dataModel, 
> ItemSimilarity<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/similarity/ItemSimilarity.html>
>  similarity, 
> CandidateItemsStrategy<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/recommender/CandidateItemsStrategy.html>
>  
> candidateItemsStrategy,MostSimilarItemsCandidateItemsStrategy<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/recommender/MostSimilarItemsCandidateItemsStrategy.html>
>  mostSimilarItemsCandidateItemsStrategy)
>
>
> Am 17.04.2014 um 12:41 schrieb Sebastian Schelter <[email protected]>:
>
> Hi Najum,
>
> I think I found the problem. Remember: Two items are similar whenever at
> least one user interacted with both of them ("the items co-occur").
>
> In the movielens dataset this is true for almost all pairs of items,
> unfortunately. From 3076 items, more than 11 million similarities are
> created. A common approach for that (which is not yet implemented in our
> precomputation unfortunately) is to only retain the top-k similar items per
> item.
>
> A solution would be to take the csv file that is created by the
> MultithreadedBatchItemSimilarities and postprocess it so that only the 50
> most similar items per item are retained. That should help with your
> problem.
>
> Unfortunately, we don't have code for that yet, maybe you want to try to
> write that yourself?
>
> Best,
> Sebastian
>
> PS: The user-based recommender restricts the number of similar users, I
> guess thats why it is so fast here.
>
>
> On 04/17/2014 12:18 PM, Najum Ali wrote:
>
> Ok, here you go:
>
> I have created a simple class with main-method (no server and other stuff):
>
> public class RecommenderTest {
> public static void main(String[] args) throws IOException, TasteException {
> DataModel dataModel = new FileDataModel(new
>
> File("/Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv"));
> ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel);
> ItemBasedRecommender recommender = new
> GenericItemBasedRecommender(dataModel,
> similarity);
>
> String pathToPreComputedFile = preComputeSimilarities(recommender,
> dataModel.getNumItems());
>
> InputStream inputStream = new FileInputStream(new
> File(pathToPreComputedFile));
> BufferedReader bufferedReader = new BufferedReader(new
> InputStreamReader(inputStream));
> Collection<GenericItemSimilarity.ItemItemSimilarity> correlations =
>
> bufferedReader.lines().map(mapToItemItemSimilarity).collect(Collectors.toList());
> ItemSimilarity precomputedSimilarity = new
> GenericItemSimilarity(correlations);
> ItemBasedRecommender recommenderWithPrecomputation = new
> GenericItemBasedRecommender(dataModel, precomputedSimilarity);
>
> recommend(recommender);
> recommend(recommenderWithPrecomputation);
> }
>
> private static String preComputeSimilarities(ItemBasedRecommender
> recommender,
> int simItemsPerItem) throws TasteException {
> String pathToAbsolutePath = "";
> try {
> File resultFile = new File(System.getProperty("java.io.tmpdir"),
> "similarities.csv");
> if (resultFile.exists()) {
> resultFile.delete();
> }
> BatchItemSimilarities batchJob = new
> MultithreadedBatchItemSimilarities(recommender, simItemsPerItem);
> int numSimilarities =
> batchJob.computeItemSimilarities(Runtime.getRuntime().availableProcessors(),
> 1,
> new FileSimilarItemsWriter(resultFile));
> pathToAbsolutePath = resultFile.getAbsolutePath();
> System.out.println("Computed " + numSimilarities + " similarities and
> saved them
> to " + pathToAbsolutePath);
> } catch (IOException e) {
> System.out.println("Error while writing pre computed similarities to
> file");
> }
> return pathToAbsolutePath;
> }
>
> private static void recommend(ItemBasedRecommender recommender) throws
> TasteException {
> long start = System.nanoTime();
> List<RecommendedItem> recommendations = recommender.recommend(1, 10);
> long end = System.nanoTime();
> System.out.println("Created recommendations in " +
> getCalculationTimeInMilliseconds(start, end) + " ms. Recommendations:" +
> recommendations);
> }
>
> private static double getCalculationTimeInMilliseconds(long start, long
> end) {
> double calculationTime = (end - start);
> return (calculationTime / 1_000_000);
> }
>
>
> private static Function<String, GenericItemSimilarity.ItemItemSimilarity>
> mapToItemItemSimilarity = (line) -> {
> String[] row = line.split(",");
> return new GenericItemSimilarity.ItemItemSimilarity(
> Long.parseLong(row[0]), Long.parseLong(row[1]),
> Double.parseDouble(row[2]));
> };
> }
>
> And thats the Output-log:
>
> 3 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel -
> Creating FileDataModel for file
>
> /Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv
> 63 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel -
> Reading file info...
> 1207 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel -
> Processed 1000000 lines
> 1208 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel
> - Read
> lines: 1000209
> 1475 [main] INFO org.apache.mahout.cf.taste.impl.model.GenericDataModel -
> Processed 6040 users
> 1599 [main] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - Queued 3706 items in 38 batches
> 10928 [pool-1-thread-8] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 7 processed 5 batches
> 10928 [pool-1-thread-8] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 7 processed 5 batches. done.
> 10978 [pool-1-thread-5] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 4 processed 4 batches. done.
> 11589 [pool-1-thread-4] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 3 processed 5 batches
> 11589 [pool-1-thread-4] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 3 processed 5 batches. done.
> 11592 [pool-1-thread-6] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 5 processed 5 batches
> 11592 [pool-1-thread-6] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 5 processed 5 batches. done.
> 11707 [pool-1-thread-7] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 6 processed 5 batches
> 11707 [pool-1-thread-7] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 6 processed 5 batches. done.
> 11730 [pool-1-thread-3] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 2 processed 4 batches. done.
> 11849 [pool-1-thread-1] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 0 processed 5 batches
> 11849 [pool-1-thread-1] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 0 processed 5 batches. done.
> 11854 [pool-1-thread-2] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 1 processed 5 batches
> 11854 [pool-1-thread-2] INFO
>
> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
> - worker 1 processed 5 batches. done.
> Computed 9174333 similarities and saved them to
> /var/folders/9g/4h38v1tj3ps9j21skc72b56r0000gn/T/similarities.csv
> Created recommendations in *1683.613
> ms*. Recommendations:[RecommendedItem[item:3890, value:4.6771617],
> RecommendedItem[item:3530, value:4.662509], RecommendedItem[item:127,
> value:4.660716], RecommendedItem[item:3323, value:4.660716],
> RecommendedItem[item:3382, value:4.660716], RecommendedItem[item:3123,
> value:4.603366], RecommendedItem[item:3233, value:4.5707765],
> RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989,
> value:4.5263577], RecommendedItem[item:2343, value:4.524066]]
> Created recommendations in* 985.679
> ms.* Recommendations:[RecommendedItem[item:3530, value:5.0],
> RecommendedItem[item:3382, value:5.0], RecommendedItem[item:3890,
> value:4.6771617], RecommendedItem[item:127, value:4.660716],
> RecommendedItem[item:3323, value:4.660716], RecommendedItem[item:3123,
> value:4.603366], RecommendedItem[item:3233, value:4.5707765],
> RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989,
> value:4.5263577], RecommendedItem[item:2343, value:4.524066]]
>
> Again almost same results. Although what I also don´t understand is, why
> am I
> getting different RecommendItems?
> That really frustrates me…
>
> You can find the Java file in the attachment.
>
>
>
> Greetings from Germany,
> Najum
>
> Am 17.04.2014 um 11:44 schrieb Sebastian Schelter <[email protected]
> <mailto:[email protected] <[email protected]>>>:
>
> Yes, just to make sure the problem is in the mahout code and not in the
> surrounding environment.
>
> On 04/17/2014 11:43 AM, Najum Ali wrote:
>
> @Sebastian
> What do u mean with a standalone recommender? A simple offline java main
> program?
>
> Am 17.04.2014 um 11:41 schrieb Sebastian Schelter <[email protected]
> <mailto:[email protected] <[email protected]>>>:
>
> Could you take the output of the precomputation, feed it into a standalone
> recommender and test it there?
>
>
> On 04/17/2014 11:37 AM, Najum Ali wrote:
>
> @sebastian
>
> Are you sure that the precomputation is done only once and not in every
> request?
>
> Yes, a @Bean annotated Object is in Spring per default a singleton
> instance.
> I also just tested it out using a System.out.println()
> Here is my log:
>
> System.out.println("----> precomputation done!“ is called before returning
> the
> GenericItemSimilarity.
>
> The first two recommendations are Item-based -> pearson similarity
> The thrid and 4th log are also item-based using pre computed similarity
> The last log is the userbased recommender using pearson
>
> Look at the huge time difference!
>
> Am 17.04.2014 um 11:23 schrieb Sebastian Schelter <[email protected]
> <mailto:[email protected] <[email protected]>>
> <mailto:[email protected] <[email protected]>>>:
>
> Najum,
>
> this is really strange, feeding an ItemBased Recommender with precomputed
> similarities should give you superfast recommendations.
>
> Are you sure that the precomputation is done only once and not in every
> request?
>
> --sebastian
>
> On 04/17/2014 11:17 AM, Najum Ali wrote:
>
> Hi guys,
>
> I have created a precomputed item-item-similarity collection for a
> GenericItemBasedRecommender.
> Using the 1M MovieLens data, my item-based recommender is only 40-50%
> faster
> than without precomputation (like 589.5ms instead 1222.9ms).
> But the user-based recommender instead is really fast, it´s like 24.2ms?
> How can
> this happen?
>
> Here are more details to my Implementation:
>
> CSV File: 1M pref, 6040 Users, 3706 Items
>
> For my Implementation I´m using screenshots, because having the good
> highlighting.
> My Recommender runs inside a Webserver (Jetty) using Spring 4 and Java8. I
> receive Recommendations as Webservice (JSON).
>
> For DataModel, I´m using FileDataModel.
>
>
> This code below creates me a precomputed ItemSimilarity when I start the
> Webserver and the property isItemPreComputationEnabled is set to true:
>
>
> For time measuring I´m using AOP. I´m measuring the whole time from
> entering my
> Controller to sending the response.
> based on System.nanoTime(); and getting the diff. It´s the same time
> measure for
> user based.
>
> I haved tried to cache the recommender and the similarity with no big
> difference. I also tried to use CandidateItemsStrategy and
> MostSimilarItemsCandidateItemsStrategy, but also no performance boost.
>
> public RecommenderBuilder createRecommenderBuilder(ItemSimilarity
> similarity)
> throws TasteException {
> final int numberOfUsers = dataModel.getNumUsers();
> final int numberOfItems = dataModel.getNumItems();
> CandidateItemsStrategy candidateItemsStrategy = new
> SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
> MostSimilarItemsCandidateItemsStrategy mostSimilarStrategy = new
> SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
> return model -> new GenericItemBasedRecommender(model,
> similarity,candidateItemsStrategy,mostSimilarStrategy);
> }
>
> I dont know why item-based is taking so much longer then user-based.
> User-based
> is like fast as hell. I even tried a DataSet using 100k Prefs, and
> 10Million
> (Movielens). Everytime the user-based is soo much faster for any
> similarity.
>
> Hope you anyone can help me to understand this. Maybe I´m doing something
> wrong.
>
> Thanks!! :))
>
>
>
>
>
>
>
>

Re: Performance Issue using item-based approach!

Reply via email to