Re: Performance Issue using item-based approach!

Najum Ali Thu, 17 Apr 2014 03:58:51 -0700

@Sebastian

wow … you are right. The original csv file is about 21mb and the corresponding 
precomputed item-item similarity file is about 260mb!!
And yes, there are wide more than 50 "most similar items“ for an item ..


Trying to restrict this to 50 (or something like that) most similar items for 
an item could do the trick as you said. 
Ok I will give it try and reply later.

By the way, what´s about the SampingCandidateItemsStrategy or something like 
this, by using this Constructor:
GenericItemBasedRecommender(DataModel dataModel, ItemSimilarity similarity, 
CandidateItemsStrategy 
candidateItemsStrategy,MostSimilarItemsCandidateItemsStrategy 
mostSimilarItemsCandidateItemsStrategy) 


Am 17.04.2014 um 12:41 schrieb Sebastian Schelter <[email protected]>:

> Hi Najum,
> 
> I think I found the problem. Remember: Two items are similar whenever at 
> least one user interacted with both of them ("the items co-occur").
> 
> In the movielens dataset this is true for almost all pairs of items, 
> unfortunately. From 3076 items, more than 11 million similarities are 
> created. A common approach for that (which is not yet implemented in our 
> precomputation unfortunately) is to only retain the top-k similar items per 
> item.
> 
> A solution would be to take the csv file that is created by the 
> MultithreadedBatchItemSimilarities and postprocess it so that only the 50 
> most similar items per item are retained. That should help with your problem.
> 
> Unfortunately, we don't have code for that yet, maybe you want to try to 
> write that yourself?
> 
> Best,
> Sebastian
> 
> PS: The user-based recommender restricts the number of similar users, I guess 
> thats why it is so fast here.
> 
> 
> On 04/17/2014 12:18 PM, Najum Ali wrote:
>> Ok, here you go:
>> 
>> I have created a simple class with main-method (no server and other stuff):
>> 
>> public class RecommenderTest {
>> public static void main(String[] args) throws IOException, TasteException {
>> DataModel dataModel = new FileDataModel(new
>> File("/Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv"));
>> ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel);
>> ItemBasedRecommender recommender = new GenericItemBasedRecommender(dataModel,
>> similarity);
>> 
>> String pathToPreComputedFile = preComputeSimilarities(recommender,
>> dataModel.getNumItems());
>> 
>> InputStream inputStream = new FileInputStream(new 
>> File(pathToPreComputedFile));
>> BufferedReader bufferedReader = new BufferedReader(new
>> InputStreamReader(inputStream));
>> Collection<GenericItemSimilarity.ItemItemSimilarity> correlations =
>> bufferedReader.lines().map(mapToItemItemSimilarity).collect(Collectors.toList());
>> ItemSimilarity precomputedSimilarity = new 
>> GenericItemSimilarity(correlations);
>> ItemBasedRecommender recommenderWithPrecomputation = new
>> GenericItemBasedRecommender(dataModel, precomputedSimilarity);
>> 
>> recommend(recommender);
>> recommend(recommenderWithPrecomputation);
>> }
>> 
>> private static String preComputeSimilarities(ItemBasedRecommender 
>> recommender,
>> int simItemsPerItem) throws TasteException {
>> String pathToAbsolutePath = "";
>> try {
>> File resultFile = new File(System.getProperty("java.io.tmpdir"),
>> "similarities.csv");
>> if (resultFile.exists()) {
>> resultFile.delete();
>> }
>> BatchItemSimilarities batchJob = new
>> MultithreadedBatchItemSimilarities(recommender, simItemsPerItem);
>> int numSimilarities =
>> batchJob.computeItemSimilarities(Runtime.getRuntime().availableProcessors(), 
>> 1,
>> new FileSimilarItemsWriter(resultFile));
>> pathToAbsolutePath = resultFile.getAbsolutePath();
>> System.out.println("Computed " + numSimilarities + " similarities and saved 
>> them
>> to " + pathToAbsolutePath);
>> } catch (IOException e) {
>> System.out.println("Error while writing pre computed similarities to file");
>> }
>> return pathToAbsolutePath;
>> }
>> 
>> private static void recommend(ItemBasedRecommender recommender) throws
>> TasteException {
>> long start = System.nanoTime();
>> List<RecommendedItem> recommendations = recommender.recommend(1, 10);
>> long end = System.nanoTime();
>> System.out.println("Created recommendations in " +
>> getCalculationTimeInMilliseconds(start, end) + " ms. Recommendations:" +
>> recommendations);
>> }
>> 
>> private static double getCalculationTimeInMilliseconds(long start, long end) 
>> {
>> double calculationTime = (end - start);
>> return (calculationTime / 1_000_000);
>> }
>> 
>> 
>> private static Function<String, GenericItemSimilarity.ItemItemSimilarity>
>> mapToItemItemSimilarity = (line) -> {
>> String[] row = line.split(",");
>> return new GenericItemSimilarity.ItemItemSimilarity(
>> Long.parseLong(row[0]), Long.parseLong(row[1]), Double.parseDouble(row[2]));
>> };
>> }
>> 
>> And thats the Output-log:
>> 
>> 3 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel -
>> Creating FileDataModel for file
>> /Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv
>> 63 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel -
>> Reading file info...
>> 1207 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel -
>> Processed 1000000 lines
>> 1208 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - 
>> Read
>> lines: 1000209
>> 1475 [main] INFO org.apache.mahout.cf.taste.impl.model.GenericDataModel -
>> Processed 6040 users
>> 1599 [main] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - Queued 3706 items in 38 batches
>> 10928 [pool-1-thread-8] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 7 processed 5 batches
>> 10928 [pool-1-thread-8] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 7 processed 5 batches. done.
>> 10978 [pool-1-thread-5] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 4 processed 4 batches. done.
>> 11589 [pool-1-thread-4] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 3 processed 5 batches
>> 11589 [pool-1-thread-4] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 3 processed 5 batches. done.
>> 11592 [pool-1-thread-6] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 5 processed 5 batches
>> 11592 [pool-1-thread-6] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 5 processed 5 batches. done.
>> 11707 [pool-1-thread-7] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 6 processed 5 batches
>> 11707 [pool-1-thread-7] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 6 processed 5 batches. done.
>> 11730 [pool-1-thread-3] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 2 processed 4 batches. done.
>> 11849 [pool-1-thread-1] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 0 processed 5 batches
>> 11849 [pool-1-thread-1] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 0 processed 5 batches. done.
>> 11854 [pool-1-thread-2] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 1 processed 5 batches
>> 11854 [pool-1-thread-2] INFO
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 1 processed 5 batches. done.
>> Computed 9174333 similarities and saved them to
>> /var/folders/9g/4h38v1tj3ps9j21skc72b56r0000gn/T/similarities.csv
>> Created recommendations in *1683.613
>> ms*. Recommendations:[RecommendedItem[item:3890, value:4.6771617],
>> RecommendedItem[item:3530, value:4.662509], RecommendedItem[item:127,
>> value:4.660716], RecommendedItem[item:3323, value:4.660716],
>> RecommendedItem[item:3382, value:4.660716], RecommendedItem[item:3123,
>> value:4.603366], RecommendedItem[item:3233, value:4.5707765],
>> RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989,
>> value:4.5263577], RecommendedItem[item:2343, value:4.524066]]
>> Created recommendations in* 985.679
>> ms.* Recommendations:[RecommendedItem[item:3530, value:5.0],
>> RecommendedItem[item:3382, value:5.0], RecommendedItem[item:3890,
>> value:4.6771617], RecommendedItem[item:127, value:4.660716],
>> RecommendedItem[item:3323, value:4.660716], RecommendedItem[item:3123,
>> value:4.603366], RecommendedItem[item:3233, value:4.5707765],
>> RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989,
>> value:4.5263577], RecommendedItem[item:2343, value:4.524066]]
>> 
>> Again almost same results. Although what I also don´t understand is, why am I
>> getting different RecommendItems?
>> That really frustrates me…
>> 
>> You can find the Java file in the attachment.
>> 
>> 
>> 
>> Greetings from Germany,
>> Najum
>> 
>> Am 17.04.2014 um 11:44 schrieb Sebastian Schelter <[email protected]
>> <mailto:[email protected]>>:
>> 
>>> Yes, just to make sure the problem is in the mahout code and not in the
>>> surrounding environment.
>>> 
>>> On 04/17/2014 11:43 AM, Najum Ali wrote:
>>>> @Sebastian
>>>> What do u mean with a standalone recommender? A simple offline java main 
>>>> program?
>>>> 
>>>> Am 17.04.2014 um 11:41 schrieb Sebastian Schelter <[email protected]
>>>> <mailto:[email protected]>>:
>>>> 
>>>>> Could you take the output of the precomputation, feed it into a standalone
>>>>> recommender and test it there?
>>>>> 
>>>>> 
>>>>> On 04/17/2014 11:37 AM, Najum Ali wrote:
>>>>>> @sebastian
>>>>>> 
>>>>>>> Are you sure that the precomputation is done only once and not in every
>>>>>>> request?
>>>>>> Yes, a @Bean annotated Object is in Spring per default a singleton 
>>>>>> instance.
>>>>>> I also just tested it out using a System.out.println()
>>>>>> Here is my log:
>>>>>> 
>>>>>> System.out.println("----> precomputation done!“ is called before 
>>>>>> returning the
>>>>>> GenericItemSimilarity.
>>>>>> 
>>>>>> The first two recommendations are Item-based -> pearson similarity
>>>>>> The thrid and 4th log are also item-based using pre computed similarity
>>>>>> The last log is the userbased recommender using pearson
>>>>>> 
>>>>>> Look at the huge time difference!
>>>>>> 
>>>>>> Am 17.04.2014 um 11:23 schrieb Sebastian Schelter <[email protected]
>>>>>> <mailto:[email protected]>
>>>>>> <mailto:[email protected]>>:
>>>>>> 
>>>>>>> Najum,
>>>>>>> 
>>>>>>> this is really strange, feeding an ItemBased Recommender with 
>>>>>>> precomputed
>>>>>>> similarities should give you superfast recommendations.
>>>>>>> 
>>>>>>> Are you sure that the precomputation is done only once and not in every
>>>>>>> request?
>>>>>>> 
>>>>>>> --sebastian
>>>>>>> 
>>>>>>> On 04/17/2014 11:17 AM, Najum Ali wrote:
>>>>>>>> Hi guys,
>>>>>>>> 
>>>>>>>> I have created a precomputed item-item-similarity collection for a
>>>>>>>> GenericItemBasedRecommender.
>>>>>>>> Using the 1M MovieLens data, my item-based recommender is only 40-50% 
>>>>>>>> faster
>>>>>>>> than without precomputation (like 589.5ms instead 1222.9ms).
>>>>>>>> But the user-based recommender instead is really fast, it´s like 
>>>>>>>> 24.2ms?
>>>>>>>> How can
>>>>>>>> this happen?
>>>>>>>> 
>>>>>>>> Here are more details to my Implementation:
>>>>>>>> 
>>>>>>>> CSV File: 1M pref, 6040 Users, 3706 Items
>>>>>>>> 
>>>>>>>> For my Implementation I´m using screenshots, because having the good
>>>>>>>> highlighting.
>>>>>>>> My Recommender runs inside a Webserver (Jetty) using Spring 4 and 
>>>>>>>> Java8. I
>>>>>>>> receive Recommendations as Webservice (JSON).
>>>>>>>> 
>>>>>>>> For DataModel, I´m using FileDataModel.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> This code below creates me a precomputed ItemSimilarity when I start 
>>>>>>>> the
>>>>>>>> Webserver and the property isItemPreComputationEnabled is set to true:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> For time measuring I´m using AOP. I´m measuring the whole time from
>>>>>>>> entering my
>>>>>>>> Controller to sending the response.
>>>>>>>> based on System.nanoTime(); and getting the diff. It´s the same time
>>>>>>>> measure for
>>>>>>>> user based.
>>>>>>>> 
>>>>>>>> I haved tried to cache the recommender and the similarity with no big
>>>>>>>> difference. I also tried to use CandidateItemsStrategy and
>>>>>>>> MostSimilarItemsCandidateItemsStrategy, but also no performance boost.
>>>>>>>> 
>>>>>>>> public RecommenderBuilder createRecommenderBuilder(ItemSimilarity 
>>>>>>>> similarity)
>>>>>>>> throws TasteException {
>>>>>>>> final int numberOfUsers = dataModel.getNumUsers();
>>>>>>>> final int numberOfItems = dataModel.getNumItems();
>>>>>>>> CandidateItemsStrategy candidateItemsStrategy = new
>>>>>>>> SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
>>>>>>>> MostSimilarItemsCandidateItemsStrategy mostSimilarStrategy = new
>>>>>>>> SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
>>>>>>>> return model -> new GenericItemBasedRecommender(model,
>>>>>>>> similarity,candidateItemsStrategy,mostSimilarStrategy);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> I dont know why item-based is taking so much longer then user-based.
>>>>>>>> User-based
>>>>>>>> is like fast as hell. I even tried a DataSet using 100k Prefs, and 
>>>>>>>> 10Million
>>>>>>>> (Movielens). Everytime the user-based is soo much faster for any 
>>>>>>>> similarity.
>>>>>>>> 
>>>>>>>> Hope you anyone can help me to understand this. Maybe I´m doing 
>>>>>>>> something
>>>>>>>> wrong.
>>>>>>>> 
>>>>>>>> Thanks!! :))
>>>> 
>>>> 
>>> 
>> 
>

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: Performance Issue using item-based approach!

Reply via email to