Re: Performance Issue using item-based approach!

Sebastian Schelter Thu, 17 Apr 2014 07:55:26 -0700

No, but MultithreadedBatchItemSimilarities is a non-distributedalternative. Unfortunately it does not have all the features ofItemSimilarityJob yet.


--sebastian


On 04/17/2014 04:43 PM, Najum Ali wrote:

Ted,

Is it also possible to use ItemSimilarityJob in a non-distributed environment?

Am 17.04.2014 um 16:22 schrieb Ted Dunning <[email protected]>:

Najum,

You should also be able to use the ItemSimilarityJob to compute a limited
indicator set.

This is stepping off of the path you have been on, but it would allow you
to deploy the recommender via a search engine.

That makes a lot of code simply vanish.  THis is also a well trod
production path.




On Thu, Apr 17, 2014 at 3:57 AM, Najum Ali <[email protected]> wrote:

@Sebastian

wow … you are right. The original csv file is about 21mb and the
corresponding precomputed item-item similarity file is about 260mb!!
And yes, there are wide more than 50 "most similar items“ for an item ..

Trying to restrict this to 50 (or something like that) most similar items
for an item could do the trick as you said.
Ok I will give it try and reply later.

By the way, what´s about the SampingCandidateItemsStrategy or something
like this, by using this Constructor:
*GenericItemBasedRecommender
<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.html#GenericItemBasedRecommender(org.apache.mahout.cf.taste.model.DataModel,%20org.apache.mahout.cf.taste.similarity.ItemSimilarity,%20org.apache.mahout.cf.taste.recommender.CandidateItemsStrategy,%20org.apache.mahout.cf.taste.recommender.MostSimilarItemsCandidateItemsStrategy)>*
(DataModel<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/model/DataModel.html>
dataModel, 
ItemSimilarity<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/similarity/ItemSimilarity.html>
similarity, 
CandidateItemsStrategy<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/recommender/CandidateItemsStrategy.html>
candidateItemsStrategy,MostSimilarItemsCandidateItemsStrategy<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/recommender/MostSimilarItemsCandidateItemsStrategy.html>
mostSimilarItemsCandidateItemsStrategy)


Am 17.04.2014 um 12:41 schrieb Sebastian Schelter <[email protected]>:

Hi Najum,

I think I found the problem. Remember: Two items are similar whenever at
least one user interacted with both of them ("the items co-occur").

In the movielens dataset this is true for almost all pairs of items,
unfortunately. From 3076 items, more than 11 million similarities are
created. A common approach for that (which is not yet implemented in our
precomputation unfortunately) is to only retain the top-k similar items per
item.

A solution would be to take the csv file that is created by the
MultithreadedBatchItemSimilarities and postprocess it so that only the 50
most similar items per item are retained. That should help with your
problem.

Unfortunately, we don't have code for that yet, maybe you want to try to
write that yourself?

Best,
Sebastian

PS: The user-based recommender restricts the number of similar users, I
guess thats why it is so fast here.


On 04/17/2014 12:18 PM, Najum Ali wrote:

Ok, here you go:

I have created a simple class with main-method (no server and other stuff):

public class RecommenderTest {
public static void main(String[] args) throws IOException, TasteException {
DataModel dataModel = new FileDataModel(new

File("/Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv"));
ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel);
ItemBasedRecommender recommender = new
GenericItemBasedRecommender(dataModel,
similarity);

String pathToPreComputedFile = preComputeSimilarities(recommender,
dataModel.getNumItems());

InputStream inputStream = new FileInputStream(new
File(pathToPreComputedFile));
BufferedReader bufferedReader = new BufferedReader(new
InputStreamReader(inputStream));
Collection<GenericItemSimilarity.ItemItemSimilarity> correlations =

bufferedReader.lines().map(mapToItemItemSimilarity).collect(Collectors.toList());
ItemSimilarity precomputedSimilarity = new
GenericItemSimilarity(correlations);
ItemBasedRecommender recommenderWithPrecomputation = new
GenericItemBasedRecommender(dataModel, precomputedSimilarity);

recommend(recommender);
recommend(recommenderWithPrecomputation);
}

private static String preComputeSimilarities(ItemBasedRecommender
recommender,
int simItemsPerItem) throws TasteException {
String pathToAbsolutePath = "";
try {
File resultFile = new File(System.getProperty("java.io.tmpdir"),
"similarities.csv");
if (resultFile.exists()) {
resultFile.delete();
}
BatchItemSimilarities batchJob = new
MultithreadedBatchItemSimilarities(recommender, simItemsPerItem);
int numSimilarities =
batchJob.computeItemSimilarities(Runtime.getRuntime().availableProcessors(),
1,
new FileSimilarItemsWriter(resultFile));
pathToAbsolutePath = resultFile.getAbsolutePath();
System.out.println("Computed " + numSimilarities + " similarities and
saved them
to " + pathToAbsolutePath);
} catch (IOException e) {
System.out.println("Error while writing pre computed similarities to
file");
}
return pathToAbsolutePath;
}

private static void recommend(ItemBasedRecommender recommender) throws
TasteException {
long start = System.nanoTime();
List<RecommendedItem> recommendations = recommender.recommend(1, 10);
long end = System.nanoTime();
System.out.println("Created recommendations in " +
getCalculationTimeInMilliseconds(start, end) + " ms. Recommendations:" +
recommendations);
}

private static double getCalculationTimeInMilliseconds(long start, long
end) {
double calculationTime = (end - start);
return (calculationTime / 1_000_000);
}


private static Function<String, GenericItemSimilarity.ItemItemSimilarity>
mapToItemItemSimilarity = (line) -> {
String[] row = line.split(",");
return new GenericItemSimilarity.ItemItemSimilarity(
Long.parseLong(row[0]), Long.parseLong(row[1]),
Double.parseDouble(row[2]));
};
}

And thats the Output-log:

3 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel -
Creating FileDataModel for file

/Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv
63 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel -
Reading file info...
1207 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel -
Processed 1000000 lines
1208 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel
- Read
lines: 1000209
1475 [main] INFO org.apache.mahout.cf.taste.impl.model.GenericDataModel -
Processed 6040 users
1599 [main] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- Queued 3706 items in 38 batches
10928 [pool-1-thread-8] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 7 processed 5 batches
10928 [pool-1-thread-8] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 7 processed 5 batches. done.
10978 [pool-1-thread-5] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 4 processed 4 batches. done.
11589 [pool-1-thread-4] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 3 processed 5 batches
11589 [pool-1-thread-4] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 3 processed 5 batches. done.
11592 [pool-1-thread-6] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 5 processed 5 batches
11592 [pool-1-thread-6] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 5 processed 5 batches. done.
11707 [pool-1-thread-7] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 6 processed 5 batches
11707 [pool-1-thread-7] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 6 processed 5 batches. done.
11730 [pool-1-thread-3] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 2 processed 4 batches. done.
11849 [pool-1-thread-1] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 0 processed 5 batches
11849 [pool-1-thread-1] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 0 processed 5 batches. done.
11854 [pool-1-thread-2] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 1 processed 5 batches
11854 [pool-1-thread-2] INFO

org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 1 processed 5 batches. done.
Computed 9174333 similarities and saved them to
/var/folders/9g/4h38v1tj3ps9j21skc72b56r0000gn/T/similarities.csv
Created recommendations in *1683.613
ms*. Recommendations:[RecommendedItem[item:3890, value:4.6771617],
RecommendedItem[item:3530, value:4.662509], RecommendedItem[item:127,
value:4.660716], RecommendedItem[item:3323, value:4.660716],
RecommendedItem[item:3382, value:4.660716], RecommendedItem[item:3123,
value:4.603366], RecommendedItem[item:3233, value:4.5707765],
RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989,
value:4.5263577], RecommendedItem[item:2343, value:4.524066]]
Created recommendations in* 985.679
ms.* Recommendations:[RecommendedItem[item:3530, value:5.0],
RecommendedItem[item:3382, value:5.0], RecommendedItem[item:3890,
value:4.6771617], RecommendedItem[item:127, value:4.660716],
RecommendedItem[item:3323, value:4.660716], RecommendedItem[item:3123,
value:4.603366], RecommendedItem[item:3233, value:4.5707765],
RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989,
value:4.5263577], RecommendedItem[item:2343, value:4.524066]]

Again almost same results. Although what I also don´t understand is, why
am I
getting different RecommendItems?
That really frustrates me…

You can find the Java file in the attachment.



Greetings from Germany,
Najum

Am 17.04.2014 um 11:44 schrieb Sebastian Schelter <[email protected]
<mailto:[email protected] <[email protected]>>>:

Yes, just to make sure the problem is in the mahout code and not in the
surrounding environment.

On 04/17/2014 11:43 AM, Najum Ali wrote:

@Sebastian
What do u mean with a standalone recommender? A simple offline java main
program?

Am 17.04.2014 um 11:41 schrieb Sebastian Schelter <[email protected]
<mailto:[email protected] <[email protected]>>>:

Could you take the output of the precomputation, feed it into a standalone
recommender and test it there?


On 04/17/2014 11:37 AM, Najum Ali wrote:

@sebastian

Are you sure that the precomputation is done only once and not in every
request?

Yes, a @Bean annotated Object is in Spring per default a singleton
instance.
I also just tested it out using a System.out.println()
Here is my log:

System.out.println("----> precomputation done!“ is called before returning
the
GenericItemSimilarity.

The first two recommendations are Item-based -> pearson similarity
The thrid and 4th log are also item-based using pre computed similarity
The last log is the userbased recommender using pearson

Look at the huge time difference!

Am 17.04.2014 um 11:23 schrieb Sebastian Schelter <[email protected]
<mailto:[email protected] <[email protected]>>
<mailto:[email protected] <[email protected]>>>:

Najum,

this is really strange, feeding an ItemBased Recommender with precomputed
similarities should give you superfast recommendations.

Are you sure that the precomputation is done only once and not in every
request?

--sebastian

On 04/17/2014 11:17 AM, Najum Ali wrote:

Hi guys,

I have created a precomputed item-item-similarity collection for a
GenericItemBasedRecommender.
Using the 1M MovieLens data, my item-based recommender is only 40-50%
faster
than without precomputation (like 589.5ms instead 1222.9ms).
But the user-based recommender instead is really fast, it´s like 24.2ms?
How can
this happen?

Here are more details to my Implementation:

CSV File: 1M pref, 6040 Users, 3706 Items

For my Implementation I´m using screenshots, because having the good
highlighting.
My Recommender runs inside a Webserver (Jetty) using Spring 4 and Java8. I
receive Recommendations as Webservice (JSON).

For DataModel, I´m using FileDataModel.


This code below creates me a precomputed ItemSimilarity when I start the
Webserver and the property isItemPreComputationEnabled is set to true:


For time measuring I´m using AOP. I´m measuring the whole time from
entering my
Controller to sending the response.
based on System.nanoTime(); and getting the diff. It´s the same time
measure for
user based.

I haved tried to cache the recommender and the similarity with no big
difference. I also tried to use CandidateItemsStrategy and
MostSimilarItemsCandidateItemsStrategy, but also no performance boost.

public RecommenderBuilder createRecommenderBuilder(ItemSimilarity
similarity)
throws TasteException {
final int numberOfUsers = dataModel.getNumUsers();
final int numberOfItems = dataModel.getNumItems();
CandidateItemsStrategy candidateItemsStrategy = new
SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
MostSimilarItemsCandidateItemsStrategy mostSimilarStrategy = new
SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
return model -> new GenericItemBasedRecommender(model,
similarity,candidateItemsStrategy,mostSimilarStrategy);
}

I dont know why item-based is taking so much longer then user-based.
User-based
is like fast as hell. I even tried a DataSet using 100k Prefs, and
10Million
(Movielens). Everytime the user-based is soo much faster for any
similarity.

Hope you anyone can help me to understand this. Maybe I´m doing something
wrong.

Thanks!! :))

Re: Performance Issue using item-based approach!

Reply via email to