Yes. This time increases non-linearly. Also, check your memory levels of the Java vm. You might be spending all time in GC.
On Sat, May 21, 2011 at 3:08 PM, gj <[email protected]> wrote: > Hi, > I've new to mahout. I using mahout-0.3 with Eclipse jdk1.6.0_18 (no hadoop). > I trying to the find RMSE for a dataset. But it seems very slow .. so far I > have not been able to get the RMSE value for single run. Hence, I was > wondering if anybody can look at my setup and tell what I am doing wrong or > why it so slow. > > Here's my code: > public static void main(String[] args) { > RecommenderBuilder builder = new RecommenderBuilder() { > public Recommender buildRecommender(DataModel model) throws TasteException{ > UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(model); > UserNeighborhood neighborhood = new NearestNUserNeighborhood(5, > userSimilarity, model); > Recommender recommender = new GenericUserBasedRecommender(model, > neighborhood, userSimilarity); > return new CachingRecommender(recommender); > } > }; > > RecommenderEvaluator evaluator = new RMSRecommenderEvaluator(); > try { > DataModel model = new FileDataModel(new > File("lf_playhistory_step1_ratings.dat")); > double score = evaluator.evaluate(builder, > null, > model, > 0.9, > 1.0); > System.out.println(score); > } catch (Exception e) { > System.err.println("FileNotFoundException: " + e.getMessage()); > } > } > } > > Dataset is: 5,462,701 entries of these tuples <userid,track,rating> > no. of tracks=610,192 > no of users=2330 > ratings = 1 to 5 > > This is output that I got on console: > > 21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info > INFO: Creating FileDataModel for file lf_playhistory_step1_ratings.dat > 21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info > INFO: Beginning evaluation using 0.9 of > FileDataModel[dataFile:C:\eclipse_workspace\LastFM\lf_playhistory_step1_ratings.dat] > 21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info > INFO: Reading file info... > 21-May-2011 22:28:19 org.slf4j.impl.JCLLoggerAdapter info > INFO: Processed 1000000 lines > 21-May-2011 22:29:53 org.slf4j.impl.JCLLoggerAdapter info > INFO: Processed 2000000 lines > 21-May-2011 22:32:09 org.slf4j.impl.JCLLoggerAdapter info > INFO: Processed 3000000 lines > 21-May-2011 22:34:03 org.slf4j.impl.JCLLoggerAdapter info > INFO: Processed 4000000 lines > 21-May-2011 22:36:19 org.slf4j.impl.JCLLoggerAdapter info > INFO: Processed 5000000 lines > 21-May-2011 22:37:08 org.slf4j.impl.JCLLoggerAdapter info > INFO: Read lines: 5462701 > 21-May-2011 22:37:08 org.slf4j.impl.JCLLoggerAdapter info > INFO: Reading file info... > 21-May-2011 22:37:16 org.slf4j.impl.JCLLoggerAdapter info > INFO: Read lines: 100000 > 21-May-2011 22:37:21 org.slf4j.impl.JCLLoggerAdapter info > INFO: Processed 2330 users > 21-May-2011 22:37:28 org.slf4j.impl.JCLLoggerAdapter info > INFO: Processed 2330 users > 21-May-2011 22:37:29 org.slf4j.impl.JCLLoggerAdapter info > INFO: Beginning evaluation of 2323 users > 21-May-2011 22:37:29 org.slf4j.impl.JCLLoggerAdapter info > INFO: Starting timing of 2323 tasks in 2 threads > 21-May-2011 22:40:28 org.slf4j.impl.JCLLoggerAdapter info > INFO: Average time per recommendation: 178468ms > 21-May-2011 22:40:28 org.slf4j.impl.JCLLoggerAdapter info > INFO: Approximate memory used: 585MB / 840MB > > From there on, I just waited for two hours ..and no output. > The INFO: Average time per recommendation: 178468ms seem very high ....I'm > guessing it's 178sec X 2330 users = 4.8 days! > This running on my laptop (Intel Core 2 Duo, T7500 @ 2.2GHz 2 GB RAM) > > Why is this taking so long? Is it too big a dataset? Is my laptop too slow? > > Can anybody help? > > Thanks, > Gawesh > -- Lance Norskog [email protected]
