Re: Grouplens dataset Recommenderjob with Hadoop

Stefano Bellasio Tue, 23 Nov 2010 14:29:52 -0800

My need is to execute item-based recommendations on Hadoop, before with pseudo 
mode and then with a cluster mode of Hadoop. So i'm trying to use this class, 
and following your suggestion now it looks like this:


package org.apache.mahout.cf.taste.impl.recommender;
import org.apache.mahout.cf.taste.common.Refreshable;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.example.grouplens.GroupLensDataModel;
import org.apache.mahout.cf.taste.impl.eval.LoadEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.recommender.svd.SVDRecommender;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
import java.io.*;
import java.util.*;
public class ItemBased implements Recommender {


                private final Recommender delegate;
                private DataModel model; 
                
                public ItemBased(DataModel model) throws TasteException, 
IOException {
                        
                ItemSimilarity similarity = new 
PearsonCorrelationSimilarity(model);    
                
                delegate = new GenericItemBasedRecommender(model, similarity);
        
}
                @Override
                public float estimatePreference(long userID, long itemID)
                                throws TasteException {
                        // TODO Auto-generated method stub
                        return delegate.estimatePreference(userID, itemID);
                }
                @Override
                public DataModel getDataModel() {
                        return delegate.getDataModel();
                }
                @Override
                public List<RecommendedItem> recommend(long userID, int howMany)
                                throws TasteException {
                        return delegate.recommend(userID, howMany);
                }
                @Override
                public List<RecommendedItem> recommend(long userID, int howMany,
                                IDRescorer rescorer) throws TasteException {
                        // TODO Auto-generated method stub
                        return delegate.recommend(userID, howMany);
                }
                @Override
                public void removePreference(long userID, long itemID)
                                throws TasteException {
                        // TODO Auto-generated method stub
                        
                }
                @Override
                public void setPreference(long userID, long itemID, float value)
                                throws TasteException {
                        delegate.setPreference(userID, itemID, value);
                        
                }
                @Override
                public void refresh(Collection<Refreshable> alreadyRefreshed) {
                        
                        delegate.refresh(alreadyRefreshed);
                        
                }
}

I run this with commandline: 
../hadoop/bin/hadoop jar core/target/mahout-core-0.5-SNAPSHOT-job.jar 
org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob -i input/ratings.txt -o 
data/out1 --recommenderClassName 
org.apache.mahout.cf.taste.hadoop.item.ItemBased

Now everything seems start well and then i see this ouput with error (before 
all the hadoop related errors):

10/11/23 23:25:45 INFO common.AbstractJob: Command line arguments: 
{--endPhase=2147483647, --input=input/ratings.txt, --numRecommendations=10, 
--output=data/out1, 
--recommenderClassName=org.apache.mahout.cf.taste.hadoop.item.ItemBased, 
--startPhase=0, --tempDir=temp}
10/11/23 23:25:46 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
processName=JobTracker, sessionId=
10/11/23 23:25:48 INFO input.FileInputFormat: Total input paths to process : 1
10/11/23 23:25:49 INFO input.FileInputFormat: Total input paths to process : 1
10/11/23 23:25:49 INFO mapred.JobClient: Running job: job_local_0001
10/11/23 23:25:49 INFO mapred.MapTask: io.sort.mb = 100
10/11/23 23:25:50 INFO mapred.MapTask: data buffer = 79691776/99614720
10/11/23 23:25:50 INFO mapred.MapTask: record buffer = 262144/327680
10/11/23 23:25:50 INFO mapred.JobClient:  map 0% reduce 0%
10/11/23 23:25:51 INFO mapred.MapTask: Spilling map output: record full = true
10/11/23 23:25:51 INFO mapred.MapTask: bufstart = 0; bufend = 514677; bufvoid = 
99614720
10/11/23 23:25:51 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length = 
327680
10/11/23 23:25:51 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
10/11/23 23:25:51 INFO compress.CodecPool: Got brand-new compressor
10/11/23 23:25:52 INFO mapred.MapTask: Finished spill 0
10/11/23 23:25:53 INFO mapred.MapTask: Spilling map output: record full = true
10/11/23 23:25:53 INFO mapred.MapTask: bufstart = 514677; bufend = 1038963; 
bufvoid = 99614720
10/11/23 23:25:53 INFO mapred.MapTask: kvstart = 262144; kvend = 196607; length 
= 327680
10/11/23 23:25:54 INFO mapred.MapTask: Finished spill 1
10/11/23 23:25:54 INFO mapred.MapTask: Spilling map output: record full = true
10/11/23 23:25:54 INFO mapred.MapTask: bufstart = 1038963; bufend = 1563249; 
bufvoid = 99614720
10/11/23 23:25:54 INFO mapred.MapTask: kvstart = 196607; kvend = 131070; length 
= 327680
10/11/23 23:25:55 INFO mapred.MapTask: Finished spill 2
10/11/23 23:25:55 INFO mapred.MapTask: Starting flush of map output
10/11/23 23:25:55 INFO mapred.LocalJobRunner: 
10/11/23 23:25:56 INFO mapred.JobClient:  map 100% reduce 0%
10/11/23 23:25:57 INFO mapred.MapTask: Finished spill 3
10/11/23 23:25:57 INFO mapred.Merger: Merging 4 sorted segments
10/11/23 23:25:57 INFO compress.CodecPool: Got brand-new decompressor
10/11/23 23:25:57 INFO compress.CodecPool: Got brand-new decompressor
10/11/23 23:25:57 INFO compress.CodecPool: Got brand-new decompressor
10/11/23 23:25:57 INFO compress.CodecPool: Got brand-new decompressor
10/11/23 23:25:57 INFO mapred.Merger: Down to the last merge-pass, with 4 
segments left of total size: 17968 bytes
10/11/23 23:25:58 INFO mapred.LocalJobRunner: 
10/11/23 23:26:01 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is 
done. And is in the process of commiting
10/11/23 23:26:01 INFO mapred.LocalJobRunner: 
10/11/23 23:26:01 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' 
done.
10/11/23 23:26:01 INFO mapred.LocalJobRunner: 
10/11/23 23:26:01 INFO mapred.Merger: Merging 1 sorted segments
10/11/23 23:26:01 INFO mapred.Merger: Down to the last merge-pass, with 1 
segments left of total size: 17813 bytes
10/11/23 23:26:01 INFO mapred.LocalJobRunner: 
10/11/23 23:26:02 INFO file.FileDataModel: Creating FileDataModel for file 
/var/folders/2E/2ETlu9HiG5mqvGvJQNUF5U+++TQ/-Tmp-/mahout-taste-hadoop2686971206570947472txt
10/11/23 23:26:02 INFO file.FileDataModel: Reading file info...
10/11/23 23:26:06 INFO file.FileDataModel: Processed 1000000 lines
10/11/23 23:26:06 INFO file.FileDataModel: Read lines: 1000209
10/11/23 23:26:07 INFO mapred.LocalJobRunner: reduce > reduce
10/11/23 23:26:07 INFO mapred.JobClient:  map 100% reduce 74%
10/11/23 23:26:07 INFO model.GenericDataModel: Processed 6040 users
10/11/23 23:26:07 INFO file.FileDataModel: Creating FileDataModel for file 
/Users/hadoop/trunk/core/src/main/java/intro.csv
10/11/23 23:26:07 INFO file.FileDataModel: Reading file info...
10/11/23 23:26:07 INFO file.FileDataModel: Read lines: 21
10/11/23 23:26:08 INFO model.GenericDataModel: Processed 5 users
10/11/23 23:26:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting 
timing of 4 tasks in 2 threads
10/11/23 23:26:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Average 
time per recommendation: 0ms
10/11/23 23:26:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Approximate 
memory used: 84MB / 208MB
10/11/23 23:26:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Unable to 
recommend in 0 cases
RecommendedItem[item:104, value:5.0]
10/11/23 23:26:08 WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalStateException: 
org.apache.mahout.cf.taste.common.NoSuchUserException
        at 
org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderReducer.reduce(RecommenderReducer.java:103)
        at 
org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderReducer.reduce(RecommenderReducer.java:1)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
Caused by: org.apache.mahout.cf.taste.common.NoSuchUserException
        at 
org.apache.mahout.cf.taste.impl.model.GenericDataModel.getPreferencesFromUser(GenericDataModel.java:206)
        at 
org.apache.mahout.cf.taste.impl.model.file.FileDataModel.getPreferencesFromUser(FileDataModel.java:627)
        at 
org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender.getNumPreferences(GenericItemBasedRecommender.java:216)
        at 
org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender.recommend(GenericItemBasedRecommender.java:101)
        at 
org.apache.mahout.cf.taste.impl.recommender.AbstractRecommender.recommend(AbstractRecommender.java:64)
        at 
org.apache.mahout.cf.taste.hadoop.item.ItemBased.recommend(ItemBased.java:54)
        at 
org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderReducer.reduce(RecommenderReducer.java:101)
        ... 5 more
10/11/23 23:26:08 INFO mapred.JobClient: Job complete: job_local_0001
10/11/23 23:26:08 INFO mapred.JobClient: Counters: 14
10/11/23 23:26:08 INFO mapred.JobClient:   FileSystemCounters
10/11/23 23:26:08 INFO mapred.JobClient:     FILE_BYTES_READ=29391004
10/11/23 23:26:08 INFO mapred.JobClient:     HDFS_BYTES_READ=34695852
10/11/23 23:26:08 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=11751438
10/11/23 23:26:08 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=29386726
10/11/23 23:26:08 INFO mapred.JobClient:   Map-Reduce Framework
10/11/23 23:26:08 INFO mapred.JobClient:     Reduce input groups=0
10/11/23 23:26:08 INFO mapred.JobClient:     Combine output records=0
10/11/23 23:26:08 INFO mapred.JobClient:     Map input records=1000209
10/11/23 23:26:08 INFO mapred.JobClient:     Reduce shuffle bytes=0
10/11/23 23:26:08 INFO mapred.JobClient:     Reduce output records=0
10/11/23 23:26:08 INFO mapred.JobClient:     Spilled Records=2000418
10/11/23 23:26:08 INFO mapred.JobClient:     Map output bytes=1990807
10/11/23 23:26:08 INFO mapred.JobClient:     Combine input records=0
10/11/23 23:26:08 INFO mapred.JobClient:     Map output records=1000209
10/11/23 23:26:08 INFO mapred.JobClient:     Reduce input records=0
10/11/23 23:26:08 WARN hdfs.DFSClient: DataStreamer Exception: 
org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/user/hadoop/data/out1/_temporary/_attempt_local_0001_r_000000_0/part-r-00000.gz
 File does not exist. Holder DFSClient_-658415795 does not have any open files.


Do i need to add more arguments in my command line or something else? I don't 
see file written by Hadoop, and i'm thinking about a setup with cloudera on 
Linux (i'm working on Mac OS), but first of all i should know if all i'm doing 
is right or not, and how i can edit my code. Thanks again, really ;)



Il giorno 22/nov/2010, alle ore 20.41, Sean Owen ha scritto:

> I am not sure what you are trying to do.
> 
> The class you wrote looks fine for the purpose below -- except that in
> the constructor you try to run a full evaluation of the recommender!
> That's definitely not right. Remove that.
> 
> But if you are trying to run an eval, then that has nothing to do with
> Hadoop, so the command below doesn't make sense.
> 
> On Mon, Nov 22, 2010 at 7:36 PM, Stefano Bellasio
> <[email protected]> wrote:
>> ok, thanks for your time Sean, im a little bit confused, when i wrote that 
>> class can i use a command like this?
>> 
>> ../hadoop/bin/hadoop jar core/target/mahout-core-0.5-SNAPSHOT.job.jar 
>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -i input/ratings.txt 
>> -o data/output --recommenderClassName 
>> org.apache.mahout.cf.taste.hadoop.item.ItemBased
>> 
>> thanks
>> 
>> Il giorno 22/nov/2010, alle ore 20.22, Sean Owen ha scritto:
>> 
>>> Mahout in Action
>>> 
>>> (We now interrupt for an ad break:  http://manning.com/owen)
>>> 
>>> On Mon, Nov 22, 2010 at 7:17 PM, Thomas De Vos <[email protected]> 
>>> wrote:
>>>> Sean,
>>>> 
>>>> Which book are you referring to?
>>>> 
>>>> Thanks
>>>> 
>>>> Thomas
>> 
>>

Re: Grouplens dataset Recommenderjob with Hadoop

Reply via email to