My need is to execute item-based recommendations on Hadoop, before with pseudo
mode and then with a cluster mode of Hadoop. So i'm trying to use this class,
and following your suggestion now it looks like this:
package org.apache.mahout.cf.taste.impl.recommender;
import org.apache.mahout.cf.taste.common.Refreshable;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.example.grouplens.GroupLensDataModel;
import org.apache.mahout.cf.taste.impl.eval.LoadEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.recommender.svd.SVDRecommender;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
import java.io.*;
import java.util.*;
public class ItemBased implements Recommender {
private final Recommender delegate;
private DataModel model;
public ItemBased(DataModel model) throws TasteException,
IOException {
ItemSimilarity similarity = new
PearsonCorrelationSimilarity(model);
delegate = new GenericItemBasedRecommender(model, similarity);
}
@Override
public float estimatePreference(long userID, long itemID)
throws TasteException {
// TODO Auto-generated method stub
return delegate.estimatePreference(userID, itemID);
}
@Override
public DataModel getDataModel() {
return delegate.getDataModel();
}
@Override
public List<RecommendedItem> recommend(long userID, int howMany)
throws TasteException {
return delegate.recommend(userID, howMany);
}
@Override
public List<RecommendedItem> recommend(long userID, int howMany,
IDRescorer rescorer) throws TasteException {
// TODO Auto-generated method stub
return delegate.recommend(userID, howMany);
}
@Override
public void removePreference(long userID, long itemID)
throws TasteException {
// TODO Auto-generated method stub
}
@Override
public void setPreference(long userID, long itemID, float value)
throws TasteException {
delegate.setPreference(userID, itemID, value);
}
@Override
public void refresh(Collection<Refreshable> alreadyRefreshed) {
delegate.refresh(alreadyRefreshed);
}
}
I run this with commandline:
../hadoop/bin/hadoop jar core/target/mahout-core-0.5-SNAPSHOT-job.jar
org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob -i input/ratings.txt -o
data/out1 --recommenderClassName
org.apache.mahout.cf.taste.hadoop.item.ItemBased
Now everything seems start well and then i see this ouput with error (before
all the hadoop related errors):
10/11/23 23:25:45 INFO common.AbstractJob: Command line arguments:
{--endPhase=2147483647, --input=input/ratings.txt, --numRecommendations=10,
--output=data/out1,
--recommenderClassName=org.apache.mahout.cf.taste.hadoop.item.ItemBased,
--startPhase=0, --tempDir=temp}
10/11/23 23:25:46 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
10/11/23 23:25:48 INFO input.FileInputFormat: Total input paths to process : 1
10/11/23 23:25:49 INFO input.FileInputFormat: Total input paths to process : 1
10/11/23 23:25:49 INFO mapred.JobClient: Running job: job_local_0001
10/11/23 23:25:49 INFO mapred.MapTask: io.sort.mb = 100
10/11/23 23:25:50 INFO mapred.MapTask: data buffer = 79691776/99614720
10/11/23 23:25:50 INFO mapred.MapTask: record buffer = 262144/327680
10/11/23 23:25:50 INFO mapred.JobClient: map 0% reduce 0%
10/11/23 23:25:51 INFO mapred.MapTask: Spilling map output: record full = true
10/11/23 23:25:51 INFO mapred.MapTask: bufstart = 0; bufend = 514677; bufvoid =
99614720
10/11/23 23:25:51 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length =
327680
10/11/23 23:25:51 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
10/11/23 23:25:51 INFO compress.CodecPool: Got brand-new compressor
10/11/23 23:25:52 INFO mapred.MapTask: Finished spill 0
10/11/23 23:25:53 INFO mapred.MapTask: Spilling map output: record full = true
10/11/23 23:25:53 INFO mapred.MapTask: bufstart = 514677; bufend = 1038963;
bufvoid = 99614720
10/11/23 23:25:53 INFO mapred.MapTask: kvstart = 262144; kvend = 196607; length
= 327680
10/11/23 23:25:54 INFO mapred.MapTask: Finished spill 1
10/11/23 23:25:54 INFO mapred.MapTask: Spilling map output: record full = true
10/11/23 23:25:54 INFO mapred.MapTask: bufstart = 1038963; bufend = 1563249;
bufvoid = 99614720
10/11/23 23:25:54 INFO mapred.MapTask: kvstart = 196607; kvend = 131070; length
= 327680
10/11/23 23:25:55 INFO mapred.MapTask: Finished spill 2
10/11/23 23:25:55 INFO mapred.MapTask: Starting flush of map output
10/11/23 23:25:55 INFO mapred.LocalJobRunner:
10/11/23 23:25:56 INFO mapred.JobClient: map 100% reduce 0%
10/11/23 23:25:57 INFO mapred.MapTask: Finished spill 3
10/11/23 23:25:57 INFO mapred.Merger: Merging 4 sorted segments
10/11/23 23:25:57 INFO compress.CodecPool: Got brand-new decompressor
10/11/23 23:25:57 INFO compress.CodecPool: Got brand-new decompressor
10/11/23 23:25:57 INFO compress.CodecPool: Got brand-new decompressor
10/11/23 23:25:57 INFO compress.CodecPool: Got brand-new decompressor
10/11/23 23:25:57 INFO mapred.Merger: Down to the last merge-pass, with 4
segments left of total size: 17968 bytes
10/11/23 23:25:58 INFO mapred.LocalJobRunner:
10/11/23 23:26:01 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is
done. And is in the process of commiting
10/11/23 23:26:01 INFO mapred.LocalJobRunner:
10/11/23 23:26:01 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0'
done.
10/11/23 23:26:01 INFO mapred.LocalJobRunner:
10/11/23 23:26:01 INFO mapred.Merger: Merging 1 sorted segments
10/11/23 23:26:01 INFO mapred.Merger: Down to the last merge-pass, with 1
segments left of total size: 17813 bytes
10/11/23 23:26:01 INFO mapred.LocalJobRunner:
10/11/23 23:26:02 INFO file.FileDataModel: Creating FileDataModel for file
/var/folders/2E/2ETlu9HiG5mqvGvJQNUF5U+++TQ/-Tmp-/mahout-taste-hadoop2686971206570947472txt
10/11/23 23:26:02 INFO file.FileDataModel: Reading file info...
10/11/23 23:26:06 INFO file.FileDataModel: Processed 1000000 lines
10/11/23 23:26:06 INFO file.FileDataModel: Read lines: 1000209
10/11/23 23:26:07 INFO mapred.LocalJobRunner: reduce > reduce
10/11/23 23:26:07 INFO mapred.JobClient: map 100% reduce 74%
10/11/23 23:26:07 INFO model.GenericDataModel: Processed 6040 users
10/11/23 23:26:07 INFO file.FileDataModel: Creating FileDataModel for file
/Users/hadoop/trunk/core/src/main/java/intro.csv
10/11/23 23:26:07 INFO file.FileDataModel: Reading file info...
10/11/23 23:26:07 INFO file.FileDataModel: Read lines: 21
10/11/23 23:26:08 INFO model.GenericDataModel: Processed 5 users
10/11/23 23:26:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting
timing of 4 tasks in 2 threads
10/11/23 23:26:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Average
time per recommendation: 0ms
10/11/23 23:26:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Approximate
memory used: 84MB / 208MB
10/11/23 23:26:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Unable to
recommend in 0 cases
RecommendedItem[item:104, value:5.0]
10/11/23 23:26:08 WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalStateException:
org.apache.mahout.cf.taste.common.NoSuchUserException
at
org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderReducer.reduce(RecommenderReducer.java:103)
at
org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderReducer.reduce(RecommenderReducer.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
Caused by: org.apache.mahout.cf.taste.common.NoSuchUserException
at
org.apache.mahout.cf.taste.impl.model.GenericDataModel.getPreferencesFromUser(GenericDataModel.java:206)
at
org.apache.mahout.cf.taste.impl.model.file.FileDataModel.getPreferencesFromUser(FileDataModel.java:627)
at
org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender.getNumPreferences(GenericItemBasedRecommender.java:216)
at
org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender.recommend(GenericItemBasedRecommender.java:101)
at
org.apache.mahout.cf.taste.impl.recommender.AbstractRecommender.recommend(AbstractRecommender.java:64)
at
org.apache.mahout.cf.taste.hadoop.item.ItemBased.recommend(ItemBased.java:54)
at
org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderReducer.reduce(RecommenderReducer.java:101)
... 5 more
10/11/23 23:26:08 INFO mapred.JobClient: Job complete: job_local_0001
10/11/23 23:26:08 INFO mapred.JobClient: Counters: 14
10/11/23 23:26:08 INFO mapred.JobClient: FileSystemCounters
10/11/23 23:26:08 INFO mapred.JobClient: FILE_BYTES_READ=29391004
10/11/23 23:26:08 INFO mapred.JobClient: HDFS_BYTES_READ=34695852
10/11/23 23:26:08 INFO mapred.JobClient: FILE_BYTES_WRITTEN=11751438
10/11/23 23:26:08 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29386726
10/11/23 23:26:08 INFO mapred.JobClient: Map-Reduce Framework
10/11/23 23:26:08 INFO mapred.JobClient: Reduce input groups=0
10/11/23 23:26:08 INFO mapred.JobClient: Combine output records=0
10/11/23 23:26:08 INFO mapred.JobClient: Map input records=1000209
10/11/23 23:26:08 INFO mapred.JobClient: Reduce shuffle bytes=0
10/11/23 23:26:08 INFO mapred.JobClient: Reduce output records=0
10/11/23 23:26:08 INFO mapred.JobClient: Spilled Records=2000418
10/11/23 23:26:08 INFO mapred.JobClient: Map output bytes=1990807
10/11/23 23:26:08 INFO mapred.JobClient: Combine input records=0
10/11/23 23:26:08 INFO mapred.JobClient: Map output records=1000209
10/11/23 23:26:08 INFO mapred.JobClient: Reduce input records=0
10/11/23 23:26:08 WARN hdfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/user/hadoop/data/out1/_temporary/_attempt_local_0001_r_000000_0/part-r-00000.gz
File does not exist. Holder DFSClient_-658415795 does not have any open files.
Do i need to add more arguments in my command line or something else? I don't
see file written by Hadoop, and i'm thinking about a setup with cloudera on
Linux (i'm working on Mac OS), but first of all i should know if all i'm doing
is right or not, and how i can edit my code. Thanks again, really ;)
Il giorno 22/nov/2010, alle ore 20.41, Sean Owen ha scritto:
> I am not sure what you are trying to do.
>
> The class you wrote looks fine for the purpose below -- except that in
> the constructor you try to run a full evaluation of the recommender!
> That's definitely not right. Remove that.
>
> But if you are trying to run an eval, then that has nothing to do with
> Hadoop, so the command below doesn't make sense.
>
> On Mon, Nov 22, 2010 at 7:36 PM, Stefano Bellasio
> <[email protected]> wrote:
>> ok, thanks for your time Sean, im a little bit confused, when i wrote that
>> class can i use a command like this?
>>
>> ../hadoop/bin/hadoop jar core/target/mahout-core-0.5-SNAPSHOT.job.jar
>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -i input/ratings.txt
>> -o data/output --recommenderClassName
>> org.apache.mahout.cf.taste.hadoop.item.ItemBased
>>
>> thanks
>>
>> Il giorno 22/nov/2010, alle ore 20.22, Sean Owen ha scritto:
>>
>>> Mahout in Action
>>>
>>> (We now interrupt for an ad break: http://manning.com/owen)
>>>
>>> On Mon, Nov 22, 2010 at 7:17 PM, Thomas De Vos <[email protected]>
>>> wrote:
>>>> Sean,
>>>>
>>>> Which book are you referring to?
>>>>
>>>> Thanks
>>>>
>>>> Thomas
>>
>>