Actually, Mahout In Action example (using wikipedia article set) states that
input file is of the form

user item1, item2, itemN

The first step of the job is described as splitting this line with a regex,
constructing lines of the form
user item1
..
user itemN

(no preferences as this is a boolean preference dataset)

I've yet to have some time to dive into the code, but I suspect either the
splitting step has been omitted in the example, or the author assumed the
job did it but doesn't anymore.
It should be quite straightforward to investigate this tomorrow.

I turned to the ML in the hope someone already had the issue and found the
problem. I'll dig the problem out and report in the following days ! ;)

2010/8/9 Sean Owen <[email protected]>

> The input file format looks wrong. It should be of the form
> "userID,itemID[,preference]". I think that's your problem here?
>
> On Mon, Aug 9, 2010 at 9:18 AM, Florent Empis <[email protected]>
> wrote:
> > Hi,
> >
> > I just tried to follow Mahout In Action, 6.4.2 Running recommendations
> with
> > Hadoop
> >
> > When I launch
> >  bin/hadoop jar ~/mahout/trunk/core/target/mahout-core-0.4-SNAPSHOT.job
> >  org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> > -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
> > input/users.txt --booleanData
> >
> > It fails and the job's log contains many occurences of the following
> > exception. I do understand why the number format fails, but I don't
> > understand why it's attempting it in the first place....Anyone had
> success
> > running this example?
> >
> > MapAttempt TASK_TYPE="MAP" TASKID="task_201008091547_0003_m_000000"
> > TASK_ATTEMPT_ID="attempt_201008091547_0003_m_000000_0"
> TASK_STATUS="FAILED"
> > FINISH_TIME="1281362639468" HOSTNAME="localhost"
> > ERROR="java\.lang\.NumberFormatException: For input string: \"1:
> 1664968\"
> > at
> >
> java\.lang\.NumberFormatException\.forInputString(NumberFormatException\.java:48)
> > at java\.lang\.Long\.parseLong(Long\.java:419)
> > at java\.lang\.Long\.parseLong(Long\.java:468)
> > at
> >
> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:40)
> > at
> >
> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:31)
> > at org\.apache\.hadoop\.mapreduce\.Mapper\.run(Mapper\.java:144)
> > at org\.apache\.hadoop\.mapred\.MapTask\.runNewMapper(MapTask\.java:621)
> > at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
> > at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
> >
> > Many thanks,
> >
> > Florent
> >
>

Reply via email to