Ah OK. You are trying to run the regular Mahout classes on the
Wikipedia data set. This won't work since the format is wrong.

The book puts forth in listing 6.1 an alternate Mapper which parses
the Wikipedia format. You could easily stick that in, in place of the
usual Mapper, in RecommenderJob, instead of recreating your own
pipeline. It should be otherwise the same.

On Mon, Aug 9, 2010 at 12:29 PM, Florent Empis <[email protected]> wrote:
> Actually, Mahout In Action example (using wikipedia article set) states that
> input file is of the form
>
> user item1, item2, itemN
>
> The first step of the job is described as splitting this line with a regex,
> constructing lines of the form
> user item1
> ..
> user itemN
>
> (no preferences as this is a boolean preference dataset)
>
> I've yet to have some time to dive into the code, but I suspect either the
> splitting step has been omitted in the example, or the author assumed the
> job did it but doesn't anymore.
> It should be quite straightforward to investigate this tomorrow.
>
> I turned to the ML in the hope someone already had the issue and found the
> problem. I'll dig the problem out and report in the following days ! ;)
>
> 2010/8/9 Sean Owen <[email protected]>
>
>> The input file format looks wrong. It should be of the form
>> "userID,itemID[,preference]". I think that's your problem here?
>>
>> On Mon, Aug 9, 2010 at 9:18 AM, Florent Empis <[email protected]>
>> wrote:
>> > Hi,
>> >
>> > I just tried to follow Mahout In Action, 6.4.2 Running recommendations
>> with
>> > Hadoop
>> >
>> > When I launch
>> >  bin/hadoop jar ~/mahout/trunk/core/target/mahout-core-0.4-SNAPSHOT.job
>> >  org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>> > -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
>> > input/users.txt --booleanData
>> >
>> > It fails and the job's log contains many occurences of the following
>> > exception. I do understand why the number format fails, but I don't
>> > understand why it's attempting it in the first place....Anyone had
>> success
>> > running this example?
>> >
>> > MapAttempt TASK_TYPE="MAP" TASKID="task_201008091547_0003_m_000000"
>> > TASK_ATTEMPT_ID="attempt_201008091547_0003_m_000000_0"
>> TASK_STATUS="FAILED"
>> > FINISH_TIME="1281362639468" HOSTNAME="localhost"
>> > ERROR="java\.lang\.NumberFormatException: For input string: \"1:
>> 1664968\"
>> > at
>> >
>> java\.lang\.NumberFormatException\.forInputString(NumberFormatException\.java:48)
>> > at java\.lang\.Long\.parseLong(Long\.java:419)
>> > at java\.lang\.Long\.parseLong(Long\.java:468)
>> > at
>> >
>> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:40)
>> > at
>> >
>> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:31)
>> > at org\.apache\.hadoop\.mapreduce\.Mapper\.run(Mapper\.java:144)
>> > at org\.apache\.hadoop\.mapred\.MapTask\.runNewMapper(MapTask\.java:621)
>> > at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
>> > at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
>> >
>> > Many thanks,
>> >
>> > Florent
>> >
>>
>

Reply via email to