My input is csv, of the form userid, itemid I set booleanData=true
So my script looks like this: #!/bin/bash # --input = hdfs file/dir containing the history to process # --output = hdfs directory to put output into # --usersFile = user ids to produce recommendations for # This will run a co-occurrence algorithm on it mahoutdir=/home/sreavely/mahout-0.4 mahoutver=0.4-SNAPSHOT hadoop jar $mahoutdir/mahout-core-$mahoutver.job org.apache.mahout.cf.taste.hadoop.item.RecommenderJob --input /user/sreavely/mahout-boolean-enduseraction-input.csv --output /user/sreavely/mahout-output --usersFile /user/sreavely/mahout-users-to-recommend-for.txt --booleanData true Cheers, Simon p.s. I also have a dataset with a preference column that I've not tested with yet. On Tue, Aug 10, 2010 at 1:38 AM, Sean Owen <[email protected]> wrote: > I think your input is malformed, what does it look like? > (But the error could be better.) > > On Mon, Aug 9, 2010 at 3:14 PM, Simon Reavely <[email protected]> > wrote: > > I built and hacked together 0.4-snapshot from src > > > > It now finds the class files - hurrah! > > However, I now get an ArrayIndexOutOfBoundsException > > > > > > 10/08/09 16:07:14 INFO mapred.JobClient: Task Id : > > attempt_201005101218_0012_m_000000_2, Status : FAILED > > java.lang.ArrayIndexOutOfBoundsException: 1 > > at > > > org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47) > > at > > > org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > > > > > Looking at the source code, the issue is on the array indexing on tokens > > below, which seems to be an issue > > with: TasteHadoopUtils.splitPrefTokens(value.toString()); > > > > @Override > > protected void map(LongWritable key, > > Text value, > > Context context) throws IOException, > > InterruptedException { > > String[] tokens = TasteHadoopUtils.splitPrefTokens(value.toString()); > > long itemID = Long.parseLong(tokens[transpose ? 0 : 1]); > > int index = TasteHadoopUtils.idToIndex(itemID); > > context.write(new VarIntWritable(index), new VarLongWritable(itemID)); > > } > > > > Any ideas? Please note, i suspect that this might be an issue with how I > > hacked together my package since I can't figure out how to create a > proper > > binary release from src. > > > > If not, I'm off to the debugger! > > > > Cheers, > > Simon > -- Simon Reavely [email protected]
