10 minutes after writing this, I found the answer, but thought I'd share anyway...
Hi, I'm attempting to follow the notes here: http://svn.apache.org/repos/asf/mahout/trunk/examples/bin/factorize-movielens-1M.sh I can successfully run the splitDataset job, but I get a failure when running parallelALS on my dataset: java.lang.NumberFormatException: For input string: "2937047778" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:495) at java.lang.Integer.parseInt(Integer.java:527) at org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils.readID(TasteHadoopUtils.java:61) I can see that my ID is too large for Integer/parseInt - is that a bug? I'd think that if the splitDataset, recommenditembased and itemsimilarity jobs all work fine with Long IDs, then the parallelALS job would as well? Wait! --- 10 minutes later, after Googling and finding the source code here: http://svn.apache.org/repos/asf/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/TasteHadoopUtils.java I see that there's an additional argument to readID called usesLongIDs, then I look at the help for parallelALS and see the "--usesLongIDs" option. I set that to true and bam! Things are working nicely! Maybe this will help someone else :) - Matt
