Hi Matt, the ability to use long ids for the ALS recommender was a long outstanding user wish. Great that it turned out to be useful for you!
I'd love to hear about your dataset and usecase, if you wanna share that information with us. On 22.08.2013 02:36, Matt Mitchell wrote: > 10 minutes after writing this, I found the answer, but thought I'd share > anyway... > > Hi, > > I'm attempting to follow the notes here: > > http://svn.apache.org/repos/asf/mahout/trunk/examples/bin/factorize-movielens-1M.sh > > I can successfully run the splitDataset job, but I get a failure when > running parallelALS on my dataset: > > java.lang.NumberFormatException: For input string: "2937047778" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:495) > at java.lang.Integer.parseInt(Integer.java:527) > at > org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils.readID(TasteHadoopUtils.java:61) > > I can see that my ID is too large for Integer/parseInt - is that a bug? I'd > think that if the splitDataset, recommenditembased and itemsimilarity jobs > all work fine with Long IDs, then the parallelALS job would as well? > > Wait! > > --- 10 minutes later, after Googling and finding the source code hre: > > http://svn.apache.org/repos/asf/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/TasteHadoopUtils.java > > I see that there's an additional argument to readID called usesLongIDs, > then I look at the help for parallelALS and see the "--usesLongIDs" option. > I set that to true and bam! Things are working nicely! > > Maybe this will help someone else :) > > - Matt >
