Hi Adam, On Oct 15, 2013, at 4:21am, Adam Warski wrote:
> > On Oct 4, 2013, at 5:40 PM, Ken Krugler <[email protected]> wrote: > >> Hi Adam, >> >> On Oct 4, 2013, at 4:38am, Adam Warski wrote: >> >>> Hello, >>> >>> I'm trying to run the hadoop-based recommender job >>> (org.apache.mahout.cf.taste.hadoop.item.RecommenderJob) from Mahout 0.8 on >>> EMR. I'm using the "Amazon Distribution" Hadoop, which is version 1.0.3. >>> Locally running the job with that version works just fine - I get the >>> expected output. >>> >>> On EMR, however, the job fails with the given exception: >>> java.lang.NoSuchMethodError: >>> org.apache.lucene.util.PriorityQueue.<init>(I)V (full stack trace: >>> https://gist.github.com/adamw/6824585). >>> >>> Looking at the EMR documentation >>> (http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-ami.html), >>> the AMI contains Lucene 2.9.4, while Mahout uses 4.3.0. And indeed, in >>> Lucene 2.x there's not PriorityQueue(int) constructor, while in Lucene 4.x >>> there is. >>> >>> Is there some known way to solve this problem and run Mahout on EMR? I >>> though about using a bootstrap action, but then replacing lucene will >>> probably trigger a long chain of dependencies which would have to be >>> updated as well. >> >> We wound up in the same situation, and went ahead with updating everything >> to Solr/Lucene 4.2.1, IIRC. >> >> The one oddity we ran into was a Solr (not Lucene) dependency on a newer >> version of HttpClient (4.2.3) than what was installed on EMR's servers, so >> we also had to update that jar and about 4 other friends from the HttpCore >> family. >> >> If you go this route, you'll want to hop onto a slave in your EMR cluster >> and take a look at all of the jars in the Hadoop /lib directory, as it's a >> long (and somewhat odd) list that should be reviewed against what your >> project depends on. >> >> -- Ken > > Thanks, did just that and described on my blog: > http://www.warski.org/blog/2013/10/using-amazons-elastic-map-reduce-to-compute-recommendations-with-apache-mahout-0-8/ Excellent, glad it worked, and thanks for taking the time to write up the results. -- Ken -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
