On Oct 4, 2013, at 5:40 PM, Ken Krugler <[email protected]> wrote:

> Hi Adam,
> 
> On Oct 4, 2013, at 4:38am, Adam Warski wrote:
> 
>> Hello,
>> 
>> I'm trying to run the hadoop-based recommender job 
>> (org.apache.mahout.cf.taste.hadoop.item.RecommenderJob) from Mahout 0.8 on 
>> EMR. I'm using the "Amazon Distribution" Hadoop, which is version 1.0.3. 
>> Locally running the job with that version works just fine - I get the 
>> expected output.
>> 
>> On EMR, however, the job fails with the given exception: 
>> java.lang.NoSuchMethodError: org.apache.lucene.util.PriorityQueue.<init>(I)V 
>> (full stack trace: https://gist.github.com/adamw/6824585).
>> 
>> Looking at the EMR documentation 
>> (http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-ami.html),
>>  the AMI contains Lucene 2.9.4, while Mahout uses 4.3.0. And indeed, in 
>> Lucene 2.x there's not PriorityQueue(int) constructor, while in Lucene 4.x 
>> there is.
>> 
>> Is there some known way to solve this problem and run Mahout on EMR? I 
>> though about using a bootstrap action, but then replacing lucene will 
>> probably trigger a long chain of dependencies which would have to be updated 
>> as well.
> 
> We wound up in the same situation, and went ahead with updating everything to 
> Solr/Lucene 4.2.1, IIRC.
> 
> The one oddity we ran into was a Solr (not Lucene) dependency on a newer 
> version of HttpClient (4.2.3) than what was installed on EMR's servers, so we 
> also had to update that jar and about 4 other friends from the HttpCore 
> family.
> 
> If you go this route, you'll want to hop onto a slave in your EMR cluster and 
> take a look at all of the jars in the Hadoop /lib directory, as it's a long 
> (and somewhat odd) list that should be reviewed against what your project 
> depends on.
> 
> -- Ken

Thanks, did just that and described on my blog:
http://www.warski.org/blog/2013/10/using-amazons-elastic-map-reduce-to-compute-recommendations-with-apache-mahout-0-8/

Adam

-- 
Adam Warski

http://twitter.com/#!/adamwarski
http://www.softwaremill.com
http://www.warski.org

Reply via email to