When I run my crawl in hadoop I’m getting the below error. Googling suggests 
there’s a version conflict between Lucene jars. How do I fix it?

attempt_201505041850_0035_r_000001_0: MahoutInterestClassifierPlugin startUp 
complete!
15/05/05 17:23:23 INFO mapred.JobClient: Task Id : 
attempt_201505041850_0035_r_000004_0, Status : FAILED
Error: LUCENE_36

Nutch 1.9 is set up on Amazon EMR 1.0.3 (Hadoop 1.x) by ssh’ing into the master 
and compiled the source. I defined the properties for elastic.host, 
elastic.port, elastic.index in nutch-site.xml, then ran ant to compile the jar.

Elasticsearch 1.3.4 was installed onto the master by fetching the debian 
package from elastic.co<http://elastic.co> and installed via dpkg. The 
elasticsearch service was started, then I created an index matching the value 
defined in elastic.index.

Scott Lundgren
Software Engineer
(704) 973-7388
[email protected]<mailto:[email protected]>

QuietStream Financial, LLC<http://www.quietstreamfinancial.com>
11121 Carmel Commons Boulevard | Suite 250
Charlotte, North Carolina 28226

Our Portfolio of Commercial Real Estate Solutions:
•        <http://www.defeasewithease.com> Commercial 
Defeasance<http://www.defeasewithease.com/> (Defease With Ease®)
•        Fairview Real Estate Solutions<http://www.fairviewres.com/>
•        Great River Mortgage Capital<http://www.greatrivermortgagecapital.com/>
•        Tax Credit Asset Management<http://www.tcamre.com/>
•        Radian Generation<http://www.radiangeneration.com/>
•        EntityKeeper<http://www.entitykeeper.com/>™
•        Crowd With Ease<http://www.crowdwithease.com>™
•        FullCapitalStack<http://www.fullcapitalstack.com>™
•        CrowdRabbit<http://www.crowdrabbit.com>™

Reply via email to