When I run my crawl in hadoop I’m getting the below error. Googling suggests there’s a version conflict between Lucene jars. How do I fix it?
attempt_201505041850_0035_r_000001_0: MahoutInterestClassifierPlugin startUp complete! 15/05/05 17:23:23 INFO mapred.JobClient: Task Id : attempt_201505041850_0035_r_000004_0, Status : FAILED Error: LUCENE_36 Nutch 1.9 is set up on Amazon EMR 1.0.3 (Hadoop 1.x) by ssh’ing into the master and compiled the source. I defined the properties for elastic.host, elastic.port, elastic.index in nutch-site.xml, then ran ant to compile the jar. Elasticsearch 1.3.4 was installed onto the master by fetching the debian package from elastic.co<http://elastic.co> and installed via dpkg. The elasticsearch service was started, then I created an index matching the value defined in elastic.index. Scott Lundgren Software Engineer (704) 973-7388 [email protected]<mailto:[email protected]> QuietStream Financial, LLC<http://www.quietstreamfinancial.com> 11121 Carmel Commons Boulevard | Suite 250 Charlotte, North Carolina 28226 Our Portfolio of Commercial Real Estate Solutions: • <http://www.defeasewithease.com> Commercial Defeasance<http://www.defeasewithease.com/> (Defease With Ease®) • Fairview Real Estate Solutions<http://www.fairviewres.com/> • Great River Mortgage Capital<http://www.greatrivermortgagecapital.com/> • Tax Credit Asset Management<http://www.tcamre.com/> • Radian Generation<http://www.radiangeneration.com/> • EntityKeeper<http://www.entitykeeper.com/>™ • Crowd With Ease<http://www.crowdwithease.com>™ • FullCapitalStack<http://www.fullcapitalstack.com>™ • CrowdRabbit<http://www.crowdrabbit.com>™

