dedup still uses the old stuff because we can't run mapreduce jobs from plugins so that didnt change.
-----Original message----- > From:Nicholas Roberts <[email protected]> > Sent: Wednesday 14th August 2013 22:03 > To: [email protected] > Subject: Re: Nutch 1.7 on Hadoop Exception in thread "main" > java.lang.ClassNotFoundException: org.apache.nutch.indexer.solr.SolrIndexer > > sure, I realize its just the indexing backends plugin that's changed > > am about to run a full set again but the only two I had problems with where > de-duplication and indexing > > more soon > > ps: my next question will be how to script this, those Hadoop command lines > are doing my head in > > > On Wed, Aug 14, 2013 at 12:48 PM, Markus Jelsma > <[email protected]>wrote: > > > Also, the webgraph is not part of indexing. That just has a ScoreUpdater > > tool that writes scores back to the crawldb, those are still passed via the > > boost field in IndexerMapReduce. > > > > -----Original message----- > > > From:Nicholas Roberts <[email protected]> > > > Sent: Wednesday 14th August 2013 21:44 > > > To: [email protected] > > > Subject: Re: Nutch 1.7 on Hadoop Exception in thread "main" > > java.lang.ClassNotFoundException: org.apache.nutch.indexer.solr.SolrIndexer > > > > > > ok, cracked open the src and found IndexingJob and this below works > > > > > > however, I read in that JIRA issue that there would be backwards > > > compatability? Webgraph, Linkdb etc all work as before, so is it hard to > > be > > > backwards compatible? > > > > > > sudo -u hdfs hadoop jar > > > /opt/nutch/apache-nutch-1.7/build/apache-nutch-1.7.job > > > org.apache.nutch.indexer.IndexingJob -D solr.server.url= > > > http://solr.server.tld:8088/solr/core1/ /user/crawl-1.7-10-5000/crawldb > > > -linkdb /user/crawl-1.7-10-5000/linkdb -dir > > /user/crawl-1.7-10-5000/segments > > > > > > > > > > > > On Wed, Aug 14, 2013 at 12:21 PM, Nicholas Roberts < > > > [email protected]> wrote: > > > > > > > I read that previously, but I wasn't sure exactly how I was to run a > > > > Hadoop job > > > > > > > > so, the old Hadoop methods are no longer supported? > > > > > > > > is there an equivalent to below in the new indexer backend ? > > > > > > > > sudo -u hdfs hadoop jar > > > > > /opt/nutch/apache-nutch-1.7/build/apache-nutch-1.7.job > > > > > org.apache.nutch.indexer.solr.SolrIndexer -solr > > > > > http://solr.server.tld:8088/solr/core1/ /user/crawl-1.7-1/crawldb > > > > -linkdb > > > > > /user/crawl-1.7-1/linkdb -dir /user/crawl-1.7-1/segments > > > > > > > > > > > > On Wed, Aug 14, 2013 at 11:33 AM, Markus Jelsma < > > > > [email protected]> wrote: > > > > > > > >> That's right. Check NUTCH-1047, that is what changed: > > > >> https://issues.apache.org/jira/browse/NUTCH-1047 > > > >> > > > >> -----Original message----- > > > >> > From:Nicholas Roberts <[email protected]> > > > >> > Sent: Wednesday 14th August 2013 20:11 > > > >> > To: [email protected] > > > >> > Subject: Nutch 1.7 on Hadoop Exception in thread "main" > > > >> java.lang.ClassNotFoundException: > > org.apache.nutch.indexer.solr.SolrIndexer > > > >> > > > > >> > hi > > > >> > > > > >> > I am testing upgrading from Nutch 1.6 to Nutch 1.7 and seem to have > > a > > > >> > problem with the SolrIndexer > > > >> > > > > >> > on Nutch 1.6 this works fine > > > >> > > > > >> > sudo -u hdfs hadoop jar > > > >> > /opt/nutch/apache-nutch-1.7/build/apache-nutch-1.7.job > > > >> > org.apache.nutch.indexer.solr.SolrIndexer -solr > > > >> > http://solr.server.tld:8088/solr/core1/ /user/crawl-1.7-1/crawldb > > > >> -linkdb > > > >> > /user/crawl-1.7-1/linkdb -dir /user/crawl-1.7-1/segments > > > >> > > > > >> > > > > >> > on Nutch 1.7 I get error > > > >> > > > > >> > > > > >> > Exception in thread "main" java.lang.ClassNotFoundException: > > > >> > org.apache.nutch.indexer.solr.SolrIndexer > > > >> > > > > >> > > > > >> > Exception in thread "main" java.lang.ClassNotFoundException: > > > >> > org.apache.nutch.indexer.solr.SolrIndexer > > > >> > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > > > >> > at java.security.AccessController.doPrivileged(Native > > Method) > > > >> > at > > java.net.URLClassLoader.findClass(URLClassLoader.java:190) > > > >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > > > >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > > > >> > at java.lang.Class.forName0(Native Method) > > > >> > at java.lang.Class.forName(Class.java:247) > > > >> > at org.apache.hadoop.util.RunJar.main(RunJar.java:201) > > > >> > > > > >> > -- > > > >> > > > > >> > -- > > > >> > Nicholas Roberts > > > >> > US 510-684-8264 > > > >> > http://Permaculture.TV <http://permaculture.tv/> > > > >> > > > > >> > > > > > > > > > > > > > > > > -- > > > > > > > > -- > > > > Nicholas Roberts > > > > US 510-684-8264 > > > > http://Permaculture.TV <http://permaculture.tv/> > > > > > > > > > > > > > > > > -- > > > > > > -- > > > Nicholas Roberts > > > US 510-684-8264 > > > http://Permaculture.TV <http://permaculture.tv/> > > > > > > > > > -- > > -- > Nicholas Roberts > US 510-684-8264 > http://Permaculture.TV <http://permaculture.tv/> >

