I suspect the dedup is related to the digest field issue and the Drupal
ApacheSolr specific solrmappings.xml

from Apache Solr Examples Nutch Multisite module on Drupal.org
http://drupalcode.org/project/apachesolr_examples.git/tree/HEAD:/apachesolr_multinutch/templates

<fields> <field dest="content" source="content"/> <field dest="label"
source="title"/> <field dest="site" source="host"/> <copyField
source="host" dest="bundle"/> <field dest="ss_nutch_segment"
source="segment"/> <field dest="boost" source="boost"/> <field dest="id"
source="digest"/> <copyField source="digest" dest="ss_nutch_bundle"/>
<field dest="timestamp" source="tstamp"/> <field dest="url" source="url"/>
<field dest="entity_type" source="entity_type"/> <field
dest="ts_metatag_description" source="metatag.description"/> <field
dest="sm_vid_metatag.keywords" source="metatag.keywords"/> </fields>
<uniqueKey>id</uniqueKey>


On Wed, Aug 14, 2013 at 1:06 PM, Markus Jelsma
<[email protected]>wrote:

> dedup still uses the old stuff because we can't run mapreduce jobs from
> plugins so that didnt change.
>
>
>
> -----Original message-----
> > From:Nicholas Roberts <[email protected]>
> > Sent: Wednesday 14th August 2013 22:03
> > To: [email protected]
> > Subject: Re: Nutch 1.7 on Hadoop Exception in thread &quot;main&quot;
> java.lang.ClassNotFoundException: org.apache.nutch.indexer.solr.SolrIndexer
> >
> > sure, I realize its just the indexing backends plugin that's changed
> >
> > am about to run a full set again but the only two I had problems with
> where
> > de-duplication and indexing
> >
> > more soon
> >
> > ps: my next question will be how to script this, those Hadoop command
> lines
> > are doing my head in
> >
> >
> > On Wed, Aug 14, 2013 at 12:48 PM, Markus Jelsma
> > <[email protected]>wrote:
> >
> > > Also, the webgraph is not part of indexing. That just has a
> ScoreUpdater
> > > tool that writes scores back to the crawldb, those are still passed
> via the
> > > boost field in IndexerMapReduce.
> > >
> > > -----Original message-----
> > > > From:Nicholas Roberts <[email protected]>
> > > > Sent: Wednesday 14th August 2013 21:44
> > > > To: [email protected]
> > > > Subject: Re: Nutch 1.7 on Hadoop Exception in thread &quot;main&quot;
> > > java.lang.ClassNotFoundException:
> org.apache.nutch.indexer.solr.SolrIndexer
> > > >
> > > > ok, cracked open the src and found IndexingJob and this below works
> > > >
> > > > however, I read in that JIRA issue that there would be backwards
> > > > compatability? Webgraph, Linkdb etc all work as before, so is it
> hard to
> > > be
> > > > backwards compatible?
> > > >
> > > > sudo -u hdfs hadoop jar
> > > > /opt/nutch/apache-nutch-1.7/build/apache-nutch-1.7.job
> > > > org.apache.nutch.indexer.IndexingJob  -D solr.server.url=
> > > > http://solr.server.tld:8088/solr/core1//user/crawl-1.7-10-5000/crawldb
> > > > -linkdb /user/crawl-1.7-10-5000/linkdb -dir
> > > /user/crawl-1.7-10-5000/segments
> > > >
> > > >
> > > >
> > > > On Wed, Aug 14, 2013 at 12:21 PM, Nicholas Roberts <
> > > > [email protected]> wrote:
> > > >
> > > > > I read that previously, but I wasn't sure exactly how I was to run
> a
> > > > > Hadoop job
> > > > >
> > > > > so, the old Hadoop methods are no longer supported?
> > > > >
> > > > > is there an equivalent to below in the new indexer backend ?
> > > > >
> > > > >  sudo -u hdfs hadoop jar
> > > > > > /opt/nutch/apache-nutch-1.7/build/apache-nutch-1.7.job
> > > > > > org.apache.nutch.indexer.solr.SolrIndexer -solr
> > > > > > http://solr.server.tld:8088/solr/core1//user/crawl-1.7-1/crawldb
> > > > > -linkdb
> > > > > > /user/crawl-1.7-1/linkdb -dir /user/crawl-1.7-1/segments
> > > > >
> > > > >
> > > > > On Wed, Aug 14, 2013 at 11:33 AM, Markus Jelsma <
> > > > > [email protected]> wrote:
> > > > >
> > > > >> That's right. Check NUTCH-1047, that is what changed:
> > > > >> https://issues.apache.org/jira/browse/NUTCH-1047
> > > > >>
> > > > >> -----Original message-----
> > > > >> > From:Nicholas Roberts <[email protected]>
> > > > >> > Sent: Wednesday 14th August 2013 20:11
> > > > >> > To: [email protected]
> > > > >> > Subject: Nutch 1.7 on Hadoop Exception in thread
> &quot;main&quot;
> > > > >> java.lang.ClassNotFoundException:
> > > org.apache.nutch.indexer.solr.SolrIndexer
> > > > >> >
> > > > >> > hi
> > > > >> >
> > > > >> > I am testing upgrading from Nutch 1.6 to Nutch 1.7 and seem to
> have
> > > a
> > > > >> > problem with the SolrIndexer
> > > > >> >
> > > > >> > on Nutch 1.6 this works fine
> > > > >> >
> > > > >> > sudo -u hdfs hadoop jar
> > > > >> > /opt/nutch/apache-nutch-1.7/build/apache-nutch-1.7.job
> > > > >> > org.apache.nutch.indexer.solr.SolrIndexer -solr
> > > > >> > http://solr.server.tld:8088/solr/core1//user/crawl-1.7-1/crawldb
> > > > >> -linkdb
> > > > >> > /user/crawl-1.7-1/linkdb -dir /user/crawl-1.7-1/segments
> > > > >> >
> > > > >> >
> > > > >> > on Nutch 1.7 I get error
> > > > >> >
> > > > >> >
> > > > >> > Exception in thread "main" java.lang.ClassNotFoundException:
> > > > >> > org.apache.nutch.indexer.solr.SolrIndexer
> > > > >> >
> > > > >> >
> > > > >> > Exception in thread "main" java.lang.ClassNotFoundException:
> > > > >> > org.apache.nutch.indexer.solr.SolrIndexer
> > > > >> >         at
> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > > >> >         at java.security.AccessController.doPrivileged(Native
> > > Method)
> > > > >> >         at
> > > java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > > >> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > > >> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > > >> >         at java.lang.Class.forName0(Native Method)
> > > > >> >         at java.lang.Class.forName(Class.java:247)
> > > > >> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:201)
> > > > >> >
> > > > >> > --
> > > > >> >
> > > > >> > --
> > > > >> > Nicholas Roberts
> > > > >> > US 510-684-8264
> > > > >> > http://Permaculture.TV <http://permaculture.tv/>
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > --
> > > > > Nicholas Roberts
> > > > > US 510-684-8264
> > > > > http://Permaculture.TV <http://permaculture.tv/>
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > --
> > > > Nicholas Roberts
> > > > US 510-684-8264
> > > > http://Permaculture.TV <http://permaculture.tv/>
> > > >
> > >
> >
> >
> >
> > --
> >
> > --
> > Nicholas Roberts
> > US 510-684-8264
> > http://Permaculture.TV <http://permaculture.tv/>
> >
>



-- 

-- 
Nicholas Roberts
US 510-684-8264
http://Permaculture.TV <http://permaculture.tv/>

Reply via email to