Hi Lewis,
On 9/30/15, 5:55 PM, "Lewis John Mcgibbney" <[email protected]> wrote: >Hi Sherban, > >On Wed, Sep 30, 2015 at 5:41 PM, <[email protected]> >wrote: > >> >> OK. I¹m using SOLR 4.6.0. >> > >OK > > >> >> Caused by: org.apache.solr.common.SolrException: copyField source >> :'rawcontent' is not a glob and doesn't match any explicit field or >> dynamicField.. Schema file is >> /Users/sdrulea/Downloads/solr-4.6.0/example/solr/collection1/schema.xml >> >> > >[...snip] > > >> The only changes I made to schema.xml were to comment out lines with >> ³protwords.txt² as the tutorial suggested. Has anyone tested the 2.3.1 >> schema.xml with SOLR 4.6.1? >> > >As what tutorial suggested? This tutorial: https://wiki.apache.org/nutch/NutchTutorial >I run custom schemas for all sorts of jobs. The short answer here is that >the schema should work out of the box. If it does not then that is an >issue. It looks like the copyField is giving us a problem. >https://github.com/apache/nutch/blob/2.x/conf/schema.xml#L370 >This is not used within the field definitions by included as a copyField. >This is an error/big in the schema. >Can you please open an issue on the issue tracker and submit a patch? If >not then I will do it. >Can you also remove this line, then restart the Sorl server and view the >log. I removed this line and it worked. I’m not sure what to put in the issue tracker. Can you please do it? > > >> >> One of my original URLs ended with ³/". I added index.html and that >>fixed >> the rejection. >> >> InjectorJob: total number of urls rejected by filters: 0 >> InjectorJob: total number of urls injected after normalization and >> filtering: 11 >> > >Great. > > >> >> Nutch still doesn¹t parse any links. Any ideas? >> > >You can try the parsechecker tools. Do you get any outlinks? >http://wiki.apache.org/nutch/QuickStartparseChecker Yes I see lots of outlinks running "nutch parsechecker" against my seed.txt URLs. It’s a mystery why nutch doesn’t parse any of them during the crawl. Is it possible to enable a verbose mode to see if nutch sees outlinks during the crawl? > >> >> >> The nutch schema.xml doesn¹t work on my SOLR 4.6.0: >> >> IndexingJob: starting >> No IndexWriters activated - check your configuration >> > >You need to add indexer-solr to the plugin.includes property. Yup. I added that. No more SOLR errors. > >Lewis __________________________________________________________________________ This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

