1) I resolved the issues with solrindex. It turned out to be a matter of
adding all the nutch schema-specific fields to solr's schema.xml.  there was
one gotcha which is that the latest solr schema does not have a default
fieldtype "text" as in Nutch 1.3/schema.xml; you must use "text_general".  A
comment for developers is that the use case of copying the nutch schema to
overwrite the solr one only works for people who are beginning their
indexing with a crawl.  More detailed instructions on how to modify
solr/schema.xml for nutch would be helpful, or better yet, a script to add
the appropriate fields.

2) is there a way to tell Nutch to index everything at a given site?  I am
crawling a couple of my own sites and it seems rather clumsy just to give
Nutch a big "TopN."  wouldn't an "all" value be helpful?

Reply via email to