Re: 1) success 2) how to tell Nutch "index everything"

Markus Jelsma Wed, 26 Oct 2011 07:48:05 -0700


On Wednesday 26 October 2011 16:37:14 Fred Zimmerman wrote:
> 1) I resolved the issues with solrindex. It turned out to be a matter of
> adding all the nutch schema-specific fields to solr's schema.xml.  there
> was one gotcha which is that the latest solr schema does not have a
> default fieldtype "text" as in Nutch 1.3/schema.xml; you must use
> "text_general".


You're free to use any nameconvention you want in Solr. We ship a complete 
working Solr schema. The fieldType's name doesn't really matter. We do not 
intend to ship an advanced schema, developers must make changes that are 
appropriate for their specific environment, use-cases and scenario.
 
> A comment for developers is that the use case of copying
> the nutch schema to overwrite the solr one only works for people who are
> beginning their indexing with a crawl.  More detailed instructions on how
> to modify solr/schema.xml for nutch would be helpful, or better yet, a
> script to add the appropriate fields.

The Solr schema provided with Nutch tells you exactly which fields are used. 
Detailed instructions on how to work it with Solr is out-of-scope in my 
opinion.
You're ofcourse free to make changes to the wiki :)

> 
> 2) is there a way to tell Nutch to index everything at a given site?  I am
> crawling a couple of my own sites and it seems rather clumsy just to give
> Nutch a big "TopN."  wouldn't an "all" value be helpful?

Only way to do this is keep running a crawl cycle until all existing and urls-
to-be-discovered are exhausted until fetch interval tells the generator to 
refetch.

Cheers


-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: 1) success 2) how to tell Nutch "index everything"

Reply via email to