On Nov 10, 2010, at 12:19 PM, Eric Martin wrote:

> I am using Solr 1.4.0 as my index, Nutch 1.2 as my crawler and Drupal 6.x as
> my interface. My objective is to increase my teaser/description in my search
> results.
> 
> 
> 
> My obstacles are:
> 
> 
> 
> 1.)    Does nutch pull the entire page when it crawls and store it? (If it
> does, then I can re-index crawled documents and get more description into my
> search results. That would be easy!)
> 
> 2.)    Does nutch truncate the page? If so, I can't find out where so I can
> modify it to get the character length I need.
> 
> 

You should look at http.content.length. If a document is longer than the value
specified with that option, then nutch truncates the page. Also, make sure 
you store "content" if you want to access it later.

> 
> I guess my biggest question is, does nutch pull and keep the entire crawled
> page? If so, I know to look to Solr configuration to get my desired search
> results.
> 
> Thanks
> 
> 
> 
> Eric
> 
> 
> 

Reply via email to