Ok, My content.limit is 65536 (default) and I am storing the content. I would assume this since this is in my nutch-default.xml and I did not override those setting in my nutch-site.xml
I can manipulate my output in Drupal using print substr($snippet, 0, 800). My Solr is setup to accept <copyField source="body" dest="teaser" maxChars="800"/> as well is my schem for nutch. So, I guess now I should run another nutch instance and see the results. If I'm missing something obvious let me know. Thanks for your help. I really appreciate your time. It's not going to waste. -----Original Message----- From: Dogacan Guney [mailto:[email protected]] Sent: Wednesday, November 10, 2010 12:26 PM To: [email protected] Subject: Re: Nutch Body Length On Nov 10, 2010, at 12:19 PM, Eric Martin wrote: > I am using Solr 1.4.0 as my index, Nutch 1.2 as my crawler and Drupal 6.x as > my interface. My objective is to increase my teaser/description in my search > results. > > > > My obstacles are: > > > > 1.) Does nutch pull the entire page when it crawls and store it? (If it > does, then I can re-index crawled documents and get more description into my > search results. That would be easy!) > > 2.) Does nutch truncate the page? If so, I can't find out where so I can > modify it to get the character length I need. > > You should look at http.content.length. If a document is longer than the value specified with that option, then nutch truncates the page. Also, make sure you store "content" if you want to access it later. > > I guess my biggest question is, does nutch pull and keep the entire crawled > page? If so, I know to look to Solr configuration to get my desired search > results. > > Thanks > > > > Eric > > >

