Solr supports summary. If I am not mistaken it is named "highlighting" and you can set appropriate flag when doing request, specifiy fragment size,etc. Response will contain summaries for docs.
The only reqierment is that content must be stored. So you should specify in the schema description stored="true" By default content is stored in the "text" field if you don't ovveride it. Cannot say about doing filtration in solr. I would suggest you to filter such pages by means of Nutch. Alexander 2008/10/22 William Ortiz <[EMAIL PROTECTED]> > Hi: > > I am a newcomer to the Solr/Nutch community and I have some questions. > > I was able to hook up Nutch for search and Solr for indexing, but I would > like to know how (if it is possible) to surface something similar to the > Nutch result summary in Solr. Should I store the value of the 'content' > field in Solr and create the summary from it? > > Also, Nutch fetches some links that return a 404 error, and these are then > indexed by Solr. Is there some way that I can filter these results in the > SolrIndexer class before they are indexed? Is it possible to get either the > Status, Metadata, Signature in the SolrIndexer? > The last few fields I mentioned can be seen when doing a dump of the > database and looking at the results... > > http://xxxx.xxxx..com/xxx/xxx-xxx Version: 6 > Status: 1 (db_unfetched) > Fetch time: Tue Oct 21 10:45:36 EDT 2008 > Modified time: Wed Dec 31 19:00:00 EST 1969 > Retries since fetch: 1 > Retry interval: 2592000 seconds (30 days) > Score: 7.0573883E-6 > Signature: null > Metadata: _pst_:blocked(23), lastModified=0 > > http://xxxx.xxxx..com/xxx/xxx-xxx Version: 6 > Status: 3 (db_gone) > Fetch time: Fri Dec 05 09:04:13 EST 2008 > Modified time: Wed Dec 31 19:00:00 EST 1969 > Retries since fetch: 0 > Retry interval: 3888000 seconds (45 days) > Score: 6.4350065E-4 > Signature: null > Metadata: _pst_:notfound(14), lastModified=0: http://xxxx.xxxx. > .com/xxx/xxx-xxx > > Thank you in advance for your help. > > William J Ortiz > -- Best Regards Alexander Aristov
