Hi:

I am a newcomer to the Solr/Nutch community and I have some questions.

I was able to hook up Nutch for search and Solr for indexing, but I would like to know how (if it is possible) to surface something similar to the Nutch result summary in Solr. Should I store the value of the 'content' field in Solr and create the summary from it?

Also, Nutch fetches some links that return a 404 error, and these are then indexed by Solr. Is there some way that I can filter these results in the SolrIndexer class before they are indexed? Is it possible to get either the Status, Metadata, Signature in the SolrIndexer? The last few fields I mentioned can be seen when doing a dump of the database and looking at the results...

http://xxxx.xxxx..com/xxx/xxx-xxx     Version: 6
Status: 1 (db_unfetched)
Fetch time: Tue Oct 21 10:45:36 EDT 2008
Modified time: Wed Dec 31 19:00:00 EST 1969
Retries since fetch: 1
Retry interval: 2592000 seconds (30 days)
Score: 7.0573883E-6
Signature: null
Metadata: _pst_:blocked(23), lastModified=0

http://xxxx.xxxx..com/xxx/xxx-xxx    Version: 6
Status: 3 (db_gone)
Fetch time: Fri Dec 05 09:04:13 EST 2008
Modified time: Wed Dec 31 19:00:00 EST 1969
Retries since fetch: 0
Retry interval: 3888000 seconds (45 days)
Score: 6.4350065E-4
Signature: null
Metadata: _pst_:notfound(14), lastModified=0: http://xxxx.xxxx..com/xxx/xxx-xxx

Thank you in advance for your help.

William J Ortiz

Reply via email to