You can look at class org.apache.nutch.indexer.basic.BasicIndexingFilter. The fields indexed into Lucene index by Nutch are: host, site, url, content, anchors and title. In the class you will find more details of what field is indexed under which circumstances.
I´ve accessed then directly through Lucene API using NutchDocumentAnalyzer. On 11/1/05, Byron Miller <[EMAIL PROTECTED]> wrote: > > I'm looking to see if i can pull a meta description in > lieu of summary for some content and wondering if this > is indexed - is there an easy way to see the fields > indexed by default and how they're exposed through > nutch bean? > -- "Minds are like parachutes, they work best when open." Bruno Patini Furtado Software Developer webpage: www.bpfurtado.net <http://www.bpfurtado.net> blog: http://www.livejournal.com/users/bpfurtado/
