You can look at class org.apache.nutch.indexer.basic.BasicIndexingFilter.

The fields indexed into Lucene index by Nutch are: host, site, url, content,
anchors and title. In the class you will find more details of what field is
indexed under which circumstances.

I´ve accessed then directly through Lucene API using NutchDocumentAnalyzer.



On 11/1/05, Byron Miller <[EMAIL PROTECTED]> wrote:
>
> I'm looking to see if i can pull a meta description in
> lieu of summary for some content and wondering if this
> is indexed - is there an easy way to see the fields
> indexed by default and how they're exposed through
> nutch bean?
>



--
"Minds are like parachutes, they work best when open."

Bruno Patini Furtado
Software Developer
webpage: www.bpfurtado.net <http://www.bpfurtado.net>
blog: http://www.livejournal.com/users/bpfurtado/

Reply via email to