Francesco,

Yes, it's a lucene index.
May this document can help you..
http://wiki.media-style.com/display/nutchDocu/Home

Stefan

Am 15.06.2005 um 18:41 schrieb Francesco Cipriani:

Hi all,
I'm trying to understand how Nutch stores its indexes, analyzing the
source code. But it's not easy and I ask your help.
I saw that each segment is composed of some data structures, such as the
fetchlist entries, the parse_data etc, and they are handled by the
ArrayFile class.
ArrayFile inherits from MapFile and uses simple integers as keys, so
the index we find in each segment subdir is composed by pairs
<integer, position in the data file>
But where is an index like <url -> <segment, position inside segment> > ?
I see that each segment has an index dir, is that index a Lucene
one? And how is it related to the index in the "index" dir at root level?
(the same level as the segment dir)
Where does Nutch look at to retrieve the content of a page, given
its url?

Thanks
--
Francesco



---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net


Reply via email to