Naess, Ronny wrote:
> No because the content is not stored, only indexed in the index it self.
> As I have found out the content is cached elswere and I am trying to
> figure out how to get it from a Lucene client just now.

Lucene might store the full text, but Nutch doesn't use this (for 
performance reasons). Whenever the full text is needed, it's retrieved 
from Nutch segment data. Please see the logic in o.a.n.s.FetchedSegment 
for details - this process doesn't use Lucene at all, it simply 
retrieves records from Hadoop MapFile using URL as document ID.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to