RE: where nutch store crawled data

POIRIER David Mon, 16 Jun 2008 08:00:05 -0700

When executing a crawl, Nutch creates segments, based on the crawel
depth if I'm not mistaking, in which the fetched content is stored. For
example, if crawling a web site named site-xyz, into the directory
$nutch_home/crawls/crawl-xyz, you will find the segments into the
following directory: $nutch_home/crawls/crawl-xyz/segments. For each
segment directory you will find a content directory.


To be honest, I don't think you can directly access the stored content
found in thoses directories, the idea being to index it and not
necesserely store it.

David



-----Original Message-----
From: beansproud [mailto:[EMAIL PROTECTED] 
Sent: lundi, 16. juin 2008 16:42
To: [email protected]
Subject: where nutch store crawled data


Hi,
    I'm fresh for nutch.And when I use nutch for crawling pages.I can
get
the crawled data by using the command : nutch readseg.
    My question is can I get the data directly ? I just can't find where
nutch put them.
    Can anybody tell me ?
    Thanks very much!
-- 
View this message in context:
http://www.nabble.com/where-nutch-store-crawled-data-tp17865961p17865961
.html
Sent from the Nutch - User mailing list archive at Nabble.com.

RE: where nutch store crawled data

Reply via email to