When executing a crawl, Nutch creates segments, based on the crawel depth if I'm not mistaking, in which the fetched content is stored. For example, if crawling a web site named site-xyz, into the directory $nutch_home/crawls/crawl-xyz, you will find the segments into the following directory: $nutch_home/crawls/crawl-xyz/segments. For each segment directory you will find a content directory.
To be honest, I don't think you can directly access the stored content found in thoses directories, the idea being to index it and not necesserely store it. David -----Original Message----- From: beansproud [mailto:[EMAIL PROTECTED] Sent: lundi, 16. juin 2008 16:42 To: [email protected] Subject: where nutch store crawled data Hi, I'm fresh for nutch.And when I use nutch for crawling pages.I can get the crawled data by using the command : nutch readseg. My question is can I get the data directly ? I just can't find where nutch put them. Can anybody tell me ? Thanks very much! -- View this message in context: http://www.nabble.com/where-nutch-store-crawled-data-tp17865961p17865961 .html Sent from the Nutch - User mailing list archive at Nabble.com.
