You can fetch it but it is not pretty. It is just a SequenceFileInputFormat: http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html
Look in the org.apache.nutch.crawl.Crawl class and specifically how it uses the Indexer. Kindly //Marcus On Tue, Jun 17, 2008 at 3:57 PM, beansproud <[EMAIL PROTECTED]> wrote: > > oh, you are right. > thanks > > > POIRIER David wrote: > > > > When executing a crawl, Nutch creates segments, based on the crawel > > depth if I'm not mistaking, in which the fetched content is stored. For > > example, if crawling a web site named site-xyz, into the directory > > $nutch_home/crawls/crawl-xyz, you will find the segments into the > > following directory: $nutch_home/crawls/crawl-xyz/segments. For each > > segment directory you will find a content directory. > > > > To be honest, I don't think you can directly access the stored content > > found in thoses directories, the idea being to index it and not > > necesserely store it. > > > > David > > > > > > > > -----Original Message----- > > From: beansproud [mailto:[EMAIL PROTECTED] > > Sent: lundi, 16. juin 2008 16:42 > > To: [email protected] > > Subject: where nutch store crawled data > > > > > > Hi, > > I'm fresh for nutch.And when I use nutch for crawling pages.I can > > get > > the crawled data by using the command : nutch readseg. > > My question is can I get the data directly ? I just can't find where > > nutch put them. > > Can anybody tell me ? > > Thanks very much! > > -- > > View this message in context: > > http://www.nabble.com/where-nutch-store-crawled-data-tp17865961p17865961 > > .html > > Sent from the Nutch - User mailing list archive at Nabble.com. > > > > > > > > -- > View this message in context: > http://www.nabble.com/where-nutch-store-crawled-data-tp17865961p17905486.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 [EMAIL PROTECTED] http://www.tailsweep.com/ http://blogg.tailsweep.com/
