By the way, when I modify the urlfilter, it works until I recompiled, I don't know why, but it just that.
Winton Davies-4 wrote: > > I have a follow up question - is it possible to directly write to the > Crawl > DB. I have several million HTML pages that are stored in a single > concatenated flat file, and I'd like to just run a utility over them to > feed > them to Nutch parsing/indexing rather than have to dump as individual > files. > Looking at the API documentation I'd couldnt find any obvious > capabilities. > > I've no idea if the fetch -> crawldb does the parse and url extraction > before it writes it anyway. If it's not possible, then it doesnt matter, > but > if it's possible, it would save having to write out lots of files. > > Winton > > > > On Tue, Jun 17, 2008 at 6:57 AM, beansproud <[EMAIL PROTECTED]> > wrote: > >> >> oh, you are right. >> thanks >> >> >> POIRIER David wrote: >> > >> > When executing a crawl, Nutch creates segments, based on the crawel >> > depth if I'm not mistaking, in which the fetched content is stored. For >> > example, if crawling a web site named site-xyz, into the directory >> > $nutch_home/crawls/crawl-xyz, you will find the segments into the >> > following directory: $nutch_home/crawls/crawl-xyz/segments. For each >> > segment directory you will find a content directory. >> > >> > To be honest, I don't think you can directly access the stored content >> > found in thoses directories, the idea being to index it and not >> > necesserely store it. >> > >> > David >> > >> > >> > >> > -----Original Message----- >> > From: beansproud [mailto:[EMAIL PROTECTED] >> > Sent: lundi, 16. juin 2008 16:42 >> > To: [email protected] >> > Subject: where nutch store crawled data >> > >> > >> > Hi, >> > I'm fresh for nutch.And when I use nutch for crawling pages.I can >> > get >> > the crawled data by using the command : nutch readseg. >> > My question is can I get the data directly ? I just can't find >> where >> > nutch put them. >> > Can anybody tell me ? >> > Thanks very much! >> > -- >> > View this message in context: >> > >> http://www.nabble.com/where-nutch-store-crawled-data-tp17865961p17865961 >> > .html >> > Sent from the Nutch - User mailing list archive at Nabble.com. >> > >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/where-nutch-store-crawled-data-tp17865961p17905486.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/where-nutch-store-crawled-data-tp17865961p18021969.html Sent from the Nutch - User mailing list archive at Nabble.com.
