By the way, when I modify the urlfilter, it works until I recompiled, I don't
know why, but it just that.


Winton Davies-4 wrote:
> 
> I have a follow up question - is it possible to directly write to the
> Crawl
> DB. I have several million HTML pages that are stored in a  single
> concatenated flat file, and I'd like to just run a utility over them to
> feed
> them to Nutch parsing/indexing rather than have to dump as individual
> files.
> Looking at the API documentation I'd couldnt find any obvious
> capabilities.
> 
> I've no idea if the fetch -> crawldb does the parse and url extraction
> before it writes it anyway. If it's not possible, then it doesnt matter,
> but
> if it's possible, it would save having to write out lots of files.
> 
> Winton
> 
> 
> 
> On Tue, Jun 17, 2008 at 6:57 AM, beansproud <[EMAIL PROTECTED]>
> wrote:
> 
>>
>> oh, you are right.
>> thanks
>>
>>
>> POIRIER David wrote:
>> >
>> > When executing a crawl, Nutch creates segments, based on the crawel
>> > depth if I'm not mistaking, in which the fetched content is stored. For
>> > example, if crawling a web site named site-xyz, into the directory
>> > $nutch_home/crawls/crawl-xyz, you will find the segments into the
>> > following directory: $nutch_home/crawls/crawl-xyz/segments. For each
>> > segment directory you will find a content directory.
>> >
>> > To be honest, I don't think you can directly access the stored content
>> > found in thoses directories, the idea being to index it and not
>> > necesserely store it.
>> >
>> > David
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: beansproud [mailto:[EMAIL PROTECTED]
>> > Sent: lundi, 16. juin 2008 16:42
>> > To: [email protected]
>> > Subject: where nutch store crawled data
>> >
>> >
>> > Hi,
>> >     I'm fresh for nutch.And when I use nutch for crawling pages.I can
>> > get
>> > the crawled data by using the command : nutch readseg.
>> >     My question is can I get the data directly ? I just can't find
>> where
>> > nutch put them.
>> >     Can anybody tell me ?
>> >     Thanks very much!
>> > --
>> > View this message in context:
>> >
>> http://www.nabble.com/where-nutch-store-crawled-data-tp17865961p17865961
>> > .html
>> > Sent from the Nutch - User mailing list archive at Nabble.com.
>> >
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/where-nutch-store-crawled-data-tp17865961p17905486.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/where-nutch-store-crawled-data-tp17865961p18021969.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to