Re: my own crawlscript.sh

Matthias W. Mon, 08 Dec 2008 04:52:32 -0800


Dennis Kubes-2 wrote:
> 
> Just having the urls isn't the same as having an index.  You would still 
> need to crawl them.  You can inject your url list into a clean crawldb 
> and fetch only those urls with the inject, generate, fetch commands. 
> Then you can use the index command to index them.
> 
OK, thanks.


Dennis Kubes-2 wrote:
> 
> You could check size.  You could also check it programatically through 
> lucene.
> 
> Dennis
> 
You mean that I should check the size of my crawl-folder and its elements?
But this would be very buggy? What if an error occures during the crawl and
the crawl has already the required size? Or is this impossible?

And how can I check this programmatically?
-- 
View this message in context: 
http://www.nabble.com/my-own-crawlscript.sh-tp20853413p20894745.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: my own crawlscript.sh

Reply via email to