Gabriele Kahlout wrote: > > On Wed, May 4, 2011 at 6:22 PM, Kelvin <[email protected]> wrote: > >> Hi Gabriele, >> >> Thank you for your help. I am sorry, I am a newbie to nutch. If I crawl >> the >> whole wikipedia, the whole wikipedia will be stored in the crawldb ofmy >> server? >> > > i think so (I'm also a newbie). > wikipedia will get stored in the segments. Once indexed (and did all db update stuff) you should delete them. Only information relating to the fetch/parse status of each link gets saved to crawldb. The lnk structure (in linkdb) should be maintained in linkdb.
-- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-custom-crawl-using-Nutch-tp2899270p3081808.html Sent from the Nutch - User mailing list archive at Nabble.com.

