Hi Sebastian, I think the problem is with the fetch not returning any results. I checked your suggestion, but it did not work.
Cheers, Leo On Thu, 2011-07-21 at 22:16 +0200, Sebastian Nagel wrote: > Hi Leo, hi Lewis, > > > From the times both the fetching and parsing took, I suspecting that maybe > > Nutch didn't actually fetch the URL, > > This may be the reason. "Empty" segments may break some of the crawler steps. > > But if I'm not wrong it looks like the updatedb-command > is not quite correct: > > > llist@LeosLinux:~/nutchData$ /usr/share/nutch/runtime/local/bin/nutch > > updatedb /home/llist/nutchData/crawl/crawldb > > -dir /home/llist/nutchData/crawl/segments/20110721122519 > > CrawlDb update: starting at 2011-07-21 12:28:03 > > CrawlDb update: db: /home/llist/nutchData/crawl/crawldb > > CrawlDb update: segments: > > [file:/home/llist/nutchData/crawl/segments/20110721122519/parse_text, > > file:/home/llist/nutchData/crawl/segments/20110721122519/content, > > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_parse, > > file:/home/llist/nutchData/crawl/segments/20110721122519/parse_data, > > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_fetch, > > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_generate] > > CrawlDb update: additions allowed: true > > As for other commands reading segments there are two ways two > add segments as arguments: 1) all segments enumarated or 2) via -dir the > parent directory > of all segments. See: > > % $NUTCH_HOME/bin/nutch updatedb > Usage: CrawlDb <crawldb> (-dir <segments> | <seg1> <seg2> ...) [-force] > [-normalize] [-filter] > [-noAdditions] > crawldb CrawlDb to update > -dir segments parent directory containing all segments to update > from > seg1 seg2 ... list of segment names to update from > > Try your updatedb command without -dir, it should work. > > Sebastian