On Jan 19, 2007, at 4:29 AM, Andrzej Bialecki wrote: > > Could you guys come up with exact data that causes this bug > (primarily I'm interested in a seed list, because then I can see > that you simply use the crawl tool, and finally try to run > mergesegs). Thanks!
My seed list is simply my personal website http://variogr.am/, one line in urls/urls I don't use the crawl command, I use a variation on the whole- internet script from the wiki. The crash is at mergesegs. rm -rf crawl bin/nutch inject crawl/crawldb urls/ # a single URL is in urls/urls bin/nutch generate crawl/crawldb crawl/segments bin/nutch fetch crawl/segments/2007... bin/nutch updatedb crawl/crawldb crawl/segments/2007... # generate a new segment with 10 URIs bin/nutch generate crawl/crawldb crawl/segments -topN 10 bin/nutch fetch crawl/segments/2007... # new segment bin/nutch updatedb crawl/crawldb crawl/segments/2007... # new segment # merge the segments and index bin/nutch mergesegs crawl/merged -dir crawl/segments .. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers