On Jan 19, 2007, at 4:29 AM, Andrzej Bialecki wrote:
>

> Could you guys come up with exact data that causes this bug  
> (primarily I'm interested in a seed list, because then I can see  
> that you simply use the crawl tool, and finally try to run  
> mergesegs). Thanks!

My seed list is simply my personal website http://variogr.am/, one  
line in urls/urls

I don't use the crawl command, I use a variation on the whole- 
internet script from the wiki.  The crash is at mergesegs.

rm -rf crawl
bin/nutch inject crawl/crawldb urls/  # a single URL is in urls/urls
bin/nutch generate crawl/crawldb crawl/segments
bin/nutch fetch crawl/segments/2007...
bin/nutch updatedb crawl/crawldb crawl/segments/2007...

# generate a new segment with 10 URIs
bin/nutch generate crawl/crawldb crawl/segments -topN 10
bin/nutch fetch crawl/segments/2007... # new segment
bin/nutch updatedb crawl/crawldb crawl/segments/2007... # new segment

# merge the segments and index
bin/nutch mergesegs crawl/merged -dir crawl/segments
..




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to