> Yes, your description makes sense for me. So if I want to fetch a list with > only 3k urls, I just have to run: > ./nutch parse $seg -topN 3000 > > right? >
yes > > But I still don't get this message: > > 2011-08-16 13:55:55,087 INFO crawl.Generator - Host or domain > cms.uni-kassel.de has more than 3000 URLs for all 1 segments - skipping > > What is meant by "more than 3000 URLs for all 1 segments"? Skipping means > then, that "it will skip after 3k urls"? > yes -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

