> Yes, your description makes sense for me. So if I want to fetch a list with
> only 3k urls, I just have to run:
> ./nutch parse $seg -topN 3000
>
> right?
>

yes

>
> But I still don't get this message:
>
> 2011-08-16 13:55:55,087 INFO  crawl.Generator - Host or domain
> cms.uni-kassel.de has more than 3000 URLs for all 1 segments - skipping
>
> What is meant by "more than 3000 URLs for all 1 segments"? Skipping means
> then, that "it will skip after 3k urls"?
>

 yes



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to