Hi,
You ran one crawl cycle. Depending on the generator and fetcher settings you
are not guaranteerd to fetch 200.000 URL's with only topN specified. Check the
logs, the generator will tell you if there are too many URL's for a host or
domain. Also check all fetcher logs, it will tell you how
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859275#comment-13859275
]
Tejas Patil commented on NUTCH-1687:
This is one good point by [~tiennm]. Although
See https://builds.apache.org/job/Nutch-trunk/2469/
--
[...truncated 3407 lines...]
init:
[mkdir] Created dir:
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-host/classes
[mkdir] Created dir:
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859358#comment-13859358
]
Tejas Patil commented on NUTCH-1687:
Created a review request:
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859364#comment-13859364
]
Tien Nguyen Manh commented on NUTCH-1687:
-
It is nice!
Pick queue in Round Robin
5 matches
Mail list logo