RE: Nutch Crawl a Specific List Of URLs (150K)

2013-12-30 Thread Markus Jelsma
Hi, You ran one crawl cycle. Depending on the generator and fetcher settings you are not guaranteerd to fetch 200.000 URL's with only topN specified. Check the logs, the generator will tell you if there are too many URL's for a host or domain. Also check all fetcher logs, it will tell you how

[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2013-12-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859275#comment-13859275 ] Tejas Patil commented on NUTCH-1687: This is one good point by [~tiennm]. Although

Build failed in Jenkins: Nutch-trunk #2469

2013-12-30 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/2469/ -- [...truncated 3407 lines...] init: [mkdir] Created dir: https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-host/classes [mkdir] Created dir:

[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2013-12-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859358#comment-13859358 ] Tejas Patil commented on NUTCH-1687: Created a review request:

[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2013-12-30 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859364#comment-13859364 ] Tien Nguyen Manh commented on NUTCH-1687: - It is nice! Pick queue in Round Robin