Build failed in Jenkins: Nutch-trunk #2466

2013-12-27 Thread Apache Jenkins Server
See -- [...truncated 3386 lines...] init: [mkdir] Created dir: [mkdir] Created dir:

Nutch Crawl a Specific List Of URLs (150K)

2013-12-27 Thread Bin Wang
Hi, I have a very specific list of URLs, which is about 140K URLs. I switch off the `db.update.additions.allowed` so it will not update the crawldb... and I was assuming I can feed all the URLs to Nutch, and after one round of fetching, it will finish and leave all the raw HTML files in the segme