I'm having the opposite problem, links that should get indexed aren't showing up, and I'm having trouble figuring out why. I'm trying to use the command line options (http://wiki.apache.org/nutch/CommandLineOptions) to dump out the data in the segments & link db, but I'm not really seeing why items aren't appearing in the index.
Anybody have other tools or examples or anything to help figure out these types of problems? -- Chris On Fri, Dec 16, 2011 at 3:40 PM, remi tassing <[email protected]> wrote: > Same question here, while (re)running the script, I see URLs that are > supposed to be filtered out. Not sure how to make it right... > > On Friday, December 16, 2011, Christopher Gross <[email protected]> wrote: >> http://wiki.apache.org/nutch/Crawl >> This script no longer works. See:echo "----- Index (Step 5 of $steps) >> -----"$NUTCH_HOME/bin/nutch index crawl/NEWindexes crawl/crawldb >> crawl/linkdb \ crawl/segments/* >> The "index" call doesn't exist....so what does this line get >> replacedwith? Is there an updated runbot.sh script? Has anyone >> created a newone that will work? I've done some changes on it, but I >> just don'tknow what to do for this part. >> Thanks! >> >> -- Chris >>

