I'm having the opposite problem, links that should get indexed aren't
showing up, and I'm having trouble figuring out why.  I'm trying to
use the command line options
(http://wiki.apache.org/nutch/CommandLineOptions) to dump out the data
in the segments & link db, but I'm not really seeing why items aren't
appearing in the index.

Anybody have other tools or examples or anything to help figure out
these types of problems?

-- Chris



On Fri, Dec 16, 2011 at 3:40 PM, remi tassing <[email protected]> wrote:
> Same question here, while (re)running the script, I see URLs that are
> supposed to be filtered out. Not sure how to make it right...
>
> On Friday, December 16, 2011, Christopher Gross <[email protected]> wrote:
>> http://wiki.apache.org/nutch/Crawl
>> This script no longer works.  See:echo "----- Index (Step 5 of $steps)
>> -----"$NUTCH_HOME/bin/nutch index crawl/NEWindexes crawl/crawldb
>> crawl/linkdb \   crawl/segments/*
>> The "index" call doesn't exist....so what does this line get
>> replacedwith?  Is there an updated runbot.sh script?  Has anyone
>> created a newone that will work?  I've done some changes on it, but I
>> just don'tknow what to do for this part.
>> Thanks!
>>
>> -- Chris
>>

Reply via email to