Hello people,

I was trying to do an manual crawl like described in the nutch tutorial on http://wiki.apache.org/nutch/NutchTutorial

First of all: If I do a crawl, with the same seed urls, using the "nutch crawl" command, everything works fine.

Here's what I was trying to do:

1.) Trying to create a new crawlDB with:

    ./nutch inject crawl/crawldb seedUrls

The directory crawl was empty and in the directory seedUrls is one file "urls" with this content:
            http://www.uni-kassel.de
            http://portal.uni-kassel.de
            http://www.asta-kassel.de
            http://www.uni-kassel.de/fb16
            http://www.cs.uni-kassel.de
            http://www.studentenwerk-kassel.de

    The command runs without any error:
    ./nutch inject crawl/crawldb seedUrls
    Injector: starting
    Injector: crawlDb: crawl/crawldb
    Injector: urlDir: seedUrls
    Injector: Converting injected urls to crawl db entries.
    Injector: Merging injected urls into crawl db.
    Injector: done

    After that a new directory with the name crawldb exists in crawl/

2.) Trying to generate new segments:

    ./nutch generate crawl/crawldb/ crawl/segments -noFilter
    Generator: Selecting best-scoring urls due for fetch.
    Generator: starting
    Generator: filtering: false
    Generator: normalizing: true
    Generator: jobtracker is 'local', generating exactly one partition.
    Generator: 0 records selected for fetching, exiting ...

So I am wondering why the generator does not create segements. It says that it had 0 records selected for fetching. It seems to me, that the injector hadn't injected the urls into the db.

When I run:
    ./nutch readdb crawl/crawldb/ -stats

It outputs:
    CrawlDb statistics start: crawl/crawldb/
    Statistics for CrawlDb: crawl/crawldb/
    Exception in thread "main" java.lang.NullPointerException
at org.apache.nutch.crawl.CrawlDbReader.processStatJob(CrawlDbReader.java:352) at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:502)

Anybody has an idea what am I doing wrong?

Is there any possibility to get more verbose output / logging from the commands?








Reply via email to