NullPointerException while running ./nutch readdb after initial inject

Marek Bachmann Mon, 16 May 2011 06:54:32 -0700

Hello people,

I was trying to do an manual crawl like described in the nutch tutorialon http://wiki.apache.org/nutch/NutchTutorial

First of all: If I do a crawl, with the same seed urls, using the "nutchcrawl" command, everything works fine.


Here's what I was trying to do:

1.) Trying to create a new crawlDB with:

    ./nutch inject crawl/crawldb seedUrls

The directory crawl was empty and in the directory seedUrls isone file "urls" with this content:

            http://www.uni-kassel.de
            http://portal.uni-kassel.de
            http://www.asta-kassel.de
            http://www.uni-kassel.de/fb16
            http://www.cs.uni-kassel.de
            http://www.studentenwerk-kassel.de

    The command runs without any error:
    ./nutch inject crawl/crawldb seedUrls
    Injector: starting
    Injector: crawlDb: crawl/crawldb
    Injector: urlDir: seedUrls
    Injector: Converting injected urls to crawl db entries.
    Injector: Merging injected urls into crawl db.
    Injector: done

    After that a new directory with the name crawldb exists in crawl/

2.) Trying to generate new segments:

    ./nutch generate crawl/crawldb/ crawl/segments -noFilter
    Generator: Selecting best-scoring urls due for fetch.
    Generator: starting
    Generator: filtering: false
    Generator: normalizing: true
    Generator: jobtracker is 'local', generating exactly one partition.
    Generator: 0 records selected for fetching, exiting ...

So I am wondering why the generator does not create segements. It saysthat it had 0 records selected for fetching. It seems to me, that theinjector hadn't injected the urls into the db.


When I run:
    ./nutch readdb crawl/crawldb/ -stats

It outputs:
    CrawlDb statistics start: crawl/crawldb/
    Statistics for CrawlDb: crawl/crawldb/
    Exception in thread "main" java.lang.NullPointerException

atorg.apache.nutch.crawl.CrawlDbReader.processStatJob(CrawlDbReader.java:352)atorg.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:502)


Anybody has an idea what am I doing wrong?

Is there any possibility to get more verbose output / logging from thecommands?

NullPointerException while running ./nutch readdb after initial inject

Reply via email to