Hi everybody, 

I have worked on Nutch for some days but can not make it work. Below is some 
output when crawling with nutch crawl. I have no idea why the fetcher failed 
with NullPointerException. I have made some searching but find no answer with 
this fail. Anyone can help me ?

 

Thanks for reading.

 

I’m using Solaris 10 Sparc, running with SF V440. Here’s my configs:

 

The url dir (/export/home/nutch/urls) have 2 file:

 

      Netmode: contains: http://netmode.vietnamnet.vn

      Localhost: contains: http://localhost:8080

 

The crawl-urlfilter.txt contains this:

 

# accept hosts in MY.DOMAIN.NAME

+^http://netmode.vietnamnet.vn

+^http://localhost:8080/nutch

 

When running with this shell script:

 

crawls=/export/home/nutch/crawls

urldir=/export/home/nutch/urls

 

rm -r $crawls

nutch crawl $urldir -dir $crawls -depth 1

 

It shows:

 

crawl started in: /export/home/nutch/crawls

rootUrlDir = /export/home/nutch/urls

threads = 10

depth = 1

Injector: starting

Injector: crawlDb: /export/home/nutch/crawls/crawldb

Injector: urlDir: /export/home/nutch/urls

Injector: Converting injected urls to crawl db entries.

Injector: Merging injected urls into crawl db.

Injector: done

Generator: starting

Generator: segment: /export/home/nutch/crawls/segments/20070110144113

Generator: Selecting best-scoring urls due for fetch.

Generator: Partitioning selected urls by host, for politeness.

Generator: done.

Fetcher: starting

Fetcher: segment: /export/home/nutch/crawls/segments/20070110144113

Fetcher: threads: 10

fetching http://localhost:8080/nutch

fetching http://netmode.vietnamnet.vn/

fetch of http://localhost:8080/nutch failed with: java.lang.NullPointerException

fetch of http://netmode.vietnamnet.vn/ failed with: 
java.lang.NullPointerException

Fetcher: done

CrawlDb update: starting

CrawlDb update: db: /export/home/nutch/crawls/crawldb

CrawlDb update: segment: /export/home/nutch/crawls/segments/20070110144113

CrawlDb update: Merging segment data into db.

CrawlDb update: done

LinkDb: starting

LinkDb: linkdb: /export/home/nutch/crawls/linkdb

LinkDb: adding segment: /export/home/nutch/crawls/segments/20070110144113

LinkDb: done

Indexer: starting

Indexer: linkdb: /export/home/nutch/crawls/linkdb

Indexer: adding segment: /export/home/nutch/crawls/segments/20070110144113

Optimizing index.

Indexer: done

Dedup: starting

Dedup: adding indexes in: /export/home/nutch/crawls/indexes

Dedup: done

Adding /export/home/nutch/crawls/indexes/part-00000

crawl finished: /export/home/nutch/crawls

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to