parse_data

Höchstötter Nadine Mon, 02 Mar 2009 01:42:11 -0800

Hi,
probably you had problems with parsing your data. Did you use crawl command?
Following should be in there:
content  crawl_fetch  crawl_generate  crawl_parse  parse_data  parse_text
Cheers, Nadine.


-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]] 
Gesendet: Montag, 2. März 2009 09:28
An: [email protected]
Betreff: Input path doesnt exist : XYZ/crawl/segments/20090302092003/parse_data

HI

I tried the command

#> bin/nutch invertlinks crawl/linkdb crawl/segments/*
LinkDb: starting
LinkDb: linkdb: crawl/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment: crawl/segments/20090302090941
LinkDb: adding segment: crawl/segments/20090302091438
LinkDb: adding segment: crawl/segments/20090302092003
LinkDb: org.apache.hadoop.mapred.InvalidInputException: Input path  
doesnt exist : /root/nutch-0.9/crawl/segments/20090302092003/parse_data

All steps before are ok, why is "parse_data" not in the directory and  
what should be inside?

dir of my directory:
XYZ#> cd crawl/segments/20090302092003
XYZ/crawl/segments/20090302092003 # ls -l
drwxr-xr-x 2 root root 4096 Mar  2 09:20 crawl_generate

in crawl_generate:
.part-00000.crc  part-00000

Anyone can help me?

Joerg

AW: Input path doesnt exist : XYZ/crawl/segments/20090302092003/parse_data

Reply via email to