Hi, probably you had problems with parsing your data. Did you use crawl command? Following should be in there: content crawl_fetch crawl_generate crawl_parse parse_data parse_text Cheers, Nadine.
-----Ursprüngliche Nachricht----- Von: [email protected] [mailto:[email protected]] Gesendet: Montag, 2. März 2009 09:28 An: [email protected] Betreff: Input path doesnt exist : XYZ/crawl/segments/20090302092003/parse_data HI I tried the command #> bin/nutch invertlinks crawl/linkdb crawl/segments/* LinkDb: starting LinkDb: linkdb: crawl/linkdb LinkDb: URL normalize: true LinkDb: URL filter: true LinkDb: adding segment: crawl/segments/20090302090941 LinkDb: adding segment: crawl/segments/20090302091438 LinkDb: adding segment: crawl/segments/20090302092003 LinkDb: org.apache.hadoop.mapred.InvalidInputException: Input path doesnt exist : /root/nutch-0.9/crawl/segments/20090302092003/parse_data All steps before are ok, why is "parse_data" not in the directory and what should be inside? dir of my directory: XYZ#> cd crawl/segments/20090302092003 XYZ/crawl/segments/20090302092003 # ls -l drwxr-xr-x 2 root root 4096 Mar 2 09:20 crawl_generate in crawl_generate: .part-00000.crc part-00000 Anyone can help me? Joerg
