Hi Wildan, the example you posted doesn't show a readseg command. You're doing a crawl which tries to read the URLs to be injected from the crawl-tobethink/segments/20090306002848 directory.
If you want to read segment information, use a command like the following: bin/nutch readseg -dump crawl-tobethink/segments/20090306002848 crawl-tobethink/dumpsegment In this case the dump would be written to crawl-tobethink/dumpsegment, but you can specify here any directory you like. You can see all options of the readseg command by typing bin/nutch readseg Kind regards, Martina -----Ursprüngliche Nachricht----- Von: W [mailto:wilda...@gmail.com] Gesendet: Freitag, 6. März 2009 12:47 An: nutch-user@lucene.apache.org Betreff: readseg error Hello Nutch User, Just read a tutorial how to get information from segment an then i got error when running readseg command : Can any body tell me why this is happen ? wil...@tobethink:/opt/nutch-trunk$ ./bin/nutch crawl -dump crawl-tobethink/segments/20090306002848/crawl started in: crawl-20090306184109 rootUrlDir = crawl-tobethink/segments/20090306002848 threads = 10 depth = 5 Injector: starting Injector: crawlDb: crawl-20090306184109/crawldb Injector: urlDir: crawl-tobethink/segments/20090306002848 Injector: Converting injected urls to crawl db entries. Exception in thread "main" java.io.IOException: Not a file: file:/opt/nutch-trunk/crawl-tobethink/segments/20090306002848/crawl_fetch at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:195) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:782) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127) at org.apache.nutch.crawl.Injector.inject(Injector.java:160) at org.apache.nutch.crawl.Crawl.main(Crawl.java:112) Regards, Wildan -- --- OpenThink Labs www.tobethink.com Aligning IT and Education >> 021-99325243 Y! : hawking_123 Linkedln : http://www.linkedin.com/in/wildanmaulana