I need to extract fetched pdf files. I can extract text by using following command
bin/nutch readseg -dump crawl-test/segments/20110201114/ dump -nogenerate -noparse -noparsedata -noparsetex But I need raw pdf files, not pure text. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-extract-fetched-files-pdf-tp4022202.html Sent from the Nutch - User mailing list archive at Nabble.com.

