I need to extract fetched pdf files. I can extract text by using following
command

bin/nutch readseg -dump crawl-test/segments/20110201114/ dump -nogenerate
-noparse -noparsedata -noparsetex

But I need raw pdf files, not pure text.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-extract-fetched-files-pdf-tp4022202.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to