How to extract fetched files(pdf)?

hudvin Sat, 24 Nov 2012 12:30:49 -0800

I need to extract fetched pdf files. I can extract text by using following
command


bin/nutch readseg -dump crawl-test/segments/20110201114/ dump -nogenerate
-noparse -noparsedata -noparsetex

But I need raw pdf files, not pure text.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-extract-fetched-files-pdf-tp4022202.html
Sent from the Nutch - User mailing list archive at Nabble.com.

How to extract fetched files(pdf)?

Reply via email to