Hello:
Apart from readPDF in the tm package, you can use the pdf to text converter
command in linux, which is pdftotext. Say file.pdf is your file, from R
you'd use:
system(pdftotext file.pdf -layout)
This invokes the pdftotext command from within R and creates a file called
file.txt with
I need to do text mining on PDF files. I understand there is a readPDF
command in tm that can be used. Have read the 2008 posts on converting
PDF files to text by Tony Breyal and others.
Wondering if the procedure has been standardized in any tutorial or
otherwise? Being new to R, I was
Thank you, Tony.
Even in 2012, I still found your post useful.
--
View this message in context:
http://r.789695.n4.nabble.com/Reading-PDF-files-tp977248p4650374.html
Sent from the R help mailing list archive at Nabble.com.
__
R-help@r-project.org
Greetings Zaki,
You should really post this question on the R-help forum so that
others might benefit from any responses. It's been a while since I've
done this, but if memory serves, the basic process was to download
xpdf and add it to the windows path, thus making it accessable from
within R.
Hi:
I need to do text mining on PDF files. I understand there is a readPDF
command in tm that can be used. Have read the 2008 posts on converting
PDF files to text by Tony Breyal and others.
Wondering if the procedure has been standardized in any tutorial or
otherwise? Being new to R, I
Copied/pasted from my earlier reply:
It's been a while since I've
done this, but if memory serves, the basic process was to download
xpdf and add it to the windows path, thus making it accessable from
within R. Two methods follow:
Method One (easiest) - using the awesome ?system command:
(1)
6 matches
Mail list logo