Re: [R] reading data from a pdf

Jean Eid Mon, 24 Oct 2005 08:10:26 -0700

Hi,

In my experience pdftotext did not do a very good job at this because it 
screws up the formatting of tables. This of course depends on what 
program the pdf document was originally constructed with. What I found 
most appealing is the use of cut and paste into xemacs or emacs and use 
M-x  canonically-space-region function. This  will eliminate any extra 
spaces. However if the pdf document was prepared through scanning and 
one uses a  character recognition program, then all is up in the air and 
the formatting of tables have to be done by hand.

Jean
[EMAIL PROTECTED] wrote:

>>Hi, I'm trying to read data from a PDF file.Is it possible to do it
>>with R? Thanks,  Marco
>>    
>>
>
>If cut and paste to a text file fails, try this:
>
>pdftotext (from the xpdf project)
>
>or
>
>http://pdftohtml.sourceforge.net
>pdftohtml is a utility which converts PDF files into HTML and
>XML formats
>
>In addition, pdftk, the command line pdf toolkit may be useful
>http://www.accesspdf.com/pdftk/
>
>  
>

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] reading data from a pdf

Reply via email to