Hi, In my experience pdftotext did not do a very good job at this because it screws up the formatting of tables. This of course depends on what program the pdf document was originally constructed with. What I found most appealing is the use of cut and paste into xemacs or emacs and use M-x canonically-space-region function. This will eliminate any extra spaces. However if the pdf document was prepared through scanning and one uses a character recognition program, then all is up in the air and the formatting of tables have to be done by hand.
Jean [EMAIL PROTECTED] wrote: >>Hi, I'm trying to read data from a PDF file.Is it possible to do it >>with R? Thanks, Marco >> >> > >If cut and paste to a text file fails, try this: > >pdftotext (from the xpdf project) > >or > >http://pdftohtml.sourceforge.net >pdftohtml is a utility which converts PDF files into HTML and >XML formats > >In addition, pdftk, the command line pdf toolkit may be useful >http://www.accesspdf.com/pdftk/ > > > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
