2005/10/21, Ted Harding <[EMAIL PROTECTED]>: > On 21-Oct-05 Marco Venanzi wrote: > > Hi, I'm trying to read data from a PDF file.Is it possible to do it > > with R? Thanks, Marco > > Basically, No. > > But you may be lucky with "copy&paste" using the mouse, from > the display generated in Acrobat Reader to a text file. > > The basic procedure here is > > 1. Click on the "Text Select Tool" (a button usually marked with a "T"); > > 2. Use the mouse to highlight the block of text you want to copy; > > 3. Depending on your operating system/graphics display: In Windows > you have (IIRC) to go to "Edit"->""Copy"; in Unlix/Linux with > X Windows do nothing; > > 4. "Paste" it into your text file, again as appropriate for your > operating system. > > However, you may not be lucky. > > PDF can store its content in stange ways, and what may look on the > screen like contiguous and consecutive text is stored internally > in separate "blocks" (what PDF calls "objects"). And this can apply > even to little bits of text in a paragraph. > > When you paste the marked text, it will go in in the order that > PDF finds the blocks in the file. As a result, your text file > may contain bits of text in random order. > > This especially applies to things arranged in tables. But it > very much depends on the software that generated the PDF in > the first place. > > Since often the data in a PDF file which you may want to copy > in this way will be tabular, you are likely to encounter this > problem! > > You can tell this is going to happen when you use the mouse to > highlight the text you intend to copy: starting with the mouse > iin say the top LH corner, move it slowly towards the lower > RH corner of the block. If the highlighting jumps all over the > screen, and/or outside the area you are trying to highlight, > then this is what's happening. > > In that case I have sometimes done it by copying lots of little > blocks, too small to provoke the effect. But this is very tedious. > > There are other things one can try, such as printing from the > PDF file to a PostScript file, and then using a program like > ps2ascii (which can deal directly with PDF) or pstotext; but frankly > no such program is likely to make a good job of this, because of > the way PS and PDF work. > > Sorry to appear unhelpful! But you may get somewhere.
Hmm, if this doesn't work you should have a look to pdftolpe, which is assumed to convert aribitrary PDF files to some LPE readable format. LPE is a lightweight programmer's editor, that should be able save the converted file into txt format. I never used this myself, though. In case you are running Windows my reply might not be of much help, sorry for that! good luck Thomas ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html