On 5. 1. 2012., at 20:00, Andreas Lehmkuehler wrote: > Hi, > > Am 03.01.2012 12:25, schrieb Ilija Pavlic: >> >> On-and-off, I spent two weeks with pdfbox, reading what documentation >> exists on the website, skimming through source files and >> try-miss-repeat programming using pdfbox. I am willing to write >> tutorials/documentation on what little I learned along the way, but > Patches are always welcome, especially those adressing the docs
I will try and figure out the code. I am most often confused by the coordinates used, so that should be the first stop for me. >> I realize that you do not have time for to write answers to all >> beginners' questions. I also realize that you are trying to be >> helpful. Thank you for that. > You have to get a digital copy of the pdf specs [1] to understand the format > of PDFs. It'll become your new "bible". I'll do that, thank you. I already skimmed through it in the past two weeks (enough for a very rough idea), but I see that some things cannot be avoided :). > The class PDFStreamEngine processes those streams it executes every operator > as long as it is supported/needed for the given usecase. The mapping from an > operator to the implementing class is done within a propertiy file, e.g. > PageDrawer.properties contains the mapping for all operators which are used > for rendering. PDFTextStripper.properties contains a smaller subset of > mappings as some of the supported operators aren't useful for text extraction. Let me check if I understood correctly: PDFStreamEngine processes a stream. When it encounters an operator, it looks up the mapping to find the operator. If a class is defined in the mapping, it will be constructed, and "process" function will be called on the class with the encountered operator and its arguments. > The operators for graphics objects are explained in chapter 8.2. There is no > simple command like drawLine, it's a little bit more complicated: > > - 0 G -> set the stroking color to black > - x y m -> move to the starting point (x,y) > - x y l -> draw a line to the endpoint (x,y) > - s -> close and stroke the path > > > But be aware path objects can be used to stroke a path, to fill a path or as > a clipping path. There is a transformation matrix which has to be taken into > account for scaling or translation and last but not least PDFs are using a > graphics stack with different states holding different graphics parameters. > > That sounds really complicated, but IMHO if you get used to it it won't be > that hard anymore. :-) Up to the graphics stack, everything sounds clear. Thank you very much for your effort answering my question. BR, Ilija.

