On 5. 1. 2012., at 20:00, Andreas Lehmkuehler wrote:

> Hi,
> 
> Am 03.01.2012 12:25, schrieb Ilija Pavlic:
>> 
>> On-and-off, I spent two weeks with pdfbox, reading what documentation
>> exists on the website, skimming through source files and
>> try-miss-repeat programming using pdfbox. I am willing to write
>> tutorials/documentation on what little I learned along the way, but
> Patches are always welcome, especially those adressing the docs

I will try and figure out the code. I am most often confused by the coordinates 
used, so that should be the first stop for me.

>> I realize that you do not have time for to write answers to all
>> beginners' questions. I also realize that you are trying to be
>> helpful. Thank you for that.
> You have to get a digital copy of the pdf specs [1] to understand the format 
> of PDFs. It'll become your new "bible".

I'll do that, thank you. I already skimmed through it in the past two weeks 
(enough for a very rough idea), but I see that some things cannot be avoided :).

> The class PDFStreamEngine processes those streams it executes every operator 
> as long as it is supported/needed for the given usecase. The mapping from an 
> operator to the implementing class is done within a propertiy file, e.g. 
> PageDrawer.properties contains the mapping for all operators which are used 
> for rendering. PDFTextStripper.properties contains a smaller subset of 
> mappings as some of the supported operators aren't useful for text extraction.

Let me check if I understood correctly: PDFStreamEngine processes a stream. 
When it encounters an operator, it looks up the mapping to find the operator. 
If a class is defined in the mapping, it will be constructed, and     "process" 
function will be called on the class with the encountered operator and its 
arguments.

> The operators for graphics objects are explained in chapter 8.2. There is no 
> simple command like drawLine, it's a little bit more complicated:
> 
> - 0 G -> set the stroking color to black
> - x y m -> move to the starting point (x,y)
> - x y l -> draw a line to the endpoint (x,y)
> - s -> close and stroke the path
> 
> 
> But be aware path objects can be used to stroke a path, to fill a path or as 
> a clipping path. There is a transformation matrix which has to be taken into 
> account for scaling or translation and last but not least PDFs are using a 
> graphics stack with different states holding different graphics parameters.
> 
> That sounds really complicated, but IMHO if you get used to it it won't be 
> that hard anymore. :-)

Up to the graphics stack, everything sounds clear. Thank you very much for your 
effort answering my question.

BR,
Ilija.

Reply via email to