I’m trying to collect some new guides  I’m learning while working with
pdfbox

I’m putting them here so you can correct me if I am wrong, and it might help
someone else.

 

1)      The pdStream got with page.getContents() have all data of a page.

2)      A token in a list of tokens collected with the PDFStreamParser
represents a data on a stream

3)      Removing a token from the list collected above and writing the
others tokens of the list in a pdStream can corrupt the stream

4)      A pdf object is formed by more than one token. (probably that’s why
we can afirme the number 3 line)

5)      A stream got with getUnfilteredStream of an object represents the
raw data of an object, it does not have pdf information like coordinates and
scale.

6)      A stream got with getFilteredStream of an object represents the raw
data of an object, it has pdf information like coordinates and scale.

 

Reply via email to