Im trying to collect some new guides Im learning while working with pdfbox
Im putting them here so you can correct me if I am wrong, and it might help someone else. 1) The pdStream got with page.getContents() have all data of a page. 2) A token in a list of tokens collected with the PDFStreamParser represents a data on a stream 3) Removing a token from the list collected above and writing the others tokens of the list in a pdStream can corrupt the stream 4) A pdf object is formed by more than one token. (probably thats why we can afirme the number 3 line) 5) A stream got with getUnfilteredStream of an object represents the raw data of an object, it does not have pdf information like coordinates and scale. 6) A stream got with getFilteredStream of an object represents the raw data of an object, it has pdf information like coordinates and scale.

