Hi all,
Just came across this older thread and just wanted to give my 2cents on it.
First of all, Francesco's observation is correct IMHO: My conclusion is
that PoDoFo linearization support overall (read an write) has always been
quite incomplete at best, and quite certainly broken/buggy enough to be
disabled quite early in PoDoFo development, so that nothing is working
about PDF linearization now.
I think it was a good decision to remove this code in pdfmm as it is not
working and having an API that is not working is just misleading. I would
agree to have this code removed in PoDoFo as well, as I do not think it is
possible to fix it in the current state. If someone would want to
reintroduce PDF linearization for writing, a complete reengineering would
be required.
Best regards,
Dominik
On Fri, Feb 18, 2022 at 9:25 AM Francesco Pretto wrote:
> Hello Connor,
>
> I'm the maintainer of pdfmm, a PoDoFo fork, but I had to evaluate what
> was done in PoDoFo about PDF linearization. I will try to answer you
> in a fair way, to the best of my knowledge. First, let's clarify what
> main capabilities PDF Linearization should enable, among others.
> According to Annex F of PDF 32000-1:2008 the PDF linearization, :
> - allows to "display the first page as quickly as possible" (not
> necessarily the page 0);
> - when the user requests another page of an open document it allows to
> "display that page as quickly as possible".
>
> PDF linearization as described by Annex F is implemented by
> encapsulating the content of the first page document in a "Incremental
> Update" like serialization that must be at the beginning of the
> document, together with a "linearization dictionary" that should be
> the first object of the document. The rest of the document is appended
> after this fake "incremental update" and "the pages shall be
> contiguous and shall be ordered by page number", and "the objects
> required to display that page shall be grouped together" and "the
> order of objects referenced from the page object should facilitate
> [...] incremental display of the page data as it arrives".
>
> Let's distinguish between PDF linearization read support, intended as
> the ability to exploit the organization of a linearized PDF document
> and write support as the ability to create a compliant linearized PDF.
> PoDoFo attempted to have linearization read support but it was
> disabled in 2009[1]. Also just reading the document structure (and not
> the object content) is performing a lot of of seeks that would kill
> the purpose of linearization (I actually removed those seeks in
> pdfmm).
>
> About PDF linearization write support, which I think you are most
> interested in, PoDoFo appears to do some work related to linearization
> in PdfVecObjects class[2], but in all the work related to create the
> linearization dictionary was disabled in PdfWriter[3] even earilier in
> 2007. Also there's no sign of the needed fake incremental update that
> contains the content of the first page.
>
> My conclusion is that PoDoFo linearization support overall (read an
> write) has always been quite incomplete at best, and quite certainly
> broken/buggy enough to be disabled quite early in PoDoFo development,
> so that nothing is working about PDF linearization now, and the
> leftover API that seems to enable linearization is just code that got
> rotten (that's why I decided to remove it completely in pdfmm). If one
> decided to work on revamping the PDF linearization support I would
> recommend to read the specification and start it from scratch, not
> basing on the left-over code in PoDoFo, but it's a weeks/months long
> full time work. Of course I would love to re-introduce it in pdfmm,
> where the situation is just much more clean than in PoDoFO, but
> unfortunately that work is not in my top priorities.
>
> I hope I was factually correct about the current state of PoDoFO.
> Other people may add further details or correct me if I was wrong.
>
> Regards,
> Francesco
>
> [1]
> https://sourceforge.net/p/podofo/code/HEAD/tree/podofo/trunk/src/podofo/base/PdfParser.cpp#l300
> [2]
> https://sourceforge.net/p/podofo/code/HEAD/tree/podofo/trunk/src/podofo/base/PdfVecObjects.cpp#l308
> [3]
> https://sourceforge.net/p/podofo/code/HEAD/tree/podofo/trunk/src/podofo/base/PdfWriter.cpp#l274
>
>
> On Thu, 17 Feb 2022 at 23:00, Connor Black wrote:
> >
> > Hey,
> >
> >
> >
> > I am currently evaluating this library for use in a commercial product
> and I was curious what linearalization would look like using PoDoFo. I have
> spent the last couple of days looking through documentation and trying to
> look into how it would work but the most I can grasp is that PdfWriter has
> the option to set linearalization through SetLineralization – but I have
> not been able to find any examples or successfully use the PdfWriter class
> to produce these results. I was wondering if you could provide a little
> code snippet showing how PdfWriter would be used