Re: [Podofo-users] PoDoFo & PDF Linearalization

2022-04-08 Thread Dominik Seichter via Podofo-users
Hi all,

Just came across this older thread and just wanted to give my 2cents on it.

First of all, Francesco's observation is correct IMHO: My conclusion is
that PoDoFo linearization support overall (read an write) has always been
quite incomplete at best, and quite certainly broken/buggy enough to be
disabled quite early in PoDoFo development, so that nothing is working
about PDF linearization now.

I think it was a good decision to remove this code in pdfmm as it is not
working and having an API that is not working is just misleading. I would
agree to have this code removed in PoDoFo as well, as I do not think it is
possible to fix it in the current state. If someone would want to
reintroduce PDF linearization for writing, a complete reengineering would
be required.

Best regards,
 Dominik


On Fri, Feb 18, 2022 at 9:25 AM Francesco Pretto  wrote:

> Hello Connor,
>
> I'm the maintainer of pdfmm, a PoDoFo fork, but I had to evaluate what
> was done in PoDoFo about PDF linearization. I will try to answer you
> in a fair way, to the best of my knowledge. First, let's clarify what
> main capabilities PDF Linearization should enable, among others.
> According to Annex F of PDF 32000-1:2008 the PDF linearization, :
> - allows to "display the first page as quickly as possible" (not
> necessarily the page 0);
> - when the user requests another page of an open document it allows to
> "display that page as quickly as possible".
>
> PDF linearization as described by Annex F is implemented by
> encapsulating the content of the first page document in a "Incremental
> Update" like serialization that must be at the beginning of the
> document, together with a "linearization dictionary" that should be
> the first object of the document. The rest of the document is appended
> after this fake "incremental update" and "the pages shall be
> contiguous and shall be ordered by page number", and "the objects
> required to display that page shall be grouped together" and "the
> order of objects referenced from the page object should facilitate
> [...] incremental display of the page data as it arrives".
>
> Let's distinguish between PDF linearization read support, intended as
> the ability to exploit the organization of a linearized PDF document
> and write support as the ability to create a compliant linearized PDF.
> PoDoFo attempted to have linearization read support but it was
> disabled in 2009[1]. Also just reading the document structure (and not
> the object content) is performing a lot of of seeks that would kill
> the purpose of linearization (I actually removed those seeks in
> pdfmm).
>
>  About PDF linearization write support, which I think you are most
> interested in, PoDoFo appears to do some work related to linearization
> in PdfVecObjects class[2], but in all the work related to create the
> linearization dictionary was disabled in PdfWriter[3] even earilier in
> 2007. Also there's no sign of the needed fake incremental update that
> contains the content of the first page.
>
> My conclusion is that PoDoFo linearization support overall (read an
> write) has always been quite incomplete at best, and quite certainly
> broken/buggy enough to be disabled quite early in PoDoFo development,
> so that nothing is working about PDF linearization now, and the
> leftover API that seems to enable linearization is just code that got
> rotten (that's why I decided to remove it completely in pdfmm). If one
> decided to work on revamping the PDF linearization support I would
> recommend to read the specification and start it from scratch, not
> basing on the left-over code in PoDoFo, but it's a weeks/months long
> full time work. Of course I would love to re-introduce it in pdfmm,
> where the situation is just much more clean than in PoDoFO, but
> unfortunately that work is not in my top priorities.
>
> I hope I was factually correct about the current state of PoDoFO.
> Other people may add further details or correct me if I was wrong.
>
> Regards,
> Francesco
>
> [1]
> https://sourceforge.net/p/podofo/code/HEAD/tree/podofo/trunk/src/podofo/base/PdfParser.cpp#l300
> [2]
> https://sourceforge.net/p/podofo/code/HEAD/tree/podofo/trunk/src/podofo/base/PdfVecObjects.cpp#l308
> [3]
> https://sourceforge.net/p/podofo/code/HEAD/tree/podofo/trunk/src/podofo/base/PdfWriter.cpp#l274
>
>
> On Thu, 17 Feb 2022 at 23:00, Connor Black  wrote:
> >
> > Hey,
> >
> >
> >
> > I am currently evaluating this library for use in a commercial product
> and I was curious what linearalization would look like using PoDoFo. I have
> spent the last couple of days looking through documentation and trying to
> look into how it would work but the most I can grasp is that PdfWriter has
> the option to set linearalization through SetLineralization – but I have
> not been able to find any examples or successfully use the PdfWriter class
> to produce these results. I was wondering if you could provide a little
> code snippet showing how PdfWriter would be used 

Re: [Podofo-users] [RFC] pdfmm 0.9.20 released and offer to merge back to PoDoFo

2022-04-08 Thread Dominik Seichter via Podofo-users
Hi Zyx, Hi Francesco,

I also wanted to continue this discussion. Sorry again for the long delay.
If you prefer, we could also have a phone/video chat about these points
(also whoever is interested could join in).

>From my point of view, the most important points/next steps are as follows
(feel free to correct me if I am wrong):

1) Decision on splitting PoDoFo and PoDoFo Tools:
I would prefer to keep them in the same repository to make sure they are in
a buildable state all the time and can also be used as examples for working
with PoDoFo.
@Francesco Pretto  : It would be great if you could take
the additional effort and port them to pdfmm. The diff could also be a nice
migration guideline for the future.

2) Decision on license of PoDoFo Tools:
PoDoFo Tools are currently GPL whereas the library itself is LGPL. This
e.g. makes it hard to copy from tools to the library. From my point of
view, we should align the license so that the tools have the same license
as the library (currently LGPL, for the future see separate discussion).

3) Merging back pdfmm in podofo / replacing podofo with pdfmm:
I would agree to the proposal from Francesco here. We should do one last
release of PoDoFo 0.9.8 based on the current svn trunk and then replace it.
@Zyx: When would be a good point of time for this? Can I just bundle a
trunk, do quick tests and upload it? I am not sure about the current state.

4) Move to Git:
Yes, let's move to Git* and keep website, mailing list on sourceforge.

5) relicensing to MPL:
I am in general open to that, but let's keep discussion in separate mail
thread.


Best regards,
 Dominik




On Tue, Feb 22, 2022 at 9:39 AM zyx  wrote:

> On Mon, 2022-02-21 at 12:17 +0100, Francesco Pretto wrote:
> > Please understand that after the port I would still need to "recruit"
> > people to ensure they actually work as intended. In short words: they
> > need a maintainer who cares about them.
>
> Hi,
> sure, that's fine. As far as I can tell, it's no change from the
> current state.
> Bye,
> zyx
>
>
> ___
> Podofo-users mailing list
> Podofo-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/podofo-users
>
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users