The Apple Preview application does a VERY lossy save by rewriting every
page in the document and throwing away all metadata and marginalia in the
document.  It¹s not a good example of how to properly do PDF size
reduction.

Leonard

On 5/8/14, 6:15 PM, "Christophe Meyer" <christophe.meyer.2...@gmail.com>
wrote:

>Thank you for your answers.
>
>@Dennis Jenkins : I use Macintosh but indeed when I save the pdf file of
>1Go with The native application on Mac, it shrinks back to 10Mo, the «
>normal » size. 
>I will install Podofo browser and study it.
>
>@zyx : I will study what you explained me.
>
>As soon as I find an answer to my problem, I will share it to you.
>
>Christophe
>
>
>Le 7 mai 2014 à 07:59, podofo-users-requ...@lists.sourceforge.net a écrit
>:
>
>> Send Podofo-users mailing list submissions to
>>      podofo-users@lists.sourceforge.net
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>>      https://lists.sourceforge.net/lists/listinfo/podofo-users
>> or, via email, send a message with subject or body 'help' to
>>      podofo-users-requ...@lists.sourceforge.net
>> 
>> You can reach the person managing the list at
>>      podofo-users-ow...@lists.sourceforge.net
>> 
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Podofo-users digest..."
>> 
>> 
>> Today's Topics:
>> 
>>  1. Re: How to reduce Pdf size (zyx)
>>  2.  PdfWriter as a base class (Ilan Zisser)
>>  3. Re: How to reduce Pdf size (Leonard Rosenthol)
>>  4. Re: How to reduce Pdf size (Leonard Rosenthol)
>>  5. Splicing PDFs with AcroForms, NeedsAppearances, mysterious
>>     file size shrinkage, Adobe Reader behavior (Dennis Jenkins)
>> 
>> 
>> ----------------------------------------------------------------------
>> 
>> Message: 1
>> Date: Mon, 05 May 2014 09:07:19 +0200
>> From: zyx <z...@litepdf.cz>
>> Subject: Re: [Podofo-users] How to reduce Pdf size
>> To: podofo-users@lists.sourceforge.net
>> Message-ID: <1399273639.1876.9.camel@zyxPad>
>> Content-Type: text/plain; charset="UTF-8"
>> 
>> On Sun, 2014-05-04 at 14:13 +0200, Christophe Meyer wrote:
>>> I am developping a simple software. As a first basis, I would like to
>>> duplicate a pdf file that has been created by my printer (a scanned
>>> document). It consists of a file of 40 pages. It weights 10 Mb and
>>> each page of the pdf document is an image of a scanned document?s
>>> page. 
>>> 
>>> In my program, I am just copying each page from the original pdf (I
>>> load it in a PdfMemDocument) and then inserts it in another
>>> PdfMemDocument with InsertPage.
>>> 
>>> I just do a Write() at the end.
>>> 
>>> The file created at the end weighs more than 500 MB!!
>> 
>>      Hi,
>> check the documentation and comments around the functions you use for
>> the page insertion. The PoDoFo doesn't merge resources, thus whenever
>> you add single page to a new document it copies whole document (or
>> "only" all resources, I dot recall precisely) to the new file, thus
>> nothing is missing when the inserted page is drawn.
>> 
>> I suggest to copy all pages in the destination document at once, convert
>> each into an XObject, then delete them all and reorder them as you wish,
>> by drawing the XObject into the new page (you can even shrink it and so
>> on). This way you'll not duplicate the resources (by the way, inner
>> images are also resources, which explains the size increase).
>>      Bye,
>>      zyx
>> 
>> -- 
>> http://www.litePDF.cz                                 i...@litepdf.cz
>> 
>> 
>> 
>> 
>> ------------------------------
>> 
>> Message: 2
>> Date: Mon, 05 May 2014 10:20:39 +0300
>> From: Ilan Zisser <ilanzis...@gmail.com>
>> Subject: [Podofo-users]  PdfWriter as a base class
>> To: podofo-users@lists.sourceforge.net
>> Message-ID: <53673bc7.4070...@gmail.com>
>> Content-Type: text/plain; charset="windows-1255"
>> 
>> An HTML attachment was scrubbed...
>> -------------- next part --------------
>> Index: PdfWriter.h
>> ===================================================================
>> --- PdfWriter.h      (revision 1598)
>> +++ PdfWriter.h      (working copy)
>> @@ -100,7 +100,7 @@
>>     *
>>     *  \param pDevice write to the specified device
>>     */
>> -    void Write( PdfOutputDevice* pDevice );
>> +    virtual void Write( PdfOutputDevice* pDevice );
>> 
>>    /** Set the write mode to use when writing the PDF.
>>     *  \param eWriteMode write mode
>> @@ -192,7 +192,7 @@
>>     *  \param bPrevEntry if true a prev entry is added to the trailer
>>object with a value of 0
>>     *  \param bOnlySizeKey write only the size key
>>     */
>> -    void FillTrailerObject( PdfObject* pTrailer, pdf_long lSize, bool
>>bPrevEntry, bool bOnlySizeKey ) const;
>> +    virtual void FillTrailerObject( PdfObject* pTrailer, pdf_long
>>lSize, bool bPrevEntry, bool bOnlySizeKey ) const;
>> 
>> protected:
>>    /**
>> @@ -202,15 +202,16 @@
>> 
>>    /** Writes the pdf header to the current file.
>>     *  \param pDevice write to this output device
>> -     */       
>> -    void PODOFO_LOCAL WritePdfHeader( PdfOutputDevice* pDevice );
>> +     */
>> 
>> +    virtual void PODOFO_LOCAL WritePdfHeader( PdfOutputDevice* pDevice
>>);
>> +
>>    /** Write pdf objects to file
>>     *  \param pDevice write to this output device
>>     *  \param vecObjects write all objects in this vector to the file
>>     *  \param pXref add all written objects to this XRefTable
>> -     */ 
>> -    void WritePdfObjects( PdfOutputDevice* pDevice, const
>>PdfVecObjects& vecObjects, PdfXRef* pXref ) PODOFO_LOCAL;
>> +     */
>> +    virtual void WritePdfObjects( PdfOutputDevice* pDevice, const
>>PdfVecObjects& vecObjects, PdfXRef* pXref ) PODOFO_LOCAL;
>> 
>>    /** Creates a file identifier which is required in several
>>     *  PDF workflows.
>> 
>> ------------------------------
>> 
>> Message: 3
>> Date: Mon, 5 May 2014 11:32:34 +0000
>> From: Leonard Rosenthol <lrose...@adobe.com>
>> Subject: Re: [Podofo-users] How to reduce Pdf size
>> To: Dennis Jenkins <dennis.jenkins...@gmail.com>,
>>      "podofo-users@lists.sourceforge.net"
>>      <podofo-users@lists.sourceforge.net>
>> Message-ID: <cf8cee9e.5aae6%lrose...@adobe.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>> 
>> That message from Reader means that the file is damaged in some way so
>>that Reader had to repair it when it opened it.  Something you are doing
>>in the editing/modification process is creating an invalid PDF.   And
>>yes, in that case, it does a (full) save.
>> 
>> Leonard
>> 
>> From: Dennis Jenkins
>><dennis.jenkins...@gmail.com<mailto:dennis.jenkins...@gmail.com>>
>> Date: Monday, May 5, 2014 at 12:03 AM
>> To: 
>>"podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge
>>.net>" 
>><podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge
>>.net>>
>> Subject: Re: [Podofo-users] How to reduce Pdf size
>> 
>> 
>> 
>> 
>> On Sun, May 4, 2014 at 5:06 PM, Leonard Rosenthol
>><lrose...@adobe.com<mailto:lrose...@adobe.com>> wrote:
>> Adobe Reader doesn't re-save PDFs - so perhaps you mean Adobe Acrobat??
>> 
>> Leonard
>> 
>> 
>> Hello Leonard,
>> 
>>   I do mean "Adobe Reader XI" on 32-bit Windows XP.  I'm not editing
>>the PDF, exactly.  Let me provide an example.
>> 
>>   The IRS provides a four-page PDF for the "941" report, and a second
>>report (addenda) called the "Schedule B".  Only two pages of the 941
>>have actual "pdf form" data.
>> 
>>   My program will create a new (empty) PDF, open the 941, splice in two
>>of the four pages, splice in the Schedule B (if needed), and then fill
>>in the form fields with the proper data.  I must also embed another font
>>and create an appearance stream (you helped me with this logic a few
>>years ago).  The software will then save the PDF.
>> 
>>   If I open this PDF in Adobe Reader, it looked correct (form fields
>>are filled in).  However, if I attempt to exit/close Adobe Reader, it
>>prompts me "Do you want to save changes to XXX.pdf before closing?"
>>(even if I changed nothing while Adobe Reader was open).  If I decline,
>>then Adobe Reader exits and nothing special happens.  If I elect to
>>"save my changes", then the resulting PDF on disk is smaller then the
>>original, a new top-level section called "/Metadata" is created, and the
>>"/Acroform" is altered.  I have yet to determine what gets removed from
>>the PDF that makes it smaller, but I suspect that it is the font that I
>>had to add earlier.  If I don't add that font, then the fields that I
>>filled in are not visible in Adobe Reader unless the individual field is
>>selected by the user (input focus).
>> 
>>   I can repeat the above with the other forms that my software will
>>populate (Arizona A1-QRT and Arizona UC-018).
>> 
>> (Federal 941 report, file size difference is not much)
>> $ ls -l ./tmp/report*.pdf
>> -rw-r--r-- 1 djenkins djenkins 654315 May  4 22:59 ./tmp/report.pdf
>> -rw-r--r-- 1 djenkins djenkins 606551 May  4 23:00 ./tmp/report2.pdf
>> 
>> (AZ UC-018 report, size difference is significant)
>> $ ls -l ./tmp/report*.pdf
>> -rw-r--r-- 1 djenkins djenkins 415754 May  4 23:01 ./tmp/report.pdf
>> -rw-r--r-- 1 djenkins djenkins 206989 May  4 23:01 ./tmp/report2.pdf
>> 
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> 
>> ------------------------------
>> 
>> Message: 4
>> Date: Mon, 5 May 2014 11:34:05 +0000
>> From: Leonard Rosenthol <lrose...@adobe.com>
>> Subject: Re: [Podofo-users] How to reduce Pdf size
>> To: zyx <z...@litepdf.cz>, "podofo-users@lists.sourceforge.net"
>>      <podofo-users@lists.sourceforge.net>
>> Message-ID: <cf8cef0d.5aaec%lrose...@adobe.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>> 
>> It?s been a long time since I looked at the page copying code in PoDoFo,
>> but it should only be copying those resources referenced by the page in
>> question.  Of course, if those resources are shared across pages - and
>>you
>> copy multiple pages - you get multiple copies (since they are no longer
>> shared when copied page by page).
>> 
>> Even better than suggested below is to start with the larger document,
>>add
>> your smaller document to it, and then delete.
>> 
>> Leonard
>> 
>> On 5/5/14, 3:07 AM, "zyx" <z...@litepdf.cz> wrote:
>> 
>>> On Sun, 2014-05-04 at 14:13 +0200, Christophe Meyer wrote:
>>>> I am developping a simple software. As a first basis, I would like to
>>>> duplicate a pdf file that has been created by my printer (a scanned
>>>> document). It consists of a file of 40 pages. It weights 10 Mb and
>>>> each page of the pdf document is an image of a scanned document?s
>>>> page. 
>>>> 
>>>> In my program, I am just copying each page from the original pdf (I
>>>> load it in a PdfMemDocument) and then inserts it in another
>>>> PdfMemDocument with InsertPage.
>>>> 
>>>> I just do a Write() at the end.
>>>> 
>>>> The file created at the end weighs more than 500 MB!!
>>> 
>>>     Hi,
>>> check the documentation and comments around the functions you use for
>>> the page insertion. The PoDoFo doesn't merge resources, thus whenever
>>> you add single page to a new document it copies whole document (or
>>> "only" all resources, I dot recall precisely) to the new file, thus
>>> nothing is missing when the inserted page is drawn.
>>> 
>>> I suggest to copy all pages in the destination document at once,
>>>convert
>>> each into an XObject, then delete them all and reorder them as you
>>>wish,
>>> by drawing the XObject into the new page (you can even shrink it and so
>>> on). This way you'll not duplicate the resources (by the way, inner
>>> images are also resources, which explains the size increase).
>>>     Bye,
>>>     zyx
>>> 
>>> -- 
>>> http://www.litePDF.cz                                 i...@litepdf.cz
>>> 
>>> 
>>> 
>>>------------------------------------------------------------------------
>>>--
>>> ----
>>> Is your legacy SCM system holding you back? Join Perforce May 7 to find
>>> out:
>>> &#149; 3 signs your SCM is hindering your productivity
>>> &#149; Requirements for releasing software faster
>>> &#149; Expert tips and advice for migrating your SCM now
>>> http://p.sf.net/sfu/perforce
>>> _______________________________________________
>>> Podofo-users mailing list
>>> Podofo-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/podofo-users
>> 
>> 
>> 
>> 
>> ------------------------------
>> 
>> Message: 5
>> Date: Wed, 7 May 2014 00:59:13 -0500
>> From: Dennis Jenkins <dennis.jenkins...@gmail.com>
>> Subject: [Podofo-users] Splicing PDFs with AcroForms,
>>      NeedsAppearances, mysterious file size shrinkage, Adobe Reader
>>      behavior
>> To: "podofo-users@lists.sourceforge.net"
>>      <podofo-users@lists.sourceforge.net>
>> Message-ID:
>>      <CAAEzAp9Rfd1=zQeaja7m8VNz68++cpwWicokHVwhRL=sclp...@mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>> 
>> Hello all (but mostly directed to Leonard),
>> 
>>  A few days ago I described [1] some odd behavior that I am having with
>> Adobe Reader consuming PDFs generated by my project.  To avoid hijacking
>> Christophe's original thread, I am starting a new one.
>> 
>>  At a high-level, my goal is to use PoDoFo to splice together pages from
>> various PDFs which are US tax forms, fill in the data, save the
>>resulting
>> PDF and have the filled-in form fields "just work" in Adobe Reader (eg,
>>be
>> visible and still editable) and have Adobe Reader NOT prompt the user to
>> save the file when the user attempts to exit.  Secondly, I noticed that
>>if
>> I allow Adobe Reader to save the PDF, it shrinks in half (sometimes).  I
>> want to know why, so that I can optimize the size of my PDFs without
>> needing Adobe Reader (my code runs on Linux as part of a web service).
>> 
>>  Leonard suggested that my PDF is malformed and that Adobe Reader is
>> offering to repair/save it in this case.  After much experimentation and
>> staring at "podofobrowser" and "podofopdfinfo diffs" of the pre- and
>>post-
>> PDFs, I am not 100% convinced that this is the case.
>> 
>> In my code, I must set the "NeedsApperances" dictionary element of the
>> "/AcroForm" to "true", or my fields will not be visible in Adobe
>>Reader.  I
>> then need to populate the appearance stream, per section 12.7.3.3 of ISO
>> 32000:2008 (herein referred to as "the spec").  When Adobe Reader saves
>>my
>> PDF, this dictionary key disappears, and every field element gains a key
>> called "AP", with a child key of "N".  This is discussed in 12.7.3.3 of
>>the
>> spec on page #435, first complete paragraph.
>> 
>> If I omit adding the key for "NeedsApperances" to the AcroForm, Adobe
>> Reader will no longer offer to save my PDF, but my field values are no
>> longer visible.  Therefore, I suspect that Adobe wants to save the PDF
>>in
>> order to apply/generate the per-field appearance stream.
>> 
>> QUESTION 1: Is the above hypothesis valid?
>> 
>> I generate my PDFs by creating an empty PDF in memory, and "inserting"
>> pages from other PDFs.  This results in a PDF with no "Fields" in the
>> "/AcroForm/Fields" array.  Adobe Reader populates the "Fields" array
>>when
>> it saves the PDF.  However, the count of elements in the "Fields" array
>> does not match the actual count of fields.  For example, Adobe Reader
>> places 176 elements into this array, but when I enumerate all fields on
>>all
>> pages using the PoDoFo API (with my patch to handle inherited fields), I
>> count 212.  I have not completed an exhaustive comparison of the
>>"Fields"
>> arrays yet to determine if the discrepancy is due to the inherited form
>> fields (typically check boxes) or not.  I wrote a routine to populate
>>the
>> "Fields" array myself (with all 212 items), but Adobe Reader rebuilds it
>> with on 176 items.  If I do not set the "NeedsApperances" flag, Adobe
>> Reader never offers to save the PDF on exit, so this array is not
>>rebuilt
>> in this case.
>> 
>> QUESTION 2: How does Adobe Reader determine which fields need to be in
>>the
>> "/AcroForm/Fields" array?
>> 
>>   Adobe Reader seems to not care that the "/AcroForm" is missing (its
>> presence or absence does not affect when Adobe Reader offers to save the
>> form).  Yet section 12.7.2 of the spec states that the "/AcroForm" is
>> required.
>> 
>> QUESTION 3: How do we reconcile section 12.7.2 with Adobe Reader's
>> behavior?  Which is "correct" (or did I misunderstand the ISO)?
>> 
>>   The content of the "Fields -> element -> AP -> N" key is an
>> "/XObject".  The data stream created by Adobe Reader for it looks
>> complicated.
>> 
>> QUESTION 4: Assuming the answer to Question #1 is "yes", Do you have any
>> suggestions on how I can compute the required XObject in code?  I just
>>want
>> to check a checkbox or place simple text into a text field.
>> 
>>   When Adobe Reader does save the PDF, and depending on which source
>> form(s) are in it, the resulting PDF might shrink in size considerably.
>> A
>> cursory look with podofobrowser shows that Adobe Reader has heavily
>> modified "Pages -> Kids[page] -> Contents[]".  In my current testing
>>PDF,
>> the original has one element in page #0 Contents, with a compressed
>>length
>> of 20443.  Adobe Reader's version has 8 array elements, each with
>> approximately 2K of compressed XObject data.
>> 
>> QUESTION 5:  Why does Adobe Reader tinker with this part of a PDF when
>> saving it?  Ok, that was rhetorical - I assume that it does so so the
>>the
>> file will be smaller, and it also sets the "linearized" flag.  The
>>question
>> should be stated: What rules does Adobe Reader follow when deciding
>>if/how
>> to refactor the actual page layout.
>> 
>> QUESTION 6: Why does refactoring the XObject components make the file so
>> much smaller (200K vs 450K for example).
>> 
>>  In some cases, the file size savings are significant.  If I knew what
>> rules Adobe Reader followed, I might attempt to write a routine to apply
>> the same changes using PoDoFo (and share it with the community).
>> 
>>  Thank you for your time.
>> 
>> [1] http://sourceforge.net/p/podofo/mailman/message/32302847/
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> 
>> ------------------------------
>> 
>> 
>>-------------------------------------------------------------------------
>>-----
>> Is your legacy SCM system holding you back? Join Perforce May 7 to find
>>out:
>> &#149; 3 signs your SCM is hindering your productivity
>> &#149; Requirements for releasing software faster
>> &#149; Expert tips and advice for migrating your SCM now
>> http://p.sf.net/sfu/perforce
>> 
>> ------------------------------
>> 
>> _______________________________________________
>> Podofo-users mailing list
>> Podofo-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/podofo-users
>> 
>> 
>> End of Podofo-users Digest, Vol 95, Issue 3
>> *******************************************
>
>
>--------------------------------------------------------------------------
>----
>Is your legacy SCM system holding you back? Join Perforce May 7 to find
>out:
>&#149; 3 signs your SCM is hindering your productivity
>&#149; Requirements for releasing software faster
>&#149; Expert tips and advice for migrating your SCM now
>http://p.sf.net/sfu/perforce
>_______________________________________________
>Podofo-users mailing list
>Podofo-users@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/podofo-users


------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
&#149; 3 signs your SCM is hindering your productivity
&#149; Requirements for releasing software faster
&#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to