The Apple Preview application does a VERY lossy save by rewriting every page in the document and throwing away all metadata and marginalia in the document. It¹s not a good example of how to properly do PDF size reduction.
Leonard On 5/8/14, 6:15 PM, "Christophe Meyer" <christophe.meyer.2...@gmail.com> wrote: >Thank you for your answers. > >@Dennis Jenkins : I use Macintosh but indeed when I save the pdf file of >1Go with The native application on Mac, it shrinks back to 10Mo, the « >normal » size. >I will install Podofo browser and study it. > >@zyx : I will study what you explained me. > >As soon as I find an answer to my problem, I will share it to you. > >Christophe > > >Le 7 mai 2014 à 07:59, podofo-users-requ...@lists.sourceforge.net a écrit >: > >> Send Podofo-users mailing list submissions to >> podofo-users@lists.sourceforge.net >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/podofo-users >> or, via email, send a message with subject or body 'help' to >> podofo-users-requ...@lists.sourceforge.net >> >> You can reach the person managing the list at >> podofo-users-ow...@lists.sourceforge.net >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Podofo-users digest..." >> >> >> Today's Topics: >> >> 1. Re: How to reduce Pdf size (zyx) >> 2. PdfWriter as a base class (Ilan Zisser) >> 3. Re: How to reduce Pdf size (Leonard Rosenthol) >> 4. Re: How to reduce Pdf size (Leonard Rosenthol) >> 5. Splicing PDFs with AcroForms, NeedsAppearances, mysterious >> file size shrinkage, Adobe Reader behavior (Dennis Jenkins) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 05 May 2014 09:07:19 +0200 >> From: zyx <z...@litepdf.cz> >> Subject: Re: [Podofo-users] How to reduce Pdf size >> To: podofo-users@lists.sourceforge.net >> Message-ID: <1399273639.1876.9.camel@zyxPad> >> Content-Type: text/plain; charset="UTF-8" >> >> On Sun, 2014-05-04 at 14:13 +0200, Christophe Meyer wrote: >>> I am developping a simple software. As a first basis, I would like to >>> duplicate a pdf file that has been created by my printer (a scanned >>> document). It consists of a file of 40 pages. It weights 10 Mb and >>> each page of the pdf document is an image of a scanned document?s >>> page. >>> >>> In my program, I am just copying each page from the original pdf (I >>> load it in a PdfMemDocument) and then inserts it in another >>> PdfMemDocument with InsertPage. >>> >>> I just do a Write() at the end. >>> >>> The file created at the end weighs more than 500 MB!! >> >> Hi, >> check the documentation and comments around the functions you use for >> the page insertion. The PoDoFo doesn't merge resources, thus whenever >> you add single page to a new document it copies whole document (or >> "only" all resources, I dot recall precisely) to the new file, thus >> nothing is missing when the inserted page is drawn. >> >> I suggest to copy all pages in the destination document at once, convert >> each into an XObject, then delete them all and reorder them as you wish, >> by drawing the XObject into the new page (you can even shrink it and so >> on). This way you'll not duplicate the resources (by the way, inner >> images are also resources, which explains the size increase). >> Bye, >> zyx >> >> -- >> http://www.litePDF.cz i...@litepdf.cz >> >> >> >> >> ------------------------------ >> >> Message: 2 >> Date: Mon, 05 May 2014 10:20:39 +0300 >> From: Ilan Zisser <ilanzis...@gmail.com> >> Subject: [Podofo-users] PdfWriter as a base class >> To: podofo-users@lists.sourceforge.net >> Message-ID: <53673bc7.4070...@gmail.com> >> Content-Type: text/plain; charset="windows-1255" >> >> An HTML attachment was scrubbed... >> -------------- next part -------------- >> Index: PdfWriter.h >> =================================================================== >> --- PdfWriter.h (revision 1598) >> +++ PdfWriter.h (working copy) >> @@ -100,7 +100,7 @@ >> * >> * \param pDevice write to the specified device >> */ >> - void Write( PdfOutputDevice* pDevice ); >> + virtual void Write( PdfOutputDevice* pDevice ); >> >> /** Set the write mode to use when writing the PDF. >> * \param eWriteMode write mode >> @@ -192,7 +192,7 @@ >> * \param bPrevEntry if true a prev entry is added to the trailer >>object with a value of 0 >> * \param bOnlySizeKey write only the size key >> */ >> - void FillTrailerObject( PdfObject* pTrailer, pdf_long lSize, bool >>bPrevEntry, bool bOnlySizeKey ) const; >> + virtual void FillTrailerObject( PdfObject* pTrailer, pdf_long >>lSize, bool bPrevEntry, bool bOnlySizeKey ) const; >> >> protected: >> /** >> @@ -202,15 +202,16 @@ >> >> /** Writes the pdf header to the current file. >> * \param pDevice write to this output device >> - */ >> - void PODOFO_LOCAL WritePdfHeader( PdfOutputDevice* pDevice ); >> + */ >> >> + virtual void PODOFO_LOCAL WritePdfHeader( PdfOutputDevice* pDevice >>); >> + >> /** Write pdf objects to file >> * \param pDevice write to this output device >> * \param vecObjects write all objects in this vector to the file >> * \param pXref add all written objects to this XRefTable >> - */ >> - void WritePdfObjects( PdfOutputDevice* pDevice, const >>PdfVecObjects& vecObjects, PdfXRef* pXref ) PODOFO_LOCAL; >> + */ >> + virtual void WritePdfObjects( PdfOutputDevice* pDevice, const >>PdfVecObjects& vecObjects, PdfXRef* pXref ) PODOFO_LOCAL; >> >> /** Creates a file identifier which is required in several >> * PDF workflows. >> >> ------------------------------ >> >> Message: 3 >> Date: Mon, 5 May 2014 11:32:34 +0000 >> From: Leonard Rosenthol <lrose...@adobe.com> >> Subject: Re: [Podofo-users] How to reduce Pdf size >> To: Dennis Jenkins <dennis.jenkins...@gmail.com>, >> "podofo-users@lists.sourceforge.net" >> <podofo-users@lists.sourceforge.net> >> Message-ID: <cf8cee9e.5aae6%lrose...@adobe.com> >> Content-Type: text/plain; charset="iso-8859-1" >> >> That message from Reader means that the file is damaged in some way so >>that Reader had to repair it when it opened it. Something you are doing >>in the editing/modification process is creating an invalid PDF. And >>yes, in that case, it does a (full) save. >> >> Leonard >> >> From: Dennis Jenkins >><dennis.jenkins...@gmail.com<mailto:dennis.jenkins...@gmail.com>> >> Date: Monday, May 5, 2014 at 12:03 AM >> To: >>"podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge >>.net>" >><podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge >>.net>> >> Subject: Re: [Podofo-users] How to reduce Pdf size >> >> >> >> >> On Sun, May 4, 2014 at 5:06 PM, Leonard Rosenthol >><lrose...@adobe.com<mailto:lrose...@adobe.com>> wrote: >> Adobe Reader doesn't re-save PDFs - so perhaps you mean Adobe Acrobat?? >> >> Leonard >> >> >> Hello Leonard, >> >> I do mean "Adobe Reader XI" on 32-bit Windows XP. I'm not editing >>the PDF, exactly. Let me provide an example. >> >> The IRS provides a four-page PDF for the "941" report, and a second >>report (addenda) called the "Schedule B". Only two pages of the 941 >>have actual "pdf form" data. >> >> My program will create a new (empty) PDF, open the 941, splice in two >>of the four pages, splice in the Schedule B (if needed), and then fill >>in the form fields with the proper data. I must also embed another font >>and create an appearance stream (you helped me with this logic a few >>years ago). The software will then save the PDF. >> >> If I open this PDF in Adobe Reader, it looked correct (form fields >>are filled in). However, if I attempt to exit/close Adobe Reader, it >>prompts me "Do you want to save changes to XXX.pdf before closing?" >>(even if I changed nothing while Adobe Reader was open). If I decline, >>then Adobe Reader exits and nothing special happens. If I elect to >>"save my changes", then the resulting PDF on disk is smaller then the >>original, a new top-level section called "/Metadata" is created, and the >>"/Acroform" is altered. I have yet to determine what gets removed from >>the PDF that makes it smaller, but I suspect that it is the font that I >>had to add earlier. If I don't add that font, then the fields that I >>filled in are not visible in Adobe Reader unless the individual field is >>selected by the user (input focus). >> >> I can repeat the above with the other forms that my software will >>populate (Arizona A1-QRT and Arizona UC-018). >> >> (Federal 941 report, file size difference is not much) >> $ ls -l ./tmp/report*.pdf >> -rw-r--r-- 1 djenkins djenkins 654315 May 4 22:59 ./tmp/report.pdf >> -rw-r--r-- 1 djenkins djenkins 606551 May 4 23:00 ./tmp/report2.pdf >> >> (AZ UC-018 report, size difference is significant) >> $ ls -l ./tmp/report*.pdf >> -rw-r--r-- 1 djenkins djenkins 415754 May 4 23:01 ./tmp/report.pdf >> -rw-r--r-- 1 djenkins djenkins 206989 May 4 23:01 ./tmp/report2.pdf >> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> Message: 4 >> Date: Mon, 5 May 2014 11:34:05 +0000 >> From: Leonard Rosenthol <lrose...@adobe.com> >> Subject: Re: [Podofo-users] How to reduce Pdf size >> To: zyx <z...@litepdf.cz>, "podofo-users@lists.sourceforge.net" >> <podofo-users@lists.sourceforge.net> >> Message-ID: <cf8cef0d.5aaec%lrose...@adobe.com> >> Content-Type: text/plain; charset="iso-8859-1" >> >> It?s been a long time since I looked at the page copying code in PoDoFo, >> but it should only be copying those resources referenced by the page in >> question. Of course, if those resources are shared across pages - and >>you >> copy multiple pages - you get multiple copies (since they are no longer >> shared when copied page by page). >> >> Even better than suggested below is to start with the larger document, >>add >> your smaller document to it, and then delete. >> >> Leonard >> >> On 5/5/14, 3:07 AM, "zyx" <z...@litepdf.cz> wrote: >> >>> On Sun, 2014-05-04 at 14:13 +0200, Christophe Meyer wrote: >>>> I am developping a simple software. As a first basis, I would like to >>>> duplicate a pdf file that has been created by my printer (a scanned >>>> document). It consists of a file of 40 pages. It weights 10 Mb and >>>> each page of the pdf document is an image of a scanned document?s >>>> page. >>>> >>>> In my program, I am just copying each page from the original pdf (I >>>> load it in a PdfMemDocument) and then inserts it in another >>>> PdfMemDocument with InsertPage. >>>> >>>> I just do a Write() at the end. >>>> >>>> The file created at the end weighs more than 500 MB!! >>> >>> Hi, >>> check the documentation and comments around the functions you use for >>> the page insertion. The PoDoFo doesn't merge resources, thus whenever >>> you add single page to a new document it copies whole document (or >>> "only" all resources, I dot recall precisely) to the new file, thus >>> nothing is missing when the inserted page is drawn. >>> >>> I suggest to copy all pages in the destination document at once, >>>convert >>> each into an XObject, then delete them all and reorder them as you >>>wish, >>> by drawing the XObject into the new page (you can even shrink it and so >>> on). This way you'll not duplicate the resources (by the way, inner >>> images are also resources, which explains the size increase). >>> Bye, >>> zyx >>> >>> -- >>> http://www.litePDF.cz i...@litepdf.cz >>> >>> >>> >>>------------------------------------------------------------------------ >>>-- >>> ---- >>> Is your legacy SCM system holding you back? Join Perforce May 7 to find >>> out: >>> • 3 signs your SCM is hindering your productivity >>> • Requirements for releasing software faster >>> • Expert tips and advice for migrating your SCM now >>> http://p.sf.net/sfu/perforce >>> _______________________________________________ >>> Podofo-users mailing list >>> Podofo-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/podofo-users >> >> >> >> >> ------------------------------ >> >> Message: 5 >> Date: Wed, 7 May 2014 00:59:13 -0500 >> From: Dennis Jenkins <dennis.jenkins...@gmail.com> >> Subject: [Podofo-users] Splicing PDFs with AcroForms, >> NeedsAppearances, mysterious file size shrinkage, Adobe Reader >> behavior >> To: "podofo-users@lists.sourceforge.net" >> <podofo-users@lists.sourceforge.net> >> Message-ID: >> <CAAEzAp9Rfd1=zQeaja7m8VNz68++cpwWicokHVwhRL=sclp...@mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> Hello all (but mostly directed to Leonard), >> >> A few days ago I described [1] some odd behavior that I am having with >> Adobe Reader consuming PDFs generated by my project. To avoid hijacking >> Christophe's original thread, I am starting a new one. >> >> At a high-level, my goal is to use PoDoFo to splice together pages from >> various PDFs which are US tax forms, fill in the data, save the >>resulting >> PDF and have the filled-in form fields "just work" in Adobe Reader (eg, >>be >> visible and still editable) and have Adobe Reader NOT prompt the user to >> save the file when the user attempts to exit. Secondly, I noticed that >>if >> I allow Adobe Reader to save the PDF, it shrinks in half (sometimes). I >> want to know why, so that I can optimize the size of my PDFs without >> needing Adobe Reader (my code runs on Linux as part of a web service). >> >> Leonard suggested that my PDF is malformed and that Adobe Reader is >> offering to repair/save it in this case. After much experimentation and >> staring at "podofobrowser" and "podofopdfinfo diffs" of the pre- and >>post- >> PDFs, I am not 100% convinced that this is the case. >> >> In my code, I must set the "NeedsApperances" dictionary element of the >> "/AcroForm" to "true", or my fields will not be visible in Adobe >>Reader. I >> then need to populate the appearance stream, per section 12.7.3.3 of ISO >> 32000:2008 (herein referred to as "the spec"). When Adobe Reader saves >>my >> PDF, this dictionary key disappears, and every field element gains a key >> called "AP", with a child key of "N". This is discussed in 12.7.3.3 of >>the >> spec on page #435, first complete paragraph. >> >> If I omit adding the key for "NeedsApperances" to the AcroForm, Adobe >> Reader will no longer offer to save my PDF, but my field values are no >> longer visible. Therefore, I suspect that Adobe wants to save the PDF >>in >> order to apply/generate the per-field appearance stream. >> >> QUESTION 1: Is the above hypothesis valid? >> >> I generate my PDFs by creating an empty PDF in memory, and "inserting" >> pages from other PDFs. This results in a PDF with no "Fields" in the >> "/AcroForm/Fields" array. Adobe Reader populates the "Fields" array >>when >> it saves the PDF. However, the count of elements in the "Fields" array >> does not match the actual count of fields. For example, Adobe Reader >> places 176 elements into this array, but when I enumerate all fields on >>all >> pages using the PoDoFo API (with my patch to handle inherited fields), I >> count 212. I have not completed an exhaustive comparison of the >>"Fields" >> arrays yet to determine if the discrepancy is due to the inherited form >> fields (typically check boxes) or not. I wrote a routine to populate >>the >> "Fields" array myself (with all 212 items), but Adobe Reader rebuilds it >> with on 176 items. If I do not set the "NeedsApperances" flag, Adobe >> Reader never offers to save the PDF on exit, so this array is not >>rebuilt >> in this case. >> >> QUESTION 2: How does Adobe Reader determine which fields need to be in >>the >> "/AcroForm/Fields" array? >> >> Adobe Reader seems to not care that the "/AcroForm" is missing (its >> presence or absence does not affect when Adobe Reader offers to save the >> form). Yet section 12.7.2 of the spec states that the "/AcroForm" is >> required. >> >> QUESTION 3: How do we reconcile section 12.7.2 with Adobe Reader's >> behavior? Which is "correct" (or did I misunderstand the ISO)? >> >> The content of the "Fields -> element -> AP -> N" key is an >> "/XObject". The data stream created by Adobe Reader for it looks >> complicated. >> >> QUESTION 4: Assuming the answer to Question #1 is "yes", Do you have any >> suggestions on how I can compute the required XObject in code? I just >>want >> to check a checkbox or place simple text into a text field. >> >> When Adobe Reader does save the PDF, and depending on which source >> form(s) are in it, the resulting PDF might shrink in size considerably. >> A >> cursory look with podofobrowser shows that Adobe Reader has heavily >> modified "Pages -> Kids[page] -> Contents[]". In my current testing >>PDF, >> the original has one element in page #0 Contents, with a compressed >>length >> of 20443. Adobe Reader's version has 8 array elements, each with >> approximately 2K of compressed XObject data. >> >> QUESTION 5: Why does Adobe Reader tinker with this part of a PDF when >> saving it? Ok, that was rhetorical - I assume that it does so so the >>the >> file will be smaller, and it also sets the "linearized" flag. The >>question >> should be stated: What rules does Adobe Reader follow when deciding >>if/how >> to refactor the actual page layout. >> >> QUESTION 6: Why does refactoring the XObject components make the file so >> much smaller (200K vs 450K for example). >> >> In some cases, the file size savings are significant. If I knew what >> rules Adobe Reader followed, I might attempt to write a routine to apply >> the same changes using PoDoFo (and share it with the community). >> >> Thank you for your time. >> >> [1] http://sourceforge.net/p/podofo/mailman/message/32302847/ >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> >>------------------------------------------------------------------------- >>----- >> Is your legacy SCM system holding you back? Join Perforce May 7 to find >>out: >> • 3 signs your SCM is hindering your productivity >> • Requirements for releasing software faster >> • Expert tips and advice for migrating your SCM now >> http://p.sf.net/sfu/perforce >> >> ------------------------------ >> >> _______________________________________________ >> Podofo-users mailing list >> Podofo-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/podofo-users >> >> >> End of Podofo-users Digest, Vol 95, Issue 3 >> ******************************************* > > >-------------------------------------------------------------------------- >---- >Is your legacy SCM system holding you back? Join Perforce May 7 to find >out: >• 3 signs your SCM is hindering your productivity >• Requirements for releasing software faster >• Expert tips and advice for migrating your SCM now >http://p.sf.net/sfu/perforce >_______________________________________________ >Podofo-users mailing list >Podofo-users@lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/podofo-users ------------------------------------------------------------------------------ Is your legacy SCM system holding you back? Join Perforce May 7 to find out: • 3 signs your SCM is hindering your productivity • Requirements for releasing software faster • Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce _______________________________________________ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users