*Technical Issue Report: Intermittent Data Loss and Font Missing
     Errors during PDF Merging*

*Environment:*

 *

   *Library:* PDFBox 2.0.33

 *

   *Operating System:* Windows Server 2019

 *

   *Application Context:* Windchill MethodServer

 *

   *Affected Fonts:* Japanese fonts, specifically *MS Gothic* (Subsetted)


*Issue Summary:* We are experiencing a high failure rate (*80% failure*) when merging technical CAD drawings into a single PDF package. Even though the source PDFs have fonts embedded, the generated "Merged PDF" frequently suffers from missing fonts or garbled text on specific pages.

*Key Observations:*

 *

   *Reproducibility:* In a test of 10 consecutive merge attempts, the
   output was correct only *2 times*, while the remaining *8 attempts*
   resulted in missing fonts.

 *

   *Specific Error:* The issue is almost exclusively linked to Japanese
   *MS Gothic* variants (e.g., |ACWDKT+MS Gothic|).

 *

   *Error Logs:* We frequently encounter |Format 14 cmap table is not
   supported| and |Format 12 cmap contains an invalid glyph index|
   warnings during the process.

 *

   *Client Behavior:* Adobe Acrobat fails to render the text,
   displaying the error: /"Cannot extract the embedded font 'ACWDKT+MS
   Gothic'. Some characters may not display or print correctly."/.

*Suspected Causes:*

1.

   *I/O Race Condition:* We intermittently receive
   |java.io.IOException: Missing root object specification in trailer|,
   suggesting the merger may be accessing files before they are fully
   flushed to disk or while they are still locked by the external
   converter.

2.

   *Resource Clashing:* We suspect the |PDFMergerUtility| may be
   clashing font resource aliases (like |/F1|) across different
   subsetted drawings, leading to corrupted Character Maps (CMaps) in
   the final document.

*Current Code Implementation:* We have attempted to fix this by implementing a *Targeted Healing Pass*. We re-open the merged PDF, scan for corrupted subsets, and re-embed a full English *Century Gothic* font using |PDResources.put()| and |PDPageContentStream.AppendMode.APPEND|. Despite this, the inconsistency persists


1.

   Is there a built-in mechanism in |PDFMergerUtility| to "flatten" or
   deduplicate subsetted fonts during the merge to prevent CMap clashing?

2.

   Given that Format 14/12 warnings are logged but don't throw
   exceptions, is there a recommended way to programmatically detect
   this "data loss" state before the file is saved?

3.

   Are there known issues with |setupTempFileOnly()| vs
   |setupMainMemoryOnly()| when dealing with large, complex vector
   drawings that might contribute to trailer parsing failures?



=======================Code For merge i had used=====================


private void mergeUsingPDFBox(List<String> pdfFiles, String outputFile) throws IOException {
        PDFMergerUtility merger = new PDFMergerUtility();
        merger.setDestinationFileName(outputFile);

        for (String file : pdfFiles) {
            merger.addSource(new File(file));
        }

merger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
    }

=====================================================




===================== I had tryed for the missing embed fonts to fix the reemmbed Bu it have the still issue.===============================

private void mergeUsingPDFBox(List<String> pdfFiles, String outputPath) throws IOException {         org.apache.pdfbox.multipdf.PDFMergerUtility merger = new org.apache.pdfbox.multipdf.PDFMergerUtility();
        merger.setDestinationFileName(outputPath);

        System.out.println("\n[PHASE 1] Initial PDF Merging...");

        for (String filePath : pdfFiles) {
            File sourceFile = new File(filePath);

            // LOGIC: Prevent "Missing root object specification in trailer" error             // This happens if we try to merge a file that is still 0-bytes (locked by converter)
            if (sourceFile.exists() && sourceFile.length() > 100) {
                merger.addSource(sourceFile);
                System.out.println("  --> Added to merge queue: " + sourceFile.getName() + " [" + sourceFile.length() + " bytes]");
            } else {
                System.out.println("  WARN: Skipping empty/invalid file (might be locked by conversion): " + filePath);
            }
        }

        // Execute merge using Main Memory to protect Windchill server heap
merger.mergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting.setupMainMemoryOnly());
        System.out.println("[PHASE 2] Merge Saved to Disk. Starting CMap Corruption Scan...");

        // RE-OPEN the result to identify and heal Japanese encoding issues
        try (PDDocument mergedDoc = PDDocument.load(new File(outputPath))) {

            // LOGIC: Remove security. Modified content streams are blocked if owner password exists.
            if (mergedDoc.isEncrypted()) {
                System.out.println("DEBUG: Removing encryption to allow font re-embedding...");
                mergedDoc.setAllSecurityToBeRemoved(true);
            }

            // Load replacement font ONCE per document loop for memory efficiency
            String fontPath = getFontFileFor("GOTHIC");
            try (InputStream fontStream = new FileInputStream(new File(fontPath))) {

                // Load FULL font (no subsetting) to ensure all English character indices exist                 PDType0Font englishFont = PDType0Font.load(mergedDoc, fontStream, false);

                int repairCount = 0;
                for (int i = 0; i < mergedDoc.getNumberOfPages(); i++) {
                    PDPage page = mergedDoc.getPage(i);
                    PDResources res = page.getResources();
                    if (res == null) continue;

                    boolean pageHasError = false;

                    // STEP A: Detect CMap Corruption (Format 12/14 warnings)
                    for (COSName fontAlias : res.getFontNames()) {
                        PDFont font = res.getFont(fontAlias);

                        // Use our helper to force a check of the internal font mapping
                        if (isFontCorrupted(font)) {
                            System.out.println("  ALERT: Page " + (i + 1) + " contains corrupted Japanese CMap/Subsets. Repairing...");
                            pageHasError = true;
                            break;
                        }
                    }

                    // STEP B: Heal the problematic page
                    if (pageHasError) {
                        for (COSName fontAlias : res.getFontNames()) {
                            String name = res.getFont(fontAlias).getName().toUpperCase();

                            // LOGIC: Target Gothic subsets (+) that failed the validation check                             if (name.contains("GOTHIC") || name.contains("+") || name.contains("MS-")) {
                                res.put(fontAlias, englishFont);
                                repairCount++;
                            }
                        }

                        // LOGIC: Re-render the operational stream
                        // Adding a space forces the PDF viewer to reload the character map using our new font                         try (PDPageContentStream cs = new PDPageContentStream(mergedDoc, page,
PDPageContentStream.AppendMode.APPEND, true, true)) {
                            cs.beginText();
                            cs.setFont(englishFont, 1);
                            cs.newLineAtOffset(0, 0);
                            cs.showText(" ");
                            cs.endText();
                        }
                    }
                }
                System.out.println("INFO : Total font mappings repaired during final pass: " + repairCount);
            }

            // Final Save: Overwrite the merged file with the high-fidelity English version
            mergedDoc.save(outputPath);
            System.out.println("[PHASE 3] Final healing pass complete. Output verified.");

        } catch (Exception e) {
            System.err.println("CRITICAL ERROR: Failed to heal the merged PDF: " + e.getMessage());
            e.printStackTrace();
        }

        System.out.println(">>> SUCCESS! High-Fidelity PDF saved to: " + outputPath + "\n");
    }

=================================================================

--

*Logs Captured during Merge:* Our internal diagnostic tools show the following warnings from FontBox during the failing merges:

 *

   |org.apache.fontbox.ttf.CmapSubtable: Format 14 cmap table is not
   supported and will be ignored|

 *

   |org.apache.fontbox.ttf.CmapSubtable: Format 12 cmap contains an
   invalid glyph index|

*Questions for the Community:*

1.

   Why would the merger intermittently corrupt the CMap of a subsetted
   font that is already valid in the source document?

2.

   Is there a way to force |PDFMergerUtility| to *not* rename font
   subsets during merging, as we suspect alias clashing is causing the
   80% failure rate?

3.

   Is there a more reliable way to "flatten" these Japanese fonts
   during the merge process to ensure 100% rendering success?

* How can we reslove this issue ,  please help us*.

**


**

Thanks and Regards
Mruthyunjaya S

Sumedhas Tech Solutions Pvt. Ltd. <https://www.sumedhastech.com>


--
Untitled Document

**

Thanks and Regards
Mruthyunjaya S

Sumedhas Tech Solutions Pvt. Ltd. <https://www.sumedhastech.com>

Mobile: +91 96202 21314
Office: +91 80 4865 5598
Website: www.sumedhastech.com <https://www.sumedhastech.com>

Channel Partner | Regional Service Integration Partner | Partner of the Year-FY22

P Please consider the environment before printing this email
----------

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Reply via email to