Technical Issue Report: Intermittent Data Loss and Font Missing Errors during PDF Merging

Mruthyunjaya s Mon, 02 Feb 2026 01:23:07 -0800


     *Technical Issue Report: Intermittent Data Loss and Font Missing
     Errors during PDF Merging*


*Environment:*

 *

   *Library:* PDFBox 2.0.33

 *

   *Operating System:* Windows Server 2019

 *

   *Application Context:* Windchill MethodServer

 *

   *Affected Fonts:* Japanese fonts, specifically *MS Gothic* (Subsetted)

*Issue Summary:* We are experiencing a high failure rate (*80% failure*)when merging technical CAD drawings into a single PDF package. Eventhough the source PDFs have fonts embedded, the generated "Merged PDF"frequently suffers from missing fonts or garbled text on specific pages.


*Key Observations:*

 *

   *Reproducibility:* In a test of 10 consecutive merge attempts, the
   output was correct only *2 times*, while the remaining *8 attempts*
   resulted in missing fonts.

 *

   *Specific Error:* The issue is almost exclusively linked to Japanese
   *MS Gothic* variants (e.g., |ACWDKT+MS Gothic|).

 *

   *Error Logs:* We frequently encounter |Format 14 cmap table is not
   supported| and |Format 12 cmap contains an invalid glyph index|
   warnings during the process.

 *

   *Client Behavior:* Adobe Acrobat fails to render the text,
   displaying the error: /"Cannot extract the embedded font 'ACWDKT+MS
   Gothic'. Some characters may not display or print correctly."/.

*Suspected Causes:*

1.

   *I/O Race Condition:* We intermittently receive
   |java.io.IOException: Missing root object specification in trailer|,
   suggesting the merger may be accessing files before they are fully
   flushed to disk or while they are still locked by the external
   converter.

2.

   *Resource Clashing:* We suspect the |PDFMergerUtility| may be
   clashing font resource aliases (like |/F1|) across different
   subsetted drawings, leading to corrupted Character Maps (CMaps) in
   the final document.

*Current Code Implementation:* We have attempted to fix this byimplementing a *Targeted Healing Pass*. We re-open the merged PDF, scanfor corrupted subsets, and re-embed a full English *Century Gothic* fontusing |PDResources.put()| and |PDPageContentStream.AppendMode.APPEND|.Despite this, the inconsistency persists



1.

   Is there a built-in mechanism in |PDFMergerUtility| to "flatten" or
   deduplicate subsetted fonts during the merge to prevent CMap clashing?

2.

   Given that Format 14/12 warnings are logged but don't throw
   exceptions, is there a recommended way to programmatically detect
   this "data loss" state before the file is saved?

3.

   Are there known issues with |setupTempFileOnly()| vs
   |setupMainMemoryOnly()| when dealing with large, complex vector
   drawings that might contribute to trailer parsing failures?



=======================Code For merge i had used=====================

private void mergeUsingPDFBox(List<String> pdfFiles, String outputFile)throws IOException {

        PDFMergerUtility merger = new PDFMergerUtility();
        merger.setDestinationFileName(outputFile);

        for (String file : pdfFiles) {
            merger.addSource(new File(file));
        }

merger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
    }

=====================================================

===================== I had tryed for the missing embed fonts to fix thereemmbed Bu it have the still issue.===============================

private void mergeUsingPDFBox(List<String> pdfFiles, String outputPath)throws IOException { org.apache.pdfbox.multipdf.PDFMergerUtility merger = neworg.apache.pdfbox.multipdf.PDFMergerUtility();

        merger.setDestinationFileName(outputPath);

        System.out.println("\n[PHASE 1] Initial PDF Merging...");

        for (String filePath : pdfFiles) {
            File sourceFile = new File(filePath);

// LOGIC: Prevent "Missing root object specification intrailer" error // This happens if we try to merge a file that is still0-bytes (locked by converter)

            if (sourceFile.exists() && sourceFile.length() > 100) {
                merger.addSource(sourceFile);

System.out.println(" --> Added to merge queue: " +sourceFile.getName() + " [" + sourceFile.length() + " bytes]");

            } else {

System.out.println(" WARN: Skipping empty/invalid file(might be locked by conversion): " + filePath);

            }
        }

        // Execute merge using Main Memory to protect Windchill server heap
merger.mergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting.setupMainMemoryOnly());

System.out.println("[PHASE 2] Merge Saved to Disk. StartingCMap Corruption Scan...");


        // RE-OPEN the result to identify and heal Japanese encoding issues

try (PDDocument mergedDoc = PDDocument.load(newFile(outputPath))) {

// LOGIC: Remove security. Modified content streams areblocked if owner password exists.

            if (mergedDoc.isEncrypted()) {

System.out.println("DEBUG: Removing encryption to allowfont re-embedding...");

                mergedDoc.setAllSecurityToBeRemoved(true);
            }

// Load replacement font ONCE per document loop for memoryefficiency

            String fontPath = getFontFileFor("GOTHIC");

try (InputStream fontStream = new FileInputStream(newFile(fontPath))) {

// Load FULL font (no subsetting) to ensure all Englishcharacter indices exist PDType0Font englishFont = PDType0Font.load(mergedDoc,fontStream, false);


                int repairCount = 0;
                for (int i = 0; i < mergedDoc.getNumberOfPages(); i++) {
                    PDPage page = mergedDoc.getPage(i);
                    PDResources res = page.getResources();
                    if (res == null) continue;

                    boolean pageHasError = false;

// STEP A: Detect CMap Corruption (Format 12/14warnings)

                    for (COSName fontAlias : res.getFontNames()) {
                        PDFont font = res.getFont(fontAlias);

// Use our helper to force a check of theinternal font mapping

                        if (isFontCorrupted(font)) {

System.out.println(" ALERT: Page " + (i +1) + " contains corrupted Japanese CMap/Subsets. Repairing...");

                            pageHasError = true;
                            break;
                        }
                    }

                    // STEP B: Heal the problematic page
                    if (pageHasError) {
                        for (COSName fontAlias : res.getFontNames()) {

String name =res.getFont(fontAlias).getName().toUpperCase();

// LOGIC: Target Gothic subsets (+) thatfailed the validation check if (name.contains("GOTHIC") ||name.contains("+") || name.contains("MS-")) {

                                res.put(fontAlias, englishFont);
                                repairCount++;
                            }
                        }

                        // LOGIC: Re-render the operational stream

// Adding a space forces the PDF viewer toreload the character map using our new font try (PDPageContentStream cs = newPDPageContentStream(mergedDoc, page,

PDPageContentStream.AppendMode.APPEND, true, true)) {
                            cs.beginText();
                            cs.setFont(englishFont, 1);
                            cs.newLineAtOffset(0, 0);
                            cs.showText(" ");
                            cs.endText();
                        }
                    }
                }

System.out.println("INFO : Total font mappings repairedduring final pass: " + repairCount);

// Final Save: Overwrite the merged file with thehigh-fidelity English version

            mergedDoc.save(outputPath);

System.out.println("[PHASE 3] Final healing pass complete.Output verified.");


        } catch (Exception e) {

System.err.println("CRITICAL ERROR: Failed to heal themerged PDF: " + e.getMessage());

            e.printStackTrace();
        }

System.out.println(">>> SUCCESS! High-Fidelity PDF saved to: "+ outputPath + "\n");

    }

=================================================================

--

*Logs Captured during Merge:* Our internal diagnostic tools show thefollowing warnings from FontBox during the failing merges:


 *

   |org.apache.fontbox.ttf.CmapSubtable: Format 14 cmap table is not
   supported and will be ignored|

 *

   |org.apache.fontbox.ttf.CmapSubtable: Format 12 cmap contains an
   invalid glyph index|

*Questions for the Community:*

1.

   Why would the merger intermittently corrupt the CMap of a subsetted
   font that is already valid in the source document?

2.

   Is there a way to force |PDFMergerUtility| to *not* rename font
   subsets during merging, as we suspect alias clashing is causing the
   80% failure rate?

3.

   Is there a more reliable way to "flatten" these Japanese fonts
   during the merge process to ensure 100% rendering success?

* How can we reslove this issue ,  please help us*.

**


**

Thanks and Regards
Mruthyunjaya S

Sumedhas Tech Solutions Pvt. Ltd. <https://www.sumedhastech.com>


--
Untitled Document

**

Thanks and Regards
Mruthyunjaya S

Sumedhas Tech Solutions Pvt. Ltd. <https://www.sumedhastech.com>

Mobile: +91 96202 21314
Office: +91 80 4865 5598
Website: www.sumedhastech.com <https://www.sumedhastech.com>

Channel Partner | Regional Service Integration Partner | Partner of theYear-FY22


P Please consider the environment before printing this email
----------

This e-mail may contain confidential and/or privileged information. Ifyou are not the intended recipient (or have received this e-mail inerror) please notify the sender immediately and destroy this e-mail. Anyunauthorized copying, disclosure or distribution of the material in thise-mail is strictly forbidden.

Technical Issue Report: Intermittent Data Loss and Font Missing Errors during PDF Merging

Reply via email to