*Technical Issue Report: Intermittent Data Loss and Font Missing
Errors during PDF Merging*
*Environment:*
*
*Library:* PDFBox 2.0.33
*
*Operating System:* Windows Server 2019
*
*Application Context:* Windchill MethodServer
*
*Affected Fonts:* Japanese fonts, specifically *MS Gothic* (Subsetted)
*Issue Summary:* We are experiencing a high failure rate (*80% failure*)
when merging technical CAD drawings into a single PDF package. Even
though the source PDFs have fonts embedded, the generated "Merged PDF"
frequently suffers from missing fonts or garbled text on specific pages.
*Key Observations:*
*
*Reproducibility:* In a test of 10 consecutive merge attempts, the
output was correct only *2 times*, while the remaining *8 attempts*
resulted in missing fonts.
*
*Specific Error:* The issue is almost exclusively linked to Japanese
*MS Gothic* variants (e.g., |ACWDKT+MS Gothic|).
*
*Error Logs:* We frequently encounter |Format 14 cmap table is not
supported| and |Format 12 cmap contains an invalid glyph index|
warnings during the process.
*
*Client Behavior:* Adobe Acrobat fails to render the text,
displaying the error: /"Cannot extract the embedded font 'ACWDKT+MS
Gothic'. Some characters may not display or print correctly."/.
*Suspected Causes:*
1.
*I/O Race Condition:* We intermittently receive
|java.io.IOException: Missing root object specification in trailer|,
suggesting the merger may be accessing files before they are fully
flushed to disk or while they are still locked by the external
converter.
2.
*Resource Clashing:* We suspect the |PDFMergerUtility| may be
clashing font resource aliases (like |/F1|) across different
subsetted drawings, leading to corrupted Character Maps (CMaps) in
the final document.
*Current Code Implementation:* We have attempted to fix this by
implementing a *Targeted Healing Pass*. We re-open the merged PDF, scan
for corrupted subsets, and re-embed a full English *Century Gothic* font
using |PDResources.put()| and |PDPageContentStream.AppendMode.APPEND|.
Despite this, the inconsistency persists
1.
Is there a built-in mechanism in |PDFMergerUtility| to "flatten" or
deduplicate subsetted fonts during the merge to prevent CMap clashing?
2.
Given that Format 14/12 warnings are logged but don't throw
exceptions, is there a recommended way to programmatically detect
this "data loss" state before the file is saved?
3.
Are there known issues with |setupTempFileOnly()| vs
|setupMainMemoryOnly()| when dealing with large, complex vector
drawings that might contribute to trailer parsing failures?
=======================Code For merge i had used=====================
private void mergeUsingPDFBox(List<String> pdfFiles, String outputFile)
throws IOException {
PDFMergerUtility merger = new PDFMergerUtility();
merger.setDestinationFileName(outputFile);
for (String file : pdfFiles) {
merger.addSource(new File(file));
}
merger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
}
=====================================================
===================== I had tryed for the missing embed fonts to fix the
reemmbed Bu it have the still issue.===============================
private void mergeUsingPDFBox(List<String> pdfFiles, String outputPath)
throws IOException {
org.apache.pdfbox.multipdf.PDFMergerUtility merger = new
org.apache.pdfbox.multipdf.PDFMergerUtility();
merger.setDestinationFileName(outputPath);
System.out.println("\n[PHASE 1] Initial PDF Merging...");
for (String filePath : pdfFiles) {
File sourceFile = new File(filePath);
// LOGIC: Prevent "Missing root object specification in
trailer" error
// This happens if we try to merge a file that is still
0-bytes (locked by converter)
if (sourceFile.exists() && sourceFile.length() > 100) {
merger.addSource(sourceFile);
System.out.println(" --> Added to merge queue: " +
sourceFile.getName() + " [" + sourceFile.length() + " bytes]");
} else {
System.out.println(" WARN: Skipping empty/invalid file
(might be locked by conversion): " + filePath);
}
}
// Execute merge using Main Memory to protect Windchill server heap
merger.mergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting.setupMainMemoryOnly());
System.out.println("[PHASE 2] Merge Saved to Disk. Starting
CMap Corruption Scan...");
// RE-OPEN the result to identify and heal Japanese encoding issues
try (PDDocument mergedDoc = PDDocument.load(new
File(outputPath))) {
// LOGIC: Remove security. Modified content streams are
blocked if owner password exists.
if (mergedDoc.isEncrypted()) {
System.out.println("DEBUG: Removing encryption to allow
font re-embedding...");
mergedDoc.setAllSecurityToBeRemoved(true);
}
// Load replacement font ONCE per document loop for memory
efficiency
String fontPath = getFontFileFor("GOTHIC");
try (InputStream fontStream = new FileInputStream(new
File(fontPath))) {
// Load FULL font (no subsetting) to ensure all English
character indices exist
PDType0Font englishFont = PDType0Font.load(mergedDoc,
fontStream, false);
int repairCount = 0;
for (int i = 0; i < mergedDoc.getNumberOfPages(); i++) {
PDPage page = mergedDoc.getPage(i);
PDResources res = page.getResources();
if (res == null) continue;
boolean pageHasError = false;
// STEP A: Detect CMap Corruption (Format 12/14
warnings)
for (COSName fontAlias : res.getFontNames()) {
PDFont font = res.getFont(fontAlias);
// Use our helper to force a check of the
internal font mapping
if (isFontCorrupted(font)) {
System.out.println(" ALERT: Page " + (i +
1) + " contains corrupted Japanese CMap/Subsets. Repairing...");
pageHasError = true;
break;
}
}
// STEP B: Heal the problematic page
if (pageHasError) {
for (COSName fontAlias : res.getFontNames()) {
String name =
res.getFont(fontAlias).getName().toUpperCase();
// LOGIC: Target Gothic subsets (+) that
failed the validation check
if (name.contains("GOTHIC") ||
name.contains("+") || name.contains("MS-")) {
res.put(fontAlias, englishFont);
repairCount++;
}
}
// LOGIC: Re-render the operational stream
// Adding a space forces the PDF viewer to
reload the character map using our new font
try (PDPageContentStream cs = new
PDPageContentStream(mergedDoc, page,
PDPageContentStream.AppendMode.APPEND, true, true)) {
cs.beginText();
cs.setFont(englishFont, 1);
cs.newLineAtOffset(0, 0);
cs.showText(" ");
cs.endText();
}
}
}
System.out.println("INFO : Total font mappings repaired
during final pass: " + repairCount);
}
// Final Save: Overwrite the merged file with the
high-fidelity English version
mergedDoc.save(outputPath);
System.out.println("[PHASE 3] Final healing pass complete.
Output verified.");
} catch (Exception e) {
System.err.println("CRITICAL ERROR: Failed to heal the
merged PDF: " + e.getMessage());
e.printStackTrace();
}
System.out.println(">>> SUCCESS! High-Fidelity PDF saved to: "
+ outputPath + "\n");
}
=================================================================
--
*Logs Captured during Merge:* Our internal diagnostic tools show the
following warnings from FontBox during the failing merges:
*
|org.apache.fontbox.ttf.CmapSubtable: Format 14 cmap table is not
supported and will be ignored|
*
|org.apache.fontbox.ttf.CmapSubtable: Format 12 cmap contains an
invalid glyph index|
*Questions for the Community:*
1.
Why would the merger intermittently corrupt the CMap of a subsetted
font that is already valid in the source document?
2.
Is there a way to force |PDFMergerUtility| to *not* rename font
subsets during merging, as we suspect alias clashing is causing the
80% failure rate?
3.
Is there a more reliable way to "flatten" these Japanese fonts
during the merge process to ensure 100% rendering success?
* How can we reslove this issue , please help us*.
**
**
Thanks and Regards
Mruthyunjaya S
Sumedhas Tech Solutions Pvt. Ltd. <https://www.sumedhastech.com>
--
Untitled Document
**
Thanks and Regards
Mruthyunjaya S
Sumedhas Tech Solutions Pvt. Ltd. <https://www.sumedhastech.com>
Mobile: +91 96202 21314
Office: +91 80 4865 5598
Website: www.sumedhastech.com <https://www.sumedhastech.com>
Channel Partner | Regional Service Integration Partner | Partner of the
Year-FY22
P Please consider the environment before printing this email
----------
This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.