Hi,

we want to create the structure to an existing PDF document. We have PDF 
documents from a scanner which contains Images but no structure.
We want to implement a program to create the structure so we can add 
AlternateDescriptions to the images based on tesaract ocr recognition.

Our first approach creates a structure but the structure seems to be incomplete 
when checking it with adobe acrobat. We can't find any hints in the pdfbox 
examples
or documentation how to do this.

Our Code snippet:

             try {
                    PDDocument document = PDDocument.load("test.pdf");
                    PDDocumentCatalog documentCatalog = 
document.getDocumentCatalog();

                    PDStructureTreeRoot treeRoot = 
document.getDocumentCatalog().getStructureTreeRoot();

                    if(treeRoot == null){
                           COSDictionary cosDictionary = 
documentCatalog.getCOSDictionary();
                           PDStructureTreeRoot newTreeRoot = new 
PDStructureTreeRoot();

                           //iterate over pages
                           List<?> pages = documentCatalog.getAllPages();
                           for (Object object : pages) {
                                  PDPage page = (PDPage) object;
                                  Map<String,PDXObject> mapObjects = 
page.getResources().getXObjects();
                                  for (PDXObject pdxObject : 
mapObjects.values()) {
                                        if(pdxObject instanceof PDXObjectImage){
                                               PDXObjectImage objectImage = 
(PDXObjectImage)pdxObject;
                                               //new SturctureElement for the 
image
                                               PDStructureElement 
structureElement = new 
PDStructureElement(StandardStructureTypes.Figure,newTreeRoot);
                                               PDMarkedContent markedContent = 
new PDMarkedContent(COSName.IMAGE,  new COSDictionary());
                                               
markedContent.addXObject(objectImage);
                                               
structureElement.appendKid(markedContent);
                                               
structureElement.setAlternateDescription("NEW ALTERNATE DESCRIPTION");
                                               
newTreeRoot.appendKid(structureElement);
                                        }
                                  }
                           }

                           documentCatalog.setStructureTreeRoot(newTreeRoot);
                           treeRoot = documentCatalog.getStructureTreeRoot();
                    }

                    document.save("testWithTree.pdf");
                    document.close();
             }
             catch (IOException e) {
                    e.printStackTrace();
             }
             catch (COSVisitorException e) {
                    e.printStackTrace();
             }

Can someone help us her?

Best regards,

Klaus Henning


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

Reply via email to