Hi Olaf,
I want to create structure in the sense of a tagged PDF, so I can add or modify
the alternate description of all images in an PDF document.
The only way I found to add or modify an the alternate description of an image
is via the structreelement of it.
structureElement.setAlternateDescription("NEW ALTERNATE DESCRIPTION");
As described in my previous mail I want to recognitze the text in the image via
ocr and write the returned text to the alternate description of the image.
Klaus
-----Ursprüngliche Nachricht-----
Von: Olaf Drümmer [mailto:[email protected]]
Gesendet: Donnerstag, 12. März 2015 12:33
An: [email protected]
Cc: Olaf Drümmer
Betreff: Re: how to create structure for an existing PDF document
Hi Klaus,
what kind of structure do you wish to create? Structure in the sense of tagged
PDF, or just some logical structure, and if so, for what purposes?
Olaf
On 12 Mar 2015, at 11:54, "Henning, Klaus" <[email protected]> wrote:
> Hi,
>
> we want to create the structure to an existing PDF document. We have PDF
> documents from a scanner which contains Images but no structure.
> We want to implement a program to create the structure so we can add
> AlternateDescriptions to the images based on tesaract ocr recognition.
>
> Our first approach creates a structure but the structure seems to be
> incomplete when checking it with adobe acrobat. We can't find any hints in
> the pdfbox examples or documentation how to do this.
>
> Our Code snippet:
>
> try {
> PDDocument document = PDDocument.load("test.pdf");
> PDDocumentCatalog documentCatalog =
> document.getDocumentCatalog();
>
> PDStructureTreeRoot treeRoot =
> document.getDocumentCatalog().getStructureTreeRoot();
>
> if(treeRoot == null){
> COSDictionary cosDictionary =
> documentCatalog.getCOSDictionary();
> PDStructureTreeRoot newTreeRoot = new
> PDStructureTreeRoot();
>
> //iterate over pages
> List<?> pages = documentCatalog.getAllPages();
> for (Object object : pages) {
> PDPage page = (PDPage) object;
> Map<String,PDXObject> mapObjects =
> page.getResources().getXObjects();
> for (PDXObject pdxObject :
> mapObjects.values()) {
> if(pdxObject instanceof
> PDXObjectImage){
> PDXObjectImage objectImage =
> (PDXObjectImage)pdxObject;
> //new SturctureElement for the
> image
> PDStructureElement
> structureElement = new
> PDStructureElement(StandardStructureTypes.Figure,newTreeRoot);
> PDMarkedContent markedContent =
> new PDMarkedContent(COSName.IMAGE, new COSDictionary());
>
> markedContent.addXObject(objectImage);
>
> structureElement.appendKid(markedContent);
>
> structureElement.setAlternateDescription("NEW ALTERNATE DESCRIPTION");
>
> newTreeRoot.appendKid(structureElement);
> }
> }
> }
>
> documentCatalog.setStructureTreeRoot(newTreeRoot);
> treeRoot = documentCatalog.getStructureTreeRoot();
> }
>
> document.save("testWithTree.pdf");
> document.close();
> }
> catch (IOException e) {
> e.printStackTrace();
> }
> catch (COSVisitorException e) {
> e.printStackTrace();
> }
>
> Can someone help us her?
>
> Best regards,
>
> Klaus Henning
>
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]