Re: Accessing "alternate text" for an image via PDFBox?

Leleu Eric Sat, 22 Sep 2012 07:09:01 -0700

Hi,

For your test file, here is a way to access "/Alt" entry  :


        PDDocument document = PDDocument.load("image_test_pass.pdf");
        PDStructureTreeRoot treeRoot =
document.getDocumentCatalog().getStructureTreeRoot();

        // get page for each StructElement
        for (Object o : treeRoot.getKids()) {
            if (o instanceof PDStructureElement) {
                PDStructureElement structElement = (PDStructureElement)o;
                System.out.println(structElement.getAlternateDescription());
                PDPage page = structElement.getPage();
                if (page != null) {
                    page.getResources().getImages();
                }
            }
        }

Please refer to the PDF specification [1] and in particular §14.6, §14.7,
§14.9.3 and §14.9.4 to know all the rules in order to find the "/Alt"
entry. There seems to have several way to define this information.

BR,
Eric


[1] www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf


2012/9/21 Matthew Sheppard <[email protected]>

> Is there some way to extract "alternate text" for a specific image using
> PDFBox?
>
> I have a PDF file which, as described at
> http://www.w3.org/WAI/GL/2011/WD-WCAG20-TECHS-20110621/pdf.html#PDF1,
> has had alternate text added to an image. Using PDFBox I can find my
> way through the object model to the image itself (a PDXObjectImage)
> through PDFDocument.getDocumentCatalog().getAllPages() [iterator]
> .getResources.getImages() but I can not see any way to get from the
> image itself to the alternate text for it.
>
> A small sample PDF (with a single image which has some alternate text
> specified) can be found at
> http://dl.dropbox.com/u/12253279/image_test_pass.pdf
>
> Many thanks in advance to anyone who is able to point me in the right
> direction,
> Matt Sheppard
>

Re: Accessing "alternate text" for an image via PDFBox?

Reply via email to