Hi,
For your test file, here is a way to access "/Alt" entry :
PDDocument document = PDDocument.load("image_test_pass.pdf");
PDStructureTreeRoot treeRoot =
document.getDocumentCatalog().getStructureTreeRoot();
// get page for each StructElement
for (Object o : treeRoot.getKids()) {
if (o instanceof PDStructureElement) {
PDStructureElement structElement = (PDStructureElement)o;
System.out.println(structElement.getAlternateDescription());
PDPage page = structElement.getPage();
if (page != null) {
page.getResources().getImages();
}
}
}
Please refer to the PDF specification [1] and in particular §14.6, §14.7,
§14.9.3 and §14.9.4 to know all the rules in order to find the "/Alt"
entry. There seems to have several way to define this information.
BR,
Eric
[1] www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf
2012/9/21 Matthew Sheppard <[email protected]>
> Is there some way to extract "alternate text" for a specific image using
> PDFBox?
>
> I have a PDF file which, as described at
> http://www.w3.org/WAI/GL/2011/WD-WCAG20-TECHS-20110621/pdf.html#PDF1,
> has had alternate text added to an image. Using PDFBox I can find my
> way through the object model to the image itself (a PDXObjectImage)
> through PDFDocument.getDocumentCatalog().getAllPages() [iterator]
> .getResources.getImages() but I can not see any way to get from the
> image itself to the alternate text for it.
>
> A small sample PDF (with a single image which has some alternate text
> specified) can be found at
> http://dl.dropbox.com/u/12253279/image_test_pass.pdf
>
> Many thanks in advance to anyone who is able to point me in the right
> direction,
> Matt Sheppard
>