Tilman, Thanks. That works perfectly. Now I need to go through it in detail to figure out how it extracts the image and metadata.
Dave Patterson On Fri, Apr 7, 2017 at 5:32 PM, Tilman Hausherr <[email protected]> wrote: > Am 07.04.2017 um 22:59 schrieb David Patterson: > >> Tilman, >> >> The ExtractImages sample code is a 1.8 artifact (I believe). It has a lot >> of errors when compiled with 2.0.5 libraries. >> > > Please try this one: > https://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/ja > va/org/apache/pdfbox/tools/ExtractImages.java?view=markup > > Tilman > > > >> 1) two imports are no longer in the 2.0.5 library >> import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectForm; >> import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage; >> >> 2) missing methods or methods with different signatures: >> PDDocument.loadNonSeq( ** >> method >> not define >> PDDocument.load( ** >> load now requires a File, not a String >> document.openProtection ( >> document.getDocumentCatalog().getAllPages() ** getAllPages >> is >> missing from the PDDocumentCatalog >> resources.getXObjects() ** >> where resources is a PDResources object >> if (xobject instanceof PDXObjectImage) ** >> PDXObjectImage is not defined >> else if (xobject instanceof PDXObjectForm) ** same with >> PDXObjectForm >> >> Maybe a new ExtractImages2 program needs to be developed for the PDFBox 2 >> era. >> >> Dave Patterson >> >> >> >> >> On Thu, Apr 6, 2017 at 5:02 PM, Tilman Hausherr <[email protected]> >> wrote: >> >> Am 06.04.2017 um 21:22 schrieb David Patterson: >>> >>> I've got some PDF's to try to read. Many of them have images in them. I'd >>>> like to be able to iterate over the images and determine their encoding >>>> (png vs. jpeg vs. ?) and size. >>>> >>>> I've found a sample that lets me iterate over the PDXObject entities, >>>> but >>>> I'm missing a key piece to determine the size and format of the objects. >>>> >>>> a) Is a PDXObject always an image, or could it be something else? >>>> >>>> Yes it could be a form. That's why all examples (e.g. >>> ExtractImages.java) >>> always check the type, and the cast to the image xobject type. That one >>> will give the size and the filters. >>> >>> Tilman >>> >>> >>> Here is the code I've got so far. >>>> >>>> for ( PDPage aPage : pdfDocument.getPages() ) { >>>> PDResources pdResources = aPage.getResources(); >>>> for ( COSName cosObject : pdResources.getXObjectNames() ) { >>>> PDXObject xObj = pdResources.getXObject( cosObject); >>>> System.out.println( "got an image maybe" ); >>>> >>>> This is where I've gotten stumped. I've looked at lots of lists of >>>> COS-whatever things, but it has not led me to "the answer." >>>> >>>> Thanks for any guidance you can provide. >>>> >>>> Dave Patterson >>>> >>>> >>>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >

