All,
Over on Tika, it looks like we copied
org.apache.pdfbox.examples.pdmodel.ExtractEmbeddedFiles to extract embedded
files. As I look at the source code for PDComplexFileSpecification, I notice
that getEmbeddedFile() does not behave like getFilename(); that is, it doesn't
iterate through the various formats and return the first non null.
When we try to get the PDEmbeddedFile, should we try each of these instead of
just getEmbeddedFile()?
getEmbeddedFile()
getEmbeddedFileDos()
getEmbeddedFileUnix()
getEmbeddedFileMac()
Will getEmbeddedFile() alone potentially miss embedded files?
Thank you.
Best,
Tim