All, Over on Tika, it looks like we copied org.apache.pdfbox.examples.pdmodel.ExtractEmbeddedFiles to extract embedded files. As I look at the source code for PDComplexFileSpecification, I notice that getEmbeddedFile() does not behave like getFilename(); that is, it doesn't iterate through the various formats and return the first non null.
When we try to get the PDEmbeddedFile, should we try each of these instead of just getEmbeddedFile()? getEmbeddedFile() getEmbeddedFileDos() getEmbeddedFileUnix() getEmbeddedFileMac() Will getEmbeddedFile() alone potentially miss embedded files? Thank you. Best, Tim