All,

  Over on Tika, it looks like we copied 
org.apache.pdfbox.examples.pdmodel.ExtractEmbeddedFiles to extract embedded 
files.  As I look at the source code for PDComplexFileSpecification, I notice 
that getEmbeddedFile() does not behave like getFilename(); that is, it doesn't 
iterate through the various formats and return the first non null.

  When we try to get the PDEmbeddedFile, should we try each of these instead of 
just getEmbeddedFile()?



getEmbeddedFile()

getEmbeddedFileDos()

getEmbeddedFileUnix()

getEmbeddedFileMac()



  Will getEmbeddedFile() alone potentially miss embedded files?



   Thank you.



         Best,



                    Tim

Reply via email to