Hi,
> "Allison, Timothy B." <[email protected]> hat am 23. Juli 2014 um 20:21 > geschrieben: > > > All, > > Over on Tika, it looks like we copied >org.apache.pdfbox.examples.pdmodel.ExtractEmbeddedFiles to extract embedded >files. As I look at the source code for PDComplexFileSpecification, I notice >that getEmbeddedFile() does not behave like getFilename(); that is, it doesn't >iterate through the various formats and return the first non null. > > When we try to get the PDEmbeddedFile, should we try each of these instead >of just getEmbeddedFile()? Yes. > getEmbeddedFile() > > getEmbeddedFileDos() > > getEmbeddedFileUnix() > > getEmbeddedFileMac() > > > > Will getEmbeddedFile() alone potentially miss embedded files? Yes. "getFilename()" was created for convenience. There isn't such method for the embedded file, so that you have to look yourself. BTW: According to the spec, the Dos, Unix and Mac mutations shouldn't be used anymore, therefore we should rearrange the order in "getFilename" BTW2: Analog to "get/setFileXXX" we should add the missing "get/setEmbeddedFileUnicode" BTW3: We should rename getUnicodeFile to getFileUnicode and add a setter for that value as well I'll take care about that, see PDFBOX-2239 > Thank you. > > > > Best, > > > > Tim BR Andreas Lehmkühler

