Ah, thanks Tim!
Forgot all about the /rmeta endpoint, it contains all the good stuff about 
embedded files.

Regards,
Willy T. Koch


Den Tir 13 jun 2023, kl. 12:29, skrev Tim Allison:
> Not sure how you're using Tika, but if you use the /rmeta endpoint in tika 
> server, or the -J option in tika-app or the RecursiveParserWrapper in code, 
> you should be able to get what you need.
> 
> On Tue, Jun 13, 2023 at 5:33 AM Willy T. Koch <[email protected]> wrote:
>> __
>> Hi,
>> Does Tika support detecting if a PDF has embedded files, and even better 
>> return an array of the file names?
>> 
>> I was forwarded a "signed" PDF from a vendor that appearantly makes their 
>> own signing solution. The PDF doesn't contain any standard PaDES properties 
>> that triggers the signature panel in Acrobat or hasSignature:true or any of 
>> the other signature properties in Tika.
>> 
>> It consisted of embedding six html files with various technical info inside 
>> the PDF, like here, from the raw content:
>> 
>> obj
>> <</Names[(Appendix 1 Evidence Quality Framework.html) 99 0 R (Appendix 2 
>> Service Description.html) 101 0 R (Appendix 3 Evidence Log.html) 105 0 R 
>> (Appendix 4 Evidence of Time.html) 107 0 R (Appendix 5 Evidence of 
>> Intent.html) 109 0 R (Appendix 6 Digital Signature Documentation.html) 103 0 
>> R (Evidence Quality of xxxxx E-signed Documents.html) 97 0 R]>>
>> endobj
>> 112 0 obj
>> 
>> From a security perspective this would also be very useful when using Tika 
>> as a secure file gateway for file analysis and detecting malicious files.
>> 
>> Thanks,
>> Willy T. Koch
>> Norway
>> 

Reply via email to