We have a unit test for an xlsx file with the default password, and
that shows that the content type is updated to
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"...in
short, that file works as I'd expect it to....might be nice to include
in the metadata that the file was initially encrypted, but that result
is good enough for me for now.

However, when I just now tried to open an xlsx file with an actual
password, I got an exception....this is messy...and probably a reason
to respin 1.21-rc1...ugh..

Fellow devs, see TIKA-2873....ugh...

On Tue, May 14, 2019 at 2:05 PM Tucker B <[email protected]> wrote:
>
> On Tue, 14 May 2019, 13:52 Tim Allison, <[email protected]> wrote:
>>
>> Hi Tucker,
>>   I know only a little about this area, but I think password protected
>> xlsx files (and ooxml generally) are encrypted inside an OLE package
>> so you can't even get to the underlying ooxml/zip file until you've
>> decrypted the file.
>
>
> That is my understanding as well. And can confirm based on the OfficeParser 
> code paths for x-tika-ooxml-protected.
>
>> Do you have the passwords to these files?
>
>
> In most cases they are the default password. So I might need to add a custom 
> mimetype detector to add as a composite detector for handling the case where 
> the default password will work.
>
>> On Tue, May 14, 2019 at 1:00 PM Tucker B <[email protected]> wrote:
>> >
>> > I have a password protected xlsx file. The default mime type detection
>> > returns a mime type of "application/x-tika-ooxml-protected". Is it
>> > possible to configure the mime type detection to return the underlying
>> > content type, e.g.
>> > "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet". I
>> > didn't see any configuration options available to override in
>> > custom-mimetypes.xml.

Reply via email to