Great.  In the meantime, if you could open a JIRA issue and attach some example 
files (including the different versions), it might be helpful for the community 
to take a look.

Thank you!

-----Original Message-----
From: Mungeol Heo [mailto:[email protected]] 
Sent: Tuesday, September 01, 2015 9:02 PM
To: [email protected]
Subject: Re: Does tika support "HWP"?

Thank you for your reply.
I will try to write a customized parser for HWP file.
And if my code is "pretty enough", I will consider to contribute it.
Again, thank you.

On Tue, Sep 1, 2015 at 7:58 PM, Nick Burch <[email protected]> wrote:
> On Tue, 1 Sep 2015, Mungeol Heo wrote:
>>>
>>> java -jar tika-app-1.10.jar --list-supported-types | grep hwp 
>>> application/x-hwp
>
>
> That means the mime type has been defined in some way
>
>>> java -jar tika-app-1.10.jar --detect sample.hwp 
>>> application/x-tika-msoffice
>
>
> That means that the HWP file is based on the OLE2 file format, but 
> that no-one has told Tika about that, so detection isn't working 
> properly. If you could create a new bug in JIRA for this, and upload a 
> very small HWP file (ideally just a few KB), we can get that fixed
>
>> And another thing is, there is no 'application/x-hwp' in the 
>> supported formats list which are mentioned at 
>> 'http://tika.apache.org/1.10/formats.html' page.
>
>
> That means there is no parser available for HWP, and you'd need to 
> write + contribute one
>
>> So, does tika support "HWP"?
>
>
> Depends on your definition of "supports"!
>
> Nick

Reply via email to