>> That means that the HWP file is based on the OLE2 file format, but that 
>> no-one has told Tika about that, so detection isn't working properly. If you 
>> could create a new bug in JIRA for this, and upload a very small HWP file 
>> (ideally just a few KB), we can get that fixed

I created a bug in JIRA which is https://issues.apache.org/jira/browse/TIKA-1728

On Wed, Sep 2, 2015 at 7:57 PM, Allison, Timothy B. <[email protected]> wrote:
> Great.  In the meantime, if you could open a JIRA issue and attach some 
> example files (including the different versions), it might be helpful for the 
> community to take a look.
>
> Thank you!
>
> -----Original Message-----
> From: Mungeol Heo [mailto:[email protected]]
> Sent: Tuesday, September 01, 2015 9:02 PM
> To: [email protected]
> Subject: Re: Does tika support "HWP"?
>
> Thank you for your reply.
> I will try to write a customized parser for HWP file.
> And if my code is "pretty enough", I will consider to contribute it.
> Again, thank you.
>
> On Tue, Sep 1, 2015 at 7:58 PM, Nick Burch <[email protected]> wrote:
>> On Tue, 1 Sep 2015, Mungeol Heo wrote:
>>>>
>>>> java -jar tika-app-1.10.jar --list-supported-types | grep hwp
>>>> application/x-hwp
>>
>>
>> That means the mime type has been defined in some way
>>
>>>> java -jar tika-app-1.10.jar --detect sample.hwp
>>>> application/x-tika-msoffice
>>
>>
>> That means that the HWP file is based on the OLE2 file format, but
>> that no-one has told Tika about that, so detection isn't working
>> properly. If you could create a new bug in JIRA for this, and upload a
>> very small HWP file (ideally just a few KB), we can get that fixed
>>
>>> And another thing is, there is no 'application/x-hwp' in the
>>> supported formats list which are mentioned at
>>> 'http://tika.apache.org/1.10/formats.html' page.
>>
>>
>> That means there is no parser available for HWP, and you'd need to
>> write + contribute one
>>
>>> So, does tika support "HWP"?
>>
>>
>> Depends on your definition of "supports"!
>>
>> Nick

Reply via email to