>> That means that the HWP file is based on the OLE2 file format, but that >> no-one has told Tika about that, so detection isn't working properly. If you >> could create a new bug in JIRA for this, and upload a very small HWP file >> (ideally just a few KB), we can get that fixed
I created a bug in JIRA which is https://issues.apache.org/jira/browse/TIKA-1728 On Wed, Sep 2, 2015 at 7:57 PM, Allison, Timothy B. <[email protected]> wrote: > Great. In the meantime, if you could open a JIRA issue and attach some > example files (including the different versions), it might be helpful for the > community to take a look. > > Thank you! > > -----Original Message----- > From: Mungeol Heo [mailto:[email protected]] > Sent: Tuesday, September 01, 2015 9:02 PM > To: [email protected] > Subject: Re: Does tika support "HWP"? > > Thank you for your reply. > I will try to write a customized parser for HWP file. > And if my code is "pretty enough", I will consider to contribute it. > Again, thank you. > > On Tue, Sep 1, 2015 at 7:58 PM, Nick Burch <[email protected]> wrote: >> On Tue, 1 Sep 2015, Mungeol Heo wrote: >>>> >>>> java -jar tika-app-1.10.jar --list-supported-types | grep hwp >>>> application/x-hwp >> >> >> That means the mime type has been defined in some way >> >>>> java -jar tika-app-1.10.jar --detect sample.hwp >>>> application/x-tika-msoffice >> >> >> That means that the HWP file is based on the OLE2 file format, but >> that no-one has told Tika about that, so detection isn't working >> properly. If you could create a new bug in JIRA for this, and upload a >> very small HWP file (ideally just a few KB), we can get that fixed >> >>> And another thing is, there is no 'application/x-hwp' in the >>> supported formats list which are mentioned at >>> 'http://tika.apache.org/1.10/formats.html' page. >> >> >> That means there is no parser available for HWP, and you'd need to >> write + contribute one >> >>> So, does tika support "HWP"? >> >> >> Depends on your definition of "supports"! >> >> Nick
