Great. In the meantime, if you could open a JIRA issue and attach some example files (including the different versions), it might be helpful for the community to take a look.
Thank you! -----Original Message----- From: Mungeol Heo [mailto:[email protected]] Sent: Tuesday, September 01, 2015 9:02 PM To: [email protected] Subject: Re: Does tika support "HWP"? Thank you for your reply. I will try to write a customized parser for HWP file. And if my code is "pretty enough", I will consider to contribute it. Again, thank you. On Tue, Sep 1, 2015 at 7:58 PM, Nick Burch <[email protected]> wrote: > On Tue, 1 Sep 2015, Mungeol Heo wrote: >>> >>> java -jar tika-app-1.10.jar --list-supported-types | grep hwp >>> application/x-hwp > > > That means the mime type has been defined in some way > >>> java -jar tika-app-1.10.jar --detect sample.hwp >>> application/x-tika-msoffice > > > That means that the HWP file is based on the OLE2 file format, but > that no-one has told Tika about that, so detection isn't working > properly. If you could create a new bug in JIRA for this, and upload a > very small HWP file (ideally just a few KB), we can get that fixed > >> And another thing is, there is no 'application/x-hwp' in the >> supported formats list which are mentioned at >> 'http://tika.apache.org/1.10/formats.html' page. > > > That means there is no parser available for HWP, and you'd need to > write + contribute one > >> So, does tika support "HWP"? > > > Depends on your definition of "supports"! > > Nick
