On Tue, 1 Sep 2015, Mungeol Heo wrote:
java -jar tika-app-1.10.jar --list-supported-types | grep hwp application/x-hwp
That means the mime type has been defined in some way
java -jar tika-app-1.10.jar --detect sample.hwp application/x-tika-msoffice
That means that the HWP file is based on the OLE2 file format, but that no-one has told Tika about that, so detection isn't working properly. If you could create a new bug in JIRA for this, and upload a very small HWP file (ideally just a few KB), we can get that fixed
And another thing is, there is no 'application/x-hwp' in the supported formats list which are mentioned at 'http://tika.apache.org/1.10/formats.html' page.
That means there is no parser available for HWP, and you'd need to write + contribute one
So, does tika support "HWP"?
Depends on your definition of "supports"! Nick
