On Tue, 1 Sep 2015, Mungeol Heo wrote:
java -jar tika-app-1.10.jar --list-supported-types | grep hwp
application/x-hwp

That means the mime type has been defined in some way

java -jar tika-app-1.10.jar --detect sample.hwp
application/x-tika-msoffice

That means that the HWP file is based on the OLE2 file format, but that no-one has told Tika about that, so detection isn't working properly. If you could create a new bug in JIRA for this, and upload a very small HWP file (ideally just a few KB), we can get that fixed

And another thing is, there is no 'application/x-hwp' in the supported
formats list which are mentioned at
'http://tika.apache.org/1.10/formats.html' page.

That means there is no parser available for HWP, and you'd need to write + contribute one

So, does tika support "HWP"?

Depends on your definition of "supports"!

Nick

Reply via email to