On Sat, 8 Mar 2014, Benson Margulies wrote:
Given a large pile of HWP files,

find . -name "*.hwp" -exec java -jar ~/Downloads/tika-app-1.5.jar -v -t {} \;

does not result in any text.

Is there a detector and not a parser?

I'm not sure what a hwp file is, so I can't be sure

You can ask the tika-app if it has a parser for a given mimetype or not, for any given file, with something like:

$ java -jar tika-app.jar --detect test.world
hello/world
$ java -jar tika-app.jar --list-parser-details | grep hello/world
$ # No supported parser

$ java -jar tika-app.jar --detect test.xls
application/vnd.ms-excel
$ java -jar tika-app.jar --list-parser-details | grep application/vnd.ms-excel
application/vnd.ms-excel
$ # Has a parser

(Skip the first step if you already know the mimetype!)

Nick
  • HWP? Benson Margulies
    • Re: HWP? Nick Burch

Reply via email to