On Sat, 8 Mar 2014, Benson Margulies wrote:
Given a large pile of HWP files,
find . -name "*.hwp" -exec java -jar ~/Downloads/tika-app-1.5.jar -v -t {} \;
does not result in any text.
Is there a detector and not a parser?
I'm not sure what a hwp file is, so I can't be sure
You can ask the tika-app if it has a parser for a given mimetype or not,
for any given file, with something like:
$ java -jar tika-app.jar --detect test.world
hello/world
$ java -jar tika-app.jar --list-parser-details | grep hello/world
$ # No supported parser
$ java -jar tika-app.jar --detect test.xls
application/vnd.ms-excel
$ java -jar tika-app.jar --list-parser-details | grep application/vnd.ms-excel
application/vnd.ms-excel
$ # Has a parser
(Skip the first step if you already know the mimetype!)
Nick