Hi, all. Long time no talk. I had been working part time and on a kind of sabbatical during which I abandoned Java in favor of studying Ruby and Clojure, and attending and organizing BarCamp's.
About three months ago, I started a new job, working with Java again. The need to extract structured data from Excel spreadsheets arose, and I wrote a JRuby script that called Tika to manage the parsing. In the process, I think I identified some possible improvements to Tika. It would be nice to simplify one of the simplest use cases, where you want Tika to parse a document using default configurations, and specify its output stream. There is a very general mechanism for parsing in CLI, but it is not possible to override the output stream default (stdout), and awkward to call it from a program rather than on the command line. I have two suggestions: 1) Make the output destination a configuration option (a command line parameter) that defaults to stdout (perhaps "-o"). Although it's easy to redirect output on the command line, it's not quite so simple when that command is called within a script that itself may be redirected. Also, when the command is executed from within another program, there may be issues as well. 2) Move the methods that do the work to ParseUtils, and leave only a thin command line wrapper around them in CLI. It would be helpful for scripts and Java programs to have these easy to use methods available too. It seems wasteful to force the caller to construct a command line to do this. What do you think? Cheers, Keith -- View this message in context: http://www.nabble.com/Moving-Functionality-from-CLI-to-ParseUtils-tp24337541p24337541.html Sent from the Apache Tika - Development mailing list archive at Nabble.com.