All - I'm looking at impelenting a way to specify an input document on a command line and output parsed full text to stdout and its Metadata.toString() to stderr. This could be helpful for automated and nonautomated unit testing. So one could do something like this:
java org.apache.tika.parser.DebugParser myDocumentFileOrUrl \ > myDocumentFileOrUrl.fulltext 2>myDocumentFileOrUrl.metadata Using a scripting language (or even simple bash), one could iterate over a set of files and create unique output files this way. One challenge would be specifying metadata property names. (By the way, do we have parsers adding metadata properties yet, or do they only put values in where the properties already exist? Here is a case where not having to specify them would be helpful.) Just wanted to let you know in case you want to offer any ideas or help. Regards, Keith -- View this message in context: http://www.nabble.com/Stdout-Stderr-Debug-Parser-tf4642634.html#a13260802 Sent from the Apache Tika - Development mailing list archive at Nabble.com.
