Happy New Year everyone,
I have a small program for simple text and metadata extraction. It is really
not more than this (in Scala):
val fileParser : AutoDetectParser = new AutoDetectParser()
val handler : WriteOutContentHandler = new WriteOutContentHandler(-1)
val metadata : Metadata = new Metadata()
val context : ParseContext = new ParseContext()
try {
fileParser.parse(stream, handler, metadata, context)
} catch ...
When I look at the metadata I always have this line: X-Parsed-By:
org.apache.tika.parser.DefaultParser
Question1) Shouldn't this be more specific? Like PdfParser, OpenDocumentParser
and so on.
Question2) I understand that there is the DigestingParser to add Md5 and Sha1
hashes to the metadata. But how can I "combine" the AutoDetectParser and the
DigestingParser?
Thanks so far
Kind regards