Happy New Year everyone,
I have a small program for simple text and metadata extraction. It is really 
not more than this (in Scala):

        val fileParser : AutoDetectParser = new AutoDetectParser()
        val handler : WriteOutContentHandler = new WriteOutContentHandler(-1)
        val metadata : Metadata = new Metadata()
        val context : ParseContext = new ParseContext()

        try {
            fileParser.parse(stream, handler, metadata, context)
        } catch ...

When I look at the metadata I always have this line: X-Parsed-By: 
org.apache.tika.parser.DefaultParser
Question1) Shouldn't this be more specific? Like PdfParser, OpenDocumentParser 
and so on.

Question2) I understand that there is the DigestingParser to add Md5 and Sha1 
hashes to the metadata. But how can I "combine" the AutoDetectParser and the 
DigestingParser?

Thanks so far
Kind regards

Reply via email to