>>Question1) Shouldn't this be more specific? Like PdfParser,
>>OpenDocumentParser and so on.
Y, make sure to call metadata.getValues(X-Parsed-By) which returns an array of
values and then iterate through that array to see the parsers that actually
processed your doc. If you call metadata.get(Property p), you only get the
first value in the array.
>> Question2) I understand that there is the DigestingParser to add Md5 and
>> Sha1 hashes to the metadata. But how can I "combine" the AutoDetectParser
>> and the DigestingParser?
See DigestingParserTest [0] for exact code, but basically something like this:
Metadata m = new Metadata();
CommonsDigester.DigestAlgorithm[] algos = CommonsDigester.parse("md5,sha512");
Parser d = new DigestingParser(new AutoDetectParser(), new
CommonsDigester(1000000, algos, m)
d.parse(InputStream....)
[0]
http://svn.apache.org/viewvc/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/DigestingParserTest.java?view=markup
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Tuesday, January 05, 2016 3:33 AM
To: [email protected]
Subject: Questions about using AutoDetect and DigestParser
Happy New Year everyone,
I have a small program for simple text and metadata extraction. It is really
not more than this (in Scala):
val fileParser : AutoDetectParser = new AutoDetectParser()
val handler : WriteOutContentHandler = new WriteOutContentHandler(-1)
val metadata : Metadata = new Metadata()
val context : ParseContext = new ParseContext()
try {
fileParser.parse(stream, handler, metadata, context)
} catch ...
When I look at the metadata I always have this line: X-Parsed-By:
org.apache.tika.parser.DefaultParser
Question1) Shouldn't this be more specific? Like PdfParser, OpenDocumentParser
and so on.
Question2) I understand that there is the DigestingParser to add Md5 and Sha1
hashes to the metadata. But how can I "combine" the AutoDetectParser and the
DigestingParser?
Thanks so far
Kind regards