Thanks so far.
You're absolutly right about Q1, I was just a bit confused because tika-app
also shows only the first parser.
Q2 is unfortunatly still not solved for me. I changed my code to:
val handler : WriteOutContentHandler = new WriteOutContentHandler(-1)
val digester : CommonsDigester = new CommonsDigester(Int.MaxValue,
CommonsDigester.DigestAlgorithm.MD5, CommonsDigester.DigestAlgorithm.SHA1)
val metadata : Metadata = new Metadata()
val context : ParseContext = new ParseContext()
val fileParser : DigestingParser = new DigestingParser(new AutoDetectParser(),
digester)
try {
fileParser.parse(stream, handler, metadata, context)
...
Java should not be so different. Well it compiles fine, but when I run it the
detection fails. Content-Type is always set to octet-stream, and only the
EmptyParser is called. If I use a simple AutoDetectParser instead everything
works fine. (Tested with a bunch of pdf's). The examples from the digester
tests didn't help me either. Any hints for me?
Kind regards
-----Original message-----
Sent: Tuesday, 05 January 2016 at 14:37:02
From: "Allison, Timothy B." <[email protected]>
To: "[email protected]" <[email protected]>
Subject: RE: Questions about using AutoDetect and DigestParser
>>Question1) Shouldn't this be more specific? Like PdfParser,
>>OpenDocumentParser and so on.
Y, make sure to call metadata.getValues(X-Parsed-By) which returns an array of
values and then iterate through that array to see the parsers that actually
processed your doc. If you call metadata.get(Property p), you only get the
first value in the array.
>> Question2) I understand that there is the DigestingParser to add Md5 and
>> Sha1 hashes to the metadata. But how can I "combine" the AutoDetectParser
>> and the DigestingParser?
See DigestingParserTest [0] for exact code, but basically something like this:
Metadata m = new Metadata();
CommonsDigester.DigestAlgorithm[] algos = CommonsDigester.parse("md5,sha512");
Parser d = new DigestingParser(new AutoDetectParser(), new
CommonsDigester(1000000, algos, m)
d.parse(InputStream....)
[0]
http://svn.apache.org/viewvc/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/DigestingParserTest.java?view=markup