Thanks so far.
You're absolutly right about Q1, I was just a bit confused because tika-app 
also shows only the first parser.

Q2 is unfortunatly still not solved for me. I changed my code to:

val handler : WriteOutContentHandler = new WriteOutContentHandler(-1)
val digester : CommonsDigester = new CommonsDigester(Int.MaxValue,  
CommonsDigester.DigestAlgorithm.MD5, CommonsDigester.DigestAlgorithm.SHA1)
val metadata : Metadata = new Metadata()
val context : ParseContext = new ParseContext()
val fileParser : DigestingParser = new DigestingParser(new AutoDetectParser(), 
digester)
try {
    fileParser.parse(stream, handler, metadata, context)
    ...

Java should not be so different. Well it compiles fine, but when I run it the 
detection fails. Content-Type is always set to octet-stream, and only the 
EmptyParser is called. If I use a simple AutoDetectParser instead everything 
works fine. (Tested with a bunch of pdf's). The examples from the digester 
tests didn't help me either. Any hints for me?

Kind regards


-----Original message-----
Sent: Tuesday, 05 January 2016 at 14:37:02
From: "Allison, Timothy B." <[email protected]>
To: "[email protected]" <[email protected]>
Subject: RE: Questions about using AutoDetect and DigestParser
>>Question1) Shouldn't this be more specific? Like PdfParser, 
>>OpenDocumentParser and so on.

Y, make sure to call metadata.getValues(X-Parsed-By) which returns an array of 
values and then iterate through that array to see the parsers that actually 
processed your doc.  If you call metadata.get(Property p), you only get the 
first value in the array.

>> Question2) I understand that there is the DigestingParser to add Md5 and 
>> Sha1 hashes to the metadata. But how can I "combine" the AutoDetectParser 
>> and the DigestingParser?

See DigestingParserTest [0] for exact code, but basically something like this:

Metadata m = new Metadata();
CommonsDigester.DigestAlgorithm[] algos = CommonsDigester.parse("md5,sha512");
Parser d = new DigestingParser(new AutoDetectParser(), new 
CommonsDigester(1000000, algos, m)

d.parse(InputStream....)

[0] 
http://svn.apache.org/viewvc/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/DigestingParserTest.java?view=markup

Reply via email to