Sorry I couldn't help. Please do let us know if you figure out what's going on.
Best,
Tim
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Friday, January 08, 2016 3:43 AM
To: [email protected]
Subject: Re: Questions about using AutoDetect and DigestParser
Actually I think the test code is quite good to get an understanding how the
DigestingParser works. I tried every combination I could think of, but I
couldn't make it work. The code mirrors the unit test as close as possible
(only the input stream is different). As it seems it is related to my use of
Scala. If I find the time I will try it again with Java to further pinpoint the
problem. In the meantime I think I'll stick to java.security.MessageDigest.
Kind regards
-----Original Message-----
Sent: Thursday, 07 January 2016 um 18:49:09 Uhr
From: "Allison, Timothy B." <[email protected]>
To: "[email protected]" <[email protected]>
Subject: RE: Questions about using AutoDetect and DigestParser As for 1, y,
sorry, that's a bug I've been meaning to fix...
As for 2, you're right, the test code is fairly opaque. Sorry. The code below
works when I put it in DigestingParserTest.
The behavior you're seeing with AutoDetectParser() happens when the
AutoDetectParser fails to load parsers either via the config file or via SPI,
which reads parsers to load from the Parser class' service file. Is there any
reason to think you're getting different SPI behavior with, say (= I don't know
Scala, and I'm guessing...sorry)
val fileParser : Parser = new AutoDetectParser()
vs.
val fileParser : Parser = new DigestingParser(new AutoDetectParser(), digester)
I'm sure you've tried the following for kicks...(again, apologies for guessing)
val autoParser : AutoDetectParser = new AutoDetectParser()
val fileParser : DigestingParser = new DigestingParser(autoParser,
digester)
Java unit test that works within DigestingParserTest:
@Test
public void testSimple() throws Exception {
CommonsDigester.DigestAlgorithm[] algos =
CommonsDigester.parse("md5,sha256,sha384,sha512");
Metadata metadata = new Metadata();
Parser d = new DigestingParser(new AutoDetectParser(), new
CommonsDigester(UNLIMITED, algos));
ContentHandler handler = new WriteOutContentHandler(-1);
try (InputStream input =
DigestingParserTest.class.getResourceAsStream("/test-documents/testPDF.pdf")) {
d.parse(input, handler, metadata, new ParseContext());
}
String[] parsedBy = metadata.getValues("X-Parsed-By");
for (String v : parsedBy) {
System.out.println("Parsed by: " + v);
}
assertEquals("org.apache.tika.parser.DefaultParser", parsedBy[0]);
assertEquals("org.apache.tika.parser.pdf.PDFParser", parsedBy[1]);
}