Actually I think the test code is quite good to get an understanding how the DigestingParser works. I tried every combination I could think of, but I couldn't make it work. The code mirrors the unit test as close as possible (only the input stream is different). As it seems it is related to my use of Scala. If I find the time I will try it again with Java to further pinpoint the problem. In the meantime I think I'll stick to java.security.MessageDigest.
Kind regards -----Original Message----- Sent: Thursday, 07 January 2016 um 18:49:09 Uhr From: "Allison, Timothy B." <[email protected]> To: "[email protected]" <[email protected]> Subject: RE: Questions about using AutoDetect and DigestParser As for 1, y, sorry, that's a bug I've been meaning to fix... As for 2, you're right, the test code is fairly opaque. Sorry. The code below works when I put it in DigestingParserTest. The behavior you're seeing with AutoDetectParser() happens when the AutoDetectParser fails to load parsers either via the config file or via SPI, which reads parsers to load from the Parser class' service file. Is there any reason to think you're getting different SPI behavior with, say (= I don't know Scala, and I'm guessing...sorry) val fileParser : Parser = new AutoDetectParser() vs. val fileParser : Parser = new DigestingParser(new AutoDetectParser(), digester) I'm sure you've tried the following for kicks...(again, apologies for guessing) val autoParser : AutoDetectParser = new AutoDetectParser() val fileParser : DigestingParser = new DigestingParser(autoParser, digester) Java unit test that works within DigestingParserTest: @Test public void testSimple() throws Exception { CommonsDigester.DigestAlgorithm[] algos = CommonsDigester.parse("md5,sha256,sha384,sha512"); Metadata metadata = new Metadata(); Parser d = new DigestingParser(new AutoDetectParser(), new CommonsDigester(UNLIMITED, algos)); ContentHandler handler = new WriteOutContentHandler(-1); try (InputStream input = DigestingParserTest.class.getResourceAsStream("/test-documents/testPDF.pdf")) { d.parse(input, handler, metadata, new ParseContext()); } String[] parsedBy = metadata.getValues("X-Parsed-By"); for (String v : parsedBy) { System.out.println("Parsed by: " + v); } assertEquals("org.apache.tika.parser.DefaultParser", parsedBy[0]); assertEquals("org.apache.tika.parser.pdf.PDFParser", parsedBy[1]); }
