Actually I think the test code is quite good to get an understanding how the 
DigestingParser works.  I tried every combination I could think of, but I 
couldn't make it work. The code mirrors the unit test as close as possible 
(only the input stream is different). As it seems it is related to my use of 
Scala. If I find the time I will try it again with Java to further pinpoint the 
problem. In the meantime I think I'll stick to java.security.MessageDigest.

Kind regards

-----Original Message-----
Sent: Thursday, 07 January 2016 um 18:49:09 Uhr
From: "Allison, Timothy B." <[email protected]>
To: "[email protected]" <[email protected]>
Subject: RE: Questions about using AutoDetect and DigestParser
As for 1, y, sorry, that's a bug I've been meaning to fix... 

As for 2, you're right, the test code is fairly opaque.  Sorry.  The code below 
works when I put it in DigestingParserTest.

The behavior you're seeing with AutoDetectParser() happens when the 
AutoDetectParser fails to load parsers either via the config file or via SPI, 
which reads parsers to load from the Parser class' service file.  Is there any 
reason to think you're getting different SPI behavior with, say (= I don't know 
Scala, and I'm guessing...sorry)

val fileParser : Parser = new AutoDetectParser()

vs.

val fileParser : Parser = new DigestingParser(new AutoDetectParser(), digester)


I'm sure you've tried the following for kicks...(again, apologies for guessing)
        val autoParser : AutoDetectParser = new AutoDetectParser()
        val fileParser : DigestingParser = new DigestingParser(autoParser, 
digester)


Java unit test that works within DigestingParserTest:

    @Test
    public void testSimple() throws Exception {
        CommonsDigester.DigestAlgorithm[] algos = 
CommonsDigester.parse("md5,sha256,sha384,sha512");
        Metadata metadata = new Metadata();
        Parser d = new DigestingParser(new AutoDetectParser(), new 
CommonsDigester(UNLIMITED, algos));
        ContentHandler handler = new WriteOutContentHandler(-1);
        try (InputStream input = 
DigestingParserTest.class.getResourceAsStream("/test-documents/testPDF.pdf")) {
            d.parse(input, handler, metadata, new ParseContext());
        }

        String[] parsedBy = metadata.getValues("X-Parsed-By");
        for (String v : parsedBy) {
            System.out.println("Parsed by: " + v);
        }

        assertEquals("org.apache.tika.parser.DefaultParser", parsedBy[0]);
        assertEquals("org.apache.tika.parser.pdf.PDFParser", parsedBy[1]);
    }

Reply via email to