Hi, Gesendet: Di, 10. Nov 2009 Von: Aaron Kaplan<lists2...@aaronkaplan.info>
> I checked out pdfbox 0.8.0, built it with ant, and ran the tests. Six > of them are failing: > > Failed tests: > testExtract(org.apache.pdfbox.util.TestTextStripper) > testRenderImage(org.apache.pdfbox.util.TestPDFToImage) > > Tests in error: > > testProtectionError(org.apache.pdfbox.encryption.TestPublicKeyEncryption) > testProtection(org.apache.pdfbox.encryption.TestPublicKeyEncryption) > > testMultipleRecipients(org.apache.pdfbox.encryption.TestPublicKeyEncryption) > > testParsingTroublePDFs(org.apache.pdfbox.pdfparser.TestPDFParser) > > > I looked at the output of TestTextStripper, and most of the differences > involve the glyph names circlecopyrt, angbracketleft, and > angbracketright, which were removed from glyphlist.txt in this commit: > > http://svn.apache.org/viewvc?view=revision&revision=793058 > > So my first question: how should these glyphs be getting resolved now > that they're not in glyphlist.txt; or do the tests need to be updated? We have to add the missing mappings. I've filed an issue in JIRA [1] > The remaining errors in TestTextStripper are all in the file > solidconvertor.pdf . The expected output file appears to be in UTF-16, > but the actual output file is a strange mixture of UTF-8 and corrupt > UTF-16. Second question: any idea why a corrupt output file is being > generated? > > I also looked into TestPDFParser and the problem was a missing input > file. I gather from an old mailing list post that it was removed > because of copyright problems. There are some "inofficial" test files. I guess it's one of them. > By this point I was getting the impression that these tests weren't > intended for me to run, so I didn't bother trying to figure out what was > going wrong in the other cases. My third question: is it expected that > the tests I listed above fail, or are there any that I should look into > as potential indicators of bugs? These tests exist to help us finding bugs after changes. So finally we expect that these tests don't fail. We try to increase the number of test pdfs to cover as much as possible test cases. But that's not that easy because of the known issue concerning the license or the confidentiality of some suitable pdfs. > Thanks > -Aaron Thanks for the reporting. BR Andreas Lehmkühler [1] https://issues.apache.org/jira/browse/PDFBOX-557