I checked out pdfbox 0.8.0, built it with ant, and ran the tests. Six of them are failing:

Failed tests:
  testExtract(org.apache.pdfbox.util.TestTextStripper)
  testRenderImage(org.apache.pdfbox.util.TestPDFToImage)

Tests in error:
  testProtectionError(org.apache.pdfbox.encryption.TestPublicKeyEncryption)
  testProtection(org.apache.pdfbox.encryption.TestPublicKeyEncryption)

testMultipleRecipients(org.apache.pdfbox.encryption.TestPublicKeyEncryption)
  testParsingTroublePDFs(org.apache.pdfbox.pdfparser.TestPDFParser)


I looked at the output of TestTextStripper, and most of the differences involve the glyph names circlecopyrt, angbracketleft, and angbracketright, which were removed from glyphlist.txt in this commit:

http://svn.apache.org/viewvc?view=revision&revision=793058

So my first question: how should these glyphs be getting resolved now that they're not in glyphlist.txt; or do the tests need to be updated?

The remaining errors in TestTextStripper are all in the file solidconvertor.pdf . The expected output file appears to be in UTF-16, but the actual output file is a strange mixture of UTF-8 and corrupt UTF-16. Second question: any idea why a corrupt output file is being generated?

I also looked into TestPDFParser and the problem was a missing input file. I gather from an old mailing list post that it was removed because of copyright problems.

By this point I was getting the impression that these tests weren't intended for me to run, so I didn't bother trying to figure out what was going wrong in the other cases. My third question: is it expected that the tests I listed above fail, or are there any that I should look into as potential indicators of bugs?

Thanks
-Aaron

Reply via email to