Re: FOP JUnit test
Jeremias Maerki schrieb: On 02.07.2003 01:07:08 J.Pietschmann wrote: I tried to produce a concept for some automated JUnit test... [..] BTW, I've started a few JUnit tests myself (tests for basic functionality of the API and for the IO-related classes). I also thought a bit about testing lately and I'd like to see a new test-target consisting of the following sub-targets: test-publicapi test-renderapi test-areatree (current tests) test-pdf test-xx (actual unit tests) .. where the dist-target should depend on (and gump should run it too). The current area tree tests would need some changes like storing the reference area trees in cvs (but this would remove the need for a reference jar which caused me some troubles in the past with incompatible jars) Christian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: FOP JUnit test
On 02.07.2003 21:54:59 J.Pietschmann wrote: > Life sucks. -1 (veto). Life shouldn't suck. Let's fix it. > I considered doing the checksum in a SAX filter for a short time. > This had the added benefit of making XML comments and some other > issues which don't influence the rendered result having no > influence on the checksum. The drawback: attributes need to be > sorted before feeding them into the digester. Any other pitfalls > to watch? Good idea. I can see no other pitfalls ATM. > > I'd still favor the PDF2TIFF with visual diffing approach using > > GhostScript. > > Yeah, but what if something invisible was screwed, or the difference > got lost due to pixelation during rendering? Think of off-by-one > errors in border placement calculations. Point. So we need both. Jeremias Maerki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: FOP JUnit test
Jeremias Maerki wrote: - PDFInfo unconditionally puts the creation time My preference is the second option. Will do. - Source FO line endings How about running it through the Canonicalizer from org.apache.xml.security? Hmhm. There is already a java.io.LineNumberInputStream which normalizes line endings as a side effect of counting them. The problem: it is deprecated and replaced by LineNumberReader (or such). More specifically: digesting is done with raw bytes, on an InputStream. Line endings are characters, and can only be reliably dealt with in a character stream, which means some subclass of Reader. To make it really difficult: it is the XML parser who ultimately decides the source stream encoding. Life sucks. I considered doing the checksum in a SAX filter for a short time. This had the added benefit of making XML comments and some other issues which don't influence the rendered result having no influence on the checksum. The drawback: attributes need to be sorted before feeding them into the digester. Any other pitfalls to watch? I'd still favor the PDF2TIFF with visual diffing approach using GhostScript. Yeah, but what if something invisible was screwed, or the difference got lost due to pixelation during rendering? Think of off-by-one errors in border placement calculations. BTW, I've started a few JUnit tests myself (tests for basic functionality of the API and for the IO-related classes). That's the bottom-up approach. URLresolving, image loading and whatnot could use this too (the current URL utils have bugs). J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: FOP JUnit test
On 02.07.2003 01:07:08 J.Pietschmann wrote: > I tried to produce a concept for some automated JUnit test... Great stuff. > Problems: > - PDFInfo unconditionally puts the creation time into the >PDF. This thwarts the whole thing. On my machine I can >disable it temporarily, but there should be a more >sustainable solution. Ideas: >o pass a flag to the renderer which inhibits creation time > creation >o pass a creation date value (can be abused, but abusers can > implement it anyway) >o patch it in the result array before digesting (hack alert) My preference is the second option. > - Source FO line endings: both CVS and ZIP may alter them, >making the source MD5 invalid. I'm not sure whether FixCRLF >can be of use here. Either way, running the tests from Eclipse >unprepared could be a bad idea. Possible fixes: >o have two MD5 in the control one for the source with CRLF, > one with LF only. Makes updating more unconvenient. >o use another FilterStream to transform CRLF->LF before > digesting. Adds unwanted complexity, but probably the way > to go... How about running it through the Canonicalizer from org.apache.xml.security? > - Hidden regressions: a checksum mismatch does not necesarily >cause a visible problem, lets say the author string gets spaces >appended or such. For proper inspection of failures we probably >need a more sophisticated tool than simply display two PDFs >side-by-side. For a starter, a sort of PDF diff which extracts >the streams, uncompresses and displays mismatches with a bit of >context would certainly be valuable. Any takers? I'd still favor the PDF2TIFF with visual diffing approach using GhostScript. I hope I'll have some time to do that once I've finished the IO stuff (Finally made it into the Commons-IO project). BTW, I've started a few JUnit tests myself (tests for basic functionality of the API and for the IO-related classes). Jeremias Maerki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: FOP JUnit test
J.Pietschmann wrote: > I tried to produce a concept for some automated JUnit test, with > the intent to quickly uncover regressions during wholesale > refactoring. > I came up with > http://cvs.apache.org/~pietsch/FopTest.java > sample control file at > http://cvs.apache.org/~pietsch/regression.xml > > Overview: the control file holds a source, a MD5 for it so you > can detect test failures caused by a changed source rather than > a regression easier and a MD5 for the result file. If the test > runs through, all is well. If a test fails, you can investigate > the file and see whether the change was a regression (fix it) > or an improvement (update the MD5 for the result). Awesome. > Problems: > - PDFInfo unconditionally puts the creation time into the >PDF. This thwarts the whole thing. On my machine I can >disable it temporarily, but there should be a more >sustainable solution. Ideas: >o pass a flag to the renderer which inhibits creation time > creation >o pass a creation date value (can be abused, but abusers can > implement it anyway) >o patch it in the result array before digesting (hack alert) Second choice makes the most sense to me. There are other non-abusive uses for an artificial creation date -- for example, creating a collection of user documentation files that all have the same date/time stamp as part of a release. > - Source FO line endings: both CVS and ZIP may alter them, >making the source MD5 invalid. I'm not sure whether FixCRLF >can be of use here. Either way, running the tests from Eclipse >unprepared could be a bad idea. Possible fixes: >o have two MD5 in the control one for the source with CRLF, > one with LF only. Makes updating more unconvenient. >o use another FilterStream to transform CRLF->LF before > digesting. Adds unwanted complexity, but probably the way > to go... I agree that the second choice is better. I haven't had time to explore the line-ending issue with Eclipse. I'm guessing that most of the Eclipse users on this list are using it with Linux? If so, then it is not an issue? Otherwise, they must have some scheme for conversion to LF already -- otherwise how do they get code checked in? > - Hidden regressions: a checksum mismatch does not necesarily >cause a visible problem, lets say the author string gets spaces >appended or such. For proper inspection of failures we probably >need a more sophisticated tool than simply display two PDFs >side-by-side. For a starter, a sort of PDF diff which extracts >the streams, uncompresses and displays mismatches with a bit of >context would certainly be valuable. Any takers? Interesting idea. Maybe it is (almost) as good (but not as much fun) to let the developer convert each PDF file to Postscript & diff the Postscript files. Or, perhaps to use the Postscript output option in the first place if you have a hidden difference. It might actually be a better use of resources to beef up the Postscript output and add pdfmarks into it, ie. to make sure that FO --> Postscript (with pdfmarks) --> PDF (using Distiller) produces identical results as FO --> PDF. That might be tricky or even impossible, but if it worked, then Postscript with pdfmarks could be used as the input to diff, with the added benefit that our Postscript output would kind of be forced to keep pace with our PDF output. You are definitely on a useful track here. Victor Mote - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]