Re: FOP JUnit test

2003-07-03 Thread Christian Geisert
Jeremias Maerki schrieb:
On 02.07.2003 01:07:08 J.Pietschmann wrote:

I tried to produce a concept for some automated JUnit test...
[..]

BTW, I've started a few JUnit tests myself (tests for basic
functionality of the API and for the IO-related classes).
I also thought a bit about testing lately and I'd like to see
a new test-target consisting of the following sub-targets:
test-publicapi
test-renderapi
test-areatree (current tests)
test-pdf
test-xx  (actual unit tests)
..
where the dist-target should depend on (and gump should run it too).

The current area tree tests would need some changes like
storing the reference area trees in cvs (but this would remove
the need for a reference jar which caused me some troubles in
the past with incompatible jars)
Christian

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]


Re: FOP JUnit test

2003-07-02 Thread Jeremias Maerki

On 02.07.2003 21:54:59 J.Pietschmann wrote:
> Life sucks.

-1 (veto). Life shouldn't suck. Let's fix it.

> I considered doing the checksum in a SAX filter for a short time.
> This had the added benefit of making XML comments and some other
> issues which don't influence the rendered result having no
> influence on the checksum. The drawback: attributes need to be
> sorted before feeding them into the digester. Any other pitfalls
> to watch?

Good idea. I can see no other pitfalls ATM.

> > I'd still favor the PDF2TIFF with visual diffing approach using
> > GhostScript.
> 
> Yeah, but what if something invisible was screwed, or the difference
> got lost due to pixelation during rendering? Think of off-by-one
> errors in border placement calculations.

Point. So we need both.


Jeremias Maerki


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]



Re: FOP JUnit test

2003-07-02 Thread J.Pietschmann
Jeremias Maerki wrote:
- PDFInfo unconditionally puts the creation time
My preference is the second option.
Will do.

- Source FO line endings
How about running it through the Canonicalizer from
org.apache.xml.security?
Hmhm.
There is already a java.io.LineNumberInputStream which normalizes
line endings as a side effect of counting them.
The problem: it is deprecated and replaced by LineNumberReader (or
such).
More specifically: digesting is done with raw bytes, on an
InputStream.
Line endings are characters, and can only be reliably dealt with
in a character stream, which means some subclass of Reader.
To make it really difficult: it is the XML parser who ultimately
decides the source stream encoding.
Life sucks.

I considered doing the checksum in a SAX filter for a short time.
This had the added benefit of making XML comments and some other
issues which don't influence the rendered result having no
influence on the checksum. The drawback: attributes need to be
sorted before feeding them into the digester. Any other pitfalls
to watch?
I'd still favor the PDF2TIFF with visual diffing approach using
GhostScript.
Yeah, but what if something invisible was screwed, or the difference
got lost due to pixelation during rendering? Think of off-by-one
errors in border placement calculations.
BTW, I've started a few JUnit tests myself (tests for basic
functionality of the API and for the IO-related classes).
That's the bottom-up approach. URLresolving, image loading and whatnot
could use this too (the current URL utils have bugs).
J.Pietschmann



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]


Re: FOP JUnit test

2003-07-02 Thread Jeremias Maerki

On 02.07.2003 01:07:08 J.Pietschmann wrote:
> I tried to produce a concept for some automated JUnit test...

Great stuff.

> Problems:
> - PDFInfo unconditionally puts the creation time into the
>PDF. This thwarts the whole thing. On my machine I can
>disable it temporarily, but there should be a more
>sustainable solution. Ideas:
>o pass a flag to the renderer which inhibits creation time
>  creation
>o pass a creation date value (can be abused, but abusers can
>  implement it anyway)
>o patch it in the result array before digesting (hack alert)

My preference is the second option.

> - Source FO line endings: both CVS and ZIP may alter them,
>making the source MD5 invalid. I'm not sure whether FixCRLF
>can be of use here. Either way, running the tests from Eclipse
>unprepared could be a bad idea. Possible fixes:
>o have two MD5 in the control one for the source with CRLF,
>  one with LF only. Makes updating more unconvenient.
>o use another FilterStream to transform CRLF->LF before
>  digesting. Adds unwanted complexity, but probably the way
>  to go...

How about running it through the Canonicalizer from
org.apache.xml.security?

> - Hidden regressions: a checksum mismatch does not necesarily
>cause a visible problem, lets say the author string gets spaces
>appended or such. For proper inspection of failures we probably
>need a more sophisticated tool than simply display two PDFs
>side-by-side. For a starter, a sort of PDF diff which extracts
>the streams, uncompresses and displays mismatches with a bit of
>context would certainly be valuable. Any takers?

I'd still favor the PDF2TIFF with visual diffing approach using
GhostScript. I hope I'll have some time to do that once I've finished
the IO stuff (Finally made it into the Commons-IO project).

BTW, I've started a few JUnit tests myself (tests for basic
functionality of the API and for the IO-related classes).

Jeremias Maerki


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]



RE: FOP JUnit test

2003-07-01 Thread Victor Mote
J.Pietschmann wrote:

> I tried to produce a concept for some automated JUnit test, with
> the intent to quickly uncover regressions during wholesale
> refactoring.
> I came up with
>   http://cvs.apache.org/~pietsch/FopTest.java
> sample control file at
>   http://cvs.apache.org/~pietsch/regression.xml
>
> Overview: the control file holds a source, a MD5 for it so you
> can detect test failures caused by a changed source rather than
> a regression easier and a MD5 for the result file. If the test
> runs through, all is well. If a test fails, you can investigate
> the file and see whether the change was a regression (fix it)
> or an improvement (update the MD5 for the result).

Awesome.

> Problems:
> - PDFInfo unconditionally puts the creation time into the
>PDF. This thwarts the whole thing. On my machine I can
>disable it temporarily, but there should be a more
>sustainable solution. Ideas:
>o pass a flag to the renderer which inhibits creation time
>  creation
>o pass a creation date value (can be abused, but abusers can
>  implement it anyway)
>o patch it in the result array before digesting (hack alert)

Second choice makes the most sense to me. There are other non-abusive uses
for an artificial creation date -- for example, creating a collection of
user documentation files that all have the same date/time stamp as part of a
release.

> - Source FO line endings: both CVS and ZIP may alter them,
>making the source MD5 invalid. I'm not sure whether FixCRLF
>can be of use here. Either way, running the tests from Eclipse
>unprepared could be a bad idea. Possible fixes:
>o have two MD5 in the control one for the source with CRLF,
>  one with LF only. Makes updating more unconvenient.
>o use another FilterStream to transform CRLF->LF before
>  digesting. Adds unwanted complexity, but probably the way
>  to go...

I agree that the second choice is better. I haven't had time to explore the
line-ending issue with Eclipse. I'm guessing that most of the Eclipse users
on this list are using it with Linux? If so, then it is not an issue?
Otherwise, they must have some scheme for conversion to LF already --
otherwise how do they get code checked in?

> - Hidden regressions: a checksum mismatch does not necesarily
>cause a visible problem, lets say the author string gets spaces
>appended or such. For proper inspection of failures we probably
>need a more sophisticated tool than simply display two PDFs
>side-by-side. For a starter, a sort of PDF diff which extracts
>the streams, uncompresses and displays mismatches with a bit of
>context would certainly be valuable. Any takers?

Interesting idea. Maybe it is (almost) as good (but not as much fun) to let
the developer convert each PDF file to Postscript & diff the Postscript
files. Or, perhaps to use the Postscript output option in the first place if
you have a hidden difference. It might actually be a better use of resources
to beef up the Postscript output and add pdfmarks into it, ie. to make sure
that FO --> Postscript (with pdfmarks) --> PDF (using Distiller) produces
identical results as FO --> PDF. That might be tricky or even impossible,
but if it worked, then Postscript with pdfmarks could be used as the input
to diff, with the added benefit that our Postscript output would kind of be
forced to keep pace with our PDF output.

You are definitely on a useful track here.

Victor Mote


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]