PM wrote: The biggest problem with this approach is that "diff" is much too coarsely grained to help us locate problems. For example, if we simply change a skin feature, HTML header, or even just the amount of whitespace that occurs in certain elements, then the above would report that every test has failed because it's no longer _exactly_ the same as our standards in html.d/ . In other words, the "failures" it reports aren't really failures in PmWiki; each of the pages are still semantically correct -- they're still rendered properly in a browser -- but the diff command falsely reports them as having failed.
Given that wget produces an HTML file - and particularly if it is an XHTML file - it is worth saying that it is possible to be more subtle. Of course we can use diff options that ignore whitespace. We can also pretty-format the HTML and then diff that canonical form (which ignores changes in cosmetic line breaks in the original HTML). More usefully, we can choose to extract only specific sub-trees, elements and/or attributes of interest from new and reference HTML files (for example, using a simple XSLT report), and then compare (only) those features. Of course, the more thorough this kind of automated testing, the more costly to set up. But even the simplistic diff is valuable - and the gradual use of more sophisticated comparisons can significantly reduce the number of false positives, while proving resilient over time. Regards Nigel Thomas http://www.preferisco.com
_______________________________________________ pmwiki-users mailing list [email protected] http://www.pmichaud.com/mailman/listinfo/pmwiki-users
