(re-send, i think the first send failed) At 04:58 PM 2/12/02 +1100, you wrote: >I have been using HTML tidy from W3C but the Debian package is way out >of date. >I was wondering if people could recommend other HTML tidiers and >validators that they use.
Dear Simon, Firstly i use the W3C validator at least once a week and think it's the ant's pants. Also i send the W3C url to people who have really sucky web pages full of coding errors. It's amazing the amount of people who have so-called "state of the art" web pages. Then you find they dont sit preoperly, or won't print properly, or were desdigned for 1600 wide screens and i only have 800 wide on the net. Guess what - bad pages are full of coding errors. Moving right along, I have downloaded and used HTML Tidy. Actually i have not found it very satisfactory at all, as it does not fold lines at the correct place for my needs. It is not very "intelligent", and does not greatly improve readibility and does not remove leading rubbish. I guess there are lots of html-tidy programmes out there, but possibly my requirements are at one end of the spectrum. Basically my spec is as follows.... * Fold all text at around 70 or 75 * If possible fold text at a full stop or at a comma. * Fold all tags, but more leniently than text. * Tags not containing spaces must not be folded, as this could long urls that fetch counters etc. * All leading tabs and spaces are removed, i do not support the concept of indenting all all, it just fills the already congested net with zillions of spaces and tabs. * Next, all common <tags> are classified/defined as either "must be moved to be at the start of a line" or "must come at the end of a line" or "must be on a line by itself" or (default) can be moved if desired. * All internal CR/LF characters are then removed, and get replaced by those inserted by the folding process. Some html files are only a single line, these get folded and become readable. * Excessive space lines are removed. * Some intelligent folding algorithms are needed to fold lines sensibly where a line is part tag and part text, or where a line contains several tags. The first choice answer is to fold when needed at the >< point between two tags. Generally processing from the right side of a long line is the way to figure out how to fold a line. I have written a first cut of all this in VB3 (of all things) but it only does the first 6000 characters to date (don't ask, it's part of something larger). So far so good, i am much happier with the results than with HTML tidy. I have three web sites waiting to be "tidied", but it is critical to test thoroughly so that the folding is correct, and that characters are not being lost, and spurious characters are not being added in error. The critical part is defining the category for each tag. For example I want all <table> and </table> tags to be on a line by themselves. The aim is readibility and obviousness of what is going on. I think de-mystifying the table structures is critical to seeing what is happening on a page. I have a reasonable collection of html test files collected from all over, and the programme will have to do a reasonable job on all of them. The aim is to have a programme which will do an ok job to most files, and not a programme with a million options waiting to be set, obviously it could process anything perfectly, but only after the options are set correctly. I hope the ideas in the spec above help. Further contributions and comments are most welcome. Lastly, when "tidying" files one becomes aware of the poor standard of coding out there, and also the weird code and weird source code layouts produced by well known html editors. Brian -- SLUG - Sydney Linux User's Group - http://slug.org.au/ More Info: http://lists.slug.org.au/listinfo/slug
