Bill, I've seen this kind of stuff happen to files on and off over the years. The solution we usually use if just writing something on the spot for the job at hand. That, or just live with it. That may sound like a bad idea but unless it's a high-traffic document it may not be worth the time invested...
-- Dave Spencer, PageWeavers --- Bill wrote --- I've come across some documents that are formatted in such a way that, when converted to HTML, they come out something like this: <font face="Arial">And</font> <font face="Arial">then</font> <font face="Arial">they</font> <font face="Arial">looked</font> or even worse: <font face="Arial">A</font><font face="Arial">n</font><font face="Arial">d</font> ... I've come up with a way, using PHP's DOMDocument system, to scrape a file clear of these, but it's very slow, and it's basically something that can be done on a stream of text (rather than having to worry about the document's structure). I'm thinking of writing something in PHP or C to clean stuff like this up, but am wondering if anyone else has any experience and suggestions? (And yes, I've used "htmltidy", but while that can merge _nested_ styles, e.g., a "<font face="Arial"><font size=+1>" get combined into its own CSS stype, e.g., "<span class="c123">", it doesn't seem to be able to merge _consecutive_ styles, as shown in the examples above. :^/ ) -- -bill! _______________________________________________ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech