The '&' character is a magic character, and must be 
encoded if it is to show up "raw" in the website content itself.
    

Actually, normally, HTML defines that if it doesn't understand something
it should leave it alone: unknown tags are removed and ignored, unknown
entities (of which this one) should be ignored, unknown attributes
should be removed and ignored, etc.
In the case of the AT&T site, the ampersand, &, is NOT an unknown tag and it is NOT an unknown entity.  Remember that entities are of the form &xxx; where xxx is a keyword or &#xxx; where xxx is a number.  It is strictly forbidden as a "raw" character in HTML code, and should be encoded as &.or &  See www.w3.org/TR/html401/sgml/entities.html. JPluck uses a Java version of the famous HTML Tidy to clean up HTML code.  It can correct many, but not all HTML errors.  I agree with your frustration, however.  Just take a look at my.yahoo.com sometime if you what to see VERY broken HTML code.

Ed





Reply via email to