Fredrik Lundh wrote: > Chris Withers wrote: > >>> That's how escaping works, be it in XML, encodings, compression, whatever. >> Well yes and no. I'd expect escaping to work such that whatever we're >> dealing with can be round tripped, ie: parsed, serialiazed, parsed >> again, etc. > > that's exactly how it works in ET, of course.
I didn't say it didn't ;-) > cdata is character data; see > > http://www.w3.org/TR/html401/types.html#h-6.2 > > that's not the same thing as a "CDATA section" (which is just one of > several ways to store character data in an XML file). Ug. How confusing :-( > how things are > stored doesn't matter; that's just a serialization detail: > > http://www.w3.org/TR/xml-infoset/#omitted > > What is not in the Information Set > > 6. Whether characters are represented by character references. > 19. The boundaries of CDATA marked sections. > ... I'm not sure I follow what you're trying to say... >> I and many others do not ;-) When writing content into an html template, >> that content often comes from other sources that spit out lumps of html. >> Being able to insert them without escaping is a common use case. > > HTML might be similar to XML, but an XML parser cannot parse HTML, so > you cannot insert HTML fragments into an XML document without either > escaping it, or pre-processing it to make sure it's well-formed. What about xhtml? > if you want to embed HTML fragments in an ET tree, use ElementTidy or > ElementSoup (or equivalent) to turn the fragment into properly nested > and properly namespaced XHTML. Fair enough... > if you want to do unstructured string handling, use a template library I'm using/building a templating library, it just happens that ET is an implementation detail of that template library ;-) >> That's true, sometimes. That inserted lump may have come from a process >> which can only spit out perfect html fragments, in which case you're >> fine, or it may come from user input, in which case you're doomed but >> will likely have happy customers ;-) > > the hackers will be happy, at least: > > http://en.wikipedia.org/wiki/Cross_site_scripting user -> content author in this case. Since they usually own and run the system to which they're adding content, a much more effective attack would just be to turn the box off :-P cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig