Le jeudi 10 f�vrier 2005 � 23:42 +0100, Erik Bruchez a �crit :
>
> > BTW, speaking of text/plain, I think that this case is much more
> > troublesome than it appears.
> >
> > First, a text can contain characters that are not allowed in XML
> > (such as the characters between 0 and 8.
>
> I didn't know about this.
>
> > Second, to serialize the text, you need to change its encoding and
> > that means that you have no roundtrip and are currently not able to
> > give back the original text and that can be a problem for many
> > applications.
> >
> > For instance, in HTML, the description of the encoding in the meta
> > tag can become incoherent with the actual encoding.
>
> > To work around these issues, I think that you should add an
> > attribute to store the initial encoding, so that you can restore it
> > when you deliver the text back as text and also find something to
> > "escape" the characters that are forbidden (maybe empty elements?).
>
> For this, no problem: the "content-type" attribute may specify the
> original encoding, e.g.:
>
> content-type="text/plain; charset=iso-8859-1"
>
> Which I think is a good reason to call the attribute "content-type".
Hmmm... I tend to avoid structured content hidden in text nodes or
attributes especially when they mix information that have different
natures...
media-type="text/plain" charset="iso-8859-1" is so much easier to
process in XML!
media-type="text" media-subtype="plain" charset="iso-8859-1" would be
excessive IMO since most of the time we do not separate the type from
the subtype...
> So there would just be the issue of those special characters. How can
> they be escaped?
They can't really be escaped in that they are forbidden even in CDATA
sections or as entity references. One could probably use unparsed
entities, but these are relics from SGML that are deprecated in
practice.
We need to think more about them, but the two solutions that come to
mind are:
* empty elements: <char value="0"/>
* PIs <?char value="0"?>
Both are rather ugly hacks, but I don't see any other ways to treat
them.
> > With all these troubles, I wonder if that's not better to serialize
> > plain text as base64 :-( !
>
> I don't think so. At least not yet. It's just so convenient to
> manipulate text as text in XSLT, for example.
Yes.
Eric (who has recently reporter a bug to subversion which had the same
kind of issues :)
--
Did you know it? Python has now a Relax NG (partial) implementation.
http://advogato.org/proj/xvif/
------------------------------------------------------------------------
Eric van der Vlist http://xmlfr.org http://dyomedea.com
(ISO) RELAX NG ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
_______________________________________________
orbeon-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/orbeon-user