Hi Ryan,

At 09:21 13-7-2001 -0500, you wrote:

>Anyway, my favorite kludge to this problem (although it might not be the
>most elegant) is to simply convert these bastard characters into html
>entities or normal characters:

<cut content="translation-table"/>

>Characters 160-255 are safe, but should probably be converted into HTML
>entities to be safer. These can be converted directly:
>
>160 = xA0 -> &#160; or &#xA0;
>161 = xA1 -> &#161; or &#xA1;
>etc...

The problem with this is, that SabloTron escapes the entities, when you've
declared them in the DTD. Might of course also be an expat thing, not realy
sure at this point.

I tried the following (if these screw up in the mailtransfer, I can attach 
them):
------------------------------------------------------------
test.xml:
------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE test SYSTEM "file://test.dtd">

<test>
         <characters>
                 middot: &amp;middot;
                 iuml: &#239;
                 sup1: &sup1;
                 quot: &quot;
         </characters>
</test>
------------------------------------------------------------
test.dtd:
------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!-- included full HTMLlat1 HTMLsymbol and HTMLspecial entities, from 
http://www.w3.org/TR/html4/*.ent -->
<ELEMENT test (characters*)>
<ELEMENT characters (#PCDATA)>

------------------------------------------------------------
test.xsl:
------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
         <xsl:output doctype-public="-//W3C//DTD HTML 4.01//EN" 
doctype-system="http://www.w3.org/TR/html4/strict.dtd"; method="xml" 
encoding="ISO-8859-1"/>
         <xsl:template match="/test/characters">
                 <html>
                         <body>
                                 <p>
                                         <xsl:value-of select="." 
disable-output-
escaping="yes"/>
                                 </p>
                         </body>
                 </html>
         </xsl:template>
</xsl:stylesheet>
------------------------------------------------------------
Then ran: sabcmd test.xsl test.xml >./output.html

------------------------------------------------------------
output.html
------------------------------------------------------------
         <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" 
"http://www.w3.org/TR/html4/strict.dtd";>
<html><body><p>
                 middot: &middot;
                 iuml: �
                 sup1:
                 quot: "
         </p></body></html>
------------------------------------------------------------

I did switch, output methods, but that doesn't make a difference. As you 
can see, an actual &sup1; is simply
ignored. A numeric approach outputs the character (enabling output-escaping 
does not change this). And
the quot and amp entities (and I suppose the gt and lt) are substituted. 
When output-escaping is on, the
middot is output &amp;middot;, the sup1 is still ignored and the iuml and 
quot still are output as characters.

The XSLT spec is not explicit on the subject, but if I read it correctly, 
disabling out-escaping should simply
leave the &sup1; entity in tact.

http://www.w3.org/TR/xslt.html#disable-output-escaping :

<quote>"However, it is sometimes convenient to be able to produce output 
that is almost, but not quite well-formed XML; for example, the output may 
include ill-formed sections which are intended to be transformed into 
well-formed XML by a subsequent non-XML aware process."</quote>

An undefined entity, in my opinion, is not well-formed XML, thus should be 
output as such. The paradox here,
is that the entity has to be and is defined in the source document, but 
since the XSLT processor, doesn't know
about the output-document DTD, the entity should be considered undefined.

However - since the input document IS well-formed, and thus the &sup1; 
entity is defined, for sake of consistency
it should be output as the character defined by the DTD, in this case 
character 0185, which is part of ISO-8859-1.

Is this a discussion for the W3C XSL list, or is it something SabloTron 
should/can/is willing to change?

For now I replace the MS-added characters as &amp;entity; to make the 
browser worry about how it's gonna
display it. Or use a numeric structure, depending on the environment.

In any case - many thanx Ryan for the info!

_______________________________________________________________
Met vriendelijke groeten / with kind regards,

IDG.nl
Melvyn Sopacua
WebMaster
_______________________________________________________________

Reply via email to