RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

Philippe Verdy Sun, 07 Dec 2003 18:54:56 -0800

Peter Kirk wrote:
> On 07/12/2003 15:40, Philippe Verdy wrote:
> > Peter Kirk wrote:
> > > Of course there is an even simpler way to provide the glue I 
> > > was talking about. W3C simply needs to relax the rule forbidding 
> > > combining marks at the start of a string (and interpret the one 
> > > precomposed character with ">" as base as if it were decomposed, 
> > > as I suggested before), and, remembering that use of NFC is a 
> > > strong recommendation rather than a requirement, not insist on 
> > > NFC in such cases. Then nothing needs to be added to Unicode.
> >
> > There's little chance that this will be relaxed by the W3C, because 
> > now HTML is XML (since XHTML is the current recommanded standard, 
> > and HTML 4.01 is just kept as is, and all other extensions are being 
> > developped since XHTML 1.1 as modules with DTDs or XML schemas), and 
> > because XML text elements are independant. What you propose would 
> > break the XML containment model (could it be implemented however in 
> > XSLT transforms from XHTML? I doubt because the output of XSLT is 
> > also XML, even if it does not always produce a XML syntax, but only 
> > a DOM-parsable tree or InfoSet...)
>
> Well, this is W3C's problem. They seem to have backed themselves into a 
> corner which they need to get out of but have no easy way of doing so. 
> Unicode is of course very familiar with this kind of situation e.g. with 
> character name errors, combining class errors, 11000+ redundant Korean 
> characters without decompositions, etc etc. So no doubt it can extend 
> its sympathy; and possibly even offer to help by encoding the kind of 
> character I was suggesting early (perhaps in exchange for some W3C 
> readiness to accept correction of errors in the normalisation data?). 
> But really this is not a Unicode issue.


I don't agree with you there: going to XML was a good decision for the 
evolution,stabilisation and interoperability of HTML (now extensions are 
in modules, described by DTDs or schemas, and this offers a good framework 
for interoperability of documents, even if they don't implement the same 
set of optional modules.

If you want something better, it is not by modifying XML (so HTML will 
stick on XML now). But in the way the DOM-tree or InfoSet generated from 
a parsed XHTML document will be rendered. With CSS and XSLT, you have 
the tools to define precisely with a compilable language, how this data 
tree can be transformed to prepare the rendering of documents.

Nothing will forbid the standard XHTML modules to define standard 
transformations in relation with style, as a XSLT application. So this 
applies to the transformation of plain text contained in the XHTML 
document into another XML document containing all the associated glyphic, 
layout and style information. Some of these information may be used to 
monitor the behavior of font renderers to enable or disable features 
with the augmented data which contains now< more than just plain text.

So this stylesheet processor will be able to position clealy diacritics 
above letters, or to create Korean syllabic clusters, or even Han 
ideographic clusters, or to alter the relative positions of the diacritic 
and its base letter to take into account differences of styles (for 
example, if the stylesheet instructs the HTML processor to render dots 
above "i" with a custom start bitmap or SVG graphic, or in bold style 
from another font...)

The initial problem for Tamil transcoding with markup is not a problem 
for Unicode or even for HTML: the author has created in its document 
separate runs of texts without specifying clearly how these separate runs 
may be rendered in a coherent layout. For Unicode or for HTML, there's 
a default layout which is the HTML "box model", and attempts to break it
requires relative positioning (specified in CSS), and possible 
transformation of the initial text into other text or markup (this is a 
work for XSLT, and could be specified in a further revision of CSS, to 
specify such complex rendering out of the default "box model").


__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE!  http://www.ellaforspam.com

<<attachment: winmail.dat>>

RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

Reply via email to