On Monday, July 07, 2003 2:04 PM, Peter Kirk <[EMAIL PROTECTED]> wrote:

> On 07/07/2003 04:15, Philippe Verdy wrote:
> > The list separator in French is preferably the semicolon, rather
> > than a comma (which must then have a space):
> > => "123<thin space>;<standard space>456"
> > The <thin space> is here also encoded accroding to the character
> > encoding 
> > constraints and fonts (here also less wide than a digit,
> > unbreakable and 
> > not justified).
> Earlier he wrote:
> 
> > In strict historic English typography, the unbreakable whitespaces
> > before punctuations are often smaller (sixth of cadratin) and
> > that's why they are often missed in ASCII-only text.  
> 
> I wonder if here we are confusing character encoding with adjustments
> which should be made during rendering and typesetting - and which
> perhaps in the days of hot metal were made by including thin spacers.
> Are you really suggesting that the huge quantities of text in English,
> French and other languages, in ASCII  and Unicode, are actually
> wrongly 
> encoded, because there is almost invariably no character code for a
> thin 
> space before punctuation? Surely it would be much more sensible to
> accept that this text is correctly encoded, and leave it to the text
> rendering or typesetting process to adjust the position of punctuation
> marks as appropriate.

No I did not suggest such things. In fact I just wrote the opposite, byjust saying 
that there are a lot of variation in the actual space character used in strict 
typographic typesetting for punctuations.

Correctly encoded French text means nothing face to Unicode standardization: this is 
not a Unicode issue but an internationalization and localization issue,as well as a 
rendering decision from the document author.

Whatever space is used, given several other constraints which may limit the choice of 
spaces to use, this does not change the cultural convention used in French to use a 
space rather than a dot as a thousands grouping separator.

The situation is less clear however for phone numbers: some use thin unbreakable 
spaces equivalently to dots, and phone numbers are generally grouped by units of 2 
digits (for the standard 10-digits national format), or 3 digits (for special numbers, 
if they are just easier to remember, like toll free numbers "0 800 xxx xxx") or no 
separator at all for national short numbers
with 3 or 4 digits like "112" (the European emergency phone number, toll-free on wired 
lines and mobile phones). For phone numbers in France, we never use any hyphen.

As I said, I described the *ideal* encoding and rendering of the group separator, not 
any single encoding (which is used and chosen by each author). With all respects to 
what Tex said, the usage of dots as a thousands group separator is never used by 
actual French writers, and you'll find it only in softwares incorrectly localized to 
French.

The default Windows setting for this grouping character is the non-breaking space 
U+00A0 found in the default Windows codepage 1252 used by Western European 
localization of Windows. And few users feel the need to change it in the user's 
regional settings.

In Linux/Unix translation projects for documents, this NBSP is the de-facto prefered 
encoding agreed by the translation community, and its translation standard requires 
using this NBSP before any two-glyphs ending or closing punctuation sign (colon, 
semi-colon, exclamation point, interrogation point, closing double angle guillemot), 
and after any two-glyph opening punctuation sign (opening double angle guillemot). So 
I do think that NBSP is the best interoperable encoding for a source text, but this 
means nothing for the actual typesetting of the documents, which may implicitly 
replace NBSP occurences (in NBSP+punctuation or digit+NBSP+digit) by a less wide 
unbreakable and non-justified space (but certainly not by a dot).

Some other conventions use in English is the double-space after a sentence-ending dot: 
this convention does not exist in French, and I do think that it exist in English as a 
way to represent a large (cadratin minimum width) space after this dot. In French the 
minimum width for this space is just a half-cadratin (so it matches the standard 
space), and this space can be word-justified (made wider) or removed when lines wrap 
on the right margin.

This is unlike thin-spaces used for digit grouping, or for linking a punctuation sign 
to the nearby character, which are unbreakable, and not word-justified; but they still 
allow to be enlarged if a word justification creates too wide spaces, and 
intercharacter spacing or narrowing must be applied to create a more uniform colored 
text, notaby for texts presented in standard narrow columns (such as newspapers, 
classified ads, or phonebook white pages, where each line is roughly 53 characters or 
signs on average).


Reply via email to