Re: Translating the standard

2018-03-14 Thread Andre Schappo via Unicode


On 13 Mar 2018, at 02:49, Yifán Wáng via Unicode 
> wrote:

Somewhat digressing from the topic, but I'd like to make some comment
on this part as I smell a persistent myth among some, hopefully small number
of, software engineers in Anglosphere.

First, the fact that computer languages are written using English
words doesn't mean that programmers are supposed to have proportional
English knowledge. Take the word of Matz, the creator of Ruby
language: "The English skill is a super-powerful rare card (in the
career path of a Japanese engineer)!" He then continue that you should
be in keeping with most up-to-date overseas info/trend in order to be
a high-tier engineer and so on. It's far from "requirement".
http://eikaiwa.dmm.com/blog/3826/

I've also read somewhere a memoir of a middle-aged programmer who was
already into BASIC in childhood. One day he thought he'd written off a
"great" program and printed it on paper, but to his surprise, an
auntie who took a look at it immediately decoded the program and
roughly understood what it was meant to do; she knew English, and he
didn't.

Programming as such, is just like a Chinese room replaced with
English, where you sit inside a cramped room night after night,
communicating with a computer by typing in English words the bulky
reference guide teaches you. Most East Asian countries are blessed
enough with a tremendous number of translated technical publications
(e.g. O'Reilly) each year, not to mention firsthand writings in their
own languages. So the documentation is easily available if you don't
speak English the language.

Second, that English is lingua franca doesn't necessarily mean the
English spoken in the wild is. The aviation industry is another field
which employs English as the common language, but they exert utmost
effort to maintain the system working. Namely, they have a controlled
word set with semantics as disambiguated as possible, called
ASD-STE100, for technical documentation, such as maintenance manuals,
to minimize errors caused by limited English knowledge. Unicode, on
the other hand, is merely written in a free style used when English
speakers who (almost) graduated from college write to English speakers
who (almost) graduated from college. Having such level of proficiency
being a non-native speaker isn't something trivial, unless someone is
constantly in contact with English-speaking community. (And
programming community isn't contained inside English-speaking
community at all.)

That said, I agree to almost everything Alastair said after. If I have
to add one more thing, a monolingual writing is usually too tightly
coupled with the language, more than engineers may believe, even if
the writer carefully chose their words to be context-neutral. Thus
it's hard job to say no more and no less than the original text in
another language, especially when exactitude matters. It's one of the
problems prevent from fully automated translation being a thing, I
guess.

Best regards,

Yifan


When it comes to program identifiers, languages such as Chinese has a huge 
advantage as it is a much more compact language than English. So, one can write 
meaningful identifier names with a small number of Chinese characters. In an 
all Chinese development team, producing software for the Chinese market, why 
not have the program identifiers written in Chinese? Or maybe this does happen?

Over the years I have talked with many Chinese students about this, and usually 
they tell me something like: "Our lecturers in China tell us to always use 
English for program identifiers".

I make use of several languages for my program identifiers ➜ 
jsfiddle.net/user/coas/fiddles My use of 
non English languages for program identifiers is somewhat random

André Schappo




Re: Translating the standard

2018-03-13 Thread Marcel Schneider via Unicode
On Tue, 13 Mar 2018 16:48:51 -0700, Asmus Freytag (c) via Unicode wrote:

On 3/13/2018 12:55 PM, Philippe Verdy wrote:

It is then a version of the matching standards from Canadian and French 
standard bodies. This does not make a big difference, except that those 
national standards (last editions in 2003) are not kept in sync with evolutions 
of the ISO/IEC standard. So it can be said that this was a version for the 2003 
version of the ISO/IEC standard, supported and sponsored by some of their 
national members.


There is a way to transpose international standards to national standards, but 
they then pick up a new designation, e.g. ANSI for US or DIN for German or EN 
for European Norm.

A./



2018-03-13 19:38 GMT+01:00 Asmus Freytag via Unicode :


On 3/13/2018 11:20 AM, Marcel Schneider via Unicode wrote:

On Mon, 12 Mar 2018 14:55:28 +, Michel Suignard wrote:


Time to correct some facts.
The French version of ISO/IEC 10646 (2003 version) were done in a separate 
effort by Canada and France NBs and not within SC2 proper. 
...


Then it can be referred to as “French version of ISO/IEC 10646” but I’ve got 
Andrew’s point, too.


Correction: if a project is not carried out by SC2 (the proper ISO/IEC 
subcommittee) then it is not a "version" of the ISO/IEC standard.

A./
 





Thanks for correction. And I confess and apologize that on Patrick’s French 
Unicode 5.0 Code Charts page (
http://hapax.qc.ca/Tableaux-5.0.htm
), there is no instance of "version", although the item is referred to as "ISO 
10646:2003 (F)", from which it can ordinarily be inferred that "ISO" did back 
the project and that it is considered as the French version of the standard.
 
I wasn’t aware that this kind of parsing the facts is somewhat informal and 
shouldn’t be handled on mailing lists without a caveat. 
 
That said, the French transposition of ISO/IEC 10646 was not carried out as 
just sort of a joint venture of Canada and France (which btw has stepped out, 
leaving Québec alone supporting the cost of future editions! Really ugly), 
given that it got feedback from numerous countries, part of which was written 
in French, and went through a heavy ballot process. Thus, getting it changed is 
not easy since it was approved by the time, and any change requests should be 
documented and are primarily damageable as threatening stability. Name changes 
affecting rare characters prove to be feasible, while on the other hand, 
syncing the French name of U+202F with common practice and TUS is obviously 
more complicated, which in turn compromises usability in UIs, where we’re 
therefore likely to use descriptors i.e. altered names for roughly half of the 
characters bearing a specific name. Somehow the same rationale as for UTN #24 
but somewhat less apposite given that the French transposition is not 
constrained by stability policies.
 
Best regards,
 
Marcel
 

Re: Translating the standard

2018-03-13 Thread Asmus Freytag (c) via Unicode

On 3/13/2018 12:55 PM, Philippe Verdy wrote:
It is then a version of the matching standards from Canadian and 
French standard bodies. This does not make a big difference, except 
that those national standards (last editions in 2003) are not kept in 
sync with evolutions of the ISO/IEC standard. So it can be said that 
this was a version for the 2003 version of the ISO/IEC standard, 
supported and sponsored by some of their national members.


There is a way to transpose international standards to national 
standards, but they then pick up a new designation, e.g. ANSI for US or 
DIN for German or EN for European Norm.


A./


2018-03-13 19:38 GMT+01:00 Asmus Freytag via Unicode 
>:


On 3/13/2018 11:20 AM, Marcel Schneider via Unicode wrote:

On Mon, 12 Mar 2018 14:55:28 +, Michel Suignard wrote:

Time to correct some facts.
The French version of ISO/IEC 10646 (2003 version) were done in a separate 
effort by Canada and France NBs and not within SC2 proper.
...

Then it can be referred to as “French version of ISO/IEC 10646” but I’ve 
got Andrew’s point, too.

Correction: if a project is not carried out by SC2 (the proper
ISO/IEC subcommittee) then it is not a "version" of the ISO/IEC
standard.

A./







Re: Translating the standard

2018-03-13 Thread Philippe Verdy via Unicode
It is then a version of the matching standards from Canadian and French
standard bodies. This does not make a big difference, except that those
national standards (last editions in 2003) are not kept in sync with
evolutions of the ISO/IEC standard. So it can be said that this was a
version for the 2003 version of the ISO/IEC standard, supported and
sponsored by some of their national members.

2018-03-13 19:38 GMT+01:00 Asmus Freytag via Unicode :

> On 3/13/2018 11:20 AM, Marcel Schneider via Unicode wrote:
>
> On Mon, 12 Mar 2018 14:55:28 +, Michel Suignard wrote:
>
> Time to correct some facts.
> The French version of ISO/IEC 10646 (2003 version) were done in a separate 
> effort by Canada and France NBs and not within SC2 proper.
> ...
>
> Then it can be referred to as “French version of ISO/IEC 10646” but I’ve got 
> Andrew’s point, too.
>
> Correction: if a project is not carried out by SC2 (the proper ISO/IEC
> subcommittee) then it is not a "version" of the ISO/IEC standard.
>
> A./
>
>
>


Re: Translating the standard

2018-03-13 Thread Asmus Freytag via Unicode

  
  
On 3/13/2018 11:20 AM, Marcel Schneider
  via Unicode wrote:


  On Mon, 12 Mar 2018 14:55:28 +, Michel Suignard wrote:

  
Time to correct some facts.
The French version of ISO/IEC 10646 (2003 version) were done in a separate effort by Canada and France NBs and not within SC2 proper. 
...

  
  Then it can be referred to as “French version of ISO/IEC 10646” but I’ve got Andrew’s point, too.


Correction: if a project is not carried out by SC2 (the proper
ISO/IEC subcommittee) then it is not a "version" of the ISO/IEC
standard.

A./




  



RE: Translating the standard

2018-03-13 Thread Marcel Schneider via Unicode
On Mon, 12 Mar 2018 14:55:28 +, Michel Suignard wrote:
> 
> Time to correct some facts.
> The French version of ISO/IEC 10646 (2003 version) were done in a separate 
> effort by Canada and France NBs and not within SC2 proper. 
> National bodies are always welcome to try to transpose and translate an ISO 
> standard. But unless this is done by the ISO Sub-committee
> (SC2 here) itself, this is not a long-term solution. This was almost 15 years 
> ago. I should know, I have been project editor for 10646 since 
> October 2000 (I started as project editor in 1997 for part-2, and been 
> involved in both Unicode and SC2 since 1990).

Then it can be referred to as “French version of ISO/IEC 10646” but I’ve got 
Andrew’s point, too.

> 
> Now to some alternative facts:
> >Since ISO has made of standards a business, all prior versions are removed 
> >from the internet, 
> >so that they donʼt show up even in that list (which Iʼd used to grab a free 
> >copy, just to check
> > the differences). Because if they had public archives of the free 
> > standards, not having any 
> >for the pay standards would stand out even more.
> >This is why if you need an older version for reference, you need to find a 
> >good soul in
> > the organization, who will be so kind to make a copy for you in the 
> > archives at the
> > headquarters.
> 
> OK, yes, the old versions are removed from the ISO site. Andrew has probably 
> easier access to older versions than you through BSI.
> He has been involved directly in SC2 work for many years. The 2003 version is 
> completely irrelevant now anyway and again was not
> done by the SC, there was never a project editor for a French version of 
> 10646.

Call him whatever, how can a project thrive without a head?

I think relevance is not the only criterium in evaluating a translation. The 
most important would probably 
be usefulness. Older versions are an appropriate means to get in touch with 
Unicode, as discussed when 
some old core specs were proposed on this list.

> 
> >The last published French version of ISO/IEC 10646 — to which you 
> >contributed — is still available on
> > Patrickʼs site:
> >
> >http://hapax.qc.ca/Tableaux-5.0.htm
> 
> The only live part of that page is the code chart and does not correspond to 
> the 1064:2003 itself (they are in fact Unicode 5.0 charts,
> however close to 10646:2003 and its first 2 amendments), I am not sure the 
> original 10646:2003 (F), and the 2 translated amendments
> (1 and 2) are available anywhere and are totally obsolete today anyway. Only 
> Canada and/or Afnor may still have archived versions.

Given that for each time some benevolent people have their nameslist 
translation ready for print, 
they have to pay the tool and the fonts — just plainly disgusting. 

No wonder once you get such a localized Code Charts edition printed out in PDF, 
it has everlasting value!

> 
> >(Iʼd noticed that the contributorsʼ list has slightly shrinked without being 
> >able to find out why.)
> > The Code Charts have not been produced, however (because there is actually 
> > no
> > redactor‐in‐chief, as already stated, and also because of budget cuts the 
> > government is not in
> > a position to pay the non‐trivial amount of money asked for by Unicode for 
> > use of the fonts
> > and/or [just trying to be as precise as I can this time| the owner of the 
> > tooling needed).
> 
> A bunch of speculation here, never was a 'redactor-in-chief' for French 
> version, Unicode never asked for money because first of all
> it does not own the tool (it is licensed by the tool owner who btw does this 
> work as a giant goodwill gesture, based on the money received
> and the amount of work required to get this to work).

Shame! Unicode should manage to get the funding — no problem for Apple! (but 
for Microsoft who had to fire many employees) —
so that the developer is fully paid and rewarded. Why has Unicode no unlimited 
license? Because of the stinginess of those corporate
members that have plenty of money to waste. I’ll save that off‐topic rant but 
without ceasing to insist that he must be paid, fully paid
and paid back and paid in the future, the more as the Code Charts are now 
printed annually and grow bigger and bigger.
It’s really up to the Consortium to gather the full license fee from their 
corporate members for the English version and any other 
interested locale. Unicode’s claim of mission encompasses logically making 
available for free as many localized Code Charts and
whatever else so far as benevolent people translate the sources. 

Shouldn’t that have been clear from the beginning on?

> In a previous message you also made some speculation about Apple role or 
> possibility that have no relationship with reality.
> 
> >Having said that, I still believe that all ISO standards should have a 
> >French version, shouldnʼt they? 
> 
> You are welcome to contribute to that. Good luck though.
> 
> On a side note, I have been working with the 

Re: Translating the standard

2018-03-12 Thread Yifán Wáng via Unicode
 there are two versions 
> declared, for political reasons, to both be canonical, which is obviously 
> risky as there’s a chance they might differ subtly on some point, perhaps 
> even because of punctuation).
>
> In terms of widespread understanding of the standard, which is where I think 
> translation is perhaps more important, I’m not sure translating the actual 
> standard itself is really the way forward.  It’d be better to ensure that 
> there are reliable translations of books like Unicode Demystified or Unicode 
> Explained - or, quite possibly, other books aimed more at the general public 
> rather than the software community per se.
>
> Kind regards,
>
> Alastair.
>
> --
> http://alastairs-place.net
>
>



RE: Translating the standard

2018-03-12 Thread Michel Suignard via Unicode
Time to correct some facts.
The French version of ISO/IEC 10646 (2003 version) were done in a separate 
effort by Canada and France NBs and not within SC2 proper. National bodies are 
always welcome to try to transpose and translate an ISO standard. But unless 
this is done by the ISO Sub-committee (SC2 here) itself, this is not a 
long-term solution. This was almost 15 years ago. I should know, I have been 
project editor for 10646 since October 2000 (I started as project editor in 
1997 for part-2, and been involved in both Unicode and SC2 since 1990).

Now to some alternative facts:
>Since ISO has made of standards a business, all prior versions are removed 
>from the internet, 
>so that they donʼt show up even in that list (which Iʼd used to grab a free 
>copy, just to check
> the differences). Because if they had public archives of the free standards, 
> not having any 
>for the pay standards would stand out even more.
>This is why if you need an older version for reference, you need to find a 
>good soul in
> the organization, who will be so kind to make a copy for you in the archives 
> at the
> headquarters.

OK, yes, the old versions are removed from the ISO site. Andrew has probably 
easier access to older versions than you through BSI. He has been involved 
directly in SC2 work for many years. The 2003 version is completely irrelevant 
now anyway and again was not done by the SC, there was never a project editor 
for a French version of 10646.

>The last published French version of ISO/IEC 10646 — to which you contributed 
>— is still available on
> Patrickʼs site:
>
>http://hapax.qc.ca/Tableaux-5.0.htm

The only live part of that page is the code chart and does not correspond to 
the 1064:2003 itself (they are in fact Unicode 5.0 charts, however close to 
10646:2003 and its first 2 amendments), I am not sure the original 10646:2003 
(F), and the 2 translated amendments (1 and 2) are available anywhere and are 
totally obsolete today anyway. Only Canada and/or Afnor may still have archived 
versions.

>(Iʼd noticed that the contributorsʼ list has slightly shrinked without being 
>able to find out why.)
> The Code Charts have not been produced, however (because there is actually no
> redactor‐in‐chief, as already stated, and also because of budget cuts the 
> government is not in
> a position to pay the non‐trivial amount of money asked for by Unicode for 
> use of the fonts
> and/or [just trying to be as precise as I can this time| the owner of the 
> tooling needed).

A bunch of speculation here, never was a 'redactor-in-chief' for French 
version, Unicode never asked for money because first of all it does not own the 
tool (it is licensed by the tool owner who btw does this work as a giant 
goodwill gesture, based on the money received and the amount of work required 
to get this to work). In a previous message you also made some speculation 
about Apple role or possibility that have no relationship with reality.

>Having said that, I still believe that all ISO standards should have a French 
>version, shouldnʼt they? 

You are welcome to contribute to that. Good luck though.

On a side note, I have been working with the same team of French volunteers to 
revive the French name list. So, this may re-appear in the Unicode web site at 
some point. Because I also produce the original code chart (in cooperation with 
Rick McGowan) for both ISO and Unicode it is a bit easier for me (although 
non-trivial). It also helps that I can read the French list :-). But the names 
list is probably as far as you want to go, and even that requires a serious 
amount of work in term of terms definition and production.

Michel



Re: Translating the standard

2018-03-12 Thread Marcel Schneider via Unicode
On Mon, 12 Mar 2018 10:00:16 +, Andrew West wrote:
> 
> On 12 March 2018 at 07:59, Marcel Schneider via Unicode
>  wrote:
> >
> > Likewise ISO/IEC 10646 is available in a French version
> 
> No it is not, and never has been.
> 
> Why don't you check your facts before making misleading statements to this 
> list?
> 
> > or at least, it should have an official French version like all ISO 
> > standards.
> 
> That is also blatantly untrue.
> 
> Only six of the publicly available ISO standards listed at
> http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html
> have French versions, and one has a Russian version. You will notice
> that there is no French version of ISO/IEC 10646.
> 
> Andrew

Since ISO has made of standards a business, all prior versions are removed from 
the internet, so that they donʼt show up even in that list (which Iʼd used to 
grab a 
free copy, just to check the differences). Because if they had public archives 
of the 
free standards, not having any for the pay standards would stand out even more.
This is why if you need an older version for reference, you need to find a good 
soul 
in the organization, who will be so kind to make a copy for you in the archives 
at 
the headquarters.

The last published French version of ISO/IEC 10646 — to which you contributed — 
is still available on Patrickʼs site:

http://hapax.qc.ca/Tableaux-5.0.htm

Actually, the French version has no chief redactor, and during a time, the 
French 
version of the NamesList was maintained only so far as to add the new names 
(for 
use in ISO 14651). For Unicode 10.0.0, the French translation has been again 
fully 
updated to Code Charts production level:

http://hapax.qc.ca/ListeNoms-10.0.0.txt

(Iʼd noticed that the contributorsʼ list has slightly shrinked without being 
able to 
find out why.) The Code Charts have not been produced, however (because there 
is actually no redactor‐in‐chief, as already stated, and also because of budget 
cuts 
the government is not in a position to pay the non‐trivial amount of money 
asked 
for by Unicode for use of the fonts and/or [just trying to be as precise as I 
can this 
time| the owner of the tooling needed).

Having said that, I still believe that all ISO standards should have a French 
version,
shouldnʼt they? :)

Best regards,

Marcel



Re: Translating the standard

2018-03-12 Thread Andrew West via Unicode
On 12 March 2018 at 07:59, Marcel Schneider via Unicode
 wrote:
>
> Likewise ISO/IEC 10646 is available in a French version

No it is not, and never has been.

Why don't you check your facts before making misleading statements to this list?

> or at least, it should have an official French version like all ISO standards.

That is also blatantly untrue.

Only six of the publicly available ISO standards listed at
http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html
have French versions, and one has a Russian version. You will notice
that there is no French version of ISO/IEC 10646.

Andrew


Re: Translating the standard

2018-03-12 Thread Marcel Schneider via Unicode
On Mon, 12 Mar 2018 07:39:53 +, Alastair Houghton wrote:
> 
> On 11 Mar 2018, at 21:14, Marcel Schneider via Unicode  wrote:
> > 
> > Indeed, to be fair. And for implementers, documenting themselves in English 
> > may scarcely ever have much of a problem, no matter whatʼs the locale.
> 
> Agreed. Implementers will already understand English; you can’t write 
> computer software
> without, since almost all documentation is in English, almost all computer 
> languages are
> based on English, and, to be frank, a large proportion of the software market 
> is itself
> English speaking. I have yet to meet a software developer who didn’t speak 
> English.
> 
> That’s not to say that people wouldn’t appreciate a translation of the 
> standard, but there are,
> as others have pointed out, obvious maintenance problems, not to mention the 
> issue that
> plagues some international institutions, namely the fact that translations 
> are necessarily
> non-canonical and so those who really care about the details of the rules 
> usually have to refer
> to a version in a particular language (sometimes that language might be 
> French rather than
> English; very occasionally there are two versions declared, for political 
> reasons, to both be
> canonical, which is obviously risky as there’s a chance they might differ 
> subtly on some point,
> perhaps even because of punctuation).

Sometimes it occurred in the EU that the French version was so sloppy it 
transformed the issue 
to entirely another one, but at the Unicode‐ISO/IEC merger the bad will was 
clearly on the other 
side —

> 
> In terms of widespread understanding of the standard, which is where I think 
> translation is
> perhaps more important, I’m not sure translating the actual standard itself 
> is really the way
> forward. It’d be better to ensure that there are reliable translations of 
> books like
> Unicode Demystified or Unicode Explained - or, quite possibly, other books 
> aimed more at
> the general public rather than the software community per se.

Good point. What we need most of all is a complete terminology, as well as full 
ranges of 
character names in every language, to enable people to talk about it after 
reading in English. 

Best regards,

Marcel



Re: Translating the standard

2018-03-12 Thread Marcel Schneider via Unicode
On Fri, 9 Mar 2018 08:41:35 -0800, Ken Whistler wrote:
> 
> 
> On 3/9/2018 6:58 AM, Marcel Schneider via Unicode wrote:
> > As of translating the Core spec as a whole, why did two recent attempts 
> > crash even
> > before the maintenance stage, while the 3.1 project succeeded?
> 
> Essentially because both the Japanese and the Chinese attempts were 
> conceived of as commercial projects, which ultimately did not cost out 
> for the publishers, I think. Both projects attempted limiting the scope 
> of their translation to a subset of the core spec that would focus on 
> East Asian topics, but the core spec is complex enough that it does not 
> abridge well. And I think both projects ran into difficulties in trying 
> to figure out how to deal with fonts and figures.

This is normally catered for by Unicode whose fonts are donated and 
licensed for the sole purpose of documenting the Standard. See FAQ.

Templates of any material to be translated are sent by Unicode, arenʼt 
they? The Unicode home page reads: “An essential part of our mission 
is to educate and engage academic and scientific communities, and 
the general public.” Therefore, translators should just have to translate 
e.g. the NamesList following Kenʼs sample localization (TN #24) — 
which is already a hard piece of work — and send the file to Unicode, 
to get a localized version of the Code Charts. Likewise ISO/IEC 10646 
is available in a French version or at least, it should have an official 
 French version like all ISO standards.

If Unicode donʼt own the tooling yet, Apple shall be happy to donate the 
funding to get Unicode in a position to fulfill their mission thoroughly,
like Apple (supposedly) donate non‐trivial amounts to many vendors to 
get them remove old software from the internet.

Using such localized NamesLists with Unibook to browse the Code Charts 
locally is another question, since that supposes handing the fonts out to 
the general public. So that is clearly a non‐starter. But browsing localized
Code Charts in Adobe Reader would be a nice facility.

Best regards,

Marcel



Re: Translating the standard

2018-03-12 Thread Alastair Houghton via Unicode
On 11 Mar 2018, at 21:14, Marcel Schneider via Unicode <unicode@unicode.org> 
wrote:
> 
> Indeed, to be fair. And for implementers, documenting themselves in English 
> may scarcely ever have much of a problem, no matter whatʼs the locale.

Agreed.  Implementers will already understand English; you can’t write computer 
software without, since almost all documentation is in English, almost all 
computer languages are based on English, and, to be frank, a large proportion 
of the software market is itself English speaking.  I have yet to meet a 
software developer who didn’t speak English.

That’s not to say that people wouldn’t appreciate a translation of the 
standard, but there are, as others have pointed out, obvious maintenance 
problems, not to mention the issue that plagues some international 
institutions, namely the fact that translations are necessarily non-canonical 
and so those who really care about the details of the rules usually have to 
refer to a version in a particular language (sometimes that language might be 
French rather than English; very occasionally there are two versions declared, 
for political reasons, to both be canonical, which is obviously risky as 
there’s a chance they might differ subtly on some point, perhaps even because 
of punctuation).

In terms of widespread understanding of the standard, which is where I think 
translation is perhaps more important, I’m not sure translating the actual 
standard itself is really the way forward.  It’d be better to ensure that there 
are reliable translations of books like Unicode Demystified or Unicode 
Explained - or, quite possibly, other books aimed more at the general public 
rather than the software community per se.

Kind regards,

Alastair.

--
http://alastairs-place.net




Re: Translating the standard

2018-03-11 Thread Marcel Schneider via Unicode
On 11/03/18 21:05, Arthur Reutenauer wrote:
> 
> On Sun, Mar 11, 2018 at 07:35:11PM +0100, Marcel Schneider via Unicode wrote:
> > I fail to understand why increasing complexity decreases the need to be 
> > widely understood.
> 
> I’m pretty sure that everybody will agree that the need gets all the
> greater as Unicode and connected technologies get more complex. But you
> can hopefully see that the cost also increases, and that’s incentive
> enough to refrain from doing it – as it already was very costly fifteen
> years ago, it’s likely to be prohibitive today.
> 
> > Recurrent threads show how slowly Unicode education 
> > is spreading among English native speakers; others incidentally complained 
> > about Unicode‐educational issues in African countries. *Not* translating 
> > the Standard — in whatever way — wonʼt help steepen the curve.
> 
> Nobody is saying “let’s not translate the Unicode Standard”; what
> several people here have pointed out is that it pays to have more modest
> and manageable goals. Besides, you’re hinting yourself that the
> problems are not only with translation, since they also affect native
> English speakers.

Indeed, to be fair. And for implementers, documenting themselves in English 
may scarcely ever have much of a problem, no matter whatʼs the locale.

Todayʼs policy is, that we are welcome to browse Wikipedia:

http://www.unicode.org/standard/WhatIsUnicode.html

Fundamentally thatʼs true (although the wording could use some fixes as of 
the difference between *using* Unicode and *documenting* Unicode), and
itʼs consistent with actual trends.

As of the cost — It still seems to me that weʼre far from the last word…

Best regards,

Marcel



Re: Translating the standard

2018-03-11 Thread Marcel Schneider via Unicode
On Fri, 9 Mar 2018 08:41:35 -0800, Ken Whistler wrote:
> 
> 
> On 3/9/2018 6:58 AM, Marcel Schneider via Unicode wrote:
> > As of translating the Core spec as a whole, why did two recent attempts 
> > crash even
> > before the maintenance stage, while the 3.1 project succeeded?
> 
> Essentially because both the Japanese and the Chinese attempts were 
> conceived of as commercial projects, which ultimately did not cost out 
> for the publishers, I think.

I immediately thought of these projects as government‐funded initiatives, 
which is most coherent with the importance of Unicodeʼs work for these 
nations given that the unified CJK repertoire has always consumed the most
of the Consortiumʼs resources, I figure out. However, looking into early 
translations on the Unicode site, only those governments that are close to 
the United Kingdom are unveiled (or not) to have helped promote Unicode 
education.

And from the one among the three terminological vocabularies that Iʼm able
to parse, as well as from the 60+ What‐is‐Unicode translations, we gain the 
chilling impression that once the early enthusiasm had passed away, any 
level of effort dropped down to zero. To such an extent that even the link
to the translation guidelines has been removed from the first place:

http://www.unicode.org/help/translation.html
|
| Although its working language is English, the Unicode Consortium strives to 
reach as many people
| and organizations in as many countries as possible around the world. One way 
of doing that is by
| encouraging the translation of Unicode material into languages other than 
English.
|
| This page guides volunteers who wish to contribute a translation of any 
Unicode material
| they deem interesting to their local audiences.

I fail to understand why increasing complexity decreases the need to be 
widely understood. Recurrent threads show how slowly Unicode education 
is spreading among English native speakers; others incidentally complained 
about Unicode‐educational issues in African countries. *Not* translating 
the Standard — in whatever way — wonʼt help steepen the curve.

Best regards,

Marcel

[To be continued; sorry for delay.]



Re: Translating the standard

2018-03-09 Thread Ken Whistler via Unicode



On 3/9/2018 6:58 AM, Marcel Schneider via Unicode wrote:

As of translating the Core spec as a whole, why did two recent attempts crash 
even
before the maintenance stage, while the 3.1 project succeeded?


Essentially because both the Japanese and the Chinese attempts were 
conceived of as commercial projects, which ultimately did not cost out 
for the publishers, I think. Both projects attempted limiting the scope 
of their translation to a subset of the core spec that would focus on 
East Asian topics, but the core spec is complex enough that it does not 
abridge well. And I think both projects ran into difficulties in trying 
to figure out how to deal with fonts and figures.


The Unicode 3.0 translation (and the 3.1 update) by Patrick Andries was 
a labor of love. In this arena, a labor of love is far more likely to 
succeed than a commercial translation project, because it doesn't have 
to make financial sense.


By the way, as a kind of annotation to an annotated translation, people 
should know that the 3.1 translation on Patrick's site is not a straight 
translation of 3.1, but a kind of interpreted adaptation. In particular, 
it incorporated a translation of UAX #15, Unicode Normalization Forms, 
Version 3.1.0, as a Chapter 6 of the translation, which is not the 
actual structure of Unicode 3.1. And there are other abridgements and 
alterations, where they make sense -- compare the resources section of 
the Preface, for example. This is not a knock on Patrick's excellent 
translation work, but it does illustrate the inherent difficulties of 
trying to approach a complete translation project for *any* version of 
the Unicode Standard.


--Ken



Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-09 Thread Marcel Schneider via Unicode
On 08/03/18 19:33, Arthur Reutenauer  wrote:
> 
> On Thu, Mar 08, 2018 at 07:05:06PM +0100, Marcel Schneider via Unicode wrote:
> > https://www.amazon.fr/Unicode-5-0-pratique-Patrick-Andries/dp/2100511408/ref=pd_bbs_sr_1?ie=UTF8=books=1206989878=8-1
> 
> You’re linking to the wrong one of Patrick’s books :-) The
> translation he made of version 3.1 (not 5.0) of the core specification
> is available in full at http://hapax.qc.ca/ (“Unicode et ISO 10646 en
> français”, middle of page), as well as a few free sample chapters from
> his other book.
> 
> Best,
> 
> Arthur
> 

Indeed, thank you very much for correction, and thanks for the link.

I can tell so much that the free online chapters of Patrick Andriesʼ 
translation 
of the Unicode standard were to me the first introduction, more precisely ch. 7 
(Punctuation) which I even printed out to get in touch with the various dashes 
and spaces and learn more about quotation marks. [I didnʼt have internet and
took the copy home from a library.] Based on this experience, I think there 
isnʼt 
too much extrapolation in supposing that millions of newcomers in all countries 
could use such a translation. Although the latest version of TUS is obviously 
more 
up‐to‐date, version 3.1 isnʼt plain wrong at all. Hence I warmly recommend to
translate at least v3.1 — or those chapters of v10.0 that are already in v3.1 — 
while prompting the reader to seek further information on the Unicode website.

We note too that Patrickʼs translation is annotated (footnotes in gray print) 
with
additional information of interest for the target locale. (Here one could 
mention 
that Latin script requires preformatted superscript letters for an 
interoperable 
representation of current text in some languages.)

Some Unicode terminology like “bidi‐mirroring” may be hard to adapt but that 
isnʼt more of a challenge than any tech/science writer is facing when handling 
content that was originally produced in the United States and/or, more 
generally,
in English. E.g. in French we may choose from a panel of more conservative 
through less usual grammatical forms among which: “réflexion bidi”, “réflexion
bidirectonnelle”, “bidi‐reflexion” (hyphenated or not), “réflexible” or, 
simply, 
“miroir”. Anyway, every locale is expected to localize the full range of 
Unicode 
terminology — unless people agree to switch to English whenever the topic is 
Unicode, even while discussing any other topic currently in Chinese or in 
Japanese, 
although doing so is not a problem, itʼs just ethically weird.

So we look forward to the concept of a “Unicode in Practice” textbook 
implemented
in Chinese and in Japanese and in any other non‐English and non‐French locale 
if it
isnʼt already.

As of translating the Core spec as a whole, why did two recent attempts crash 
even 
before the maintenance stage, while the 3.1 project succeeded?

Some pieces of the puzzle seem to be still missing.

Best regards,

Marcel



Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-08 Thread Marcel Schneider via Unicode
On Thu, 08 Mar 2018 04:25:53 -0500, Elsebeth Flarup via Unicode wrote:
> 
> For a number of reasons I think translating the standard is a really bad idea.
> 
[…]
> 
> There are other reasons to not do this.

I assume that the reasons you are thinking of, are congruent with those that 
Ken already explained in detail in:

http://www.unicode.org/mail-arch/unicode-ml/y2018-m03/0025.html

And I think with Ken that the idea in itself isnʼt bad as such, but that it is 
not 
feasible any longer. Everybody (supposedly) knows that the Core Spec has 
really been translated, published in a print edition, scanned into Google Books,
and is still for sale:

https://www.amazon.fr/Unicode-5-0-pratique-Patrick-Andries/dp/2100511408/ref=pd_bbs_sr_1?ie=UTF8=books=1206989878=8-1

https://books.google.fr/books?
id=GgbWZNTRncsC=frontcover=Andries+Patrick=fr=X=0ahUKEwis59Cwp93ZAhUF6RQKHZ1GBlIQ6AEIKjAA#v=onepage
=Andries%20Patrick=false

OK, the version number was only half the actual one.

Best regards,

Marcel



Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-08 Thread Elsebeth Flarup via Unicode
For a number of reasons I think translating the standard is a really bad idea.

As long as there are people interested in maintaining the translation, 
identifying deltas and easily translating just the deltas would NOT be 
difficult, however. Modern computer aided translation tools all use translation 
memories that automatically translate already translated segments and present 
only new/changed segments to the translator. No need for change bars etc. 

This assumes that somebody would have stewardship of the translation memory, 
that the people doing the translation would be willing to/capable of using the 
CAT tools, etc., but the technical translation technology is available to make 
this part of the equation not much of an issue.

There are other reasons to not do this.

Elsebeth


​​

‐‐‐ Original Message ‐‐‐

On March 8, 2018 10:03 AM, Richard Wordingham via Unicode <unicode@unicode.org> 
wrote:

> ​​
> 
> On Thu, 8 Mar 2018 02:27:06 +0100 (CET)
> 
> Marcel Schneider via Unicode unicode@unicode.org wrote:
> 
> > Yes the biggest issue over time, as Ken wrote, is to maintain a
> > 
> > translation, be it only the Nameslist.
> 
> For which accurately determined change bars can work wonders. An
> 
> alternative would be paragraph identification and a list of changed
> 
> paragraphs. The section number in TUS is too coarse for giving text
> 
> locations, and page numbers are inherently changeable.
> 
> Richard.





Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-08 Thread Marcel Schneider via Unicode
On Thu, 8 Mar 2018 09:03:28 +, Richard Wordingham via Unicode wrote:
> 
> > Yes the biggest issue over time, as Ken wrote, is to *maintain* a
> > translation, be it only the Nameslist.
> 
> For which accurately determined change bars can work wonders. An
> alternative would be paragraph identification and a list of changed
> paragraphs. The section number in TUS is too coarse for giving text
> locations, and page numbers are inherently changeable.

Adobe Illustrator doesnʼt seem to support purple numbers, and Adobe Reader
seems unable to accept input of bookmarks as a go‐to feature (while that must
be proper to Acrobat). Word is reported not to add lasting change bars in an 
automated way. But all that can be done in HTML — which is not the format 
of The Unicode Standard, whose web bookmarks are fortunately published in 
separate collections. When UAXes are updated, an intermediate revision has 
all changes highlighted and remains available online. We can see delta charts 
with all changes highlighted, in PDF. Why did the Core Specification not come 
into the benefit of these facilities?

Has this already been submitted as formal feedback? 
(UTC is known for not considering feedback that has not been submitted via
the Contact form or docsub...@unicode.org, and Mailing lists have explicit 
caveats.)

Best regards,

Marcel



Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-08 Thread Richard Wordingham via Unicode
On Thu, 8 Mar 2018 02:27:06 +0100 (CET)
Marcel Schneider via Unicode  wrote:

> Yes the biggest issue over time, as Ken wrote, is to *maintain* a
> translation, be it only the Nameslist.

For which accurately determined change bars can work wonders.  An
alternative would be paragraph identification and a list of changed
paragraphs.  The section number in TUS is too coarse for giving text
locations, and page numbers are inherently changeable.

Richard.




Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-08 Thread Marcel Schneider via Unicode
On Mon, 5 Mar 2018 20:19:47 +0100, Philippe Verdy via Unicode wrote:
[…]
> * the core text of the standard (section 3 about conformance and requirements 
> is the first thing to adapt).
> There's absolutely no need however to do that as a pure translation, it can 
> be rewritten and presented
> with the goals wanted by users. Here again Wikiepdia has done significant 
> efforts there, in various languages

I donʼt think there is a potential to rewrite the core specs if the goal is 
making an abstract, given that 
the original authors already made efforts to keep the language simple. Whenever 
the goal is to add
information, by contrast, e.g. about (yet) non‐standard use of superscripts in 
Latin text, then the added
value — clearly tagged as such — will reward the effort.

A big part of the core spec is made of script‐specific introductions designed 
to be balanced and handy.
Hence part of the information is provided only in the code charts, some in the 
annexes. Compiling it all 
and writing up more detailed articles is indeed much more interesting for 
readers focussing on a script.

Best regards,

Marcel



Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-07 Thread Marcel Schneider via Unicode
On Mon, 5 Mar 2018 20:19:47 +0100, Philippe Verdy via Unicode wrote:
 
> There's been significant efforts to "translate" or more precisely "adapt" 
> significant parts of the standard with good presentations in Wikipedia and 
> various sites for scoped topics. So there are alternate charts, and instead 
> of translating all, the concepts are summarized, reexplained, but still 
> give links to the original version in English everytime more info is needed. 

Indeed one of the best uses we can make of efforts in Unicode education is
in extending and improving the Wikipedia coverage, because this is the first
place almost everybody is going to. So if a government is considering an 
investment, donating to Wikimedia and motivating a vast community seems
a really good plan. And hiring staffers for this purpose will increase 
reliability
of the data (given that some corporations misuse the infrastructure for PR).

> All UCD files don't need to be translated, they can also be automatically 
> processed to generate alternate presentations or datatables in other 
> formats. There's no value in taking efforts to translate them manually, 
> it's better to develop a tool that will process them in the format users 
> can read. 

The only UCD file Iʼd advise to fully translate is the Nameslist as being the 
source code of the Code Charts. These are indeed indispensable because of
the glyphic information they convey, that can be found nowhere else, Hence
all good secondary sources like Wikipedia link to the Unicode Charts,
The NamesList per se is useful also in that it provides a minimal amount of
information about the characters. But it lacks important hints about 
bidi‐mirroring,
that should be compiled from yet another UCD file. The downside of generating
a holistic view is that it generally ends up in an atomic view as on a 
per‐character
basis. Though anyway itʼs up to the user to gather an overview tailored for 
his/her
needs. This is catered for by Chinese and Japanese versions of sites such as
www.fileformat.info.

[…]
> The only efforts is in: 
> * naming characters (Wikipedia is great to distribute the effort and have 
> articles showing relevant collections of characters and document alternate 
> names or disambiguate synonyms). 

Naming characters is a real challenge and is often running into multiple issues.
First we need to make clear for who the localization is intended: technical 
people
or UIs. It happened that a literal translation tuned in accordance with 
specialists
was then handed out to the industry for showing up on everyoneʼs computer,
while some core characters of the intended locale are named differently in real
life, so that students donʼt encounter what they have learned at school. 
And the worst thing is that once a translation is released, image considerations
lead to seek stability even where no Unicode (ISO) policy is preventing updates.

> * the core text of the standard (section 3 about conformance and 
> requirements is the first thing to adapt). There's absolutely no need 
> however to do that as a pure translation, it can be rewritten and presented 
> with the goals wanted by users. Here again Wikiepdia has done significant 
> efforts there, in various languages 
> * keeping the tools developed in the previous paragraph in sync and 
> conformity with the standard (sync the UCD files they use).  

Yes the biggest issue over time, as Ken wrote, is to *maintain* a translation, 
be it only the Nameslist.


Marcel



Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-05 Thread Philippe Verdy via Unicode
There's been significant efforts to "translate" or more precisely "adapt"
significant parts of the standard with good presentations in Wikipedia and
various sites for scoped topics. So there are alternate charts, and instead
of translating all, the concepts are summarized, reexplained, but still
give links to the original version in English everytime more info is needed.
All UCD files don't need to be translated, they can also be automatically
processed to generate alternate presentations or datatables in other
formats. There's no value in taking efforts to translate them manually,
it's better to develop a tool that will process them in the format users
can read.

So remove the UCD files and the tables from the count, as well as sample
code (which is jsut demontrative and uses simplified non optimal
implementation to keep this code clear). We an now have separate tools or
websites presenting them and proposing commented code which is also better
performing. We have large collections of i18n libraries that were developed
for various development platforms and usage documentation in various
languages.

The only efforts is in:
* naming characters (Wikipedia is great to distribute the effort and have
articles showing relevant collections of characters and document alternate
names or disambiguate synonyms).
* the core text of the standard (section 3 about conformance and
requirements is the first thing to adapt). There's absolutely no need
however to do that as a pure translation, it can be rewritten and presented
with the goals wanted by users. Here again Wikiepdia has done significant
efforts there, in various languages
* keeping the tools developed in the previous paragraph in sync and
conformity with the standard (sync the UCD files they use).

2018-03-05 19:21 GMT+01:00 Ken Whistler via Unicode :

>
> On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote:
>
> I have a question; if some people try to make a
> translated version of Unicode
>
>
> And to add to Asmus' response, folks on the list should understand that
> even with the best of effort, the concept of a "translated version of
> Unicode" is a near impossibility. In fairly recent times, two serious
> efforts to translate *just *the core specification -- one in Japanese,
> and a somewhat later attempt for Chinese -- crashed and burned, for a
> variety of reasons. The core specification is huge, contains a lot of very
> specific technical terminology that is difficult to translate, along with a
> large collection of script- and language-specific detail, also hard to
> translate. Worse, it keeps changing, with updates now coming out once every
> year. Some large parts are stable, but it is impossible to predict what
> sections might be impacted by the next year's encoding decisions.
>
> That is not including that fact that "the Unicode Standard" now also
> includes 14 separate HTML (or XHTML) annexes, all of which are also moving
> targets, along with the UCD data files, which often contain important
> information in their headers that would also require translation. And then,
> of course, there are the 2000+ pages of the formatted code charts, which
> require highly specific and very complicated custom tooling and font usage
> to produce.
>
> It would require a dedicated (and expensive) small army of translators,
> terminologists, editors, programmers, font designers, and project managers
> to replicate all of this into another language publication -- and then they
> would have to do it again the next year, and again the next year, in
> perpetuity. Basically, given the current situation, it would be a fool's
> errand, more likely to introduce errors and inconsistencies than to help
> anybody with actual implementation.
>
> People who want accessibility to the Unicode Standard in other languages
> need to scale down their expectations considerably, and focus on preparing
> reasonably short and succinct introductions to the terminology and
> complexity involved in the full standard. Such projects are feasible. But a
> full translation of "the Unicode Standard" simply is not.
>
> --Ken
>


Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-05 Thread Ken Whistler via Unicode


On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote:

I have a question; if some people try to make a
translated version of Unicode


And to add to Asmus' response, folks on the list should understand that 
even with the best of effort, the concept of a "translated version of 
Unicode" is a near impossibility. In fairly recent times, two serious 
efforts to translate *just *the core specification -- one in Japanese, 
and a somewhat later attempt for Chinese -- crashed and burned, for a 
variety of reasons. The core specification is huge, contains a lot of 
very specific technical terminology that is difficult to translate, 
along with a large collection of script- and language-specific detail, 
also hard to translate. Worse, it keeps changing, with updates now 
coming out once every year. Some large parts are stable, but it is 
impossible to predict what sections might be impacted by the next year's 
encoding decisions.


That is not including that fact that "the Unicode Standard" now also 
includes 14 separate HTML (or XHTML) annexes, all of which are also 
moving targets, along with the UCD data files, which often contain 
important information in their headers that would also require 
translation. And then, of course, there are the 2000+ pages of the 
formatted code charts, which require highly specific and very 
complicated custom tooling and font usage to produce.


It would require a dedicated (and expensive) small army of translators, 
terminologists, editors, programmers, font designers, and project 
managers to replicate all of this into another language publication -- 
and then they would have to do it again the next year, and again the 
next year, in perpetuity. Basically, given the current situation, it 
would be a fool's errand, more likely to introduce errors and 
inconsistencies than to help anybody with actual implementation.


People who want accessibility to the Unicode Standard in other languages 
need to scale down their expectations considerably, and focus on 
preparing reasonably short and succinct introductions to the terminology 
and complexity involved in the full standard. Such projects are 
feasible. But a full translation of "the Unicode Standard" simply is not.


--Ken