Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-09 Thread Marcel Schneider via Unicode
On 08/03/18 19:33, Arthur Reutenauer  wrote:
> 
> On Thu, Mar 08, 2018 at 07:05:06PM +0100, Marcel Schneider via Unicode wrote:
> > https://www.amazon.fr/Unicode-5-0-pratique-Patrick-Andries/dp/2100511408/ref=pd_bbs_sr_1?ie=UTF8=books=1206989878=8-1
> 
> You’re linking to the wrong one of Patrick’s books :-) The
> translation he made of version 3.1 (not 5.0) of the core specification
> is available in full at http://hapax.qc.ca/ (“Unicode et ISO 10646 en
> français”, middle of page), as well as a few free sample chapters from
> his other book.
> 
> Best,
> 
> Arthur
> 

Indeed, thank you very much for correction, and thanks for the link.

I can tell so much that the free online chapters of Patrick Andriesʼ 
translation 
of the Unicode standard were to me the first introduction, more precisely ch. 7 
(Punctuation) which I even printed out to get in touch with the various dashes 
and spaces and learn more about quotation marks. [I didnʼt have internet and
took the copy home from a library.] Based on this experience, I think there 
isnʼt 
too much extrapolation in supposing that millions of newcomers in all countries 
could use such a translation. Although the latest version of TUS is obviously 
more 
up‐to‐date, version 3.1 isnʼt plain wrong at all. Hence I warmly recommend to
translate at least v3.1 — or those chapters of v10.0 that are already in v3.1 — 
while prompting the reader to seek further information on the Unicode website.

We note too that Patrickʼs translation is annotated (footnotes in gray print) 
with
additional information of interest for the target locale. (Here one could 
mention 
that Latin script requires preformatted superscript letters for an 
interoperable 
representation of current text in some languages.)

Some Unicode terminology like “bidi‐mirroring” may be hard to adapt but that 
isnʼt more of a challenge than any tech/science writer is facing when handling 
content that was originally produced in the United States and/or, more 
generally,
in English. E.g. in French we may choose from a panel of more conservative 
through less usual grammatical forms among which: “réflexion bidi”, “réflexion
bidirectonnelle”, “bidi‐reflexion” (hyphenated or not), “réflexible” or, 
simply, 
“miroir”. Anyway, every locale is expected to localize the full range of 
Unicode 
terminology — unless people agree to switch to English whenever the topic is 
Unicode, even while discussing any other topic currently in Chinese or in 
Japanese, 
although doing so is not a problem, itʼs just ethically weird.

So we look forward to the concept of a “Unicode in Practice” textbook 
implemented
in Chinese and in Japanese and in any other non‐English and non‐French locale 
if it
isnʼt already.

As of translating the Core spec as a whole, why did two recent attempts crash 
even 
before the maintenance stage, while the 3.1 project succeeded?

Some pieces of the puzzle seem to be still missing.

Best regards,

Marcel



Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-08 Thread Marcel Schneider via Unicode
On Thu, 08 Mar 2018 04:25:53 -0500, Elsebeth Flarup via Unicode wrote:
> 
> For a number of reasons I think translating the standard is a really bad idea.
> 
[…]
> 
> There are other reasons to not do this.

I assume that the reasons you are thinking of, are congruent with those that 
Ken already explained in detail in:

http://www.unicode.org/mail-arch/unicode-ml/y2018-m03/0025.html

And I think with Ken that the idea in itself isnʼt bad as such, but that it is 
not 
feasible any longer. Everybody (supposedly) knows that the Core Spec has 
really been translated, published in a print edition, scanned into Google Books,
and is still for sale:

https://www.amazon.fr/Unicode-5-0-pratique-Patrick-Andries/dp/2100511408/ref=pd_bbs_sr_1?ie=UTF8=books=1206989878=8-1

https://books.google.fr/books?
id=GgbWZNTRncsC=frontcover=Andries+Patrick=fr=X=0ahUKEwis59Cwp93ZAhUF6RQKHZ1GBlIQ6AEIKjAA#v=onepage
=Andries%20Patrick=false

OK, the version number was only half the actual one.

Best regards,

Marcel



Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-08 Thread Elsebeth Flarup via Unicode
For a number of reasons I think translating the standard is a really bad idea.

As long as there are people interested in maintaining the translation, 
identifying deltas and easily translating just the deltas would NOT be 
difficult, however. Modern computer aided translation tools all use translation 
memories that automatically translate already translated segments and present 
only new/changed segments to the translator. No need for change bars etc. 

This assumes that somebody would have stewardship of the translation memory, 
that the people doing the translation would be willing to/capable of using the 
CAT tools, etc., but the technical translation technology is available to make 
this part of the equation not much of an issue.

There are other reasons to not do this.

Elsebeth


​​

‐‐‐ Original Message ‐‐‐

On March 8, 2018 10:03 AM, Richard Wordingham via Unicode  
wrote:

> ​​
> 
> On Thu, 8 Mar 2018 02:27:06 +0100 (CET)
> 
> Marcel Schneider via Unicode unicode@unicode.org wrote:
> 
> > Yes the biggest issue over time, as Ken wrote, is to maintain a
> > 
> > translation, be it only the Nameslist.
> 
> For which accurately determined change bars can work wonders. An
> 
> alternative would be paragraph identification and a list of changed
> 
> paragraphs. The section number in TUS is too coarse for giving text
> 
> locations, and page numbers are inherently changeable.
> 
> Richard.





Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-08 Thread Marcel Schneider via Unicode
On Thu, 8 Mar 2018 09:03:28 +, Richard Wordingham via Unicode wrote:
> 
> > Yes the biggest issue over time, as Ken wrote, is to *maintain* a
> > translation, be it only the Nameslist.
> 
> For which accurately determined change bars can work wonders. An
> alternative would be paragraph identification and a list of changed
> paragraphs. The section number in TUS is too coarse for giving text
> locations, and page numbers are inherently changeable.

Adobe Illustrator doesnʼt seem to support purple numbers, and Adobe Reader
seems unable to accept input of bookmarks as a go‐to feature (while that must
be proper to Acrobat). Word is reported not to add lasting change bars in an 
automated way. But all that can be done in HTML — which is not the format 
of The Unicode Standard, whose web bookmarks are fortunately published in 
separate collections. When UAXes are updated, an intermediate revision has 
all changes highlighted and remains available online. We can see delta charts 
with all changes highlighted, in PDF. Why did the Core Specification not come 
into the benefit of these facilities?

Has this already been submitted as formal feedback? 
(UTC is known for not considering feedback that has not been submitted via
the Contact form or docsub...@unicode.org, and Mailing lists have explicit 
caveats.)

Best regards,

Marcel



Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-08 Thread Richard Wordingham via Unicode
On Thu, 8 Mar 2018 02:27:06 +0100 (CET)
Marcel Schneider via Unicode  wrote:

> Yes the biggest issue over time, as Ken wrote, is to *maintain* a
> translation, be it only the Nameslist.

For which accurately determined change bars can work wonders.  An
alternative would be paragraph identification and a list of changed
paragraphs.  The section number in TUS is too coarse for giving text
locations, and page numbers are inherently changeable.

Richard.




Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-08 Thread Marcel Schneider via Unicode
On Mon, 5 Mar 2018 20:19:47 +0100, Philippe Verdy via Unicode wrote:
[…]
> * the core text of the standard (section 3 about conformance and requirements 
> is the first thing to adapt).
> There's absolutely no need however to do that as a pure translation, it can 
> be rewritten and presented
> with the goals wanted by users. Here again Wikiepdia has done significant 
> efforts there, in various languages

I donʼt think there is a potential to rewrite the core specs if the goal is 
making an abstract, given that 
the original authors already made efforts to keep the language simple. Whenever 
the goal is to add
information, by contrast, e.g. about (yet) non‐standard use of superscripts in 
Latin text, then the added
value — clearly tagged as such — will reward the effort.

A big part of the core spec is made of script‐specific introductions designed 
to be balanced and handy.
Hence part of the information is provided only in the code charts, some in the 
annexes. Compiling it all 
and writing up more detailed articles is indeed much more interesting for 
readers focussing on a script.

Best regards,

Marcel



Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-07 Thread Marcel Schneider via Unicode
On Mon, 5 Mar 2018 20:19:47 +0100, Philippe Verdy via Unicode wrote:
 
> There's been significant efforts to "translate" or more precisely "adapt" 
> significant parts of the standard with good presentations in Wikipedia and 
> various sites for scoped topics. So there are alternate charts, and instead 
> of translating all, the concepts are summarized, reexplained, but still 
> give links to the original version in English everytime more info is needed. 

Indeed one of the best uses we can make of efforts in Unicode education is
in extending and improving the Wikipedia coverage, because this is the first
place almost everybody is going to. So if a government is considering an 
investment, donating to Wikimedia and motivating a vast community seems
a really good plan. And hiring staffers for this purpose will increase 
reliability
of the data (given that some corporations misuse the infrastructure for PR).

> All UCD files don't need to be translated, they can also be automatically 
> processed to generate alternate presentations or datatables in other 
> formats. There's no value in taking efforts to translate them manually, 
> it's better to develop a tool that will process them in the format users 
> can read. 

The only UCD file Iʼd advise to fully translate is the Nameslist as being the 
source code of the Code Charts. These are indeed indispensable because of
the glyphic information they convey, that can be found nowhere else, Hence
all good secondary sources like Wikipedia link to the Unicode Charts,
The NamesList per se is useful also in that it provides a minimal amount of
information about the characters. But it lacks important hints about 
bidi‐mirroring,
that should be compiled from yet another UCD file. The downside of generating
a holistic view is that it generally ends up in an atomic view as on a 
per‐character
basis. Though anyway itʼs up to the user to gather an overview tailored for 
his/her
needs. This is catered for by Chinese and Japanese versions of sites such as
www.fileformat.info.

[…]
> The only efforts is in: 
> * naming characters (Wikipedia is great to distribute the effort and have 
> articles showing relevant collections of characters and document alternate 
> names or disambiguate synonyms). 

Naming characters is a real challenge and is often running into multiple issues.
First we need to make clear for who the localization is intended: technical 
people
or UIs. It happened that a literal translation tuned in accordance with 
specialists
was then handed out to the industry for showing up on everyoneʼs computer,
while some core characters of the intended locale are named differently in real
life, so that students donʼt encounter what they have learned at school. 
And the worst thing is that once a translation is released, image considerations
lead to seek stability even where no Unicode (ISO) policy is preventing updates.

> * the core text of the standard (section 3 about conformance and 
> requirements is the first thing to adapt). There's absolutely no need 
> however to do that as a pure translation, it can be rewritten and presented 
> with the goals wanted by users. Here again Wikiepdia has done significant 
> efforts there, in various languages 
> * keeping the tools developed in the previous paragraph in sync and 
> conformity with the standard (sync the UCD files they use).  

Yes the biggest issue over time, as Ken wrote, is to *maintain* a translation, 
be it only the Nameslist.


Marcel



Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-05 Thread Philippe Verdy via Unicode
There's been significant efforts to "translate" or more precisely "adapt"
significant parts of the standard with good presentations in Wikipedia and
various sites for scoped topics. So there are alternate charts, and instead
of translating all, the concepts are summarized, reexplained, but still
give links to the original version in English everytime more info is needed.
All UCD files don't need to be translated, they can also be automatically
processed to generate alternate presentations or datatables in other
formats. There's no value in taking efforts to translate them manually,
it's better to develop a tool that will process them in the format users
can read.

So remove the UCD files and the tables from the count, as well as sample
code (which is jsut demontrative and uses simplified non optimal
implementation to keep this code clear). We an now have separate tools or
websites presenting them and proposing commented code which is also better
performing. We have large collections of i18n libraries that were developed
for various development platforms and usage documentation in various
languages.

The only efforts is in:
* naming characters (Wikipedia is great to distribute the effort and have
articles showing relevant collections of characters and document alternate
names or disambiguate synonyms).
* the core text of the standard (section 3 about conformance and
requirements is the first thing to adapt). There's absolutely no need
however to do that as a pure translation, it can be rewritten and presented
with the goals wanted by users. Here again Wikiepdia has done significant
efforts there, in various languages
* keeping the tools developed in the previous paragraph in sync and
conformity with the standard (sync the UCD files they use).

2018-03-05 19:21 GMT+01:00 Ken Whistler via Unicode :

>
> On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote:
>
> I have a question; if some people try to make a
> translated version of Unicode
>
>
> And to add to Asmus' response, folks on the list should understand that
> even with the best of effort, the concept of a "translated version of
> Unicode" is a near impossibility. In fairly recent times, two serious
> efforts to translate *just *the core specification -- one in Japanese,
> and a somewhat later attempt for Chinese -- crashed and burned, for a
> variety of reasons. The core specification is huge, contains a lot of very
> specific technical terminology that is difficult to translate, along with a
> large collection of script- and language-specific detail, also hard to
> translate. Worse, it keeps changing, with updates now coming out once every
> year. Some large parts are stable, but it is impossible to predict what
> sections might be impacted by the next year's encoding decisions.
>
> That is not including that fact that "the Unicode Standard" now also
> includes 14 separate HTML (or XHTML) annexes, all of which are also moving
> targets, along with the UCD data files, which often contain important
> information in their headers that would also require translation. And then,
> of course, there are the 2000+ pages of the formatted code charts, which
> require highly specific and very complicated custom tooling and font usage
> to produce.
>
> It would require a dedicated (and expensive) small army of translators,
> terminologists, editors, programmers, font designers, and project managers
> to replicate all of this into another language publication -- and then they
> would have to do it again the next year, and again the next year, in
> perpetuity. Basically, given the current situation, it would be a fool's
> errand, more likely to introduce errors and inconsistencies than to help
> anybody with actual implementation.
>
> People who want accessibility to the Unicode Standard in other languages
> need to scale down their expectations considerably, and focus on preparing
> reasonably short and succinct introductions to the terminology and
> complexity involved in the full standard. Such projects are feasible. But a
> full translation of "the Unicode Standard" simply is not.
>
> --Ken
>