Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)
On 08/03/18 19:33, Arthur Reutenauerwrote: > > On Thu, Mar 08, 2018 at 07:05:06PM +0100, Marcel Schneider via Unicode wrote: > > https://www.amazon.fr/Unicode-5-0-pratique-Patrick-Andries/dp/2100511408/ref=pd_bbs_sr_1?ie=UTF8=books=1206989878=8-1 > > You’re linking to the wrong one of Patrick’s books :-) The > translation he made of version 3.1 (not 5.0) of the core specification > is available in full at http://hapax.qc.ca/ (“Unicode et ISO 10646 en > français”, middle of page), as well as a few free sample chapters from > his other book. > > Best, > > Arthur > Indeed, thank you very much for correction, and thanks for the link. I can tell so much that the free online chapters of Patrick Andriesʼ translation of the Unicode standard were to me the first introduction, more precisely ch. 7 (Punctuation) which I even printed out to get in touch with the various dashes and spaces and learn more about quotation marks. [I didnʼt have internet and took the copy home from a library.] Based on this experience, I think there isnʼt too much extrapolation in supposing that millions of newcomers in all countries could use such a translation. Although the latest version of TUS is obviously more up‐to‐date, version 3.1 isnʼt plain wrong at all. Hence I warmly recommend to translate at least v3.1 — or those chapters of v10.0 that are already in v3.1 — while prompting the reader to seek further information on the Unicode website. We note too that Patrickʼs translation is annotated (footnotes in gray print) with additional information of interest for the target locale. (Here one could mention that Latin script requires preformatted superscript letters for an interoperable representation of current text in some languages.) Some Unicode terminology like “bidi‐mirroring” may be hard to adapt but that isnʼt more of a challenge than any tech/science writer is facing when handling content that was originally produced in the United States and/or, more generally, in English. E.g. in French we may choose from a panel of more conservative through less usual grammatical forms among which: “réflexion bidi”, “réflexion bidirectonnelle”, “bidi‐reflexion” (hyphenated or not), “réflexible” or, simply, “miroir”. Anyway, every locale is expected to localize the full range of Unicode terminology — unless people agree to switch to English whenever the topic is Unicode, even while discussing any other topic currently in Chinese or in Japanese, although doing so is not a problem, itʼs just ethically weird. So we look forward to the concept of a “Unicode in Practice” textbook implemented in Chinese and in Japanese and in any other non‐English and non‐French locale if it isnʼt already. As of translating the Core spec as a whole, why did two recent attempts crash even before the maintenance stage, while the 3.1 project succeeded? Some pieces of the puzzle seem to be still missing. Best regards, Marcel
Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)
On Thu, 08 Mar 2018 04:25:53 -0500, Elsebeth Flarup via Unicode wrote: > > For a number of reasons I think translating the standard is a really bad idea. > […] > > There are other reasons to not do this. I assume that the reasons you are thinking of, are congruent with those that Ken already explained in detail in: http://www.unicode.org/mail-arch/unicode-ml/y2018-m03/0025.html And I think with Ken that the idea in itself isnʼt bad as such, but that it is not feasible any longer. Everybody (supposedly) knows that the Core Spec has really been translated, published in a print edition, scanned into Google Books, and is still for sale: https://www.amazon.fr/Unicode-5-0-pratique-Patrick-Andries/dp/2100511408/ref=pd_bbs_sr_1?ie=UTF8=books=1206989878=8-1 https://books.google.fr/books? id=GgbWZNTRncsC=frontcover=Andries+Patrick=fr=X=0ahUKEwis59Cwp93ZAhUF6RQKHZ1GBlIQ6AEIKjAA#v=onepage =Andries%20Patrick=false OK, the version number was only half the actual one. Best regards, Marcel
Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)
For a number of reasons I think translating the standard is a really bad idea. As long as there are people interested in maintaining the translation, identifying deltas and easily translating just the deltas would NOT be difficult, however. Modern computer aided translation tools all use translation memories that automatically translate already translated segments and present only new/changed segments to the translator. No need for change bars etc. This assumes that somebody would have stewardship of the translation memory, that the people doing the translation would be willing to/capable of using the CAT tools, etc., but the technical translation technology is available to make this part of the equation not much of an issue. There are other reasons to not do this. Elsebeth ‐‐‐ Original Message ‐‐‐ On March 8, 2018 10:03 AM, Richard Wordingham via Unicodewrote: > > > On Thu, 8 Mar 2018 02:27:06 +0100 (CET) > > Marcel Schneider via Unicode unicode@unicode.org wrote: > > > Yes the biggest issue over time, as Ken wrote, is to maintain a > > > > translation, be it only the Nameslist. > > For which accurately determined change bars can work wonders. An > > alternative would be paragraph identification and a list of changed > > paragraphs. The section number in TUS is too coarse for giving text > > locations, and page numbers are inherently changeable. > > Richard.
Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)
On Thu, 8 Mar 2018 09:03:28 +, Richard Wordingham via Unicode wrote: > > > Yes the biggest issue over time, as Ken wrote, is to *maintain* a > > translation, be it only the Nameslist. > > For which accurately determined change bars can work wonders. An > alternative would be paragraph identification and a list of changed > paragraphs. The section number in TUS is too coarse for giving text > locations, and page numbers are inherently changeable. Adobe Illustrator doesnʼt seem to support purple numbers, and Adobe Reader seems unable to accept input of bookmarks as a go‐to feature (while that must be proper to Acrobat). Word is reported not to add lasting change bars in an automated way. But all that can be done in HTML — which is not the format of The Unicode Standard, whose web bookmarks are fortunately published in separate collections. When UAXes are updated, an intermediate revision has all changes highlighted and remains available online. We can see delta charts with all changes highlighted, in PDF. Why did the Core Specification not come into the benefit of these facilities? Has this already been submitted as formal feedback? (UTC is known for not considering feedback that has not been submitted via the Contact form or docsub...@unicode.org, and Mailing lists have explicit caveats.) Best regards, Marcel
Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)
On Thu, 8 Mar 2018 02:27:06 +0100 (CET) Marcel Schneider via Unicodewrote: > Yes the biggest issue over time, as Ken wrote, is to *maintain* a > translation, be it only the Nameslist. For which accurately determined change bars can work wonders. An alternative would be paragraph identification and a list of changed paragraphs. The section number in TUS is too coarse for giving text locations, and page numbers are inherently changeable. Richard.
Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)
On Mon, 5 Mar 2018 20:19:47 +0100, Philippe Verdy via Unicode wrote: […] > * the core text of the standard (section 3 about conformance and requirements > is the first thing to adapt). > There's absolutely no need however to do that as a pure translation, it can > be rewritten and presented > with the goals wanted by users. Here again Wikiepdia has done significant > efforts there, in various languages I donʼt think there is a potential to rewrite the core specs if the goal is making an abstract, given that the original authors already made efforts to keep the language simple. Whenever the goal is to add information, by contrast, e.g. about (yet) non‐standard use of superscripts in Latin text, then the added value — clearly tagged as such — will reward the effort. A big part of the core spec is made of script‐specific introductions designed to be balanced and handy. Hence part of the information is provided only in the code charts, some in the annexes. Compiling it all and writing up more detailed articles is indeed much more interesting for readers focussing on a script. Best regards, Marcel
Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)
On Mon, 5 Mar 2018 20:19:47 +0100, Philippe Verdy via Unicode wrote: > There's been significant efforts to "translate" or more precisely "adapt" > significant parts of the standard with good presentations in Wikipedia and > various sites for scoped topics. So there are alternate charts, and instead > of translating all, the concepts are summarized, reexplained, but still > give links to the original version in English everytime more info is needed. Indeed one of the best uses we can make of efforts in Unicode education is in extending and improving the Wikipedia coverage, because this is the first place almost everybody is going to. So if a government is considering an investment, donating to Wikimedia and motivating a vast community seems a really good plan. And hiring staffers for this purpose will increase reliability of the data (given that some corporations misuse the infrastructure for PR). > All UCD files don't need to be translated, they can also be automatically > processed to generate alternate presentations or datatables in other > formats. There's no value in taking efforts to translate them manually, > it's better to develop a tool that will process them in the format users > can read. The only UCD file Iʼd advise to fully translate is the Nameslist as being the source code of the Code Charts. These are indeed indispensable because of the glyphic information they convey, that can be found nowhere else, Hence all good secondary sources like Wikipedia link to the Unicode Charts, The NamesList per se is useful also in that it provides a minimal amount of information about the characters. But it lacks important hints about bidi‐mirroring, that should be compiled from yet another UCD file. The downside of generating a holistic view is that it generally ends up in an atomic view as on a per‐character basis. Though anyway itʼs up to the user to gather an overview tailored for his/her needs. This is catered for by Chinese and Japanese versions of sites such as www.fileformat.info. […] > The only efforts is in: > * naming characters (Wikipedia is great to distribute the effort and have > articles showing relevant collections of characters and document alternate > names or disambiguate synonyms). Naming characters is a real challenge and is often running into multiple issues. First we need to make clear for who the localization is intended: technical people or UIs. It happened that a literal translation tuned in accordance with specialists was then handed out to the industry for showing up on everyoneʼs computer, while some core characters of the intended locale are named differently in real life, so that students donʼt encounter what they have learned at school. And the worst thing is that once a translation is released, image considerations lead to seek stability even where no Unicode (ISO) policy is preventing updates. > * the core text of the standard (section 3 about conformance and > requirements is the first thing to adapt). There's absolutely no need > however to do that as a pure translation, it can be rewritten and presented > with the goals wanted by users. Here again Wikiepdia has done significant > efforts there, in various languages > * keeping the tools developed in the previous paragraph in sync and > conformity with the standard (sync the UCD files they use). Yes the biggest issue over time, as Ken wrote, is to *maintain* a translation, be it only the Nameslist. Marcel
Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)
There's been significant efforts to "translate" or more precisely "adapt" significant parts of the standard with good presentations in Wikipedia and various sites for scoped topics. So there are alternate charts, and instead of translating all, the concepts are summarized, reexplained, but still give links to the original version in English everytime more info is needed. All UCD files don't need to be translated, they can also be automatically processed to generate alternate presentations or datatables in other formats. There's no value in taking efforts to translate them manually, it's better to develop a tool that will process them in the format users can read. So remove the UCD files and the tables from the count, as well as sample code (which is jsut demontrative and uses simplified non optimal implementation to keep this code clear). We an now have separate tools or websites presenting them and proposing commented code which is also better performing. We have large collections of i18n libraries that were developed for various development platforms and usage documentation in various languages. The only efforts is in: * naming characters (Wikipedia is great to distribute the effort and have articles showing relevant collections of characters and document alternate names or disambiguate synonyms). * the core text of the standard (section 3 about conformance and requirements is the first thing to adapt). There's absolutely no need however to do that as a pure translation, it can be rewritten and presented with the goals wanted by users. Here again Wikiepdia has done significant efforts there, in various languages * keeping the tools developed in the previous paragraph in sync and conformity with the standard (sync the UCD files they use). 2018-03-05 19:21 GMT+01:00 Ken Whistler via Unicode: > > On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote: > > I have a question; if some people try to make a > translated version of Unicode > > > And to add to Asmus' response, folks on the list should understand that > even with the best of effort, the concept of a "translated version of > Unicode" is a near impossibility. In fairly recent times, two serious > efforts to translate *just *the core specification -- one in Japanese, > and a somewhat later attempt for Chinese -- crashed and burned, for a > variety of reasons. The core specification is huge, contains a lot of very > specific technical terminology that is difficult to translate, along with a > large collection of script- and language-specific detail, also hard to > translate. Worse, it keeps changing, with updates now coming out once every > year. Some large parts are stable, but it is impossible to predict what > sections might be impacted by the next year's encoding decisions. > > That is not including that fact that "the Unicode Standard" now also > includes 14 separate HTML (or XHTML) annexes, all of which are also moving > targets, along with the UCD data files, which often contain important > information in their headers that would also require translation. And then, > of course, there are the 2000+ pages of the formatted code charts, which > require highly specific and very complicated custom tooling and font usage > to produce. > > It would require a dedicated (and expensive) small army of translators, > terminologists, editors, programmers, font designers, and project managers > to replicate all of this into another language publication -- and then they > would have to do it again the next year, and again the next year, in > perpetuity. Basically, given the current situation, it would be a fool's > errand, more likely to introduce errors and inconsistencies than to help > anybody with actual implementation. > > People who want accessibility to the Unicode Standard in other languages > need to scale down their expectations considerably, and focus on preparing > reasonably short and succinct introductions to the terminology and > complexity involved in the full standard. Such projects are feasible. But a > full translation of "the Unicode Standard" simply is not. > > --Ken >