Re: Is the binaryness/textness of a data format a property?

2020-03-22 Thread Martin J . Dürst via Unicode
On 23/03/2020 03:56, Markus Scherer via Unicode wrote: > On Sat, Mar 21, 2020 at 12:35 PM Doug Ewell via Unicode > wrote: > >> I thought the whole premise of GB18030 was that it was Unicode mapped into >> a GB2312 framework. What characters exist in GB18030 that don't exist in >> Unicode, and

Re: Is the binaryness/textness of a data format a property?

2020-03-20 Thread Martin J . Dürst via Unicode
On 20/03/2020 23:41, Adam Borowski via Unicode wrote: > Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF or > U+11000..U+7FFF (or possibly even up to 2³⁶ or 2⁴²), which has its uses > but is not well-formed Unicode. This would definitely no longer be UTF-8! Martin.

Call for Papers: G21C Grapholinguistics in the 21st century, Paris June 2020

2020-01-06 Thread Martin J . Dürst via Unicode
Happy New Year to everybody on this list! Except for the Internationalization and Unicode Conference (see https://www.unicodeconference.org/; submission deadline March 6, 2020), this list very rarely sees calls for papers, but this one should definitely be of interest at least to a subset of

Re: Grapheme clusters and backspace (was Re: Coding for Emoji: how to modify programs to work with emoji)

2019-10-22 Thread Martin J . Dürst via Unicode
Hello Richard, others, On 2019/10/23 07:32, Richard Wordingham via Unicode wrote: > On Tue, 22 Oct 2019 23:27:27 +0200 > Daniel Bünzli via Unicode wrote: >> Just to make things clear. When you say character in your message, >> you consistently mean scalar value right ? > > Yes. > > I find it

Re: Unicode website glitches. (was The Most Frequent Emoji)

2019-10-11 Thread Martin J . Dürst via Unicode
Martin. > Mark > > > On Thu, Oct 10, 2019 at 11:50 PM Martin J. Dürst via Unicode < > unicode@unicode.org> wrote: > >> I had a look at the page with the frequencies. Many emoji didn't >> display, but that's my browser's problem. What was worse was that the >> sid

Fwd: The Most Frequent Emoji

2019-10-11 Thread Martin J . Dürst via Unicode
I had a look at the page with the frequencies. Many emoji didn't display, but that's my browser's problem. What was worse was that the sidebar and the stuff at the bottom was all looking weird. I hope this can be fixed. Regards, Martin. Forwarded Message Subject: The Most

Re: Manipuri/Meitei customary writing system

2019-10-04 Thread Martin J . Dürst via Unicode
On 2019/10/04 15:35, Martin J. Dürst via Unicode wrote: > Hello Markus, > > On 2019/10/04 01:53, Markus Scherer via Unicode wrote: >> Dear Unicoders, >> >> Is Manipuri/Meitei customarily written in Bangla/Bengali script or >> in Meitei script? >> >>

Re: Manipuri/Meitei customary writing system

2019-10-04 Thread Martin J . Dürst via Unicode
Hello Markus, On 2019/10/04 01:53, Markus Scherer via Unicode wrote: > Dear Unicoders, > > Is Manipuri/Meitei customarily written in Bangla/Bengali script or > in Meitei script? > > I am looking at > https://en.wikipedia.org/wiki/Meitei_language#Writing_systems which seems > to describe writing

Re: Emoji Haggadah

2019-04-16 Thread Martin J . Dürst via Unicode
Hello Mark, others, On 2019/04/16 12:18, Mark E. Shoulson via Unicode wrote: > Yes.  But the sentences aren't just symbolic representations of the > concepts or something.  They are frequently direct > transcriptions—usually by puns—for *English* sentences, so left-to-right > makes sense.  So

Re: Encoding italic

2019-02-09 Thread Martin J . Dürst via Unicode
On 2019/02/09 19:58, Richard Wordingham via Unicode wrote: > On Fri, 8 Feb 2019 18:08:34 -0800 > Asmus Freytag via Unicode wrote: >> Under the implicit assumptions bandied about here, the VS approach >> thus reveals itself as a true rich-text solution (font switching) >> albeit realized with

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Martin J . Dürst via Unicode
On 2019/01/31 07:02, Richard Wordingham via Unicode wrote: > On Wed, 30 Jan 2019 15:33:38 +0100 > Frédéric Grosshans via Unicode wrote: > >> Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit : >>> - It doesn't do Arabic shaping. In my recommendation I'm arguing >>> that in this mode,

Re: Encoding italic

2019-01-29 Thread Martin J . Dürst via Unicode
On 2019/01/28 05:03, James Kass via Unicode wrote: > > A new beta of BabelPad has been released which enables input, storing, > and display of italics, bold, strikethrough, and underline in plain-text > using the tag characters method described earlier in this thread.  This > enhancement is

Re: Encoding italic

2019-01-29 Thread Martin J . Dürst via Unicode
On 2019/01/24 23:49, Andrew West via Unicode wrote: > On Thu, 24 Jan 2019 at 13:59, James Kass via Unicode > wrote: > We were told time and time again when emoji were first proposed that > they were required for encoding for interoperability with Japanese > telecoms whose usage had spilled over

Re: Encoding italic

2019-01-17 Thread Martin J . Dürst via Unicode
On 2019/01/17 17:51, James Kass via Unicode wrote: > > On 2019-01-17 6:27 AM, Martin J. Dürst replied: > > ... > > Based by these data points, and knowing many of the people involved, my > > description would be that decisions about what to encode as characters &g

Re: Encoding italic

2019-01-16 Thread Martin J . Dürst via Unicode
On 2019/01/17 12:38, James Kass via Unicode wrote: > ( http://www.unicode.org/versions/Unicode11.0.0/ch02.pdf ) > > "Plain text must contain enough information to permit the text to be > rendered legibly, and nothing more." > > The argument is that italic information can be stripped yet

Re: A last missing link for interoperable representation

2019-01-14 Thread Martin J . Dürst via Unicode
On 2019/01/15 07:58, David Starner via Unicode wrote: > On Mon, Jan 14, 2019 at 2:09 AM Tex via Unicode wrote: >> ·Plain text still has tremendous utility and rich text is not always >> an option. > > Where? Twitter has the option of doing rich text, as does any closed > system. In

Re: A last missing link for interoperable representation

2019-01-14 Thread Martin J . Dürst via Unicode
On 2019/01/15 10:48, Mark E. Shoulson via Unicode wrote: > On 1/14/19 4:21 PM, Asmus Freytag via Unicode wrote: >> Short of that, I'm extremely leery of "leading" standardization; that >> is, encoding things that "might" be used. >> > It is certainly true that Unicode should not be (and

Re: A last missing link for interoperable representation

2019-01-14 Thread Martin J . Dürst via Unicode
Hello James, others, On 2019/01/14 15:24, James Kass via Unicode wrote: > > Martin J. Dürst wrote, > > > I'd say it should be conservative. As the meaning of that word > > (similar to others such as progressive and regressive) may be > > interpreted in vari

Re: A last missing link for interoperable representation

2019-01-14 Thread Martin J . Dürst via Unicode
Hello James, others, From the examples below, it looks like a feature request for Twitter (and/or Facebook). Blaming the problem on Unicode doesn't seem to be appropriate. Regards, Martin. On 2019/01/14 18:06, James Kass via Unicode wrote: > > Not a twitter user, don't know how popular

Re: A last missing link for interoperable representation

2019-01-13 Thread Martin J . Dürst via Unicode
On 2019/01/14 01:46, Julian Bradfield via Unicode wrote: > On 2019-01-12, Richard Wordingham via Unicode wrote: >> On Sat, 12 Jan 2019 10:57:26 + (GMT) >> And what happens when you capitalise a word for emphasis or to begin a >> sentence? Is it no longer the same word? > > Indeed. As has

Re: A last missing link for interoperable representation

2019-01-13 Thread Martin J . Dürst via Unicode
On 2019/01/13 13:24, James Kass via Unicode wrote: > > Mark E. Shoulson wrote, > > > This discussion has been very interesting, really.  I've heard what I > > thought were very good points and relevant arguments from both/all > > sides, and I confess to not being sure which I actually prefer.

Re: A last missing link for interoperable representation

2019-01-11 Thread Martin J . Dürst via Unicode
On 2019/01/11 16:13, James Kass via Unicode wrote: > Styled Latin text is being simulated with math alphanumerics now, which > means that data is being interchanged and archived.  That's the user > demand illustrated. Almost by definition, styled text isn't plain text, even if it's simulated

Re: A last missing link for interoperable representation

2019-01-10 Thread Martin J . Dürst via Unicode
On 2019/01/11 10:48, James Kass via Unicode wrote: > Is it true that many of the CJK variants now covered were previously > considered by the Consortium to be merely stylistic variants? What is a stylistic variant or not is quite a bit more complicated for CJK than for scripts such as Latin.

Re: A sign/abbreviation for "magister"

2018-10-31 Thread Martin J . Dürst via Unicode
On 2018/11/01 03:10, Marcel Schneider via Unicode wrote: > On 31/10/2018 at 17:27, Julian Bradfield via Unicode wrote: >> When one does question the Académie about the fact, this is their >> reply: >> >> Le fait de placer en exposant ces mentions est de convention >> typographique ; il convient

Re: A sign/abbreviation for "magister"

2018-10-31 Thread Martin J . Dürst via Unicode
On 2018/10/31 03:51, Marcel Schneider via Unicode wrote: > On 30/10/2018 at 18:59, Doug Ewell via Unicode wrote: >> >> Marcel Schneider wrote: >> >>> This use case is different from the use case that led to submit >>> the L2/18-206 proposal, cited by Dr Ewell on 29/10/2018 at 20:29: >> >> I guess

Re: A sign/abbreviation for "magister"

2018-10-29 Thread Martin J . Dürst via Unicode
On 2018/10/29 05:42, Michael Everson via Unicode wrote: > This is no different the Irish name McCoy which can be written MᶜCoy where > the raising of the c is actually just decorative, though perhaps it was once > an abbreviation for Mac. In some styles you can see a line or a dot under the >

Re: Fallback for Sinhala Consonant Clusters

2018-10-14 Thread Martin J. Dürst via Unicode
script, the Sinhala script uses its virama character as a vowel length indicator. Missing touching consonants are being rendered almost as though there were no ZWJ, but the combination of consonant and al-lakuna is being rendered badly. Richard. . -- Prof. Dr.sc. Martin J. Dürst Department

Re: Dealing with Georgian capitalization in programming languages

2018-10-09 Thread Martin J. Dürst via Unicode
Hello Ken, others, On 2018/10/03 06:43, Ken Whistler wrote: But it seems to me that the problem you are citing can be avoided if you simply rethink what your "capitalize" means. It really should be conceived of as first lowercasing the *entire* string, and then titlecasing the *eligible*

Re: Dealing with Georgian capitalization in programming languages

2018-10-04 Thread Martin J. Dürst via Unicode
Ken, Markus, Many thanks for your ideas, which I noted at https://bugs.ruby-lang.org/issues/14839. Regards, Martin. On 2018/10/03 06:43, Ken Whistler wrote: On 10/2/2018 12:45 AM, Martin J. Dürst via Unicode wrote: My questions here are: - Has this been considered when Georgian Mtavruli

Dealing with Georgian capitalization in programming languages

2018-10-02 Thread Martin J. Dürst via Unicode
Since the last discussion on Georgian (Mtavruli) on this mailing list, I have been looking into how to implement it in the Programming language Ruby. Ruby has four case-conversion operations for its class String: upcase: convert all characters to upper case downcase: convert all characters

Re: Shortcuts question

2018-09-16 Thread Martin J. Dürst via Unicode
On 2018/09/16 21:08, Marcel Schneider via Unicode wrote: An additional level of complexity is induced by ergonomics. so that most non-Latin layouts may wish to stick with QWERTY, and even ergonomic layouts in the footprints of August Dvorak rather than Shai Coleman are likely to offer

Re: UCD in XML or in CSV? (is: UCD in YAML)

2018-09-07 Thread Martin J. Dürst via Unicode
On 2018/09/08 04:47, Rebecca Bettencourt via Unicode wrote: On Fri, Sep 7, 2018 at 11:20 AM Philippe Verdy via Unicode < unicode@unicode.org> wrote: That version has been announced in the Windows 10 Hub several weeks ago. And it only took them 33 years. :) I used to joke that Notepad

Re: Diacritic marks in parentheses

2018-07-27 Thread Martin J. Dürst via Unicode
On 2018/07/27 01:27, Markus Scherer via Unicode wrote: I would not expect for Ä+combining () above = Ä᪻ to look right except with specialized fonts. http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%84%5Cu1ABB==0 Even if it worked widely, I think it would be confusing. Yes, for the moment.

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-05 Thread Martin J. Dürst via Unicode
Hello Rebecca, On 2018/06/05 12:43, Rebecca T via Unicode wrote: Something I’d love to see is translated keywords; shouldn’t be hard with a line in the cargo.toml for a ruidmentary lookup. Again, I’m of the opinion that an imperfect implementation is better than no attempt. I remember reading

Re: Hyphenation Markup

2018-06-02 Thread Martin J. Dürst via Unicode
Hello Richard, On 2018/06/02 20:37, Richard Wordingham via Unicode wrote: Am 2018-06-02 um 06:44 schrieb Richard Wordingham via Unicode: In Latin text, one can indicate permissible line break opportunities between grapheme clusters by inserting U+00AD SOFT HYPHEN. What low-end schemes, if

Re: Uppercase ß

2018-05-29 Thread Martin J. Dürst via Unicode
On 2018/05/29 17:15, Hans Åberg via Unicode wrote: On 29 May 2018, at 07:30, Asmus Freytag via Unicode wrote: An uppercase exists and it has formally been ruled as acceptable way to write this letter (mostly an issue for ALL CAPS as ß does not occur in word-initial position). A./ Duden

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-28 Thread Martin J. Dürst via Unicode
Hello Sundar, On 2018/05/28 04:27, SundaraRaman R via Unicode wrote: Hi, In languages like Ruby or Java (https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isAlphabetic(int)), functions to check if a character is alphabetic do that by looking for the 'Alphabetic' property

Re: Major vendors changing U+1F52B PISTOL  depiction from firearm to squirt gun

2018-05-23 Thread Martin J. Dürst via Unicode
On 2018/05/24 03:00, Michael Everson via Unicode wrote: I consider it a significant semantic shift from the intended meaning of the character in the source Japanese character set. Yes and no. I'd consider the semantic shift from a real pistol in a Japanese message to a real pistol in a

Re: Is the Editor's Draft public?

2018-04-20 Thread Martin J. Dürst via Unicode
On 2018/04/20 18:12, Martin J. Dürst wrote: There was an announcement for a public review period just recently. The review period is up to the 23rd of April. I'm not sure whether the announcement is up somewhere on the Web, but I'll forward it to you directly. Sorry, found the Web address

Re: Is the Editor's Draft public?

2018-04-20 Thread Martin J. Dürst via Unicode
Hello Henri, On 2018/04/20 17:15, Henri Sivonen via Unicode wrote: Is the Editor's Draft of the Unicode Standard visible publicly? Use case: Checking if things that I might send feedback about have already been addressed since the publication of Unicode 10.0. There was an announcement for a

Re: Fwd: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode

2018-04-02 Thread Martin J. Dürst via Unicode
On 2018/04/03 10:56, Mark E. Shoulson via Unicode wrote: Whew!  Thanks for explaining the joke! Everyone here really thought they were serious.  Maybe you should write to the authors of the RFC and explain to them that their growth-function is incorrect.  I'm sure they'd be glad of the

Fwd: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode

2018-04-01 Thread Martin J. Dürst via Unicode
Please enjoy. Sorry for being late with forwarding, at least in some parts of the world. Regards, Martin. Forwarded Message Subject: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode Date: Sun, 1 Apr 2018 08:29:00 -0700 (PDT) From: rfc-edi...@rfc-editor.org

Re: A sketch with the best-known Swiss tongue twister

2018-03-13 Thread Martin J. Dürst via Unicode
On 2018/03/09 21:24, Mark Davis ☕️ wrote: There are definitely many dialects across Switzerland. I think that for *this* phrase it would be roughly the same for most of the population, with minor differences (eg 'het' vs 'hät'). But a native speaker like Martin would be able to say for sure.

Re: A sketch with the best-known Swiss tongue twister

2018-03-13 Thread Martin J. Dürst via Unicode
On 2018/03/10 20:26, philip chastney via Unicode wrote: I would make the following observations on terminology in practice: -- the newspapers in Zurich advertised courses in "Hoch Deutsch", for those who needed to deal with foreigners This should probably be written 'the newspapers in

Re: base1024 encoding using Unicode emojis

2018-03-12 Thread Martin J. Dürst via Unicode
On 2018/03/12 02:07, Keith Turner via Unicode wrote: Yeah, it certainly results in larger utf8 strings. For example a sha256 hash is 112 bytes when encoded as Ecoji utf8. For base64, sha256 is 44 bytes. Even though its more bytes, Ecoji has less visible characters than base64 for sha256.

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-09 Thread Martin J. Dürst via Unicode
On 2018/03/09 10:22, Philippe Verdy via Unicode wrote: As well how Chinese/Japanese post offices handle addresses written with sinograms for personal names ? Is the expanded IDS form acceptable for them, or do they require using Romanized addresses, or phonetic approximations (Bopomofo in China,

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-09 Thread Martin J. Dürst via Unicode
On 2018/03/09 10:17, Philippe Verdy via Unicode wrote: This still leaves the question about how to write personal names ! IDS alone cannot represent them without enabling some "reasonable" ligaturing (they don't have to match the exact strokes variants for optimal placement, or with all possible

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-04 Thread Martin J. Dürst via Unicode
Hello John, On 2018/03/01 12:31, via Unicode wrote: Pen, or brush and paper is much more flexible. With thousands of names of people and places still not encoded I am not sure if I would describe hans (simplified Chinese characters) as well supported. nor with current policy which limits

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Martin J. Dürst via Unicode
On 2018/02/28 19:38, Janusz S. Bień via Unicode wrote: On Tue, Feb 27 2018 at 13:45 -0800, announceme...@unicode.org writes: The 157 new Emoji are now available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages. I'm quite curious what it the relation

Re: 0027, 02BC, 2019, or a new character?

2018-02-22 Thread Martin J. Dürst via Unicode
On 2018/02/21 12:15, Michael Everson via Unicode wrote: I absolutely disagree. There’s a whole lot of related languages out there, and the speakers share some things in common. Orthographic harmonization between these languages can ONLY help any speaker of one to access information in any of

Re: IDC's versus Egyptian format controls

2018-02-21 Thread Martin J. Dürst via Unicode
On 2018/02/17 08:25, James Kass via Unicode wrote: Some people studying Han characters use the IDCs to illustrate the ideographs and their components for various purposes. Well, as far as I understand, this was their original (and is still their main) purpose. For example: U-0002A8B8 ꢸ

Re: Why so much emoji nonsense?

2018-02-14 Thread Martin J. Dürst via Unicode
On 2018/02/15 10:49, James Kass via Unicode wrote: Yes, except that Unicode "supported" all manner of things being interchanged by setting aside a range of code points for private use. Which enabled certain cell phone companies to save some bandwidth by assigning various popular in-line

Re: Keyboard layouts and CLDR

2018-01-30 Thread Martin J. Dürst via Unicode
On 2018/01/30 16:18, Philippe Verdy via Unicode wrote: - Adding Y to the list of allowed letters after the dieresis deadkey to produce "Ÿ" : the most frequent case is L'HAŸE-LÈS-ROSES, the official name of a French municipality when written with full capitalisation, almost all spell checkers

Re: 0027, 02BC, 2019, or a new character?

2018-01-22 Thread Martin J. Dürst via Unicode
On 2018/01/23 09:55, James Kass via Unicode wrote: Any Kazakh/Qazaq student ambitious enough to study a foreign language such as English is already sophisticated enough to easily distinguish differing digraph values between the two languages. English speakers face distinctions such as the

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-21 Thread Martin J. Dürst via Unicode
On 2017/12/15 07:40, Richard Wordingham via Unicode wrote: On Mon, 11 Dec 2017 21:45:23 + Cibu Johny (സിബു) wrote: Malayalam could be a similar story. In case of Malayalam, it can be font specific because of the existence of traditional and reformed writing styles. A

Re: Word_Break for Hieroglyphs

2017-12-20 Thread Martin J. Dürst via Unicode
On 2017/12/20 17:46, Richard Wordingham via Unicode wrote: In an implementation that offered genuine whole word selection, and thus tackled with the challenges of Chinese, Japanese, Korean and Vietnamese (both scripts, not just CJKV) as well as Thai, I would expect the selections to be bounded

Interesting UTF-8 decoder

2017-10-09 Thread Martin J. Dürst via Unicode
A friend of mine sent me a pointer to http://nullprogram.com/blog/2017/10/06/, a branchless UTF-8 decoder. Regards, Martin.

Re: IBM 1620 invalid character symbol

2017-09-26 Thread Martin J. Dürst via Unicode
On 2017/09/26 22:03, John W Kennedy via Unicode wrote: I don’t know what your snippet is from, but the normally authoritative IBM manual, A26-5706-3, IBM 1620 CPU Model 1 (July, 1965) displays what is clearly the Cyrillic letter. Whether it should be regarded as that, or as a distinct

Re: Assamese and Unicode.

2017-09-05 Thread Martin J. Dürst via Unicode
Sorry for the long delay of this answer. On 2017/08/24 07:35, David Faulks via Unicode wrote: It appears that the Indian government will submit an 'Assamese' proposal. http://silchar.com/unicode-standard-for-assamese-in-the-offing/ Since everything I know about Assamese Script indicates that

Inadvertent copies of test data in L2/17-197 ?

2017-08-07 Thread Martin J. Dürst via Unicode
Hello Henry, I just had a look at http://www.unicode.org/L2/L2017/17197-utf8-retract.pdf to use the test data in there for Ruby. I was under the impression from previous looks at it that it contained a lot of test data. However, when I looked at the test data more carefully (I had read the

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-08-05 Thread Martin J. Dürst via Unicode
Hello Mark, On 2017/08/04 09:34, Mark Davis ☕️ wrote: FYI, the UTC retracted the following. Thanks for letting us know! Regards, Martin. *[151-C19 ] Consensus:* Modify the section on "Best Practices for Using FFFD" in section "3.9

Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-04 Thread Martin J. Dürst via Unicode
On 2017/06/02 04:54, Doug Ewell via Unicode wrote: Richard Wordingham wrote: even supporting 6-byte patterns just in case 20.1 bits eventually turn out not to be enough, Sorry to be late with this, but if 20.1 bits turn out to not be enough, what about 21 bits? That would still limit

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Martin J. Dürst via Unicode
Hello Karl, others, On 2017/05/27 06:15, Karl Williamson via Unicode wrote: On 05/26/2017 12:22 PM, Ken Whistler wrote: On 5/26/2017 10:28 AM, Karl Williamson via Unicode wrote: The link provided about the PRI doesn't lead to the comments. PRI #121 (August, 2008) pre-dated the practice of

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Martin J. Dürst via Unicode
Hello Markus, others, On 2017/05/27 00:41, Markus Scherer wrote: On Fri, May 26, 2017 at 3:28 AM, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote: But there's plenty in the text that makes it absolutely clear that some things cannot be included. In particular, it says The term “m

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-26 Thread Martin J. Dürst via Unicode
On 2017/05/25 09:22, Markus Scherer wrote: On Wed, May 24, 2017 at 3:56 PM, Karl Williamson <pub...@khwilliamson.com> wrote: On 05/24/2017 12:46 AM, Martin J. Dürst wrote: That's wrong. There was a public review issue with various options and with feedback, and the recommendation ha

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-24 Thread Martin J. Dürst via Unicode
On 2017/05/24 05:57, Karl Williamson via Unicode wrote: On 05/23/2017 12:20 PM, Asmus Freytag (c) via Unicode wrote: Adding a "recommendation" this late in the game is just bad standards policy. Unless I misunderstand, you are missing the point. There is already a recommendation listed in

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-23 Thread Martin J. Dürst via Unicode
f. Dr.sc. Martin J. Dürst Department of Intelligent Information Technology College of Science and Engineering Aoyama Gakuin University Fuchinobe 5-1-10, Chuo-ku, Sagamihara 252-5258 Japan

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Martin J. Dürst via Unicode
Hello everybody, [using this mail to in effect reply to different mails in the thread] On 2017/05/16 17:31, Henri Sivonen via Unicode wrote: On Tue, May 16, 2017 at 10:22 AM, Asmus Freytag wrote: Under what circumstance would it matter how many U+FFFDs you see?

Re: Proposal to add standardized variation sequences for chess notation

2017-04-12 Thread Martin J. Dürst via Unicode
On 2017/04/12 00:44, Philippe Verdy via Unicode wrote: Some Asian chess boards include also diagonal lines or dots on top of their crossing (notably 9x9 boards are subdivided into nine 3x3 subgroups by such dots). These chess boards do not alternate white and black "squares" ; beside this, the

Re: Unicode vs. Unikod

2017-04-10 Thread Martin J. Dürst via Unicode
ot;Unicodzie" (locative singular), moreover there is no doubt how to pronounce it. This is probably the reason why, to my surprise, the word was introduced also in some other Slavonic languages, e.g. https://en.wiktionary.org/wiki/Unikod. My point is only that both "What is Unicode" and Polish Wik

Re: Standaridized variation sequences for the Desert alphabet?

2017-04-06 Thread Martin J. Dürst
. ranging from abstract to concrete, and so on. On 29 Mar 2017, at 11:12, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote: - That suggests that IF this script is in current use, You don’t even know? You’re kidding, right? Everything is relative. And without being part of the user com

Re: Proposal to add standardized variation sequences for chess notation

2017-04-05 Thread Martin J. Dürst
On 2017/04/05 23:49, Michael Everson wrote: Oh, here is the answer to your question. It took me 15 seconds to change the background and text colour in Quark XPress. It has nothing to do with the proposal for variation sequences.

Re: Proposal to add standardized variation sequences for chess notation

2017-04-04 Thread Martin J. Dürst
On 2017/04/03 23:41, Kent Karlsson wrote: Hence the chess board lines should be displayed in a strong left-to-right context (either via bidi markup characters, or via some higher order bidi markup mechanism, such as the "bidi" attribute in HTML). Though in most cases (not Arabic/Hebrew/...

Re: Proposal to add standardized variation sequences for chess notation

2017-04-03 Thread Martin J. Dürst
On 2017/04/03 01:27, Richard Wordingham wrote: We seem to agree that it should be a graphic modification, rather than as semantic modification. The question I pose is, "Is it just a graphic modification in this case?". I'm not convinced that it is. A player starts with two

Re: Unicode Emoji 5.0 characters now final

2017-03-30 Thread Martin J. Dürst
On 2017/03/30 06:17, Christoph Päper wrote: Mark Davis ☕️ : That isn't really the case. In particular, vendors can propose adding additional subdivisions to the recommended list. Awesome, "vendors" can do that. (._.m) If I made an open-source emoji font that contained

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-29 Thread Martin J. Dürst
nd deciding to split because history is way more important than modern practice. In that light, some more comments lower down. On 2017/03/28 22:56, Michael Everson wrote: On 28 Mar 2017, at 11:39, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote: An æ ligature is a ligature of a and

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Martin J. Dürst
On 2017/03/29 01:47, Philippe Verdy wrote: 2017-03-28 18:30 GMT+02:00 Asmus Freytag : On 3/28/2017 6:56 AM, Michael Everson wrote: An æ ligature is a ligature of a and of e. It is not some sort of pretzel. We need a pretzel emoji. We need a broken tooth emoji too !

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Martin J. Dürst
Hello Doug, On 2017/03/29 03:41, Doug Ewell wrote: If this story sounds vaguely familiar to old-timers, it's exactly the path that was followed the last time Plane 14 tag characters were under discussion, between 1998 and 2000: someone wrote an RFC to embed language tags in plain text using

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Martin J. Dürst
On 2017/03/27 21:59, Michael Everson wrote: On 27 Mar 2017, at 08:05, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote: Consider 2EBC ⺼ CJK RADICAL MEAT and 2E9D ⺝ CJK RADICAL MOON which are apparently really supposed to have identical glyphs, though we use an old-fashioned style in the

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Martin J. Dürst
On 2017/03/28 01:20, Michael Everson wrote: Ken transcribes into modern type a letter by Shelton dated 1859, in which “boy” is written В<ЃІ>, “few” as Й<ІЋ>, “truefully” [sic] as ГС<ІЋ>ЙЋТІ, and “you” as Џ<ІЋ>. These are all 1859 variants, yes? That would just show that these variants

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Martin J. Dürst
On 2017/03/28 01:49, Michael Everson wrote: Sorry, but typographic control of that sort is grand for typesetting, where you can select ranges of text and language-tag it (assuming your program accepts and supports all the language tags you might need (which they don’t)) and you can select

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Martin J. Dürst
I agree with Alstair. The list of font technology options was mostly to show that there are already a lot of options (some might even say too many), so font technology doesn't really limit our choices. Regards, Martin. On 2017/03/27 23:04, Alastair Houghton wrote: On 27 Mar 2017, at

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Martin J. Dürst
Hello Michael, others, On 2017/03/27 21:07, Michael Everson wrote: On 27 Mar 2017, at 06:42, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote: The characters in question have different and undisputed origins, undisputed. If you change that to the somewhat more neutral "the shapes

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Martin J. Dürst
On 2017/03/28 01:03, Michael Everson wrote: On 27 Mar 2017, at 16:56, John H. Jenkins wrote: The 1857 St Louis punches definitely included both the 1855 EW Ч and the 1859 OI <ЃІ>. Ken Beesley shows them in smoke proofs in his 2004 paper on Metafont. Good to have some

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-27 Thread Martin J. Dürst
On 2017/03/24 23:37, Michael Everson wrote: On 24 Mar 2017, at 11:34, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote: On 2017/03/23 22:48, Michael Everson wrote: Indeed I would say to John Jenkins and Ken Beesley that the richness of the history of the Deseret alphabet would be impove

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-27 Thread Martin J. Dürst
On 2017/03/27 01:20, Michael Everson wrote: On 26 Mar 2017, at 16:45, Asmus Freytag wrote: Consider 2EBC ⺼ CJK RADICAL MEAT and 2E9D ⺝ CJK RADICAL MOON which are apparently really supposed to have identical glyphs, though we use an old-fashioned style in the charts

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Martin J. Dürst
On 2017/03/26 22:15, Michael Everson wrote: On 26 Mar 2017, at 09:12, Martin J. Dürst <due...@it.aoyama.ac.jp> wrote: Thats a good point: any disunification requires showing examples of contrasting uses. Fully agreed. The default position is NOT “everything is encoded unified

Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)

2017-03-26 Thread Martin J. Dürst
On 2017/03/25 03:33, Doug Ewell wrote: Philippe Verdy wrote: But Unicode just prefered to keep the roundtrip compatiblity with earlier 8-bit encodings (including existing ISO 8859 and DIN standards) so that "ü" in German and French also have the same canonical decomposition even if the

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Martin J. Dürst
On 2017/03/26 11:24, Philippe Verdy wrote: Thats a good point: any disunification requires showing examples of contrasting uses. Fully agreed. We haven't yet heard of any contrasting uses for the letter shapes we are discussing. Now depending on individual publications, authors would use

Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-24 Thread Martin J. Dürst
On 2017/03/23 22:32, Michael Everson wrote: What is right for Deseret has to be decided by and for Deseret users, rather than by script historians. Odd. That view doesn’t seem to be applicable to CJK unification. Well, it may not seem to you, but actually it is. I have had a lot of

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-24 Thread Martin J. Dürst
On 2017/03/23 22:48, Michael Everson wrote: Indeed I would say to John Jenkins and Ken Beesley that the richness of the history of the Deseret alphabet would be impoverished by treating the 1859 letters as identical to the 1855 letters. Well, I might be completely wrong, but John Jenkins

Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-23 Thread Martin J. Dürst
Hello Michael, others, [Fixed script name in subject.] On 2017/03/23 09:03, Michael Everson wrote: On 22 Mar 2017, at 21:39, David Starner wrote: There's the same characters here, written in different ways. No, it’s not. Its the same diphthong (a sound) written with

Re: Superscript and Subscript Characters in General Use

2017-01-11 Thread Martin J. Dürst
On 2017/01/11 17:32, Richard Wordingham wrote: The truly straight Unicode approach in HTML is to use 1945. Just entering those 5 characters into a text entry box in Firefox gave me a properly formatted vulgar fraction. That is how vulgar fractions are supposed to work. Unfortunately, one may

Re: WAP Pictogram Specification as Emoji Source

2017-01-06 Thread Martin J. Dürst
On 2017/01/07 08:21, Christoph Päper wrote: I just discovered the WAP Pictogram specification (WAP-213-WAPInterPic), last published in April 2001 and updated in November 2001. I haven’t found any reference or vendor-specific images, by the way, and if it wasn’t just used as an example

Re: IdnaTest.txt and RFC 5893

2017-01-04 Thread Martin J. Dürst
5893? Kind regards, Alastair. -- http://alastairs-place.net . -- Prof. Dr.sc. Martin J. Dürst Department of Intelligent Information Technology College of Science and Engineering Aoyama Gakuin University Fuchinobe 5-1-10, Chuo-ku, Sagamihara 252-5258 Japan

Re: Best practices for replacing UTF-8 overlongs

2016-12-19 Thread Martin J. Dürst
On 2016/12/20 11:35, Tex Texin wrote: Shawn, Ok, but that begs the questions of what to do... "All bets are off" is not instructive. Well, it may be instructive in that its difficult to get software to decide what happened. A human may be in a better position to analyze the error and the

Re: Mixed-Script confusables in prog.languages

2016-12-05 Thread Martin J. Dürst
On 2016/12/05 04:07, Philippe Verdy wrote: In more technical programming languages however, you can usually be much more restrictive as the identifiers used are generally abbreviated and simplified: you can kill lettercase differences for example, In some languages maybe. But languages such

Re: Mixed-Script confusables in prog.languages

2016-12-05 Thread Martin J. Dürst
On 2016/12/05 17:31, Reini Urban wrote: ψ_S contains Greek U+03C8, Common and Latin. Since Latin and Common are always allowed, the only new script is Greek. The first non-default script is automatically and silently allowed, only a mix with another non-default script, such as Cyrillic

Re: "Oh that's what you meant!: reducing emoji misunderstanding"

2016-11-18 Thread Martin J. Dürst
ly try it anyway. Best regards, James Kass -- Prof. Dr.sc. Martin J. Dürst Department of Intelligent Information Technology College of Science and Engineering Aoyama Gakuin University Fuchinobe 5-1-10, Chuo-ku, Sagamihara 252-5258 Japan

Re: Possible to add new precomposed characters for local language in Togo?

2016-11-15 Thread Martin J. Dürst
Hello Marcel, On 2016/11/12 07:35, Marcel Schneider wrote: For lack of anything better, and faced with Microsoftʼs one weekʼs silence, I now suggest to make a wider use of the Vietnamese text representation scheme that Microsoft implemented for Vietnamese, that is documented in TUS [1], and

  1   2   3   >