Re: Is the binaryness/textness of a data format a property?

2020-03-22 Thread Markus Scherer via Unicode
On Sat, Mar 21, 2020 at 12:35 PM Doug Ewell via Unicode wrote: > I thought the whole premise of GB18030 was that it was Unicode mapped into > a GB2312 framework. What characters exist in GB18030 that don't exist in > Unicode, and have they been proposed for Unicode yet, and why was none of > the

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-12 Thread Markus Scherer via Unicode
On Wed, Feb 12, 2020 at 11:37 AM Marius Spix via Unicode < unicode@unicode.org> wrote: > In my opinion, this is an invalid character, which should not be > included in Unicode. > Please remember that feedback that you want the committee to look at needs to go through

Fwd: ICU 66preview available

2019-12-05 Thread Markus Scherer via Unicode
Dear Unicoders, If you use ICU, then testing with ICU 66*preview* is a good way of trying out Unicode 13 *beta* . (Just please don't use these snapshots in production releases.) Best regards,

Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-02 Thread Markus Scherer via Unicode
On Mon, Dec 2, 2019 at 5:47 PM विश्वासो वासुकिजः (Vishvas Vasuki) via Unicode wrote: > But that says that the definitions are at >> > >> https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml >> , >> but all one currently gets from that is an error message 'XML

Re: Proposal to add Roman transliteration schemes to ISO 15924.

2019-12-02 Thread Markus Scherer via Unicode
On Mon, Dec 2, 2019 at 8:42 AM Roozbeh Pournader via Unicode < unicode@unicode.org> wrote: > You don't need an ISO 15924 script code. You need to think in terms of BCP > 47. Sanskrit in Latin would be sa-Latn. > Right! Now, if you want to distinguish the different transcription systems for >

Re: Encoding the Nsibidi script (African) for writing the Igbo language

2019-11-11 Thread Markus Scherer via Unicode
On Mon, Nov 11, 2019 at 4:03 AM Philippe Verdy via Unicode < unicode@unicode.org> wrote: > But first there's still no code in ISO 15924 (first step easy to complete > before encoding in the UCS). > That's not first; it's nearly last. The script code standard says "In general, script codes shall

Re: Pure Regular Expression Engines and Literal Clusters

2019-10-11 Thread Markus Scherer via Unicode
On Fri, Oct 11, 2019 at 12:05 PM Richard Wordingham via Unicode < unicode@unicode.org> wrote: > On Thu, 10 Oct 2019 15:23:00 -0700 > Markus Scherer via Unicode wrote: > > > [c \q{ch}]h should work like (ch|c)h. Note that the order matters in > > the alternation --

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-11 Thread Markus Scherer via Unicode
On Fri, Oct 11, 2019 at 4:37 AM Fred Brennan via Unicode < unicode@unicode.org> wrote: > Many users are asking me and I'm not sure of the answer (nor how to find > it > out). > You can find out by looking at the data files that are being developed for Unicode 13. Look at the latest

Re: Pure Regular Expression Engines and Literal Clusters

2019-10-10 Thread Markus Scherer via Unicode
On Tue, Oct 8, 2019 at 7:28 AM Richard Wordingham via Unicode < unicode@unicode.org> wrote: > An example UTS#18 gives for matching a literal cluster can be simplified > to, in its notation: > > [c \q{ch}] > > This is interpreted as 'match against "ch" if possible, otherwise > against "c". Thus

Re: Manipuri/Meitei customary writing system

2019-10-04 Thread Markus Scherer via Unicode
On Fri, Oct 4, 2019 at 2:05 PM Richard Wordingham via Unicode < unicode@unicode.org> wrote: > > >> Is the use of the Meitei script aspirational or customary? > > >> Which script is being used for major newspapers, popular books, > > >> and video captions? > > > > > > This may give you some more

Manipuri/Meitei customary writing system

2019-10-03 Thread Markus Scherer via Unicode
Dear Unicoders, Is Manipuri/Meitei customarily written in Bangla/Bengali script or in Meitei script? I am looking at https://en.wikipedia.org/wiki/Meitei_language#Writing_systems which seems to describe writing practice in transition, and I can't quite tell where it stands. Is the use of the

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Markus Scherer via Unicode
There are lots of ways to implement the UCA. When you want fast string comparison, the zero weights are useful for processing -- and you don't actually assemble a sort key. People who want sort keys usually want them to be short, so you spend time on compression. You probably also build sort

Re: Dealing with Georgian capitalization in programming languages

2018-10-02 Thread Markus Scherer via Unicode
On Tue, Oct 2, 2018 at 12:50 AM Martin J. Dürst via Unicode < unicode@unicode.org> wrote: > ... The only > operation that can cause problems is 'capitalize'. > > When I say "cause problems", I mean producing mixed-case output. I > originally thought that 'capitalize' would be fine. It is fine for

Re: Diacritic marks in parentheses

2018-07-26 Thread Markus Scherer via Unicode
I would not expect for Ä+combining () above = Ä᪻ to look right except with specialized fonts. http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%84%5Cu1ABB==0 Even if it worked widely, I think it would be confusing. I think you are best off writing Arzt/Ärztin. Viele Grüße, markus

Re: preliminary proposal: New Unicode characters for Arabic music half-flat and half-sharp symbols

2018-05-15 Thread Markus Scherer via Unicode
On Tue, May 15, 2018 at 10:47 AM, Johnny Farraj via Unicode < unicode@unicode.org> wrote: > Dear Unicode list members, > > I wish to get feedback about a new symbol submission proposal. > Just to clarify, this is a discussion list where you may get some useful feedback. This is not where you

Re: [Unicode] Re: Fonts and font sizes used in the Unicode

2018-03-05 Thread Markus Scherer via Unicode
On Mon, Mar 5, 2018 at 9:03 AM, suzuki toshiya via Unicode < unicode@unicode.org> wrote: > I have a question; if some people try to make a > translated version of Unicode, they should contact > all font contributors and ask for the license? > Unicode Consortium cannot give any sublicense? > If

Re: Fonts and font sizes used in the Unicode

2018-03-04 Thread Markus Scherer via Unicode
On Sun, Mar 4, 2018 at 6:10 AM, Helena Miton via Unicode < unicode@unicode.org> wrote: > Greetings. Is there a way to know which font and font size have been used > in the Unicode charts (for various writing systems)? Many thanks! > What are you trying to do? Many of the fonts are unique to the

Re: Emoji blooper

2018-02-13 Thread Markus Scherer via Unicode
On my machine (Chromebox+Gmail), the trumpets point down to the lower left. If you want to convey precise images, then send images... markus

Re: Internationalization & Unicode Conference 2018

2018-01-24 Thread Markus Scherer via Unicode
If your presentation is accepted for the conference, you should get a hotel discount. markus

Re: Minimal Implementation of Unicode Collation Algorithm

2017-12-04 Thread Markus Scherer via Unicode
On Mon, Dec 4, 2017 at 5:30 AM, Richard Wordingham via Unicode < unicode@unicode.org> wrote: > May a collation algorithm that always compares all strings as equal be a > compliant implementation of the Unicode Collation Algorithm (UTS #10)? > If not, by which clause is it not compliant?

Re: implicit weight base for U+2CEA2

2017-09-27 Thread Markus Scherer via Unicode
On Wed, Sep 27, 2017 at 4:07 PM, James Tauber wrote: > Ah yes, I was just going by membership in the CJK Unified Ideographs > Extension E block, not actual assignment. > > So the lack of assignment means it should fail the Unified_Ideograph > membership in

Re: implicit weight base for U+2CEA2

2017-09-27 Thread Markus Scherer via Unicode
On Wed, Sep 27, 2017 at 1:49 PM, James Tauber via Unicode < unicode@unicode.org> wrote: > I recently updated pyuca[1], my pure Python implementation of the Unicode > Collation Algorithm to work with 8.0.0, 9.0.0, and 10.0.0 but to get all > the tests to work, I had to special case the implicit

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-09-23 Thread Markus Scherer via Unicode
FYI, I changed the ICU behavior for the upcoming ICU 60 release (pending code review). Proposal & description: https://sourceforge.net/p/icu/mailman/message/35990833/ Code changes: http://bugs.icu-project.org/trac/review/13311 Best regards, markus On Thu, Aug 3, 2017 at 5:34 PM, Mark Davis ☕️

Re: Emoji Space

2017-07-17 Thread Markus Scherer via Unicode
On Mon, Jul 17, 2017 at 5:25 AM, Christoph Päper via Unicode < unicode@unicode.org> wrote: > As you may know, the combined original Japanese emoji set included three > whitespace characters: one was the full width of a (square) emoji, one was > half that and the last one was a quarter blank.

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-06-03 Thread Markus Scherer via Unicode
On Wed, May 31, 2017 at 5:12 AM, Henri Sivonen wrote: > On Sun, May 21, 2017 at 7:37 PM, Mark Davis ☕️ via Unicode > wrote: > > There is plenty of time for public comment, since it was targeted at > Unicode > > 11, the release for about a year from

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-26 Thread Markus Scherer via Unicode
On Fri, May 26, 2017 at 3:28 AM, Martin J. Dürst wrote: > But there's plenty in the text that makes it absolutely clear that some > things cannot be included. In particular, it says > > > The term “maximal subpart of an ill-formed subsequence” refers to the code >

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-24 Thread Markus Scherer via Unicode
On Wed, May 24, 2017 at 3:56 PM, Karl Williamson wrote: > On 05/24/2017 12:46 AM, Martin J. Dürst wrote: > >> That's wrong. There was a public review issue with various options and >> with feedback, and the recommendation has been implemented and in use >> widely (among

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-23 Thread Markus Scherer via Unicode
On Tue, May 23, 2017 at 7:05 AM, Asmus Freytag via Unicode < unicode@unicode.org> wrote: > So, if the proposal for Unicode really was more of a "feels right" and not > a "deviate at your peril" situation (or necessary escape hatch), then we > are better off not making a RECOMMEDATION that goes

Re: Comparing Raw Values of the Age Property

2017-05-22 Thread Markus Scherer via Unicode
On Mon, May 22, 2017 at 2:44 PM, Richard Wordingham via Unicode < unicode@unicode.org> wrote: > Given two raw values of the Age property, defined in UCD file > DerivedAge.txt, how is a computer program supposed to compare them? > Apart from special handling for the value "Unassigned" and its

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Markus Scherer via Unicode
Let me try to address some of the issues raised here. The proposal changes a recommendation, not a requirement. Conformance applies to finding and interpreting valid sequences properly. This includes not consuming parts of valid sequences when dealing with illegal ones, as explained in the