Re: New Unicode Working Group: Message Formatting

2020-01-14 Thread Philippe Verdy via Unicode
People name are NOT transliterated freely. It's up to each person to document his romanized name, it should not be invented by automatic processes. And frequently the romanized name (officialized) does noit match the original name in another script: this is very frequent for Chinese people, as well

Re: Geological symbols

2020-01-13 Thread Philippe Verdy via Unicode
It is possible with some other markup languages, including HTML by using ruby notation and other interlinear notations for creating special vertical layouts inside an horizontal line. There are difficulties however caused by line wraps which may occur before the vertical layout, or even inside it

Re: New Unicode Working Group: Message Formatting

2020-01-11 Thread Philippe Verdy via Unicode
You seem to have never seen how translation packages work and are used in common projects (not just CLDR, but you could find them as well in Wikimedia projects, or translation packages for lot of open source packages). The purpose is to allow translating the UI of these applications for user's dema

Re: emojis for mouse buttons?

2020-01-01 Thread Philippe Verdy via Unicode
te triangle > > pointing up inside) > > MOUSE SCROLL DOWN (mouse with middle button black and white triangle > > pointing down inside) > > > > These characters are pretty useful in software manuals, training > > materials and user interfaces. > > > > Happ

Re: emojis for mouse buttons?

2019-12-31 Thread Philippe Verdy via Unicode
Playing with the fiolling of the middle cell to mean a double click is a bad idea, it would be better to add one or two rounded borders separated from the button, or simply display two icons in sequence for a double click). Note that the glyphs do not necessarily have to show a mouse, it could as

Re: emojis for mouse buttons?

2019-12-31 Thread Philippe Verdy via Unicode
n three cells by horizontal and vertical strokes, and one of the three cells filled (representing the wire or the wireless waves is not necessary). Le mar. 31 déc. 2019 à 14:57, Shriramana Sharma a écrit : > Why are these called "emojis" for mouse buttons rather than just > "

emojis for mouse buttons?

2019-12-31 Thread Philippe Verdy via Unicode
A lot of application need to document their keymap and want to display keys. For now there are emojis for mouses (several variants: 1, 2 or 3 buttons), independently of the button actually pressed. However there's no simple emoji to represent the very common mouse click buttons used in lot of UI.

Re: Encoding the Nsibidi script (African) for writing the Igbo language

2019-11-11 Thread Philippe Verdy via Unicode
Le lun. 11 nov. 2019 à 17:31, Markus Scherer a écrit : > We generally assign the script code when the script is in the pipeline for > a near-future version of Unicode, which demonstrates that it's "a candidate > for encoding". We also want the name of the script to be settled, so that > the scrip

Re: Encoding the Nsibidi script (African) for writing the Igbo language

2019-11-11 Thread Philippe Verdy via Unicode
ation across countries or caused by wars, invasions, diplomacy, or commercial interests) Le lun. 11 nov. 2019 à 17:31, Markus Scherer a écrit : > On Mon, Nov 11, 2019 at 4:03 AM Philippe Verdy via Unicode < > unicode@unicode.org> wrote: > >> But first there's still no code

Encoding the Nsibidi script (African) for writing the Igbo language

2019-11-11 Thread Philippe Verdy via Unicode
Encoding the Nsibidi script (African) for writing the Efik, Ekoi, Ibibio, Igbo language. See this site as an example of use, with links to published educational books. http://blog.nsibiri.org/ Also this online dictionary: https://fr.scribd.com/doc/281219778/Ikpokwu Other links: https://en.wikiped

Re: comma ellipses

2019-10-07 Thread Philippe Verdy via Unicode
Commas may be used instead of dots by users of French keyboards (it's easier to type the comma, when the dot/full stop requires pressing the SHIFT key). I may be wrong, but I've quire frequently seen commas or semicolons instead of dot/full stops under normal orthography. But the web and notably so

Re: Acute/apostrophe diacritic in Võro for palatalized consonants

2019-08-19 Thread Philippe Verdy via Unicode
I must add that the current version of Wikipedia in Võro, seems to have completely renounced to encode this combining mark (no acute, no apostrophe), probably because of lack of proper encoding in Unicode and difficulty to harmonize its orthography. It may be a good argument for the addition of th

Re: Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Philippe Verdy via Unicode
t;> Anshuman Pandey did preliminary research on this in 2011. >> >> http://www.unicode.org/L2/L2011/11144-magar-akkha.pdf >> >> It would be premature to assign an ISO 15924 script code, pending the >> research to determine whether this script should be separately encoded

Re: Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Philippe Verdy via Unicode
Also we can note that "mgp" (Eastern Magari) is severely endangered according to multiple sources include Ethnologue and the Linguist List. This is still not the case for Western Magari (mostly on Nepal, not in Sikkim India), where evidence is probably easier to find (where the encoding of a new sc

Re: Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Philippe Verdy via Unicode
Le lun. 22 juil. 2019 à 18:43, Ken Whistler a écrit : > See the entry for "Magar Akkha" on: > > http://linguistics.berkeley.edu/sei/scripts-not-encoded.html > > Anshuman Pandey did preliminary research on this in 2011. > That's what I said: 8 years ago already. > http://www.unicode.org/L2/L201

Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Philippe Verdy via Unicode
According to Ethnolog, the Eastern Magar language (mgp) is written in two scripts: Devanagari and "Akkha". But the "Akkha" script does not seem to have any ISO 15924 code. The Ethnologue currently assigns a private use code (Qabl) for this script. Was the addition delayed due to lack of evidence

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-20 Thread Philippe Verdy via Unicode
écrit : > > On 7/17/2019 4:54 PM, Philippe Verdy via Unicode wrote: > > then the Unicode version (age) used for Hieroglyphs should also be > assigned to Hieratic. > > It is already. > > > In fact the ligatures system for the "cursive" Egyptian Hieratic is s

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-17 Thread Philippe Verdy via Unicode
But my concern is in fact valid as well for Egyptian Hieratic (considered in Chapter 14 to be "unified" with the Hieroglyphs, and being a cursive variant, currently not supported in any font because of the very complex set of ligatures this would require, and that may not even work properly with th

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-17 Thread Philippe Verdy via Unicode
Sorry I misread (with an automated tool) an old dataset where these "3.0" versions were indicated in an incorrect form Le jeu. 18 juil. 2019 à 01:07, Philippe Verdy a écrit : > Note also that there are variants registered with Unicode versions (Age) > for symbols, even if they don't have any ass

Re: ISO 15924 : missing indication of support for Syriac variants

2019-07-17 Thread Philippe Verdy via Unicode
Note also that there are variants registered with Unicode versions (Age) for symbols, even if they don't have any assigned Unicode alias, but this is not a problem. 994 Zinh Code for inherited script codet pour écriture héritée Inherited 2009-02-23 995 *Zmth

ISO 15924 : missing indication of support for Syriac variants

2019-07-17 Thread Philippe Verdy via Unicode
The ISO 15924/RA reference page contains indication of support in Unicode for variants of various scripts such as Aran, Latf, Latg, Hanb, Hans, Hant:. 160 *Arab* Arabic arabe Arabic 1.1 2004-05-01 161 *Aran* Arabic (Nastaliq variant) arabe (variante nastalique) 1.1 2014-11-15 ... 503 *Hanb* Han wit

Fwd: Numeric group separators and Bidi

2019-07-09 Thread Philippe Verdy via Unicode
> Well my first feeling was that U+202F should work all the time, but I > found cases where this is not always the case. So this must be bugs in > those renderers. > I think we can attribute these bugs to the fact that this character is insufficiently known, and not even accessible in most input t

Re: Numeric group separators and Bidi

2019-07-09 Thread Philippe Verdy via Unicode
2F is a correct choice if you need > the look of a narrow space. > > Another possibility is to embed the number in a LRI...PDI block, as > e.g. https://unicode.org/cldr/utility/bidic.jsp does with the "1–3%" > fragment of its default example. > > cheers, > egmont &

Numeric group separators and Bidi

2019-07-09 Thread Philippe Verdy via Unicode
Is there a narrow space usable as a numeric group separator, and that also has the same bidi property as digits (i.e. neutral outside the span of digits and separators, but inheriting the implied directionality of the previous digit) ? I can't find a way to use narrow spaces instead of punctuation

Re: Unicode "no-op" Character?

2019-07-03 Thread Philippe Verdy via Unicode
Also consider that C0 controls (like STX and ETX) can already be used for packetizing, but immediately comes the need for escaping (DLE has been used for that goal, jsut before the character to preserve in the stream content, notably before DLE itself, or STX and ETX). There's then no need at all o

Re: Unicode "no-op" Character?

2019-07-03 Thread Philippe Verdy via Unicode
Le mer. 3 juil. 2019 à 06:09, Sławomir Osipiuk a écrit : > I don’t think you understood me at all. I can packetize a string with any > character that is guaranteed not to appear in the text. > Your goal is **impossible** to reach with Unicode. Assume sich character is "added" to the UCS, then it

Re: Unicode "no-op" Character?

2019-06-29 Thread Philippe Verdy via Unicode
If you want to "packetize" arbitrarily long Unicode text, you don't need any new magic character. Just prepend your packet with a base character used as a syntaxic delimiter, that does not combine with what follows in any normalization. There's a fine character for that: the TAB control. Except th

Symbols of colors used in Portugal for transport

2019-04-27 Thread Philippe Verdy via Unicode
A very useful think to add to Unicode (for colorblind people) ! http://bestinportugal.com/color-add-project-brings-color-identification-to-the-color-blind Is it proposed to add as new symbols ?

Re: Emoji Haggadah

2019-04-19 Thread Philippe Verdy via Unicode
I cannot; definitely it requires first good knowldge of English (to find possible synonyms, plus phonetic approximations, including using abbreviatable words), and Hebrew culture (to guess names and the context). All this text looks completely random and makes no sense otherwise. Le mar. 16 avr. 2

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-17 Thread Philippe Verdy via Unicode
Le ven. 8 févr. 2019 à 13:56, Egmont Koblinger a écrit : > Philippe, I hate do say it, but at the risk of being impolite, I just > have to. > Resist this idea, I've not been impolite. I just want to show you that terminals are legacy environments that are far behind what is needed for proper int

Re: Bidi paragraph direction in terminal emulators

2019-02-14 Thread Philippe Verdy via Unicode
Le mar. 12 févr. 2019 à 14:16, Egmont Koblinger via Unicode < unicode@unicode.org> a écrit : > > There is nothing magic about the grid of cells, and once you introduce > new escape sequences, you might as well truly modernise the terminal. > > The magic about the grid of cells is all the software

Re: Encoding colour (from Re: Encoding italic)

2019-02-11 Thread Philippe Verdy via Unicode
Le dim. 10 févr. 2019 à 02:33, wjgo_10...@btinternet.com via Unicode < unicode@unicode.org> a écrit : > Previously I wrote: > > > A stateful method, though which might be useful for plain text streams > > in some applications, would be to encode as characters some of the > > glyphs for indicating

Re: Encoding italic

2019-02-11 Thread Philippe Verdy via Unicode
Le dim. 10 févr. 2019 à 16:42, James Kass via Unicode a écrit : > > Philippe Verdy wrote, > > >> ...[one font file having both italic and roman]... > > The only case where it happens in real fonts is for the mapping of > > Mathematical Symbols which have a distinct encoding for some > > varia

Re: Bidi paragraph direction in terminal emulators

2019-02-10 Thread Philippe Verdy via Unicode
Le sam. 9 févr. 2019 à 20:55, Egmont Koblinger via Unicode < unicode@unicode.org> a écrit : > Hi Asmus, > > > On quick reading this appears to be a strong argument why such emulators > will > > never be able to be used for certain scripts. Effectively, the model > described works > > well with any

Re: Encoding italic

2019-02-10 Thread Philippe Verdy via Unicode
Le dim. 10 févr. 2019 à 05:34, James Kass via Unicode a écrit : > > Martin J. Dürst wrote, > > >> Isn't that already the case if one uses variation sequences to choose > >> between Chinese and Japanese glyphs? > > > > Well, not necessarily. There's nothing prohibiting a font that includes >

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Philippe Verdy via Unicode
Adding a single bit of protection in cell attributes to indicate they are either protected or become transparent (and the rest of the attributes/character field indicates the id of another terminal grid or rendering plugin crfeating its own layer and having its own scrolling state and dimensions) c

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Philippe Verdy via Unicode
Le jeu. 7 févr. 2019 à 19:38, Egmont Koblinger a écrit : > As you can see from previous discussions, there's a whole lot of > confusion about the terminology. And it was exactly the subject of my first message sent to this thread ! you probably missed it. > Philippe, with all due respect, I h

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-07 Thread Philippe Verdy via Unicode
Le jeu. 7 févr. 2019 à 13:29, Egmont Koblinger a écrit : > Hi Philippe, > > > There's some rules for correct display including with Bidi: > > In what sense are these "rules"? Where are these written, in what kind > of specification or existing practice? > "Rules" are not formally written, they a

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-06 Thread Philippe Verdy via Unicode
d one, but in most > terminals, a newline is not such a control function.) > > Anyway, please also see my previous email; I hope that clarifies a lot > for you, too. > > > cheers, > egmont > > On Tue, Feb 5, 2019 at 5:53 PM Philippe Verdy via Unicode > wrote: > >

Re: Bidi paragraph direction in terminal emulators (was: Proposal for BiDi in terminal emulators)

2019-02-05 Thread Philippe Verdy via Unicode
I think that before making any decision we must make some decision about what we mean by "newlines". There are in fact 3 different functions: - (1) soft line breaks (which are used to enforce a maximum display width between paragraph margins): these are equivalent to breakable and compressible whit

Re: Proposal for BiDi in terminal emulators

2019-02-02 Thread Philippe Verdy via Unicode
Actually not all U+E0020 through U+E007E are "un-deprecated" for this use. For now emoji flags only use: - U+E0041 through U+E005A (mapping to ASCII letters A through Z used in 2-letter ISO3166-1 codes). These are usable in pairs, without requiring any modifier (and only for ISO3166-1 registered

Re: Encoding italic

2019-02-01 Thread Philippe Verdy via Unicode
the proposal would contradict the goals of variation selectors and would pollute ther variation sequences registry (possibly even creating conflicts). And if we admit it for italics, than another VSn will be dedicated to bold, and another for monospace, and finally many would follow for various sty

Re: Encoding italic

2019-01-28 Thread Philippe Verdy via Unicode
So you used "bold I.e, you converted from ASCII to tag characters the full HTML sequences "" and "", including the HTML element name. I see little interest for that approach. Additionally this means that U+E003C is the tag identifier and its scope does not end for the rest of the text (the HTML c

Re: Encoding italic

2019-01-27 Thread Philippe Verdy via Unicode
You're not very explicit about the Tag encoding you use for these styles. Of course it must not be a language tag so the introducer is not U+E0001, or a cancel-all tag so it is not prefixed by U+E007F It cannot also use letter-like, digit-like and hyphen-like tag characters for its introduction. S

Re: Ancient Greek apostrophe marking elision

2019-01-27 Thread Philippe Verdy via Unicode
For Volapük, it looks much more like U+02BE (right half ring modifier letter) than like U+02BC (apostrophe "modifier" letter). according to the PDF on https://archive.org/details/cu31924027111453/page/n12 The half ring makes a clear distinction with the regular apostrophe (for elisions) or quotati

Re: Encoding italic (was: A last missing link)

2019-01-17 Thread Philippe Verdy via Unicode
If encoding italics means reencoding the normal linguistic usage, it is no ! We already have the nightmares caused by partial encoding of Latin and Greek (als a few Hebrew characters) for maths notations or IPA notations, but they are restricted to a well delimited scope of use and subset, and at l

Re: NNBSP (was: A last missing link for interoperable representation)

2019-01-17 Thread Philippe Verdy via Unicode
Le jeu. 17 janv. 2019 à 05:01, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > On 16/01/2019 21:53, Richard Wordingham via Unicode wrote: > > > > On Tue, 15 Jan 2019 13:25:06 +0100 > > Philippe Verdy via Unicode wrote: > > > >> If yo

Re: A last missing link for interoperable representation

2019-01-15 Thread Philippe Verdy via Unicode
Note that even if this NNBSP character is not mapped in a font, it should be rendered correctly with all modern renderers (the mapping is necessary only when a font design wants to tune its metrics, because its width varies between 1/8 and 1/6 em (the narrow space is a bit narrower in traditional E

Re: A last missing link for interoperable representation

2019-01-15 Thread Philippe Verdy via Unicode
Le lun. 14 janv. 2019 à 20:25, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > On 14/01/2019 06:08, James Kass via Unicode wrote: > > > > Marcel Schneider wrote, > > > >> There is a crazy typeface out there, misleadingly called 'Courier > >> New', as if the foundry didn’t anticipat

Re: UCA unnecessary collation weight 0000

2018-11-04 Thread Philippe Verdy via Unicode
roperties, algorithms, CLDR, ICU) that have a higher > benefit. > > You can continue flogging this horse all you want, but I'm muting this > thread (and I suspect I'm not the only one). > > Mark > > > On Sun, Nov 4, 2018 at 2:37 AM Philippe Verdy via Unicode <

Re: Encoding

2018-11-04 Thread Philippe Verdy via Unicode
I can take another example about what I call "legacy encoding" (which really means that such encoding is just an "approximation" from which no semantic can be clearly infered, except by using a non-determinist heuristic, which can frequently make "false guesses"). Consider the case of the legacy H

Re: Encoding

2018-11-04 Thread Philippe Verdy via Unicode
Note also that some other scripts have their own dedicated "abbreviation mark" encoded, but as distinctive punctuations or modifier letters: they are NOT combining. I do not advocate changing these scripts at all. As well I don't propose to instruct authors to use an after Latin/Greek/Letters/Ara

Re: Encoding

2018-11-04 Thread Philippe Verdy via Unicode
Le dim. 4 nov. 2018 à 18:34, Marcel Schneider a écrit : > On 04/11/2018 17:45, Philippe Verdy wrote: > Marcel > * As already repeatedly stated, I’m taking the one bit where TUS states > that all natural languages shall be given a semantically unambiguous (ie > not introducing new ambiguity) and i

Re: Encoding

2018-11-04 Thread Philippe Verdy via Unicode
Le dim. 4 nov. 2018 à 18:34, Marcel Schneider a écrit : > On 04/11/2018 17:45, Philippe Verdy wrote: > Beyond that, the problem with *COMBINING ABBREVIATION MARK is that it > needs OpenType support to work, while direct encoding of preformatted > superscripts and use as abbreviation indicators fo

Re: Encoding (was: Re: A sign/abbreviation for "magister")

2018-11-04 Thread Philippe Verdy via Unicode
Note that I actually propose not just one rendering for the but two possible variants (that would be equally valid withou preference). Use it after any base cluster (including with diacritics if needed, like combining underlines). - the first one can be to render the previous cluster as superscrip

Re: UCA unnecessary collation weight 0000

2018-11-03 Thread Philippe Verdy via Unicode
Le ven. 2 nov. 2018 à 22:27, Ken Whistler a écrit : > > On 11/2/2018 10:02 AM, Philippe Verdy via Unicode wrote: > > I was replying not about the notational repreentation of the DUCET data > table (using [....] unnecessarily) but about the text of UTR#10 itself. > Wh

Re: UCA unnecessary collation weight 0000

2018-11-03 Thread Philippe Verdy via Unicode
Le ven. 2 nov. 2018 à 22:27, Ken Whistler a écrit : > > On 11/2/2018 10:02 AM, Philippe Verdy via Unicode wrote: > > I was replying not about the notational repreentation of the DUCET data > table (using [....] unnecessarily) but about the text of UTR#10 itself. > Wh

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
It should be noted that the algorithmic complexity for this NFLD normalization ("legacy") is exactly the same as for NFKD ("compatibility"). However NFLD is versioned (like also NFLC), so NFLD can take a second parameter: the maximum Unicode version which can be used to filter which decomposition

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
Le sam. 3 nov. 2018 à 23:36, Philippe Verdy a écrit : > - this new decomposition mapping file for NFLC and NFLD, where NFLC is >> defined to be NFC(NFLD), has some stability requirements and it must be >> warrantied that NFD(NFLD) = NFD >> > Oops! fix my typo: it must be warrantied that NFD(NFLD)

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
> > Unlike NFKC and NFKD, the NFLC and NFLD would be an extensible superset > based on MUTABLE character properties (this can also be "decompositions > mappings" except that once a character is added to the new property file, > they won't be removed, and can have some stability as well, where the >

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
I can give other interesting examples about why the Unicode "character encoding model" is the best option Just consider how the Hangul alphabet is (now) encoded: its consonnant letters are encoded "twice" (leading and trailing jamos) because they carry semantic distinctions for efficient processin

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
ome names (not enough for most users to be ware that > variations can be encoded explicitly and compliantly) > > > Le sam. 3 nov. 2018 à 20:41, Philippe Verdy a écrit : > >> >> >> Le ven. 2 nov. 2018 à 20:01, Marcel Schneider via Unicode < >> unicode@unicode.o

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
20:41, Philippe Verdy a écrit : > > > Le ven. 2 nov. 2018 à 20:01, Marcel Schneider via Unicode < > unicode@unicode.org> a écrit : > >> On 02/11/2018 17:45, Philippe Verdy via Unicode wrote: >> [quoted mail] >> > >> > Using variation selectors is o

Re: A sign/abbreviation for "magister"

2018-11-03 Thread Philippe Verdy via Unicode
Le ven. 2 nov. 2018 à 20:01, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > On 02/11/2018 17:45, Philippe Verdy via Unicode wrote: > [quoted mail] > > > > Using variation selectors is only appropriate for these existing > > (preencoded) superscrip

Re: UCA unnecessary collation weight 0000

2018-11-02 Thread Philippe Verdy via Unicode
icu-project.org/icu-bin/collation.html and turn on "raw >>> collation elements" and "sort keys" to see the transformed collation >>> elements (from the DUCET + CLDR) and the resulting sort keys. >>> >>> a =>[29,05,_05] => 29 , 05 , 05

Re: A sign/abbreviation for "magister"

2018-11-02 Thread Philippe Verdy via Unicode
Le ven. 2 nov. 2018 à 16:20, Marcel Schneider via Unicode < unicode@unicode.org> a écrit : > That seems to me a regression, after the front has moved in favor of > recognizing Latin script needs preformatted superscript. The use case is > clear, as we have ª, º, and n° with degree sign, and so on

Re: UCA unnecessary collation weight 0000

2018-11-02 Thread Philippe Verdy via Unicode
urn on "raw > collation elements" and "sort keys" to see the transformed collation > elements (from the DUCET + CLDR) and the resulting sort keys. > > a =>[29,05,_05] => 29 , 05 , 05 . > a\u0300 => [29,05,_05][,8A,_05] => 29 , 45 8A , 06 . > à => >

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
As well the step 2 of the algorithm speaks about a single "array" of collation elements. Actually it's best to create one separate array per level, and append weights for each level in the relevant array for that level. The steps S2.2 to S2.4 can do this, including for derived collation elements in

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
So it should be clear in the UCA algorithm and in the DUCET datatable that "" is NOT a valid weight It is just a notational placeholder used as ".", only indicating in the DUCET format that there's NO weight assigned at the indicated level, because the collation element is ALWAYS ignorable

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
In summary, this step given in the algorithm is completely unneeded and can be dropped completely: *S3.2 *If L is not 1, append a *level separator* *Note:*The level separator is zero (), which is guaranteed to be lower than any weight in the resulting s

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
The is there in the UCA only because the DUCET is published in a format that uses it, but here also this format is useless: you never need any [.], or [..] in the DUCET table as well. Instead the DUCET just needs to indicate what is the minimum weight assigned for every level (exce

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
Le jeu. 1 nov. 2018 à 21:31, Philippe Verdy a écrit : > so you can use these two last functions to write the first one: > > bool isIgnorable(int level, string element) { > return getLevel(getWeightAt(element, 0)) > getMinWeight(level); > } > correction: return getWeightAt(element, 0)

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
Le jeu. 1 nov. 2018 à 21:08, Markus Scherer a écrit : > When you want fast string comparison, the zero weights are useful for >> processing -- and you don't actually assemble a sort key. >> > And no, I absolutely no case where any weight is useful during processing, it does not distinguish a

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
I'm not speaking just about how collation keys will finally be stored (as uint16 or bytes, or sequences of bits with variable length); I'm just refering to the sequence of weights you generate. You absolutely NEVER need ANYWHERE in the UCA algorithm any weight, not even during processing, or

Re: UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
For example, Figure 3 in the UTR#10 contains: Figure 3. Comparison of Sort Keys StringSort Key 1 cab *0706* 06D9 06EE ** 0020 0020 *0020* ** *0002* 0002 0002 2 Cab *0706* 06D9 06EE ** 0020 0020 *0020* ** *0008* 0002

UCA unnecessary collation weight 0000

2018-11-01 Thread Philippe Verdy via Unicode
I just remarked that there's absolutely NO utility of the collation weight anywhere in the algorithm. For example in UTR #10, section 3.3.1 gives a collection element : [..0021.0002] for COMBINING GRAVE ACCENT. However it can also be simply: [.0021.0002] for a simple reason: the secon

Re: A sign/abbreviation for "magister"

2018-10-31 Thread Philippe Verdy via Unicode
As is "Mgr" for Monseigneur in French ("Mgr" without superscripts makes little sense, and if "Mr" is sometimes found as an abbreviation for "Monsieur", its standard abbreviation is "M.", and its plural "Messieurs" is noted "MM" without any abbreviation dot or superscript, but normally never as "Mrs

Re: A sign/abbreviation for "magister"

2018-10-29 Thread Philippe Verdy via Unicode
For the case of "Mister" vs. "Magister", the (double) underlining is not just a stylistic option but conveys semantics as an explicit abbreviation mark ! We are here at the line between what is pure visual encoding (e.g. using superscript letters), and logical encoding (as done eveywhere else in un

Re: A sign/abbreviation for "magister"

2018-10-28 Thread Philippe Verdy via Unicode
here was a "missing base character" to apply before a known combining mark or extender) Le dim. 28 oct. 2018 à 18:54, Philippe Verdy a écrit : > Le dim. 28 oct. 2018 à 18:28, Janusz S. Bień a > écrit : > >> On Sun, Oct 28 2018 at 15:19 +0100, Philippe Verdy via Unicode wrote: &g

Re: A sign/abbreviation for "magister"

2018-10-28 Thread Philippe Verdy via Unicode
Le dim. 28 oct. 2018 à 18:28, Janusz S. Bień a écrit : > On Sun, Oct 28 2018 at 15:19 +0100, Philippe Verdy via Unicode wrote: > > Given the "squiggle" below letters are actually gien distinctive > > semantics, I think it should be encoded a combining character (to b

Re: A sign/abbreviation for "magister"

2018-10-28 Thread Philippe Verdy via Unicode
Given the "squiggle" below letters are actually gien distinctive semantics, I think it should be encoded a combining character (to be written not after a "superscript" but after any normal base letter, possibly with other combining characters, or CGJ if needed because of the compatibility equivalen

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
04:21, Garth Wallace via Unicode a écrit : > I learned that one as a kid, as the "pigpen cipher". I'm not aware of any > numerological significance (which is easy enough to "find" in anything). > > On Sat, Oct 27, 2018 at 7:43 PM Philippe Verdy via Unicode <

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
So in summary this Masonic "alphabet" uses 13 square "letters" and a single combining mark (the central dot), possibly extended with the minus and plus signs and space. It's possible that the central dot is used as a spacing mark to note a punctuation. The assignment of Latin (or Hebrew) letters to

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
I must add that the Masonic 3x3 grid alphabet has been proposed as an alternative to Braille, easier to learn and memoize, easier and faster to draw with a pen on paper without any physical guide, and easier also to recognize using only tactile contact by a finger tip, but more difficult to form wi

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
More interesting: the Masonic alphabet http://tallermasonico.com/0diccio1.htm - 18 letters of the Latin alphabet (or Hebrew), from A to T (excluding J and K), are disposed by group of 2 letters in a 3x3 square grid, whose global outer sides are not marked on the outer border of the grid but on lin

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
Do you speak about this one? https://www.magisterdaire.com/magister-symbol-black-sq/ It looks like a graphic personal signature for the author of this esoteric book, even if it looks like an interesting composition of several of our existing Unicode symbols, glued together in a vertical ligature, r

Re: A sign/abbreviation for "magister"

2018-10-27 Thread Philippe Verdy via Unicode
Le sam. 27 oct. 2018 à 15:06, Asmus Freytag via Unicode a écrit : > First question is: how do you interpret the symbol? For me it is > definitely the capital M followed by the superscript "r" (written in an > old style no longer used in Poland), but there is something below the > superscript. It

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
laiming these are practices > despite the standards? If so, are these just tolerated by parsers, or are > they actually generated by encoders? > > > > What would be the rationale for supporting unnecessary whitespace? If > linebreaks are forced at some line length they can presum

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
ary whitespace? If >>> linebreaks are forced at some line length they can presumably be removed at >>> that length and not treated as part of the encoding. >>> >>> Maybe we differ on define where the encoding begins and ends, and where >>> higher level protocols prescribe how they

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
linebreaks are forced at some line length they can presumably be removed at >> that length and not treated as part of the encoding. >> >> Maybe we differ on define where the encoding begins and ends, and where >> higher level protocols prescribe how they are embedded within the

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
n the protocol. > > > > Tex > > > > > > > > > > *From:* Unicode [mailto:unicode-boun...@unicode.org] *On Behalf Of *Philippe > Verdy via Unicode > *Sent:* Sunday, October 14, 2018 1:41 AM > *To:* Adam Borowski > *Cc:* unicode Unicode Discussion >

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
eated as part of the encoding. > > Maybe we differ on define where the encoding begins and ends, and where > higher level protocols prescribe how they are embedded within the protocol. > > > > Tex > > > > > > > > > > *From:* Unicode [mailto:unicode

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-14 Thread Philippe Verdy via Unicode
Le dim. 14 oct. 2018 à 21:21, Doug Ewell via Unicode a écrit : > Steffen Nurpmeso wrote: > > > Base64 is defined in RFC 2045 (Multipurpose Internet Mail Extensions > > (MIME) Part One: Format of Internet Message Bodies). > > Base64 is defined in RFC 4648, "The Base16, Base32, and Base64 Data > En

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-14 Thread Philippe Verdy via Unicode
It's also interesting to look at https://tools.ietf.org/html/rfc3501 - which defines (for IMAP v4) another "BASE64" encoding, - and also defines a "Modified UTF-7" encoding using it, deviating from Unicode's definition of UTF-7, - and adding other requirements (which forbids alternate encodings per

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-14 Thread Philippe Verdy via Unicode
s. Le dim. 14 oct. 2018 à 03:47, Adam Borowski via Unicode a écrit : > On Sun, Oct 14, 2018 at 01:37:35AM +0200, Philippe Verdy via Unicode wrote: > > Le sam. 13 oct. 2018 à 18:58, Steffen Nurpmeso via Unicode < > > unicode@unicode.org> a écrit : > > > The only

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-13 Thread Philippe Verdy via Unicode
Le sam. 13 oct. 2018 à 18:58, Steffen Nurpmeso via Unicode < unicode@unicode.org> a écrit : > Philippe Verdy via Unicode wrote in w9+jearw4ghyk...@mail.gmail.com>: > |You forget that Base64 (as used in MIME) does not follow these rules \ > |as it allows multiple different

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-13 Thread Philippe Verdy via Unicode
In summary, two disating implementations are allowed to return different values t and t' of Base64_Encode(d) from the same message d, but both Base64_Decode(t') and Base64_Decode(t) will be equal and will MUST return d exactly. There's an allowed choice of implementation for Base64_Encode() but B

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-13 Thread Philippe Verdy via Unicode
You forget that Base64 (as used in MIME) does not follow these rules as it allows multiple different encodings for the same source binary. MIME actually splits a binary object into multiple fragments at random positions, and then encodes these fragments separately. Also MIME uses an extension of Ba

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-12 Thread Philippe Verdy via Unicode
I also think the reverse is also true ! Decoding a Base64 entity does not warranty it will return valid text in any known encoding. So Unicode normalization of the output cannot apply. Even if it represents text, nothing indicates that the result will be encoded with some Unicode encoding form (u

Re: Dealing with Georgian capitalization in programming languages

2018-10-02 Thread Philippe Verdy via Unicode
I see no easy way to convert ALL UPPERCASE text with consistant casing as there's no rule, except by using dictionnary lookups. In reality data should be input using default casing (as in dictionnary entries), independantly of their position in sentences, paragraphs or titles, and the contextual co

  1   2   3   >