Re: symbols/codepoints for necessity and possibility in modal logic

2013-07-19 Thread Asmus Freytag
Unicode cannot be the arbiter of mathematical (or other) notation, but, within limits, you could ask for some annotations if this would help ensure that there's some uniformity in how people pick symbols for certain purposes. Why not contact the relevant publishers and find out what they are

Re: Ways to show Unicode contents on Windows?

2013-07-29 Thread Asmus Freytag
On 7/29/2013 4:25 PM, Ilya Zakharevich wrote: On Wed, Jul 10, 2013 at 04:24:36AM +, Murray Sargent wrote: Ilya asked, Are there any other ways to show Unicode on Windows? You can download Unibook (http://www.unicode.org/unibook/) and set up your fonts for the ranges. That's the way The

Re: Unicode code page and ?.net

2013-07-30 Thread Asmus Freytag
On 7/30/2013 11:39 AM, Buck Golemon wrote: I shudder to imagine the circumstances that forced you to learn this information. I shudder to imagine the state of mind that prompted you to make this valuable contribution. A./

Re: _Unicode_code_page_and_?.net

2013-07-30 Thread Asmus Freytag
On 7/30/2013 12:26 PM, Doug Ewell wrote: Buck Golemon buck at yelp dot com replied to Richard Wordingham richard dot wordingham at ntlworld dot com: There are no Unicode code pages. Just to be pedantic, there are several on Windows. They encode the coding form (Unicode codes being best

Re: _Unicode_code_page_and_?.net

2013-07-30 Thread Asmus Freytag
On 7/30/2013 2:15 PM, Doug Ewell wrote: Asmus Freytag asmusf at ix dot netcom dot com wrote: A code page is not, in general, the same as an encoding scheme. What is, then, the proper definition of a code page? I might not be able to do better than Potter Stewart here. I think of a code page

Re: What to backup after corruption of code units?

2013-08-28 Thread Asmus Freytag
On 8/27/2013 9:34 PM, Stephan Stiller wrote: All good replies It means the program needs to go back (a.k.a. back up) but I'd say backtracking would make for better wording in TUS. I tend to disagree, because back up seems to me the one expression that people dealing in code point conversion

Re: Can a single text document use multiple character encodings?

2013-08-28 Thread Asmus Freytag
On 8/28/2013 4:15 PM, Stephan Stiller wrote: To appease the nit pickers: I totally didn't know there's nitpickers on this list, like, those that reply to and pick on each other. Interesting! It's called life-long learning. A./

Re: What to backup after corruption of code units?

2013-08-28 Thread Asmus Freytag
On 8/28/2013 3:29 PM, Xue Fuqiao wrote: I see. Thanks for all your replies! BTW I have a further question: On Wed, Aug 28, 2013 at 1:44 PM, Philippe Verdy verd...@wanadoo.fr wrote: - in UTF-8, you'll need to look backward between 1 to 3 positions before your start position to find the

Re: Can a single text document use multiple character encodings?

2013-08-28 Thread Asmus Freytag
On 8/28/2013 1:00 PM, Stephan Stiller wrote: For Web formats (HTML, etc.), the answer is no. The obvious follow-up to the list: It'd be interesting to know where the answer is yes. People will occasionally mention ISO/IEC 2022, which can be thought of as a meta-encoding or encoding template or

Re: What to backup after corruption of code units?

2013-08-28 Thread Asmus Freytag
On 8/28/2013 5:19 PM, Doug Ewell wrote: Actually 0xC2, according to the rules of UTF-8. Hmm. What you are referring to is that 0xC0 and 0xC1 don't occur because of the requirement for minimal length encoding. However, a check for =0xC0 will give the correct result for backing up, assuming

Re: What to backup after corruption of code units?

2013-08-28 Thread Asmus Freytag
On 8/28/2013 6:25 PM, Karl Williamson wrote: On 08/28/2013 06:52 PM, Asmus Freytag wrote: On 8/28/2013 5:19 PM, Doug Ewell wrote: Actually 0xC2, according to the rules of UTF-8. Hmm. What you are referring to is that 0xC0 and 0xC1 don't occur because of the requirement for minimal length

Re: Can a single text document use multiple character encodings?

2013-08-28 Thread Asmus Freytag
On 8/28/2013 6:31 PM, Doug Ewell wrote: He didn't ask if such a practice was common, or confusing, or a good idea, though perhaps those were underlying questions. The answer may well have depended on the underlying question. But until he comes back with an elaboration, the discussion might

Re: ASCII control codes in sequences of multibyte character sets

2013-09-02 Thread Asmus Freytag
On 9/2/2013 5:08 PM, Doug Ewell wrote: I asked because, as Philippe said, an octet is the same as an 8-bit byte. Yes, that's the standard definition of octet, er 8-bit byte. Never having encountered a non-8-bit byte anywhere in the wild, I've always ceded the field of octets to nitpickers.

Re: ASCII control codes in sequences of multibyte character sets

2013-09-02 Thread Asmus Freytag
On 9/2/2013 6:47 PM, Doug Ewell wrote: In any case, there is nothing about multi-octet versus multi-byte that makes one fixed-length and the other variable-length. Yep. A./

Re: Why blackletter letters?

2013-09-10 Thread Asmus Freytag
Good question, Jean-François. I seem to recall that typographers may make a distinction between black-letter and fraktur forms, but even if they, the differences are typographical, not essential. For the purpose of *character* encoding, one would need to make a very strong rationale for

Re: Why blackletter letters?

2013-09-10 Thread Asmus Freytag
On 9/10/2013 11:05 AM, Michael Everson wrote: On 10 Sep 2013, at 18:01, Asmus Freytag asm...@ix.netcom.com wrote: This rationale is absent in document WG2 N3907 that requests these characters. Therefore, it seems these two additions should not have been made. I disagree. The mathematical

Re: Why blackletter letters?

2013-09-11 Thread Asmus Freytag
On 9/10/2013 12:09 PM, Michael Everson wrote: On 10 Sep 2013, at 20:04, Asmus Freytag asm...@ix.netcom.com wrote: The proper thing would be to deprecate these accidental duplications forthwith. Nonsense. And blackletter isn't identical to Fraktur. It is not different enough to base

Re: Why blackletter letters?

2013-09-11 Thread Asmus Freytag
On 9/11/2013 1:13 PM, Michael Everson wrote: Nonsense. And blackletter isn't identical to Fraktur. It is not different enough to base a character encoding distinction on it. Why don't we code times and garamond shapes then as characters as well. The Mathematical Alphanumeric Symbols block

Re: Why blackletter letters?

2013-09-12 Thread Asmus Freytag
On 9/11/2013 9:50 PM, Charlie Ruland ☘ wrote: One final remark: Thinking about it I have the impression that the blackletter vs. antiqua distinction once made in German very much resembles that made between Hiragana and Katakana in Japanese. In both cases the underlying systems of the

Re: Why blackletter letters?

2013-09-12 Thread Asmus Freytag
On 9/12/2013 1:36 AM, Gerrit Ansmann wrote: On Thu, 12 Sep 2013 06:50:23 +0200, Charlie Ruland ☘ rul...@luckymail.com wrote: One final remark: Thinking about it I have the impression that the blackletter vs. antiqua distinction once made in German very much resembles that made between

Re: Origin of Ellipsis (was: RE: Empty set)

2013-09-13 Thread Asmus Freytag
On 9/13/2013 10:54 AM, Whistler, Ken wrote: Stephan Stiller noted: Maybe ... and the origin of the single-glyph ellipsis remains a mystery to me. As Philippe surmised, it is a compatibility character, originally included in the Unicode 1.0 repertoire for cross-mapping to existing legacy

Re: Origin of Ellipsis

2013-09-14 Thread Asmus Freytag
On 9/14/2013 6:24 AM, Michael Everson wrote: On 14 Sep 2013, at 14:16, Stephan Stiller stephan.stil...@gmail.com wrote: Books never used it. The tradition in typing was developed to assist typesetters to navigate the typewritten text they were setting. The typesetters never put two spaces

Re: Origin of Ellipsis and double spacing after a sentence.

2013-09-14 Thread Asmus Freytag
On 9/14/2013 12:19 PM, Michael Everson wrote: And as a book designer and publisher, I think that having large spaces after a full stop is both unnecessary and vulgar. Quote from the blog: While the modern convention is the single space, it is no less arbitrary than any other, and if

Re: Code point vs. scalar value (was: RE: Origin of Ellipsis (was: RE: Empty set))

2013-09-16 Thread Asmus Freytag
On 9/16/2013 1:41 PM, Doug Ewell wrote: This has nothing to do with UTF-Anything or Normalization Form Anything. But all with keeping the discussion alive for any reason, however insignificant :) A./

Re: Code point vs. scalar value (was: RE: Origin of Ellipsis (was: RE: Empty set))

2013-09-16 Thread Asmus Freytag
On 9/16/2013 2:18 PM, Doug Ewell wrote: Asmus Freytag asmusf at ix dot netcom dot com wrote: On 9/16/2013 1:41 PM, Doug Ewell wrote: This has nothing to do with UTF-Anything or Normalization Form Anything. But all with keeping the discussion alive for any reason, however insignificant :) I

Re: Henry Luce Foundation Grant to Unicode in Support of Encoding Tangut

2013-09-16 Thread Asmus Freytag
On 9/16/2013 2:05 PM, Steffen Daode Nurpmeso wrote: But this may be a sign that the Unicode Consortium is about to have its own status changed to become a non-profit charity foundation dedicated to wordlwide promotion of education and culture. Thanks. But this should be clear, and some

Re: Henry Luce Foundation Grant to Unicode in Support of Encoding Tangut

2013-09-16 Thread Asmus Freytag
On 9/16/2013 3:01 PM, Philippe Verdy wrote: Please stop, I've enough replies about the Unicode Consortium status. But my questions about consequences of **dedicated** grants remain as it affects how you'll organize works and manage it, within a limited timeframe. We've not seen this discussed

Re: Henry Luce Foundation Grant to Unicode in Support of Encoding Tangut

2013-09-17 Thread Asmus Freytag
I seems this post is a bit inappropriate for this forum in its content and given its rather bizarre immaturity of interaction with other member, seems altogether more fitting for a kindergarten playground in . It would be nice if such posts could be kept off this list. A./ On 9/17/2013 8:15

Re: Code point vs. scalar value

2013-09-17 Thread Asmus Freytag
On 9/17/2013 2:55 PM, Stephan Stiller wrote: [AF:] It is the wording in your posts that adds to the confusion. My fundamental point is, has been, and continues to be that whenever people use the more general word code point instead of the more appropriate scalar value, that will add to the

Re: Code point vs. scalar value

2013-09-18 Thread Asmus Freytag
On 9/17/2013 8:40 PM, Philippe Verdy wrote: In what way does UTF-16 use surrogate code /points/? An encoding form is a mapping. Let's look at this mapping: * One _inputs_ scalar values (not surrogate code points). In fact the input is one code point. Then only if that code

Re: Code point vs. scalar value

2013-09-18 Thread Asmus Freytag
On 9/18/2013 2:42 AM, Philippe Verdy wrote: There are scalar values used in so many other unrelated domains (notably in mathematics, where a scalar value is an identifiable object that remains constant in relation with some operations and independant of its context, unlike functions,

Re: Code point vs. scalar value

2013-09-18 Thread Asmus Freytag
On 9/18/2013 3:14 PM, Philippe Verdy wrote: I would propose exactly the opposite of what you want: avoid using scalar value alone. But only speak about 'Unicode scalar value character property. If it is a property, it would be a code point property... Still, I support your general point.

Re: COMBINING OVER MARK?

2013-10-01 Thread Asmus Freytag
A superscript glyph would in my view normally be larger than a glyph for a combining superscript character. The reason is that the former just has to appear raised and smaller, while the latter has to fit somehow in the space above x-height. The

Re: Dotted Circle plus Combining Mark as Text

2013-10-20 Thread Asmus Freytag
On 10/20/2013 1:47 AM, Jukka K. Korpela wrote: 2013-10-20 2:38, Richard Wordingham wrote: Is a sequence of a U+25CC DOTTED CIRCLE plus a combining mark plain text? Well, is h1helloh1 plain text? The answer is that any string of characters may be considered as plain text and any string of

Re: Dotted Circle plus Combining Mark as Text

2013-10-20 Thread Asmus Freytag
On 10/20/2013 3:45 PM, Philippe Verdy wrote: 2013/10/20 Asmus Freytag asm...@ix.netcom.com mailto:asm...@ix.netcom.com Incidentally, the dotted circle shown in the Unicode Code charts is *not* 25CC, and if I were to implement a show dotted circle feature in a program I would

Re: ¥ instead of \

2013-10-22 Thread Asmus Freytag
On 10/22/2013 11:38 AM, Jean-François Colson wrote: Hello. I know that in some Japanese encodings (JIS, EUC), \ was replaced by a ¥. On my computer, there are some Japanese fonts where the characters seems coded following Unicode, except for the \ which remained a ¥. Is that acceptable from

Re: Engmagate?

2013-12-12 Thread Asmus Freytag
On 12/12/2013 2:25 PM, Leo Broukhis wrote: Hmmm... As a person with Russian as the first language I can assure you that from any literate Russian-speaking person's perspective italic ū is an unacceptable and *WRONG* representation of п (because in Russian, unlike Serbian, there is й). Should

Re: Engmagate?

2013-12-12 Thread Asmus Freytag
the circle. That's exactly right. Leo On Thu, Dec 12, 2013 at 2:52 PM, Asmus Freytag asm...@ix.netcom.com mailto:asm...@ix.netcom.com wrote: On 12/12/2013 2:25 PM, Leo Broukhis wrote: Hmmm... As a person with Russian as the first language I can assure you that from any literate

Re: Engmagate?

2013-12-12 Thread Asmus Freytag
On 12/12/2013 6:38 PM, Leo Broukhis wrote: Italic is not plain text. Is this the only thing that would have stopped you from advocating disunification? Yeah. To heck with the end user and their pathetic preferences. Is a preference to have traditional and simplified CJK characters

Re: The Ruble sign has been approved

2013-12-12 Thread Asmus Freytag
[mailto:unicode-bou...@unicode.org] *Puolesta *Marc Blanchet *Lähetetty:* 13. joulukuuta 2013 00:00 *Vastaanottaja:* Asmus Freytag *Kopio:* verd...@wanadoo.fr; William_J_G Overington; Michael Everson; unicode Unicode Discussion *Aihe:* Re: The Ruble sign has been approved Le 2013-12-12 à 13:42, Asmus

Re: Commercial minus as italic variant of division sign in German and Scandinavian context

2014-01-15 Thread Asmus Freytag
I find it unhelpful to consider 2052 as the italic variant of 00F7, and further find the evidence for that not all that germane. Both are variants of the - sign, and so ipso facto are variants of each other. However, to identify something as italic to me would require that one form is used in

Re: Commercial minus as italic variant of division sign in German and Scandinavian context

2014-01-16 Thread Asmus Freytag
Halvard Silli Asmus Freytag, Wed, 15 Jan 2014 23:17:46 -0800: I find it unhelpful to consider 2052 as the italic variant of 00F7, and further find the evidence for that not all that germane. Both are variants of the - sign, and so ipso facto are variants of each other. However, to identify

Re: proposal for new character 'soft/preferred line break'

2014-02-05 Thread Asmus Freytag
I agree, the use of nobreak markup is more appropriate to the problem. This is not a plain text issue and it even fails the smell test for issue that is more elegantly solved by format characters than markup. A./ On 2/5/2014 2:27 PM, Jukka K. Korpela wrote: 2014-02-05 23:44, Rhavin Grobert

Re: ?MP = Multi*lingual* plane?

2014-02-27 Thread Asmus Freytag
On 2/27/2014 2:32 AM, Shriramana Sharma wrote: Given that Unicode encodes scripts and not languages, how appropriate is it to call the BMP and the SMP as the multi*lingual* planes? Isn't it lovely how these things work? A./ ___ Unicode mailing list

Re: Romanized Singhala got great reception in Sri Lanka

2014-03-16 Thread Asmus Freytag
On 3/16/2014 9:05 AM, William_J_G Overington wrote: So, everyone, can the Romanized Singhala system be used with a QWERTY keyboard to produce Unicode-encoded text, thereby producing a good combined system? Could this be achieved if a text-processing software package were produced that could

Re: Dead and Compose keys (was: Re: Romanized Singhala got great reception in Sri Lanka)

2014-03-18 Thread Asmus Freytag
On 3/18/2014 1:57 PM, Tom Gewecke wrote: On Mar 18, 2014, at 1:48 PM, Marc Durdin wrote: Can anyone who is more knowledgeable in Unicode Sinhala tell me which is the correct rendering? See graphic below. image002.png The OS X version is the most correct according my limited knowledge of

Re: Editing Sinhala and Similar Scripts

2014-03-20 Thread Asmus Freytag
On 3/19/2014 9:17 PM, J. Leslie Turriff wrote: Perhaps it might be useful to be able to distinguish between an editing mode and a composition mode: editing mode would be active when a document is first loaded into the editor, when the editor has no keystroke history to consult, and in

Re: New symbol to denote true open access (e.g. to scholarly literature), analogous to the copyright symbol

2014-03-21 Thread Asmus Freytag
On 3/21/2014 8:22 AM, Jan Velterop wrote: But are the chances nil? Essentially you are trying to create a symbol for this material is placed in the public domain. If you get that symbol adopted by similar authorities as those that created ©, then you would see it encoded in due time. If

Re: Does regular Unicode have a character that looks like a space to a human yet is not treated as a space by software please?

2014-03-29 Thread Asmus Freytag
On managing some types of spacing between elements in running text: On 3/27/2014 8:04 AM, Jukka K. Korpela wrote: 2014-03-27 15:10, Kalvesmaki, Joel wrote: William, try the U+2000..U+200A glyphs under General Punctuation--I think that's what you're looking for to manage precise widths of

Re: Bidi reordering of soft hyphen

2014-04-01 Thread Asmus Freytag
I think this calls for an implementation note on UAX#9 along these lines. - During line breaking, if a line is broken at the location of a SHY, the text around the line break may change. A common case is the replacement of the invisible SHY by a visible HYPHEN, but see

Re: Bidi reordering of soft hyphen

2014-04-01 Thread Asmus Freytag
On 4/1/2014 4:12 PM, Jonathan Rosenne wrote: The use of soft hyphen is a cultural matter. In Hebrew, Classic and Israeli, soft hyphens are not used. More to the point, how does software render a soft hyphen included in inserted LTR text, when the outer text is Hebrew? Would it always be

Re: Bidi reordering of soft hyphen

2014-04-02 Thread Asmus Freytag
On 4/2/2014 12:36 AM, Richard Wordingham wrote: On Tue, 1 Apr 2014 23:41:48 + Whistler, Ken ken.whist...@sap.com wrote: Is it legitimate to truncate the context to a single line? The BiDi algorithm is attempting to interpret unlabelled text as embedded text (it's not an arbitrary dance),

Re: FYI: More emoji from Chrome

2014-04-02 Thread Asmus Freytag
On 4/2/2014 1:42 AM, Christopher Fynn wrote: Rather than Emoji it might be better if people learnt Han ideographs which are also compact (and a far more developed system of communication than emoji). One CJK character can also easily replace dozens of Latin characters - which is what is being

Re: FYI: More emoji from Chrome

2014-04-02 Thread Asmus Freytag
On 4/2/2014 4:05 AM, Koji Ishii wrote: On Apr 2, 2014, at 7:19 PM, Asmus Freytag asm...@ix.netcom.com wrote: On 4/2/2014 1:42 AM, Christopher Fynn wrote: Rather than Emoji it might be better if people learnt Han ideographs which are also compact (and a far more developed system

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-20 Thread Asmus Freytag
On 4/20/2014 3:24 AM, Eli Zaretskii wrote: Would someone please help understand the following subtleties and obscure language in the UBA document found at http://www.unicode.org/reports/tr9/? Thanks in advance. Eli, I've tried to give you some explanations - in some places, I concur with

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-21 Thread Asmus Freytag
On 4/20/2014 6:54 PM, James Clark wrote: On Mon, Apr 21, 2014 at 2:58 AM, Asmus Freytag asm...@ix.netcom.com mailto:asm...@ix.netcom.com wrote: On 4/20/2014 3:24 AM, Eli Zaretskii wrote: Would someone please help understand the following subtleties and obscure language in the UBA

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-21 Thread Asmus Freytag
On 4/21/2014 1:33 AM, Eli Zaretskii wrote: Date: Sun, 20 Apr 2014 23:03:20 -0700 From: Asmus Freytag asm...@ix.netcom.com CC: Eli Zaretskii e...@gnu.org, unicode@unicode.org, Kenneth Whistler k...@unicode.org Note that the current embedding level is not changed by this rule

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-21 Thread Asmus Freytag
On 4/21/2014 12:55 AM, Eli Zaretskii wrote: in some places, I concur with you that the wording could be improved and that such improved wording should be proposed to the UTC (or its editorial committee) for incorporation into a future update. How do we do that? You file a problem report using

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-21 Thread Asmus Freytag
algorithms in a conformance test case. 2014-04-21 16:32 GMT+02:00 Asmus Freytag asm...@ix.netcom.com mailto:asm...@ix.netcom.com: On 4/21/2014 1:33 AM, Eli Zaretskii wrote: Date: Sun, 20 Apr 2014 23:03:20 -0700 From: Asmus Freytagasm...@ix.netcom.com mailto:asm...@ix.netcom.com

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-21 Thread Asmus Freytag
to be less dependent on the sample implementation. 2014-04-21 19:48 GMT+02:00 Asmus Freytag asm...@ix.netcom.com mailto:asm...@ix.netcom.com: Philippe, I fail to understand how your post contributes to the topic. The issue was unclear wording of the specification, not deficiencies

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-21 Thread Asmus Freytag
On 4/21/2014 11:14 AM, Doug Ewell wrote: From: Asmus Freytag asmusf at ix dot netcom dot com wrote: In general, I heartily dislike specifications that just narrate a particular implementation... I agree completely. I see this with CLDR as well; there is a more or less implicit assumption

Re: Glyphs designed for the internationalization of the web-based on-line shops of museums and art galleries

2014-04-21 Thread Asmus Freytag
On 4/21/2014 2:47 AM, William_J_G Overington wrote: I am hoping to attach images showing the designs to other posts in this thread. Please find attached an image of the designs of the colourful glyphs. The language I would use for my reaction to this, is just too colorful to reproduce here

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-21 Thread Asmus Freytag
On 4/21/2014 1:54 PM, Philippe Verdy wrote: My intent was not to demonstrate a bug in the algorithm, I have not even claimed that, but to make sure that (less common) usages of paired brackets that do not obey to a pure hierarchy (because these notations use different type of brackets, they

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-21 Thread Asmus Freytag
Ilya, I appreciate your taking the time to take apart Philippe's message. That aspect of it was not obvious to me. A./ PS: more comments below On 4/21/2014 4:41 PM, Ilya Zakharevich wrote: On Mon, Apr 21, 2014 at 02:44:14PM -0700, Asmus Freytag wrote: On 4/21/2014 1:54 PM, Philippe Verdy

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-21 Thread Asmus Freytag
On 4/21/2014 5:44 PM, Whistler, Ken wrote: So one may ask: what will be the result of the CURRENT UNICODE parsing applied to Phillipe’s example? This is an [«] example [»] for demonstration only. That is easily answered. Let's crank up the bidi reference code with a shorter example

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-22 Thread Asmus Freytag
On 4/21/2014 8:32 PM, Ilya Zakharevich wrote: On Mon, Apr 21, 2014 at 06:08:12PM -0700, Asmus Freytag wrote: Here's the text I supplied, with numbers added for discussion. It definitely needs some editing, but the point of the exercise would be to see what: 1. A bracket pair is a pair

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-22 Thread Asmus Freytag
On 4/22/2014 2:19 AM, Ilya Zakharevich wrote: I think the crucial problem is with 1( 2[ 3( 4] 5) 5b] 6) I have two possible interpretations: one matches 2 with 5b, another leaves 2 unmatched. Ilya, if you read UAX#9, the way the algorithm works is by pushing openers on a stack,

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-22 Thread Asmus Freytag
On 4/22/2014 9:02 AM, Eli Zaretskii wrote: an resolve it, so we match 1) and 6). But that's wrong, isn't it? Yes, brain fart. I agree, but let me try to say the same more concisely: A bracket pair is a pair of an opening paired bracket and a closing paired bracket characters

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-22 Thread Asmus Freytag
On 4/22/2014 10:11 AM, Eli Zaretskii wrote: Date: Tue, 22 Apr 2014 09:52:43 -0700 From: Asmus Freytag asm...@ix.netcom.com CC: nospam-ab...@ilyaz.org, verd...@wanadoo.fr, k...@unicode.org, j...@jclark.com, unicode@unicode.org I agree, but let me try to say the same more concisely

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-22 Thread Asmus Freytag
On 4/22/2014 2:17 PM, Ilya Zakharevich wrote: On Tue, Apr 22, 2014 at 07:08:56PM +0300, Eli Zaretskii wrote: Sorry, I do not see any definition here. Just a collection of words which looks like a definition, but only locally… Any definition is just a collection of words, of course. Can you

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-23 Thread Asmus Freytag
On 4/23/2014 12:35 AM, Ilya Zakharevich wrote: On Tue, Apr 22, 2014 at 09:06:27AM -0700, Asmus Freytag wrote: if you read UAX#9, the way the algorithm works is by pushing openers on a stack, then, on finding the first closer, going down the stack and attempting to locate a match

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-23 Thread Asmus Freytag
On 4/23/2014 4:41 PM, Ilya Zakharevich wrote: On Wed, Apr 23, 2014 at 09:21:04AM -0700, Asmus Freytag wrote: a parsing is good if it satisfies all conditions below: 0) Some delimiters in the string are marked as “non-matching”; the rest is broken into disjoint “matched” pairs

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-24 Thread Asmus Freytag
On 4/23/2014 7:37 PM, Philippe Verdy wrote: Thanks for the clear reply, now I know that my example in a prior message would work appropriately with UBA: This is an [«] ARABIC EXAMPLE [»] for demonstration only. Because: - the opening guillemet is not stripped out of the context stack when

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-24 Thread Asmus Freytag
On 4/24/2014 8:20 AM, Eli Zaretskii wrote: So nothing (at least not the reason of the GC which is just an intermediate but incomplete helper) forbids the guillemets to be listed in BidiBrackets.txt. They don't satisfy the conditions for that. From BidiBrackets.txt: Philippe is incorrect once

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-24 Thread Asmus Freytag
On 4/24/2014 7:39 AM, Eli Zaretskii wrote: This is _*incorrect*_, see the text in blue/bold in the definition copied below. The second bullet in item 3 of the second second-level bullet of the third top-level bullet of BD16 clearly says that all elements that are above the matched element are

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-24 Thread Asmus Freytag
On this side show, Philippe finally is correct, because I received his message without ASCII-i-fication; he cc'd me directly, and I never saw the mangled text. It's a bit embarassing for a Unicode mail list to not even be able to let guillemets through unmolested. But this shall not distract

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-05-01 Thread Asmus Freytag
This has seen off-line discussion with the mail manager and we're good. A./ On 5/1/2014 3:44 PM, Richard Wordingham wrote: On Thu, 24 Apr 2014 17:19:57 -0700 Asmus Freytag asm...@ix.netcom.com wrote: On this side show, Philippe finally is correct, because I received his message without ASCII

Re: Preliminary inquiry: Sigla for James Joyce's Finnegans Wake

2014-05-08 Thread Asmus Freytag
On 5/8/2014 9:09 AM, catherine butler wrote: We're struggling to master the intricacies of proposing new Unicode characters specific to the James Joyce masterpiece Finnegans Wake. http://fwpages.blogspot.com/2014/05/unicode-for-james-joyce-needed.html There are somewhere from two to two-dozen

Re: Preliminary inquiry: Sigla for James Joyce's Finnegans Wake

2014-05-09 Thread Asmus Freytag
On 5/9/2014 10:45 AM, catherine butler wrote: What is needed is an authoritative and complete inventory of these, using *images* from the works and notes to show their shapes (and a few images to document that they are indeed part of running text). I don't have access to the manuscript

Re: Indic Syllabic Categories

2014-05-09 Thread Asmus Freytag
On 5/9/2014 6:32 PM, Shriramana Sharma wrote: Dear Richard, It is true that Vowel_Independent can behave like Consonant characters. Given that consonant letters also have an inherent vowel in these scripts, IMO there is not really much to distinguish *technically*. At least in *Indian* Indic

Re: Corrigendum #9

2014-05-30 Thread Asmus Freytag
On 5/30/2014 11:26 AM, Karl Williamson wrote: I'm having a problem with this http://www.unicode.org/versions/corrigendum9.html You are not alone. Some people now think it means that noncharacters are really no different from private-use characters, and should be treated very similarly if

Re: Corrigendum #9

2014-05-31 Thread Asmus Freytag
On 5/31/2014 4:09 AM, Philippe Verdy wrote: 2014-05-30 20:49 GMT+02:00 Asmus Freytag asm...@ix.netcom.com mailto:asm...@ix.netcom.com: This might have been possible at the time these were added, but now it is probably not feasible. One of the reasons is that block names are exposed

Re: Corrigendum #9

2014-05-31 Thread Asmus Freytag
On 5/31/2014 12:36 PM, Philippe Verdy wrote: May be; but there's real doubt that a regular expression that would need this property would be severely broken if that property was corrected. There are many other properties that are more useful (and mich more used) whose associated set of

Re: Corrigendum #9

2014-06-01 Thread Asmus Freytag
On 5/31/2014 10:06 PM, Philippe Verdy wrote: I've not proposed to move these characters elsewhere (or ro reencode them), why do you think that?. I just challenge your statement that a block cannot be discontinuous, Well, go ahead and challenge that. As implemented in the current nameslist

Re: Corrigendum #9

2014-06-01 Thread Asmus Freytag
On 6/1/2014 9:07 AM, Markus Scherer wrote: On Sun, Jun 1, 2014 at 7:49 AM, Karl Williamson pub...@khwilliamson.com mailto:pub...@khwilliamson.com wrote: Thanks, I had not thought about that. I'm thinking wording something like this is more appropriate Noncharacters may be openly

Re: Corrigendum #9

2014-06-02 Thread Asmus Freytag
On 6/2/2014 9:27 AM, Mark Davis ☕️ wrote: On Mon, Jun 2, 2014 at 6:21 PM, Shawn Steele shawn.ste...@microsoft.com mailto:shawn.ste...@microsoft.com wrote: The “problem” is now that previously these characters were illegal The problem was that we were inconsistent in standard and

Re: Corrigendum #9

2014-06-02 Thread Asmus Freytag
On 6/2/2014 9:08 AM, Mark Davis ☕️ wrote: The problem is where to draw the line. In today's world, what's an app? You may have a cooperating system of apps, where it is perfectly reasonable to interchange sentinel values (for example). The way to draw the line is to insist on there being an

Re: Corrigendum #9

2014-06-02 Thread Asmus Freytag
On 6/2/2014 9:38 AM, Shawn Steele wrote: I agree with Markus; I think the FAQ is pretty clear. (And if not, that's where we should make it clearer.) But the formal wording of the standard should reflect that clarity, right? I don't tend to read the FAQ :) FAQ's are useful, but they are not

Re: Corrigendum #9

2014-06-02 Thread Asmus Freytag
On 6/2/2014 2:53 PM, Markus Scherer wrote: On Mon, Jun 2, 2014 at 1:32 PM, David Starner prosfil...@gmail.com mailto:prosfil...@gmail.com wrote: I would especially discourage any web browser from handling these; they're noncharacters used for unknown purposes that are undisplayable

Re: Corrigendum #9

2014-06-03 Thread Asmus Freytag
On 6/2/2014 3:08 PM, Asmus Freytag wrote: On 6/2/2014 2:53 PM, Markus Scherer wrote: On Mon, Jun 2, 2014 at 1:32 PM, David Starner prosfil...@gmail.com mailto:prosfil...@gmail.com wrote: I would especially discourage any web browser from handling these; they're noncharacters used

Re: Use of Unicode Symbol 26A0

2014-06-03 Thread Asmus Freytag
Michelle, Unicode normally does not document all known usages of symbols. Occasionally, if a symbol is used in ways that might be unexpected from its name, the standard may add an alias or annotation. This is done in particular, when there is a question of whether a given symbol is the

Re: Corrigendum #9

2014-06-03 Thread Asmus Freytag
Nicely put. A./ On 6/3/2014 12:09 AM, Martin J. Dürst wrote: On 2014/06/03 07:08, Asmus Freytag wrote: On 6/2/2014 2:53 PM, Markus Scherer wrote: On Mon, Jun 2, 2014 at 1:32 PM, David Starner prosfil...@gmail.com mailto:prosfil...@gmail.com wrote: I would especially discourage any web

Re: Use of Unicode Symbol 26A0

2014-06-04 Thread Asmus Freytag
On 6/3/2014 10:17 AM, Jukka K. Korpela wrote: On the practical side, it might be in order to warn against usage that relies on some particular interpretation like that. What I mean is that it is OK to use WARNING SIGN as warning about risk of personal injury, but questionable to expect that

Re: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

2014-06-04 Thread Asmus Freytag
On 6/4/2014 11:26 AM, Doug Ewell wrote: Sorry, I left out an important detail. I wrote: 3. U+FEFF at the beginning of a stream (note: not packet or arbitrary cutoff point) I meant U+FEFF as a zero-width no-break space. Obviously it is very common to see U+FEFF as a signature or BOM. My

Re: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

2014-06-04 Thread Asmus Freytag
On 6/4/2014 12:21 PM, Richard Wordingham wrote: On Wed, 04 Jun 2014 11:40:11 -0700 Asmus Freytag asm...@ix.netcom.com wrote: On 6/4/2014 11:26 AM, Doug Ewell wrote: I meant U+FEFF as a zero-width no-break space. Obviously it is very common to see U+FEFF as a signature or BOM. The semantics

Re: Corrigendum #9

2014-06-07 Thread Asmus Freytag
On 6/7/2014 9:19 PM, Karl Williamson wrote: On 06/02/2014 11:00 AM, Shawn Steele wrote: To further my understanding, can someone provide examples of how these are used in actual practice? I can't think of any offhand and the closest I get is like the old escape characters to get a dot matrix

Re: Characters that should be displayed?

2014-06-29 Thread Asmus Freytag
On 6/29/2014 11:44 AM, Koji Ishii wrote: Surrogate code points, private-use characters, and control characters are not given the Default_Ignorable_Code_Point property. To avoid security problems, such characters or code points, when not interpreted and not displayable by normal rendering,

Re: Characters that should be displayed?

2014-07-01 Thread Asmus Freytag
On 6/30/2014 10:55 PM, Koji Ishii wrote: Thanks for the reply. It’s very likely that the page contains images, borders, background, etc., so I can recognize all the text are missing. But neither of text missing nor text garbled suggests me how to fix it. I’d try another browser, then give up

Re: Corrigendum #9

2014-07-02 Thread Asmus Freytag
On 7/2/2014 8:02 AM, Karl Williamson wrote: Corrigendum #9 has changed this so much that people are coming to me and saying that inputs may very well have non-characters, and that the default should be to pass them through. Since we have no published wording for how the TUS will absorb

Re: Corrigendum #9

2014-07-03 Thread Asmus Freytag
On 7/3/2014 11:02 AM, Richard COOK wrote: On Jul 2, 2014, at 8:02 AM, Karl Williamson pub...@khwilliamson.com wrote: Corrigendum #9 has changed this so much that people are coming to me and saying that inputs may very well have non-characters, and that the default should be to pass them

<    3   4   5   6   7   8   9   10   11   12   >