Hello Andrew,
On 2016/10/07 11:11, Andrew Cunningham wrote:
Considering the mess that adhoc fonts create. What is the best way forward?
That's very clear: Use Unicode.
Zwekabin, Mon, Zawgyi, and Zawgyi-Tai and their ilk?
Most governemt translations I am seeing in Australia for Burmese are
On 2016/10/04 19:35, Marcel Schneider wrote:
On Mon, 3 Oct 2016 13:47:09 -0700, Asmus Freytag (c) wrote:
Later, the beta and gamma were encoded for phonetic notation, but not the
alpha.
As a result, you can write basic formulas for select compounds, but not all.
Given that these basic
68
October 23 Gregorian.
Meiji 1 January 1 Lunar (and Keio 4 January 1 Lunar) is 1868 January 25
Gregorian.
My best guess is that the author of Table 22-8 picked up the year value
from spreadsheet showing "1867-12-31" in local time, originally intended to
show merely "1868-01".
pon the day after Emperor
Akihito's succession to the throne on 7 January 1989.
--
Martin J. Dürst
Department of Intelligent Information Technology
Collegue of Science and Engineering
Aoyama Gakuin University
Fuchinobe 5-1-10, Chuo-ku, Sagamihara
252-5258 Japan
Or, if such attempt (evaluate the support level of emoji by checking
some codepoints) is wrong, is there any good method to evaluate the
support level of emoji in given font?
Regards,
mpsuzuki
.
--
Martin J. Dürst
Department of Intelligent Information Technology
Collegue of Science and Engine
On 2016/08/08 08:08, Sean Leonard wrote:
On 8/6/2016 11:30 AM, Doug Ewell wrote:
Additionally, in UTF-8, either LS or PS actually takes more bytes than
CR plus LF, so the "increased text size" argument also discouraged use
of the new controls.
That is true, it takes 3 bytes. However, the
Hello Don,
I agree with Doug that creating a good keyboard layout is a good thing
to do. Among the people on this list, you probably have the best
contacts, and can help create some test layouts and see how people react.
Also, creating fonts that have the necessary coverage but are encoded
On 2016/03/31 06:42, Philippe Verdy wrote:
The use of "ÿ" in Dutch should also be considered as an orthographic fault,
and it should be corrected into "ij" (to solve the capitalization problem),
but there are occurences in Dutch of "ÿ" which is correct (notably in
borrowed French toponyms such
Thanks everybody for the feedback.
On 2016/03/19 04:33, Marcel Schneider wrote:
On Fri, Mar 18, 2016, 08:43:56, Martin J. Dürst wrote:
b) Convert to upper (or lower), which may simplify implementation.
For example, 'Džinsi' (jeans) would become 'DžINSI' with a), 'DŽINSI' (or
'džinsi') with b
On 2016/03/19 04:55, Garth Wallace wrote:
On Fri, Mar 18, 2016 at 11:48 AM, Philippe Verdy wrote:
2016-03-18 19:11 GMT+01:00 Garth Wallace :
Rotation is definitely not salient in standard go kifu like it is in
fairy chess notation. Go variants for more
I'm working on extending the case conversion methods for the programming
language Ruby from the current ASCII only to cover all of Unicode.
Ruby comes with four methods for case conversion. Three of them, upcase,
downcase, and capitalize, are quite clear. But we have hit a question
for the
On 2016/03/10 07:52, Ken Whistler wrote:
I don't know the answer to this. But I suspect that that the source
was from one of the collection of fonts associated with the STIX
project research that led to the collection of mathematical symbols
additions noted in L2/01-067 (superseded by
On 2016/02/09 02:10, James Tauber wrote:
http://jktauber.com/2016/01/28/polytonic-greek-unicode-is-still-not-perfect/
Hello James,
I read your article. I just wanted to point out that in your problem 3,
the two sequences aren't normalized because if the acute accent is
first, that would be
I agree to a certain extent with Julian. There are extremely many
subjects industry surely would like computer science students to learn
in college, and internationalization/Unicode is only one of them.
On the other hand, I think that universities teach about integer and
floating point
Hello Marc,
On 2015/12/10 14:35, Marc Blanchet wrote:
This is an interesting example of a phenomenon that turns up in many
other contexts, too. A similar example is the use of accents on
upper-case letters in French in France where 'officially', upper-case
letters are written without accents.
On 2015/12/10 09:30, Mark E. Shoulson wrote:
I remember when we went through all this the first time around, encoding
ẞ in the first place. People were saying "But the Duden says no!!!" And
someone then pointed out, "Please close your Duden and cast your gaze
upon ITS FRONT COVER, where you
Hello Plug,
I suggest using HTML:
बक ्ष
Regards, Martin.
On 2015/12/09 12:24, Plug Gulp wrote:
Hi,
I am trying to understand if there is a way to use Devanagari
characters (and grapheme clusters) as subscript and/or superscript in
unicode text. It will help if someone could please direct
On 2015/11/28 04:55, Plug Gulp wrote:
The Unicode standard 8.0 states in chapter 23, section titled "Cursive
Connection and Ligatures"(printed page #814, PDF page #850) that:
"The zero width joiner and non-joiner characters are designed for use
in plain text; they should not be used where
Hello Doug,
Thanks for making us aware of this very sad event. Michael did a lot for
Unicode, and fought bravely with his illness. I hope we can all remember
him this week at the Unicode Conference, where he gave so many amazing
talks.
I also hope that somebody somehow will be able to
On 2015/10/24 02:11, Rick McGowan wrote:
William,
All right... This is likely to be my last posting on the subject...
... there has been much objection to my invention in this mailing list
over the years, with no good reason ever stated, ...
If this invention had been made in the research
You can also design your own version of the emoji you want to use. [I'm
not a lawyer, but as far as I understand,] what's protected is the
individual design, not the idea of a "donut" or "frowning face" emoji as
such.
Regards, Martin.
On 2015/10/12 09:51, Shervin Afshar wrote:
Those
On 2015/10/01 13:11, Jonathan Rosenne wrote:
For languages such as Java, passwords should be handled as byte arrays rather
than strings. This may make it difficult to apply normalization.
Well, they should be received from the user interface as strings, then
normalized, then converted to
Some additional concerns:
- Input methods for Chinese, Japanese,... need visual feedback to check
that the correct Han character was selected. That may show (some parts
of) the password to bystanders.
- Length limitations of 8 bytes are few and far between these days, but
they still exist.
On 2015/10/05 04:30, Asmus Freytag (t) wrote:
On 10/4/2015 6:02 AM, Richard Wordingham wrote:
In the absence of a specific tailoring, is the combination of a lone
surrogate and a combining mark a user-perceived character? Does a lone
surrogate constitute a user-perceived character?
In an
Hello Doug,
On 2015/09/22 00:42, Doug Ewell wrote:
I was thinking that something like "non–Basic-Latin Unicode" might be
Is that non-Basic Latin or not Basic-Latin?
useful. It avoids the confusion of referring to ASCII as a range of code
points instead of a separate encoding standard.
Hello Sean,
On 2015/09/20 23:48, Sean Leonard wrote:
What is the most concise term for characters or code points
So we already have two different things we might need a term for.
outside of
the US-ASCII range (U+ - U+007F)? Sometimes I have referred to these
as "extended characters"
Hello Ken,
You write "The bitcoin sign and baht symbol are two unrelated symbols
that have some visual similarity.", but don't really give any supporting
information for that claim.
For example, searching for images of bitcoin and bath symbols shows that
the Bitcoin usually has two vertical
Sorry to be late. Just some background information.
On 2015/04/28 14:57, Makoto Kato wrote:
Although I read JIS X 4051, it doesn't define that half-width katakana
and full-width katakana are differently.
I was on the committee that updated JIS X 4015 (mostly liaison/observer
role). The
Sorry, one correction:
On 2015/08/27 16:39, Martin J. Dürst wrote:
In practice, technical restrictions in early limitations (one byte ==
one (half-width) character cell) led to a typographic distinction. The
fact that half-width Kana used less space was exploited in fixed-pitch
screen design
On 2015/07/29 23:27, Andrew West wrote:
On 29 July 2015 at 14:42, William_J_G Overington
My diet can include soya
There already is, you can write My diet can include soya.
If you are likely to swell up and die if you eat a peanut (for
example), you will not want to trust your life to an
Hello Richard,
On 2015/07/15 16:49, Richard Wordingham wrote:
What mark-up schemes exist to show that a sequence of letters and
combining marks constitutes a single word?
Such mark-up would be useful when using spell checkers. At present, I
use U+2060 WORD JOINER (WJ) to indicate the absence
On 2015/06/22 05:37, Frédéric Grosshans wrote:
I don't know if it's what you're looking for but Google brought me to the
following URL.
https://www.itscj.ipsj.or.jp/itscj_english/iso-ir/ISO-IR.pdf
I managed to download the pdf without problems. I also successfully
downloaded a standard (
On 2015/06/04 17:03, Chris wrote:
I wish Steve Jobs was here to give this lecture.
Well, if Steve Jobs were still around, he could think about whether (and
how many) users really want their private characters, and whether it was
worth the time to have his engineers working on the solution.
On 2015/06/03 07:55, Chris wrote:
As you point out, The UCS will not encode characters without a demonstrated
usage.”. But there are use cases for characters that don’t meet UCS’s criteria for a
world wide standard, but are necessary for more specific use cases, like specialised
regional,
On 2015/05/29 11:37, John wrote:
If I had a large document that reused a particular character thousands of times,
Then it would be either a very boring document (containing almost only
that same character) or it would be a very large document.
would this HTML markup require embedding that
On 2015/02/20 05:17, Eli Zaretskii wrote:
From: Philippe Verdy verd...@wanadoo.fr
Date: Thu, 19 Feb 2015 20:31:07 +0100
Cc: Julian Bradfield jcb+unic...@inf.ed.ac.uk,
unicode Unicode Discussion unicode@unicode.org
The decompositions are not needed for plain text searches, that can use
On 2015/02/19 20:47, Julian Bradfield wrote:
On 2015-02-19, Eli Zaretskii e...@gnu.org wrote:
Does anyone know why does the UCD define compatibility decompositions
for Arabic initial, medial, and final forms, but doesn't do the same
for Hebrew final letters, like U+05DD HEBREW LETTER FINAL MEM?
What's better on this keyboard when compared to the Dvorak layout?
At first sight, it looks heavily right-handed, all the letters that the
Dvorak keyboard has on the homerow are on the right hand.
Regards, Martin.
P.S.: I'm a happy Dvorak user.
On 2015/01/26 06:54, Robert Wheelock wrote:
On 2014/12/24 09:50, Tex Texin wrote:
True, however as William points out, apparently the rules have changed,
I hope the rules get clarified to clearly state that these are exceptions.
so it isn’t unreasonable to ask again whether the rules now allow it, or if
people that dismissed the idea
On 2014/12/18 06:49, Michael Everson wrote:
Clearly the plural of emoji is emojis.
Not in Japanese, where there are no plural forms. The question of what
it is/will be in English will be decided by usage, not by grammar. I'd
use 'emoji', but then I'm too biased towards Japanese to be
On 2014/10/24 10:21, Asmus Freytag wrote:
Peter is correct.
The only fonts that should be released to the public are those that are
Unicode encoded and have the correct shaping tables.
Unlike the public, the code chart editors for Unicode have tools that
can correctly handle not only
On 2014/07/24 15:37, Richard Wordingham wrote:
No. The text samples I could find quickly show scripta continua, but I
suspect the line breaks are occurring at word or syllable boundaries.
If I am right about the constraint on line break position, then this
can be recovered by marking the
On 2014/06/03 07:08, Asmus Freytag wrote:
On 6/2/2014 2:53 PM, Markus Scherer wrote:
On Mon, Jun 2, 2014 at 1:32 PM, David Starner prosfil...@gmail.com
mailto:prosfil...@gmail.com wrote:
I would especially discourage any web browser from handling
these; they're noncharacters used for
On 2014/04/02 20:08, Christopher Fynn wrote:
On 02/04/2014, Asmus Freytag asm...@ix.netcom.com wrote:
On 4/2/2014 1:42 AM, Christopher Fynn wrote:
Rather than Emoji it might be better if people learnt Han ideographs
which are also compact (and a far more developed system of
communication than
On 2014/04/03 02:00, James Lin wrote:
Emoji or 顔文字, literally means Face word or Face Characters, essentially,
Emoji is 絵文字 (picture character), 顔文字 is kaomoji (face character).
Regards, Martin.
provides an emotional state in the context of words. Emoji is very
popular in APJ, and
Now that it's no longer April 1st (at least not here in Japan), I can
add a (moderately) serious comment.
On 2014/04/02 01:43, Ilya Zakharevich wrote:
On Tue, Apr 01, 2014 at 09:01:39AM +0200, Mark Davis ☕️ wrote:
More emoji from Chrome:
J. Dürst due...@it.aoyama.ac.jp
On 2014/03/16 14:36, Philippe Verdy wrote:
You may still want to promote it at some government or education
institution, in order to promote it as a national standard, except that
there's little change it will ever happen when all countries in ISO have
stopoed
I got informed today by your IT Dept. that the mail below never went
out. Resent herewith.Martin.
Original Message
Subject: Re: Romanized Singhala got great reception in Sri Lanka
Date: Mon, 17 Mar 2014 14:37:00 +0900
From: Martin J. Dürst due...@it.aoyama.ac.jp
On 2014
Hello Henry,
Some comments on your specific questions, which may trigger some
additional discussion.
On 2013/12/12 1:43, Henry S. Thompson wrote:
I'm one of the editors of a proposed replacement for RFC3023 [1], the
media type registration for application/xml, text/xml and 3 others.
The
On 2013/10/23 4:22, Asmus Freytag wrote:
On 10/22/2013 11:38 AM, Jean-François Colson wrote:
Hello.
I know that in some Japanese encodings (JIS, EUC), \ was replaced by a ¥.
On my computer, there are some Japanese fonts where the characters
seems coded following Unicode, except for the \
On 2013/10/02 9:52, Leo Broukhis wrote:
Thanks! That comes out exactly right, although using math markup for
linguistic purposes is, IMO, a stretch.
Why? Surely like in other fields (Math to start with), there somewhere
is a boundary between plain text and rich text. Of course it's not
On 2013/07/05 16:04, Denis Jacquerye wrote:
On Thu, Jul 4, 2013 at 12:07 PM, Michael Eversonever...@evertype.com wrote:
The problem is in pretending that a cedilla and a comma below are equivalent
because in some script fonts in France or Turkey routinely write some sort of
On 2013/07/05 17:25, Stephan Stiller wrote:
What I had in mind was more specific: Germans are supposed to convert
[ä,ö,ü,ß] to [ae,oe,ue,ss], though I don't know what's considered
best/legal wrt documents required for entering the US, for example.
I have always used Duerst on plane tickets
On 2013/06/22 0:32, Michael Everson wrote:
On 21 Jun 2013, at 16:20, Khaled Hosnykhaledho...@eglug.org wrote:
Yeah, I don't believe that you can language-tag individual file names for such
display as that is markup.
Why do you need to? You only need one language, it is not like file names
On 2013/04/23 18:01, William_J_G Overington wrote:
On Monday 22 April 2013, Asmus Freytagasm...@ix.netcom.com wrote:
I'm always suspicious if someone wants to discuss scope of the standard before
demonstrating a compelling case on the merits of wide-spread actual use.
The reason that I
On 2013/04/11 16:30, Michael Everson wrote:
On 11 Apr 2013, at 00:09, Shriramana Sharmasamj...@gmail.com wrote:
Or was the Khmer model of an invisible joiner a *later* bright idea?
Yes.
Later, yes. Bright? Most Kambodian experts disagree.
Regards, Martin.
Hello Roger,
The conclusion to your question below is a very clear NO. The reason is
that most text is already in NFC. In fact, as I wrote a few days or
weeks ago, NFC was defined to capture what's usually around on the Web
(and in other places, too). Trying to recommend that everything be in
On 2013/01/22 1:12, Denis Jacquerye wrote:
Does anybody have any idea of how much of the Web is normalized in NFC
or NFD? Or how much not normalized?
I have never measured this. But at one time, there was only NFD (and
NFKD). The Unicode Consortium, with input from W3C, then defined NFC
(and
On 2013/01/08 14:43, Stephan Stiller wrote:
Wouldn't the clean way be to ensure valid strings (only) when they're
built
Of course, the earlier erroneous data gets caught, the better. The
problem is that error checking is expensive, both in lines of code and
in execution time (I think there
On 2013/01/08 3:27, Markus Scherer wrote:
Also, we commonly read code points from 16-bit Unicode strings, and
unpaired surrogates are returned as themselves and treated as such (e.g.,
in collation). That would not be well-formed UTF-16, but it's generally
harmless in text processing.
Things
On 2013/01/06 7:21, Costello, Roger L. wrote:
Does this mean that when exchanging Unicode data across the Internet the
endianness is not relevant?
Are these stated correctly:
When Unicode data is in a file we would say, for example, The file contains
UTF-32BE data.
When Unicode
On 2012/12/21 0:59, Asmus Freytag wrote:
There have been efforts at a Japanese translation of the text of the
standard, I have no idea whether that contains translated names for
characters.
JIS X 0221-1995, which is a translation of ISO 10646, contains some
Japanese character names, but this
I'm looking for a (preferably online) tool that converts Unicode
characters to Unicode character names. Richard Ishida's tools
(http://rishida.net/tools/conversion/) do a lot of conversions, but not
names.
Regards, Martin.
Well, first, it is 17 planes (or have we switched to using hexadecimal
numbers on the Unicode list already?
Second, of course this is in connection with UTF-16. I wasn't involved
when UTF-16 was created, but it must have become clear that 2^16 (^
denotes exponentiation (to the power of))
On 2012/11/17 12:54, Buck Golemon wrote:
On Fri, Nov 16, 2012 at 4:11 PM, Doug Ewelld...@ewellic.org wrote:
Buck Golemon wrote:
Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and
to map it to the equally-non-semantic U+81 ?
U+0081 (there are always at least four
To this, my mother would say: Why keep it simple when we can make it
complicated?.
Regards,Martin.
On 2012/11/27 21:01, Philippe Verdy wrote:
That's a valid computation if the extension was limited to use only
2-surrogate encodings for supplementary planes.
If we could use 3-surrogate
On 2012/11/21 16:23, Peter Krefting wrote:
Doug Ewell d...@ewellic.org:
Somewhat off-topic, I find it amusing that tolerance of poorly
encoded input is considered justification for changing the underlying
standards,
The encoding work at W3C, at least as far as I see it, is not an attempt
to
On 2012/11/17 9:56, Philippe Verdy wrote:
True. HTML5 makes its own reinterpretation of the IETF's MIME standard,
definining it own protocol (which means that it is no longer fully
compatible with MIME and its IANA datatabase, because the mapping of the
value of a charset= pseudo-attribute is
Just in case it helps, Ruby (since version 1.9) also uses 3).
Regards, Martin.
On 2012/11/17 6:48, Buck Golemon wrote:
When decoding bytes to unicode using the latin1 scheme, there are three
options for bytes not defined in the ISO-8859-1 standard.
1) Throw an error.
2) Insert the
On 2012/11/17 9:45, Doug Ewell wrote:
If he is targeting HTML5, then none of this matters, because HTML5 says
that ISO 8859-1 is really Windows-1252.
Yes. But unless Python wants to limit its use to HTML5, this should be
handled on a separate level (mapping a iso-8859-1 label to the
On 2012/11/13 21:49, Eli Zaretskii wrote:
I'd welcome that. Although the reality flies in the face of user
requirements in this case: most bidi-aware editors, including my own
work in Emacs, don't have 2 carets, for some reason. Maybe the
developers didn't consider that important enough, or
On 2012/11/08 19:15, Michael Everson wrote:
On 8 Nov 2012, at 09:59, Simon Montagusmont...@smontagu.org wrote:
Please take into account that the half-stars should be symmetric-swapped in RTL
text. I attach an example from an advertisment for a movie published in Haaretz
2 November 2012
I
Richard - Complex script usually refers to scripts where rendering isn't
just simply putting glyphs side by side. That includes stuff with
combining marks, ligatures, reordering, stacking, and the like.
Regards, Martin.
On 2012/10/03 7:09, Richard Wordingham wrote:
On Tue, 02 Oct 2012
So in order to get something going here, why doesn't Doug draft a letter
to these guys (possibly based on the one from a few years ago) and then
Mark sends it off in his position at Unicode, which hopefully will
impress them more than just a personal contribution.
Being upset in this list
Hello Karl,
On 2012/07/21 0:41, Karl Pentzlin wrote:
Looking for an example of plain text which is obvious to anybody,
it seems to me that the Subject field of e-mails is a good example.
Common e-mail software lets you enter any text but gives you never
access to any higher-level protocol.
On 2012/07/21 7:01, David Starner wrote:
I'm concerned about the statement/implication that one can optimize
for ASCII and Latin-1. It's too easy for a lot of developers to test
speed with the English/European documents they have around and test
correctness only with Chinese. I see the argument
Hello Doug,
On 2012/07/18 0:35, Doug Ewell wrote:
For those who haven't yet had enough of this debate yet, here's a link
to an informative blog (with some informative comments) from Michael
Kaplan:
Every character has a story #4: U+feff (alternate title: UTF-8 is the
BOM, dude!)
On 2012/07/18 16:35, Leif Halvard Silli wrote:
Martin J. Dürst, Wed, 18 Jul 2012 11:00:42 +0900:
The best reason is simply that nobody should be using
crutches as long as they can walk with their own legs.
Crutches, in that sense, is only about authoring convenience. And, of
course
Hello Leif,
I think that more and more, we are on the wrong mailing list.
Regards, Martin.
On 2012/07/18 18:47, Leif Halvard Silli wrote:
Martin J. Dürst, Wed, 18 Jul 2012 17:20:31 +0900:
On 2012/07/18 16:35, Leif Halvard Silli wrote:
Martin J. Dürst, Wed, 18 Jul 2012 11:00:42 +0900
On 2012/07/13 22:31, Jukka K. Korpela wrote:
2012-07-13 16:12, Leif Halvard Silli wrote:
The kind of BOM intolerance I know about in user agents is that some
text browsers and IE5 for Mac (abandoned) convert the BOM into a
(typically empty) line a the start of the body element.
I wonder if
On 2012/07/14 1:33, Philippe Verdy wrote:
Fra: Jukka K. Korpelajkorp...@cs.tut.fi
When the BOM is used in web pages or editors for UTF-8 encoded content it
can sometimes introduce blank spaces or short sequences of strange-looking
characters (such as ). For this reason, it is usually best
On 2012/07/17 17:22, Leif Halvard Silli wrote:
And an argument was put forward in the WHATWG mailinglist
earlier tis year/end of previous year, that a page with strict ASCII
characters inside could still contain character entities/references for
characters outside ASCII.
Of course they can.
Hello Leif,
Sorry to be late with my answer.
On 2012/07/13 20:44, Leif Halvard Silli wrote:
Martin J. Dürst, Fri, 13 Jul 2012 18:17:05 +0900:
On 2012/07/13 0:12, Leif Halvard Silli wrote:
Doug Ewell, Wed, 11 Jul 2012 09:12:46 -0600:
and people who want to create or modify UTF-8 files which
Hello Leif,
On 2012/07/18 4:35, Leif Halvard Silli wrote:
But is the Windows Notepad really to blame?
Pretty much so. There may have been other products from Microsoft that
also did it, but with respect to forcing browsers and XML parsers to
accept an UTF-8 BOM as a signature, Notepad was
Hello Philippe,
On 2012/07/18 3:37, Philippe Verdy wrote:
2012/7/17 Julian Bradfieldjcb+unic...@inf.ed.ac.uk:
On 2012-07-16, Philippe Verdyverd...@wanadoo.fr wrote:
I am also convinced that even Shell interpreters on Linux/Unix should
recognize and accept the leading BOM before the hash/bang
Hello Jukka,
On 2012/07/17 23:31, Jukka K. Korpela wrote:
2012-07-17 17:11, Leif Halvard Silli wrote:
For instance, early on in 'the Web', some
appeared to think that all non-ASCII had to be represented as entities.
Yes indeed. There's still some such stuff around. It's mostly
unnecessary,
On 2012/07/13 0:12, Leif Halvard Silli wrote:
Doug Ewell, Wed, 11 Jul 2012 09:12:46 -0600:
and people who want to create or modify UTF-8 files which will
be consumed by a process that is intolerant of the signature
should not use Notepad. That goes for HTML (pre-5) pages [snip]
HTML5-parsers
On 2012/07/11 4:37, Asmus Freytag wrote:
I recall, with certainty, having seen the : in the context of
elementary instruction in arithmetic,
as in 4 : 2 = ?, but am no longer positive about seeing ÷ in the same
context.
I remember this very well. In grade school, we had to learn two ways to
On 2012/07/11 10:35, Stephan Stiller wrote:
About Martin Dürst's content re geteilt-gemessen:
When I attended the German school system in approx the 1990s this
distinction wasn't mentioned or taught. (I prefer to not give details
about specific time and place for privacy reasons.)
Sorry, but
On 2012/07/11 11:04, Mark E. Shoulson wrote:
Ever start to feel that we would have been better off not to give
official descriptive names at all? Or else really vague ones like
LETTERLIKE THINGY NUMBER 5412? So much blood-pressure raised over the
names...
I'm feeling that way since about the
On 2012/05/30 4:42, Roozbeh Pournader wrote:
Just look what happened when the Japanese did their own font/character set
hack. The backslash/yen problem is still with us, to this day...
To be fair, the Japanese Yen at 0x5C was there long before Unicode, in
the Japanese version of ISO 646.
On 2012/05/29 17:43, Asmus Freytag wrote:
On 5/27/2012 5:52 PM, Michael Everson wrote:
Get over it. Please just get over it. It doesn't matter. It's a blort.
Time to agree with Michael.
Get over it, is good advice here.
Sovereign countries are free to decree currency symbols, whatever their
On 2012/04/29 18:58, Szelp, A. Sz. wrote:
While there are good reasons the authors of HTML5 brought to ignore SCSU or
BOCU-1, having excluded UTF-32 which is the most direct, one-to-one mapping
of Unicode codepoints to byte values seems shortsighted.
Well, except that it's hopelessly
On 2012/04/28 4:26, Mark Davis ☕ wrote:
Actually, if the goal is to get as many characters in as possible, Punycode
might be the best solution. That is the encoding used for internationalized
domains. In that form, it uses a smaller number of bytes per character, but
a parameterization allows
On 2012/04/28 7:29, Cristian Secară wrote:
În data de Fri, 27 Apr 2012 12:26:25 -0700, Mark Davis ☕ a scris:
Actually, if the goal is to get as many characters in as possible,
Punycode might be the best solution. That is the encoding used for
internationalized domains. In that form, it uses a
On 2012/04/27 17:06, Cristian Secară wrote:
It turned out that they (ETSI its groups) created a way to solve the
70 characters limitation, namely “National Language Single Shift” and
“National Language Locking Shift” mechanism. This is described in 3GPP
TS 23.038 standard and it was introduced
On 2011/11/21 5:54, Asmus Freytag wrote:
On 11/20/2011 8:00 AM, Joó Ádám wrote:
Leaving aside that CSS is presentation and not content, and is
definitely not markup. HTML is a better candidate.
Á
The details of the appearance of the mark would be presentation.
The scoping, like for applying
I tried to find something like a normative description of the default
bidi class of unassigned code points.
In UTR #9, it says
(http://www.unicode.org/reports/tr9/tr9-23.html#Bidirectional_Character_Types):
Unassigned characters are given strong types in the algorithm. This is
an explicit
How can one use the Forum to comment on URI/IRI issues when one gets a
message:
Your message contains too many URLs. The maximum number of URLs allowed
is 8.
I never liked this forum stuff too much, and this hasn't made things
better :-(.
Regards, Martin.
I'm hoping to get some advice from people with experience with various
Unicode/transcoding libraries.
RFC 3987 (the current IRI spec) has the following text:
Note: Some older software transcoding to UTF-8 may produce illegal
output for some input, in particular for characters outside
101 - 200 of 228 matches
Mail list logo