I'd like to invite everyone to support this worthwhile project:
http://www.kickstarter.com/projects/1496420787/the-endangered-alphabets-project/
Michael Everson * http://www.evertype.com/
This is about time we allocate a significant space withi the Unicode
code space to work in the old fashion code page provisioning mode.
I'm not calling for any change to existing major aloocations. However, this
is about time we allocate (not PUA) large number of codes to a code page
based sub
On 08/19/2011 04:43 PM, srivas sinnathurai wrote:
All those in favour of creating code pages, please say yes, and others
please say why not.
Sinnathurai, 7000 code pages are not enough. To replace Unicode, you
should create at least 65536 code pages, because Unicode is represented
in UTF-16
In what way is this not what the PUA is all about?
--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell
From: srivas sinnathurai
Sent: Friday, August 19, 2011 5:13
To: Michael Everson
Cc: unicode Unicode Discussion ;
PUA is not structured and not officially programmable to accommodate
numerous code pages.
Take the ISO 8859-1, 2, 3, and so on .
These are now allocating the same code points to many languages and for
other purposes.
Similarly, a structured and official allocations to any many requirements
Hello,
I would like to ask why there are no PUA parts which would be reserved
for RTL scripts (i.e. would have the directionality set to strong RTL).
Thanks!
P.T.
--
Petr Tomasek http://www.etf.cuni.cz/~tomasek
Jabber: but...@jabbim.cz
On 19 Aug 2011, at 14:29, Petr Tomasek wrote:
I would like to ask why there are no PUA parts which would be reserved for
RTL scripts (i.e. would have the directionality set to strong RTL).
Thanks!
P.T.
This is a very good question.
Michael Everson * http://www.evertype.com/
On Fri, Aug 19, 2011 at 02:43:56PM +0100, Michael Everson wrote:
On 19 Aug 2011, at 14:29, Petr Tomasek wrote:
I would like to ask why there are no PUA parts which would be reserved for
RTL scripts (i.e. would have the directionality set to strong RTL).
Thanks!
P.T.
This is a
Petr Tomasek tomasek at etf dot cuni dot cz wrote:
I would like to ask why there are no PUA parts which would be reserved
for RTL scripts (i.e. would have the directionality set to strong
RTL).
The PUA is supposed to be a free and open sandbox, without reserved or
allocated zones. There was
On 08/19/2011 07:13 PM, Michael Everson wrote:
This is a very good question.
It seems Michael speaks tongue-in-cheek.
I personally don't see the point in allocation RTL areas in the PUA. It
is after all the *P*UA. Do you expect rendering engines to support the PUA?
Yeah OK maybe simply
I would like to ask why there are no PUA parts which would be
reserved for RTL scripts (i.e. would have the directionality set to
strong RTL).
This is a very good question.
Probably noone had such idea until now.
Werner
srivas sinnathurai sisrivas at blueyonder dot co dot uk wrote:
PUA is not structured
It's not supposed to be. It's a private-use area. You use it the way
you see fit.
and not officially programmable to accommodate
numerous code pages.
None of Unicode is designed around code-page
On 08/19/2011 09:29 AM, Petr Tomasek wrote:
Hello,
I would like to ask why there are no PUA parts which would be reserved
for RTL scripts (i.e. would have the directionality set to strong RTL).
I have long wondered about this, and I'm pretty sure the discussion has
surfaced here once or twice
On 19 Aug 2011, at 15:13, Doug Ewell wrote:
The PUA is supposed to be a free and open sandbox, without reserved or
allocated zones.
Nevertheless, inherent directionality is something that computers take notice
of. There would be no harm in having a RTL PUA area.
My question would be why
On 08/19/2011 07:43 PM, Doug Ewell wrote:
My question would be why the PUA is designated as 'L' by default at all,
instead of, say, 'ON'.
...
do present the impression that these code points are somehow reserved
for strong-LTR characters, and also for non-reordrant characters (i.e.
no
On 08/19/2011 10:13 AM, Doug Ewell wrote:
So your private agreement, in addition to specifying the meaning of your
PUA characters and probably some sample glyphs, can also specify their
properties, overriding the default properties.
I don't know if you can even do this. My understanding of
On 19 Aug 2011, at 15:24, Shriramana Sharma wrote:
On 08/19/2011 07:13 PM, Michael Everson wrote:
This is a very good question.
It seems Michael speaks tongue-in-cheek.
Not at all. I think there should be a RTL PUA.
I personally don't see the point in allocation RTL areas in the PUA. It
From: Michael Everson everson_at_evertype.com
On 19 Aug 2011, at 14:29, Petr Tomasek wrote:
I would like to ask why there are no PUA parts which would be reserved for
RTL scripts (i.e. would have the directionality set to strong RTL).
Thanks!
P.T.
This is a very good
On 08/19/2011 08:14 PM, William_J_G Overington wrote:
I am wondering if the following idea would be of any usefulness
towards solving the problem without needing any code point
allocations in Unicode.
Pardon me for not understanding if I entirely missed your point, but why
can't these
On 08/19/2011 10:24 AM, Shriramana Sharma wrote:
I also wonder what the following below
http://unicode.org/reports/tr9/#Bidirectional_Character_Types means:
Private-use characters can be assigned different values by a
conformant implementation.
Best I can guess is You can write your own
On 19 Aug 2011, at 15:34, Shriramana Sharma wrote:
On 08/19/2011 07:43 PM, Doug Ewell wrote:
My question would be why the PUA is designated as 'L' by default at all,
instead of, say, 'ON'.
...
do present the impression that these code points are somehow reserved
for strong-LTR characters,
On 08/19/2011 08:11 PM, vanis...@boil.afraid.org wrote:
why there weren't private use Variation Selectors.
Because you are already free to use PUA codepoints as VSs?
--
Shriramana Sharma
Michael Everson everson at evertype dot com wrote:
So your private agreement, in addition to specifying the meaning of
your PUA characters and probably some sample glyphs, can also specify
their properties, overriding the default properties.
Gods know I wouldn't have any idea how to get
Mark E. Shoulson mark at kli dot org wrote:
So your private agreement, in addition to specifying the meaning of
your PUA characters and probably some sample glyphs, can also specify
their properties, overriding the default properties.
I don't know if you can even do this. My understanding
On 19 Aug 2011, at 15:51, Shriramana Sharma wrote:
On 08/19/2011 08:11 PM, vanis...@boil.afraid.org wrote:
why there weren't private use Variation Selectors.
Because you are already free to use PUA codepoints as VSs?
Because the existing VSs are sufficient?
Michael Everson *
On 08/19/2011 08:34 PM, Mark E. Shoulson wrote:
But they work Just Great for LTR scripts in the PUA, but not for RTL
scripts. Isn't that kind of bias counter to the whole point of the PUA
and Unicode in general? And it isn't only due to implementors, either:
Unicode specifies LTR
On 08/19/2011 08:36 PM, Michael Everson wrote:
On 19 Aug 2011, at 15:51, Shriramana Sharma wrote:
On 08/19/2011 08:11 PM, vanis...@boil.afraid.org wrote:
why there weren't private use Variation Selectors.
Because you are already free to use PUA codepoints as VSs?
Because the existing VSs
On 19 Aug 2011, at 16:03, Shriramana Sharma wrote:
There is plenty of space. There would be no difficulty in assigning
some rows to a RTL PUA.
It is not a question of availability of space. It is a question of principles.
Sure. People who are happy with LTR directionality have PUA code
On 19 Aug 2011, at 15:57, Doug Ewell wrote:
Most applications don't care about the PUA, or assume it's only for that
particular vendor's custom Latin-script ligatures and dictionary symbols.
And guess what: since my ligatures and dictionary symbols are LTR, I have no
problem because the
I am wondering if the following idea would be of any usefulness towards solving
the problem without needing any code point allocations in Unicode.
Suppose that a concept of an Endangered Language Code Page is invented.
Suppose that the letter sequence ELCP is used to designate an endangered
On 19 Aug 2011, at 16:04, Mark E. Shoulson wrote:
I didn't say that applications or rendering engines are able to accept
your overridden properties and apply them, right out of the box, at
least not today.
But they work Just Great for LTR scripts in the PUA, but not for RTL scripts.
I think you want ISO 2022.
In any event, this will never happen in Unicode, because this is the exact
opposite of what Unicode is all about, unless I misunderstand you. Unicode's
goal is for every code unit to have a fixed interpretation. So far as many
people involved in the original
On 19 Aug 2011, at 16:05, Shriramana Sharma wrote:
On 08/19/2011 08:03 PM, Michael Everson wrote:
On 19 Aug 2011, at 15:13, Doug Ewell wrote:
The PUA is supposed to be a free and open sandbox, without reserved
or allocated zones.
Nevertheless, inherent directionality is something that
On 08/19/2011 11:03 AM, Shriramana Sharma wrote:
In effect, changing the existing BC=L to ON is no worse than changing
it to R.
I think making the directionality of the PUA L instead of ON was a
mistake in the first place, yes, but does even the PUA fall under the
commandment Thou shalt
From: Michael Everson everson_at_evertype.com
On 19 Aug 2011, at 15:51, Shriramana Sharma wrote:
On 08/19/2011 08:11 PM, vanisaac_at_boil.afraid.org wrote:
why there weren't private use Variation Selectors.
Because you are already free to use PUA codepoints as VSs?
Because the
Doug,
First of all flat code space is the primary functionality of Unicode and not
calling for any changes to existing encodings.
What I propose is assign about 16,000 codes to code-page switching model.
Why this suggestion?
With current flat space, one code point is only allocated to one and
On 19 Aug 2011, at 16:16, Shriramana Sharma wrote:
Which then again brings us back to Doug's previous point that these should be
(have been) assigned some more neutral BC such as ON.
That train has left the station, though.
Michael Everson * http://www.evertype.com/
On Fri, Aug 19, 2011 at 04:22:19PM +0100, Michael Everson wrote:
On 19 Aug 2011, at 16:04, Mark E. Shoulson wrote:
I didn't say that applications or rendering engines are able to accept
your overridden properties and apply them, right out of the box, at
least not today.
But they work
On 19 Aug 2011, at 16:31, Mark E. Shoulson wrote:
On 08/19/2011 11:03 AM, Shriramana Sharma wrote:
In effect, changing the existing BC=L to ON is no worse than changing it to
R.
I think making the directionality of the PUA L instead of ON was a
mistake in the first place, yes, but
On 08/19/2011 11:21 AM, Michael Everson wrote:
Directionality is a very deep property. A CSUR LTR script works fine out of the box on
all platforms at least as far as directionality goes. A CSUR RTL script simply can't, and
do you really think that defining the properties will effectively
We are keeping the Unicode as it is and asking it to support code
pages within say 25% of the allocations. That is entirely different to
making all Unicode as code page switchable.
We will have plenty of time inour hand to avoid any disasters, as we are not
touching the primary purpose while
Michael Everson everson at evertype dot com wrote:
It's part of the private agreement. I can't personally tell an OS or
application (unless I write it) how to interpret those properties,
but they are out there, and it would be theoretically possible for an
OS or app to accept those
Michael Everson everson at evertype dot com wrote:
The PUA *already* defines its characters as LTR. That's been done. It is
*part* of the definition and functionality of the PUA. It's irrelevant
whether it should be or not. It *is*.
It isn't. It's just a default, though admittedly one
Michael Everson everson at evertype dot com wrote:
Which then again brings us back to Doug's previous point that these
should be (have been) assigned some more neutral BC such as ON.
That train has left the station, though.
I thought this property was mutable, and indeed that even some
On 08/19/2011 09:00 PM, vanis...@boil.afraid.org wrote:
Quote from 16.4: Standardized variation sequences are defined in the file
StandardizedVariants.txt in the Unicode Character Database. Ideographic
variation sequences are defined by the registration process defined in Unicode
Technical
On 19 Aug 2011, at 16:38, Mark E. Shoulson wrote:
It's pretty disingenuous to say Well, if you want a private-use RTL script,
you should be prepared to write an engine that can render it, ignoring the
fact that LTR people can get by with just a font. Why should it be so much
harder to
srivas sinnathurai sisrivas at blueyonder dot co dot uk wrote:
Why this suggestion?
With current flat space, one code point is only allocated to one and
only one purpose.
We can run out of code space soon.
Argument over. There are not 800,000 more characters that need to be
encoded for
On 08/19/2011 09:51 PM, Doug Ewell wrote:
I thought this property was mutable, and indeed that even some assigned
characters had had their Bidi_Class changed over the years. I could be
wrong.
Perhaps you refer to this:
http://www.unicode.org/versions/corrigendum8.html ?
--
Shriramana
All of the property assignments to PUA characters (except the GC) are purely
informative. The property assignments that are there are simply based on the
likelyhood of property assignment, and can be freely overridden by
implementations. It is just more likely that PUA characters are bc:L than
Asmus Freytag asmusf at ix dot netcom dot com wrote:
Nevertheless, N4085 is a German NB document, the criteria in question
are those suggested by the German NB and not WG2 (and the document
makes note of this distinction), and it is an error to portray this
passage as representing either a
On 08/19/2011 08:59 PM, Michael Everson wrote:
Please write me a rendering engine that will correctly
processhttp://www.evertype.com/standards/csur/engsvanyali.html on
the Mac OS, Linux, and Windows. Thanks.
Heh, it seems from a superficial look through that you could stick the
required
+1
Mark
*— Il meglio è l’inimico del bene —*
On Fri, Aug 19, 2011 at 08:41, John Cowan co...@mercury.ccil.org wrote:
Michael Everson scripsit:
I'd like to invite everyone to support this worthwhile project:
Worthwhile it may be, but surely misinformed as well. Does Mr. Brooks
actually
On 19 Aug 2011, at 17:35, Mark Davis ☕ wrote:
All of the property assignments to PUA characters (except the GC) are purely
informative. The property assignments that are there are simply based on the
likelyhood of property assignment, and can be freely overridden by
implementations.
How?
On 08/19/2011 08:47 PM, Michael Everson wrote:
Indic scripts have LTR directionality. They can use PUA and do
whatever is needed for the*other* challenges inherent in Indic
fonts. A private RTL script cannot use the PUA and have the same
level of support.
In OT, without help from the
On 08/19/2011 09:54 PM, Michael Everson wrote:
On 19 Aug 2011, at 16:38, Mark E. Shoulson wrote:
It's pretty disingenuous to say Well, if you want a private-use
RTL script, you should be prepared to write an engine that can
render it, ignoring the fact that LTR people can get by with
just a
Shriramana Sharma samjnaa at gmail dot com wrote:
I thought this property was mutable, and indeed that even some
assigned characters had had their Bidi_Class changed over the years.
I could be wrong.
Perhaps you refer to this:
http://www.unicode.org/versions/corrigendum8.html ?
No, I
On 08/19/2011 10:05 PM, Mark Davis ☕ wrote:
All of the property assignments to PUA characters (except the GC) are purely
informative. The property assignments that are there are simply based on the
likelyhood of property assignment, and can be freely overridden by
implementations.
Glad to hear
I'd rather see code pages become endangered, and code-page switching an
obscure footnote on the pages of history.
Please, don't invent any new code page systems.
Steven
On 08/19/2011 08:40 AM, srivas sinnathurai wrote:
Doug,
First of all flat code space is the primary functionality of
On 08/19/2011 09:08 PM, Mark E. Shoulson wrote:
It's pretty disingenuous to say Well, if you want a private-use RTL
script, you should be prepared to write an engine that can render it,
ignoring the fact that LTR people can get by with just a font. Why
should it be so much harder to write
On 08/19/2011 09:01 PM, Mark E. Shoulson wrote:
On 08/19/2011 11:03 AM, Shriramana Sharma wrote:
In effect, changing the existing BC=L to ON is no worse than changing
it to R.
I think making the directionality of the PUA L instead of ON was a
mistake in the first place, yes, but does even
William_J_G Overington wjgo underscore 10009 at btinternet dot com
wrote:
Suppose that a concept of an Endangered Language Code Page is invented.
The original Endangered Alphabets subject line was hijacked, almost
immediately, into a thread about defining code pages within the Unicode
On 19 Aug 2011, at 18:01, Shriramana Sharma wrote:
Even though it isn't encoded? That is, my understanding is that we *can't*
change the PUA to ON now, but that there is a suggestion that some *new*
hunk of PUA be created that is R, in order to balance the existing L. Is
that right?
srivas sinnathurai 於 2011年8月19日 上午9:40 寫道:
Why this suggestion?
With current flat space, one code point is only allocated to one and only one
purpose.
We can run out of code space soon.
There are a couple of problems here.
We currently have over 860,000 unassigned code points. Surveys
John H. Jenkins:
there would have to be a *lot* of writing systems out there we don't know
about to fill up planes 4 through 14
That’s quite possible, though, the universe is huge. The question rather is
whether we will ever know about them. It’s quite possible we won’t.
On Friday 19 August 2011, Doug Ewell d...@ewellic.org wrote:
Sorry, in my attempt to avoid naming names I made it look as though Karl made
that claim. He did not. William's message was the one that attempted to
connect the dots between official WG2 policy and the German NB proposal.
Michael Everson 於 2011年8月19日 上午11:15 寫道:
On 19 Aug 2011, at 18:01, Shriramana Sharma wrote:
Even though it isn't encoded? That is, my understanding is that we *can't*
change the PUA to ON now, but that there is a suggestion that some *new*
hunk of PUA be created that is R, in order to
Maybe we should step back a bit:
I'm not calling for any change to existing major aloocations. However,
this is about time we allocate (not PUA) large number of codes to a
code page based sub codes so that not only all 7000+ languages can
Freely use it without INTERFERENCE from Unicode and
On 19 Aug 2011, at 18:24, John H. Jenkins wrote:
We currently have over 860,000 unassigned code points. Surveys of all known
writing systems indicate that only a small fraction of these will be needed.
Indeed, although it looks likely that Han will spill out of the SIP into
plane 3, all
On Friday 19 August 2011, Doug Ewell d...@ewellic.org wrote:
William_J_G Overington wjgo
underscore 10009 at btinternet dot com
wrote:
Suppose that a concept of an Endangered Language Code
Page is invented.
The original Endangered Alphabets subject line was hijacked, almost
John H. Jenkins jenkins at apple dot com wrote:
Put a RTL PUA zone in Plane 14, which is mostly empty, and expected to
remain so, and you're done.
No, you're not, because the OSs/rendering engines would have to rev, and to
be honest, there won't be a lot of enthusiasm for doing something
On 08/19/2011 12:39 PM, Shriramana Sharma wrote:
And I grant you your point of free font making software being
available, but still proper OT/Graphite/AAT tables have to be made (to
render all those contextual forms in this conscript), which calls for
some expertise at least. You would then
On 08/19/2011 01:24 PM, John H. Jenkins wrote:
In order to get the UTC and WG2 to agree to a major architectural
change such as you're suggesting, you'd have to have some very solid
evidence that it's needed—not an interesting idea, not potentially
useful, but seriously *needed*. That's how
Mark E. Shoulson mark at kli dot org wrote:
And indeed, it went the other way too, back when ISO-10646 had not 17,
but 65536 *planes* and someone provided some reasonable evidence (or
just plain reasoned arguments) that 4.3 *billion* characters was
probably overkill.
Technically, I think
20.8.2011 0:07, Doug Ewell wrote:
Of course, 2.1 billion characters is also overkill, but the advent of
UTF-16 was how we ended up with 17 planes.
And now we think that a little over a million is enough for everyone,
just as they thought in the late 1980s that 16 bits is enough for everyone.
On 08/19/2011 05:07 PM, Doug Ewell wrote:
Mark E. Shoulsonmark at kli dot org wrote:
And indeed, it went the other way too, back when ISO-10646 had not 17,
but 65536 *planes* and someone provided some reasonable evidence (or
just plain reasoned arguments) that 4.3 *billion* characters was
On 20 Aug 2011, at 00:35, Jukka K. Korpela wrote:
And now we think that a little over a million is enough for everyone,
just as they thought in the late 1980s that 16 bits is enough for everyone.
Whenever somebody talks about needing 31 bits for Unicode, I always think of
the hypothetical
Jukka K. Korpela jkorpela at cs dot tut dot fi wrote:
And now we think that a little over a million is enough for everyone,
just as they thought in the late 1980s that 16 bits is enough for
everyone.
I know this is an enjoyable exercise — people love to ridicule Bill
Gates for his comment in
On 8/19/2011 2:07 PM, Doug Ewell wrote:
Technically, I think 10646 was always limited to 32,768 planes so that
one could always address a code point with a 32-bit signed integer (a
nod to the Java fans).
Well, yes, but it didn't really have anything to do with Java. Remember
that Java
wasn't
Benjamin M Scarborough 於 2011年8月19日 下午3:53 寫道:
Whenever somebody talks about needing 31 bits for Unicode, I always think of
the hypothetical situation of discovering some extraterrestrial civilization
and trying to add all of their writing systems to Unicode. I imagine there
would be
On 8/19/2011 2:53 PM, Benjamin M Scarborough wrote:
Whenever somebody talks about needing 31 bits for Unicode, I always think of
the hypothetical situation of discovering some extraterrestrial civilization
and trying to add all of their writing systems to Unicode. I imagine there
would be
On 8/19/2011 2:35 PM, Jukka K. Korpela wrote:
20.8.2011 0:07, Doug Ewell wrote:
Of course, 2.1 billion characters is also overkill, but the advent of
UTF-16 was how we ended up with 17 planes.
And now we think that a little over a million is enough for everyone,
just as they thought in the
On 8/19/2011 3:24 PM, Ken Whistler wrote:
On 8/19/2011 2:07 PM, Doug Ewell wrote:
Technically, I think 10646 was always limited to 32,768 planes so that
one could always address a code point with a 32-bit signed integer (a
nod to the Java fans).
Well, yes, but it didn't really have anything
On 08/19/2011 11:19 PM, John H. Jenkins wrote:
Saying that does not make it possible for people to use PUA
characters with RTL directionality, since all the OSes treat them
as LTR.
Mac OS has a mechanism to override that default assumption, the
'prop' table.
Which proves my point that the
83 matches
Mail list logo