In chapter 8, regarding Hebrew, the standard says:
Positioning. Marks may combine with vowels and other points, and there are
complex typographic rules for positioning these combinations.
I understand that this sentence should be regarded as being normative.
Clause 4.3 uses the word tend.
On Dec 5, 2004, at 07:02 PM, Doug Ewell wrote:
A word-based encoding for English could automatically assume spaces
where they are appropriate. The sentence:
What means this, my lord?
would have seven encodable elements: the five words, the comma, and the
question mark. Spaces would be
From: D. Starner [EMAIL PROTECTED]
If you're talking about a language that hides the structure of strings
and has no problem with variable length data, then it wouldn't matter
what the internal processing of the string looks like. You'd need to
use iterators and discourage the use of arbitrary
Elaine Keown
Vancouver
Dear Philippe and Lists:
In all your searches and in your proposals, did you
try to segregate the proposed additional characters
into two separate categories: those needed
for inclusion within many modern studies, and those
The Samaritan marks are still
Elaine in Vancouver
Dear Mark:
Thanks, I guess.
This is the one I'm going to comment on, since it's
the one I know best.
I know that Michael Everson and I are working on a
Samaritan proposal,
It appears to me that my proposal came first, no? By
some months...I have some
On 07/12/2004 07:52, Jony Rosenne wrote:
...
Consequently, there is and cannot be anything wrong with Unicode (at least
in this respect) and it does support ANY sequence of Hebrew vowels and
consonants.
I do maintain that is some cases the typographic process would require out
of band assistance
Title: RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)
Doug Ewell wrote:
John Cowan jcowan at reutershealth dot com wrote:
Windows filesystems do know what encoding they use. But a
filename on
a Unix(oid) file system is a mere sequence of octets, of
which only 00
and 2F are
Richard Cook rscook at socrates dot berkeley dot edu wrote:
Well, why stop with words, my lord? Why not just encode all sentences,
paragraphs, pages, chapters, books, libraries, or your higher level
unit of choice, for that matter.
...
Whether you choose to associate a single glyph with your
Philippe stated, and I need to correct:
UTF-24 already exists as an encoding form (it is identical to UTF-32), if
you just consider that encoding forms just need to be able to represent a
valid code range within a single code unit.
This is false.
Unicode encoding forms exist by virtue of
Thanks to Peter Constable, John Hudson, Tom Gewecke, Christopher Fynn, and
others, for taking the time to address my question.
Gary
---
Gary Grosso
Arbortext, Inc.
Ann Arbor, MI, USA
Yes, and pigs could fly, if they had big enough wings.
An 8-foot wingspan should do it. For picture of said flying pig see:
http://www.cincinnati.com/bigpiggig/profile_091700.html
http://www.cincinnati.com/bigpiggig/images/pig091700.jpg
Rick
E. Keown wrote:
In the so-called 'deprecated' block, the 2nd Hebrew
block in the BMP, are composed Hebrew points which I
plan to go on using. And I expect everyone else to go
on using them also, all Hebraists. We think they are
needed for 'text representation' of shin and sin.
It really is a
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
Of E. Keown
In the so-called 'deprecated' block, the 2nd Hebrew
block in the BMP, are composed Hebrew points which I
plan to go on using. And I expect everyone else to go
on using them also, all Hebraists. We think they are
At 11:52 PM 12/6/2004, Jony Rosenne wrote:
In chapter 8, regarding Hebrew, the standard says:
Positioning. Marks may combine with vowels and other points, and there are
complex typographic rules for positioning these combinations.
I understand that this sentence should be regarded as being
From: Kenneth Whistler [EMAIL PROTECTED]
Yes, and pigs could fly, if they had big enough wings.
Once again, this is a creative comment. As if Unicode had to be bound on
architectural constraints such as the requirement of representing code units
(which are architectural for a system) only as
At 09:50 PM 12/6/2004, John Hudson wrote:
I don't know. I try to avoid politics, if possible. The significance of
what I'm saying is that you have made a good start in your proposal, that
it has some shortcomings, and that I hope to be able to help put something
more complete together.
It
From: D. Starner [EMAIL PROTECTED]
(Sorry for sending this twice, Marcin.)
Marcin 'Qrczak' Kowalczyk writes:
UTF-8 is poorly suitable for internal processing of strings in a
modern programming language (i.e. one which doesn't already have a
pile of legacy functions working of bytes, but which can
RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)I know wht you mean here:
most Linux/Unix filesystems (as well as many legacy filesystems for Windows
and MacOS...) do not track the encoding with which filenames were encoded
and, depending on local user preferences when that user created that
Philippe continued:
As if Unicode had to be bound on
architectural constraints such as the requirement of representing code units
(which are architectural for a system) only as 16-bit or 32-bit units,
Yes, it does. By definition. In the standard.
ignoring the fact that technologies do
On 06/12/2004 22:41, E. Keown wrote:
...
1.
Proposal to add Samaritan Pointing to the UCS
http://www.lashonkodesh.org/samarpro.pdf
WG2 number: N2748
I notice that Elaine is here proposing a HEBREW SAMARITAN PUNCTUATION
WORD DIVIDER - and this should be in the BMP as Samaritan is a script in
Lars,
I'm going to step in here, because this argument seems to
be generating more heat than light.
I never said it doesn't violate any existing rules. Stating that it does,
doesn't help a bit. Rules can be changed.
I ask you to step back and try to see the big picture.
First, I'm going to
John Hudson scripsit:
OpenType is a trademark of Microsoft and a proprietary font format
jointly developed by Microsoft and Adobe.
The question is, is it an open standard? That is, is anyone free to
create OpenType fonts, OpenType font tools, OpenType font renderers?
Is the documentation
John Cowan wrote:
OpenType is a trademark of Microsoft and a proprietary font format
jointly developed by Microsoft and Adobe.
The question is, is it an open standard? That is, is anyone free to
create OpenType fonts, OpenType font tools, OpenType font renderers?
Is the documentation freely
Peter Kirk scripsit:
I notice that Elaine is here proposing a HEBREW SAMARITAN PUNCTUATION
WORD DIVIDER - and this should be in the BMP as Samaritan is a script in
modern list. But there is already in the pipeline a PHOENICIAN WORD
SEPARATOR, provisionally U+1091F, and already defined
Kenneth Whistler scripsit:
Storage of UNIX filenames on Windows databases, for example,
can be done with BINARY fields, which correctly capture the
identity of them as what they are: an unconvertible array of
byte values, not a convertible string in some particular
code page.
This solution,
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:
An alternative can then be a mixed encoding selection:
- choose a legacy encoding that will most often be able to represent
valid filenames without loss of information (for example ISO-8859-1,
or Cp1252).
- encode the filename with
Kenneth Whistler kenw at sybase dot com wrote:
I do not think this is a proposal to amend UTF-8 to allow
invalid sequences. So we should get that off the table.
I hope you are right.
Apparently Lars is currently using PUA U+E080..U+E0FF
(or U+EE80..U+EEFF ?) for this purpose, enabling the
RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)
Lars Kristan wrote:
I never said it doesn't violate any existing rules. Stating that it
does, doesn't help a bit. Rules can be changed. Assuming we understand
the consequences. And that is what we should be discussing. By stating
what should
28 matches
Mail list logo