On Fri, 10 Dec 2004 13:28:59 -0800, Michael (michka) Kaplan
<[EMAIL PROTECTED]> wrote:
> From: "Kenneth Whistler" <[EMAIL PROTECTED]>
>
> > On the other hand, for many English speakers, "RSVP" is simply
> > learned as an unanalyzed verb, pronounced "aressveepee", meaning
> > "send a response to th
From: "Kenneth Whistler" <[EMAIL PROTECTED]>
That it has been morphological reanalyzed is demonstrated by the
fact that it takes regular English verb endings, as in:
"I RSVPed yesterday, right after I got the email."
As I said, it is now a bona fide English verb, and most
English speakers will tre
Philippe Verdy scripsit:
> And I disagree with you about the fact the U+ can't be used in XML
> documents. It can be used in URI through URI escaping mechanism, as
> explicitly indicated in the XML specification...
You have a hold of the right stick but at the wrong end. U+ can be
enco
Philippe Verdy scripsit:
> >Okay, I'm confused. Does ≮ open a tag? Does it matter if it's
> >composed or decomposed?
>
> It does not open a XML tag.
> It does matter if it's composed (won't open a tag) or decomposed (will
> open a tag, but with a combining character, invalid as an identifier
>
Philippe Verdy scripsit:
> If you look at the XML 1.0 Second Edition
The Second Edition has been superseded by the Third.
> Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
> [#x1-#x10]
That is normative.
> But the comment following it specifies:
That comment is not n
At 12:50 PM 12/10/2004, Kenneth Whistler wrote:
Tim Greenwood asked:
> > ... a perfectly normal linguistic process of
> > attributive disambiguation of a term which had grown ambiguous
> > in usage.
>
> Is that like the 'Please RSVP' that I see all too often? Or should
> that not be excused?
*grins
Philippe,
> RSVP is a French acronym for "Répondez, s'il vous plait".
Yes, we know that.
But it is also a reanalyzed English verb which means
"reply to a message (or invitation)".
That it has been morphological reanalyzed is demonstrated by the
fact that it takes regular English verb endings, a
From: "D. Starner" <[EMAIL PROTECTED]>
Okay, I'm confused. Does ≮ open a tag? Does it matter if it's
composed or
decomposed?
It does not open a XML tag.
It does matter if it's composed (won't open a tag) or decomposed (will open
a tag, but with a combining character, invalid as an identifier star
This is just a confusion among the hoi polloi.
âMark
- Original Message -
From: "Asmus Freytag" <[EMAIL PROTECTED]>
To: "Kenneth Whistler" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, December 10, 2004 17:38
Subject: Re: US-ASCII (was: Re: Invalid UTF-8
> Is that like the 'Please RSVP' that I see all too often? Or should
> that not be excused?
*grins* Well, technically, that is not a case of attributive
disambiguation, but rather ignorant redundancy.
RSVP is a French acronym for "Répondez, s'il vous plait".
SVP is also the wellknown French acronym
From: "John Cowan" <[EMAIL PROTECTED]>
Marcin 'Qrczak' Kowalczyk scripsit:
http://www.w3.org/TR/2000/REC-xml-20001006#charsets
implies that the appropriate level for parsing XML is code points.
You are reading the XML Recommendation incorrectly. It is not defined
in terms of codepoints (8-bit, 16-
John Cowan writes:
> You are reading the XML Recommendation incorrectly. It is not defined
> in terms of codepoints (8-bit, 16-bit, or 32-bit) but in terms of
> characters. XML processors are required to process UTF-8 and UTF-16,
> and may process other character encodings or not. But the inter
Marcin 'Qrczak' Kowalczyk scripsit:
> http://www.w3.org/TR/2000/REC-xml-20001006#charsets
> implies that the appropriate level for parsing XML is code points.
You are reading the XML Recommendation incorrectly. It is not defined
in terms of codepoints (8-bit, 16-bit, or 32-bit) but in terms of
c
From: "Philippe Verdy" <[EMAIL PROTECTED]>
From: "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]>
"Philippe Verdy" <[EMAIL PROTECTED]> writes:
The XML/HTML core syntax is defined with fixed behavior of some
individual characters like '&', '<', quotation marks, and with special
behavior for spaces.
T
"Marcin 'Qrczak' Kowalczyk" writes:
> "D. Starner" writes:
>
> > This implies that every programmer needs an indepth knowledge of
> > Unicode to handle simple strings.
>
> There is no way to avoid that.
Then there's no way that we're ever going to get reliable Unicode
support.
> If the ru
John Cowan <[EMAIL PROTECTED]> writes:
>> > The XML/HTML core syntax is defined with fixed behavior of some
>> > individual characters like '&', '<', quotation marks, and with special
>> > behavior for spaces.
>>
>> The point is: what "characters" mean in this sentence. Code points?
>> Combining
- Original Message -
From: "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, December 10, 2004 8:35 PM
Subject: Re: Nicest UTF
"Philippe Verdy" <[EMAIL PROTECTED]> writes:
The XML/HTML core syntax is defined with fixed behavior of some
individual chara
John Cowan <[EMAIL PROTECTED]> writes:
>> > The XML/HTML core syntax is defined with fixed behavior of some
>> > individual characters like '&', '<', quotation marks, and with special
>> > behavior for spaces.
>>
>> The point is: what "characters" mean in this sentence. Code points?
>> Combining
From: "Carl W. Brown" <[EMAIL PROTECTED]>
Philippe,
Also a broken opening tag for HTML/XML documents
In addition to not having endian problems UTF-8 is also useful when
tracing
intersystem communications data because XML and other tags are usually in
the ASCII subset of UTF-8 and stand out making
From: "Kenneth Whistler" <[EMAIL PROTECTED]>
> On the other hand, for many English speakers, "RSVP" is simply
> learned as an unanalyzed verb, pronounced "aressveepee", meaning
> "send a response to this message". And to castigate such speakers
> for politely prepending a "please" to that verb is
Kenneth Whistler scripsit:
> On the other hand, for many English speakers, "RSVP" is simply
> learned as an unanalyzed verb, pronounced "aressveepee", meaning
> "send a response to this message". And to castigate such speakers
> for politely prepending a "please" to that verb is a little
> too muc
Marcin 'Qrczak' Kowalczyk scripsit:
> > The XML/HTML core syntax is defined with fixed behavior of some
> > individual characters like '&', '<', quotation marks, and with special
> > behavior for spaces.
>
> The point is: what "characters" mean in this sentence. Code points?
> Combining character
Philippe,
Also a broken opening tag for HTML/XML documents
In addition to not having endian problems UTF-8 is also useful when
tracing
intersystem communications data because XML and other tags are usually
in
the ASCII subset of UTF-8 and stand out making it easier to find the
specific data you a
Tim Greenwood asked:
> > ... a perfectly normal linguistic process of
> > attributive disambiguation of a term which had grown ambiguous
> > in usage.
>
> Is that like the 'Please RSVP' that I see all too often? Or should
> that not be excused?
*grins* Well, technically, that is not a case of at
On Fri, 10 Dec 2004 12:06:12 -0800 (PST), Kenneth Whistler
<[EMAIL PROTECTED]> wrote:
> In addition to Doug's historical clarification, you need to
> understand this as a perfectly normal linguistic process of
> attributive disambiguation of a term which had grown ambiguous
> in usage.
Is that li
Arcane Jill va escriure:
> And yet, in an expression such as tolower(trim(s)), the second
> validation is unnecessary. The input to tolower() /must/ be valid,
> because it is the output of trim(). But on the other hand, tolower()
> could be called with arbitrary input, so I can't skip the validati
> If any
> criticism was present, it referred to the redundant "US-" prefix in
> "US-ASCII", not to Unicode, and even that wasn't really criticism, just my
> lack of understanding /why/.
In addition to Doug's historical clarification, you need to
understand this as a perfectly normal linguistic
"Philippe Verdy" <[EMAIL PROTECTED]> writes:
> The XML/HTML core syntax is defined with fixed behavior of some
> individual characters like '&', '<', quotation marks, and with special
> behavior for spaces.
The point is: what "characters" mean in this sentence. Code points?
Combining character se
"D. Starner" <[EMAIL PROTECTED]> writes:
>> String equality in a programming language should not treat composed
>> and decomposed forms as equal. Not this level of abstraction.
>
> This implies that every programmer needs an indepth knowledge of
> Unicode to handle simple strings.
There is no way
Arcane Jill wrote:
> Here's something that's been bothering me. Suppose I write a function
> - let's call it trim(), which removes leading and trailing spaces from
> a string, represented as one of the UTFs. If I've understood this
> correctly, I'm supposed to validate the input, yes?
>
> Okay, n
Arcane Jill wrote:
Here's something that's been bothering me. Suppose I write a function -
[ that process strings in one of the UTFs]
> I'm supposed to validate the input, yes?
You are designing the API - you get to choose what it does.
An application as a whole needs to validate external input t
Philippe,
> Also a broken opening tag for HTML/XML documents
In addition to not having endian problems UTF-8 is also useful when tracing
intersystem communications data because XML and other tags are usually in
the ASCII subset of UTF-8 and stand out making it easier to find the
specific data you
Jill,
I think that the best practice is to validate input.
Besides the overhead of revalidating there is the issue of what do you do
with data that contains invalid characters. This has to be handles
explicitly. Once validated all transforms should maintain valid data. If
you also provide a mo
Use of the Unicode standard does *not* require constant validation of
strings. The standard carefully distinguishes between Unicode strings
(D29a-d, page 74) and UTFs. The Unicode strings are in-memory
representations of Unicode, but do not have to be valid UTFs; so all Unicode
X-bit strings are va
"Arcane Jill" <[EMAIL PROTECTED]> writes:
> Here's something that's been bothering me. Suppose I write a function
> -
> let's call it trim(), which removes leading and trailing spaces from a
> string, represented as one of the UTFs. If I've understood this
> correctly, I'm supposed to validate the
Here's something that's been bothering me. Suppose I write a function -
let's call it trim(), which removes leading and trailing spaces from a
string, represented as one of the UTFs. If I've understood this correctly,
I'm supposed to validate the input, yes?
Okay, now suppose I write a second f
36 matches
Mail list logo