subject:"Usage of \\\[oxdb\] \(was Re\: String Literals, take 2\)"

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-06 Thread James Mastros

On 12/05/2002 12:18 PM, Michael Lazzaro wrote:


On Thursday, December 5, 2002, at 02:11  AM, James Mastros wrote:


On 12/04/2002 3:21 PM, Larry Wall wrote:


\x and \o are then just shortcuts.


Can we please also have \0 as a shortcut for \0x0?


\0 in addition to \x, meaning the same thing?  I think that would get 
us back to where we were with octal, wouldn't it?  I'm not real keen 
on leading zero meaning anything, personally...  :-P 

You misinterpret.  I meant \0 meaning the same as \c[NUL], IE the same 
as chr(0), a null character.  (I suppse I should have said \0x[0].)

Which means that the only way to get a string with a literal 0xFF 
byte in it is with qq:u1[\xFF]? (Larry, I don't know that this has 
been mentioned before: is that right?)  chr:u1(0xFF) might do it too, 
but we're getting ahead of ourselves.

Hmm... does this matter?


Sorry.  It does, in fact, not matter... momentarly stopped thinking in 
terms of utf8 encoding being a completly transparent process.

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-05 Thread Larry Wall

On Thu, Dec 05, 2002 at 09:18:21AM -0800, Michael Lazzaro wrote:
: 
: On Thursday, December 5, 2002, at 02:11  AM, James Mastros wrote:
: 
: >On 12/04/2002 3:21 PM, Larry Wall wrote:
: >>\x and \o are then just shortcuts.
: >Can we please also have \0 as a shortcut for \0x0?
: 
: \0 in addition to \x, meaning the same thing?  I think that would get 
: us back to where we were with octal, wouldn't it?  I'm not real keen on 
: leading zero meaning anything, personally...  :-P

\0 still means chr(0).  I don't think there's much conflict with
the new \0x, \0o, \0b, and \0d, since \0 almost always occurs at the
end of a string, if anywhere.

: >>There ain't no such thing as a "wide" character.  \xff is exactly
: >>the same character as \x[ff].
: >Which means that the only way to get a string with a literal 0xFF byte 
: >in it is with qq:u1[\xFF]? (Larry, I don't know that this has been 
: >mentioned before: is that right?)  chr:u1(0xFF) might do it too, but 
: >we're getting ahead of ourselves.
: 
: Hmm... does this matter?  I'm a bit rusty on my Unicode these days, but 
: I was assuming that \xFF and \x00FF always pointed to the same 
: character, and that you in fact _don't_ have the ability to put 
: individual bytes in a string, because Perl is deciding how to place the 
: characters for you (how long they should be, etc.)  So if you wanted 
: more explicit control, you'd use C.

A "byte" string is any string whose characters are all under 256.  It's
up to an interface to coerce this to actual bytes if it needs them.

We'll presumably have something like "use bytes" that turns off all
multi-byte processing, in which case you have to deal with any UTF that
comes in by hand.  But in general it'll be better if the interface coerces
to types like "str8", which is presumably pronouced "straight".

Don't ask me how str16 and str32 are pronounced.  (But generally you should
be using utf16 instead of str16 in any event, unless your interface truly
doesn't know how to deal with surrogates.)  In other words, str16 is
the name of the obsolescent UCS-2, and str32 is the name for UCS-4, which
is more or less the same as UTF-32, except that UTF-32 is not allowed to
use the bits above 0x10.

So anyway, we've got all these types:

str8utf8
str16   utf16
str32   utf32

where the "str" version is essentially just a compact integer array.  One could
alias str8 to "latin1" since the default coercion from Unicode to str8 would
have those semantics.

It's not clear exactly what the bare "str" type is.  "Str" is obviously
the abstract string type, but "str" probably means the default C string
type for the current architecture/OS/locale/whatever.  In other words,
it might be str8, or it might be utf8.  Let's hope it's utf8, because
that will work forever, give or take an eon.

: >Also, an annoying corner case: is "\0x1ff" eq "\0x[1f]f", or is it eq 
: >"\0x[1ff]"?  What about other bases?  Is "\0x1x" eq "\0x[1]", or is it 
: >eq "\0x[1x]" (IE illegal).  (Now that I put those three questions 
: >together, the only reasonable answer seems to be that the number ends 
: >in the last place it's valid to end if you don't use explicit 
: >brackets.)
: 
: Yeah, my guess is that it's as you say... it goes till it can't goes no 
: more, but never gives an error (well, maybe for "\0xz", where there are 
: zero valid digits?)  But I would suspect that the bracketed form is 
: *strongly* recommended.  At least, that's what I plan on telling 
: people.  :-)

Sounds good to me.  Dwimming is wonderful, but so is dwissing.

Larry

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-05 Thread Michael Lazzaro


On Thursday, December 5, 2002, at 02:11  AM, James Mastros wrote:


On 12/04/2002 3:21 PM, Larry Wall wrote:

\x and \o are then just shortcuts.

Can we please also have \0 as a shortcut for \0x0?


\0 in addition to \x, meaning the same thing?  I think that would get 
us back to where we were with octal, wouldn't it?  I'm not real keen on 
leading zero meaning anything, personally...  :-P

There ain't no such thing as a "wide" character.  \xff is exactly
the same character as \x[ff].

Which means that the only way to get a string with a literal 0xFF byte 
in it is with qq:u1[\xFF]? (Larry, I don't know that this has been 
mentioned before: is that right?)  chr:u1(0xFF) might do it too, but 
we're getting ahead of ourselves.

Hmm... does this matter?  I'm a bit rusty on my Unicode these days, but 
I was assuming that \xFF and \x00FF always pointed to the same 
character, and that you in fact _don't_ have the ability to put 
individual bytes in a string, because Perl is deciding how to place the 
characters for you (how long they should be, etc.)  So if you wanted 
more explicit control, you'd use C.

Also, an annoying corner case: is "\0x1ff" eq "\0x[1f]f", or is it eq 
"\0x[1ff]"?  What about other bases?  Is "\0x1x" eq "\0x[1]", or is it 
eq "\0x[1x]" (IE illegal).  (Now that I put those three questions 
together, the only reasonable answer seems to be that the number ends 
in the last place it's valid to end if you don't use explicit 
brackets.)

Yeah, my guess is that it's as you say... it goes till it can't goes no 
more, but never gives an error (well, maybe for "\0xz", where there are 
zero valid digits?)  But I would suspect that the bracketed form is 
*strongly* recommended.  At least, that's what I plan on telling 
people.  :-)

Design team: If we're wrong on these, please correct.  :-)

MikeL

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-05 Thread James Mastros

On 12/04/2002 3:21 PM, Larry Wall wrote:

On Wed, Dec 04, 2002 at 11:38:35AM -0800, Michael Lazzaro wrote:
: We still need to verify whether we can have, in qq strings:
: 
:\033  - octal   (p5; deprecated but allowed in p6?)

I think it's disallowed.
Thank the many gods ... or One True God, or Larry, or whatever your 
personal preference may be.  ("So have a merry Christmas, Happy Hanukah, 
Kwazy Kwanzaa, a tip-top Tet, and a solemn, dignified Ramadan.")

   \0o33  - octal
   \0x1b  - hex 
   \0d123 - decimal
   \0b1001- binary
\x and \o are then just shortcuts.
Can we please also have \0 as a shortcut for \0x0?


\c[^H], for instance.  We can overload the \c notation to our heart's
desire, as long as we don't conflict with its use for named characters:

\c[GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI]

Very Cool.  (BTW, for those that don't follow Unicode, this means that 
everything matching /^[^A-Z ]$/ is fair game for us; Unicode limits 
charachter names to that to minimize chicken-and-egg problems.  We 
/probably/ shouldn't take anything in /^[A-Za-z ]$/, to allow people to 
say the much more readable "\c[Greek Capital Letter Omega with Pepperoni 
and Pineapple]".

: There is also the question of what the bracketed format does.  "Wide" 
: chars, e.g. for Unicode, seem appropriate only in hex.  But it would 
: seem useful to allow a bracketed form for the others that prevents 
: ambiguities:
: 
:"\o164" ne "\o{16}4"
:"\d100" ne "\d{10}0"
: 
: Whether that means you can actually specify wide chars in \o, \d, and 
: \b or it's just a disambiguification of the Latin-1 case is open to 
: question.

There ain't no such thing as a "wide" character.  \xff is exactly
the same character as \x[ff].
Which means that the only way to get a string with a literal 0xFF byte 
in it is with qq:u1[\xFF]? (Larry, I don't know that this has been 
mentioned before: is that right?)  chr:u1(0xFF) might do it too, but 
we're getting ahead of ourselves.

Also, an annoying corner case: is "\0x1ff" eq "\0x[1f]f", or is it eq 
"\0x[1ff]"?  What about other bases?  Is "\0x1x" eq "\0x[1]", or is it 
eq "\0x[1x]" (IE illegal).  (Now that I put those three questions 
together, the only reasonable answer seems to be that the number ends in 
the last place it's valid to end if you don't use explicit brackets.)

(BTW, in HTML and XML, numeric character escapes are decimal by default, 
you have to add a # for hex.  In windows and several other OSes (I 
think, I like to play with Unicode but have little actual use for it), 
ALT-0nnn is spelt in decimal only.  Decimal Unicode ordnals are 
fundimently flawed (since blocks are always on nice even hex numbers, 
but ugly decimal ones), but useful anyway).

	-=- James Mastros

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-04 Thread Damian Conway

Larry wrote:


: But I think we'd definitely like to introduce \d.

Can't, unless we change \d to  in regexen.


Which we ought to be very wary of, given how very frequently it's
used in regexes.

Damian

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-04 Thread Larry Wall

On Wed, Dec 04, 2002 at 11:38:35AM -0800, Michael Lazzaro wrote:
: We still need to verify whether we can have, in qq strings:
: 
:\033  - octal   (p5; deprecated but allowed in p6?)

I think it's disallowed.

:\o33  - octal   (p5)
:\x1b  - hex (p5)
:\d123 - decimal (?)
:\b1001- binary  (?)

Can't really have \d and \b if they keep their current regex meanings.
I think the general form is:

   \0o33  - octal
   \0x1b  - hex 
   \0d123 - decimal
   \0b1001- binary

\x and \o are then just shortcuts.

: and if so, if these are allowed too:
: 
:\o{777}   - (?)
:\x{1b}- "wide" hex  (p5)
:\d{123}   - (?)
:\b{1001}  - (?)

The general form could be

   \0o[33]  - octal
   \0x[1b]  - hex 
   \0d[123] - decimal
   \0b[1001]- binary

Or it could be

   \c[0o33]  - octal
   \c[0x1b]  - hex 
   \c[0d123] - decimal
   \c[0b1001]- binary

since \c is taking over \N's (rather ill-defined) duties.

: Note that \b conflicts with backspace.  I'd rather keep backspace than 
: binary, personally; I have yet to feel the need to call out a char in 
: binary.  :-)  Or we can make it dependent on the trailing digits, or 
: require the brackets, or require backspace to be spelt differently.

\c[^H], for instance.  We can overload the \c notation to our heart's
desire, as long as we don't conflict with its use for named characters:

\c[GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI]

: But I think we'd definitely like to introduce \d.

Can't, unless we change \d to  in regexen.

: There is also the question of what the bracketed format does.  "Wide" 
: chars, e.g. for Unicode, seem appropriate only in hex.  But it would 
: seem useful to allow a bracketed form for the others that prevents 
: ambiguities:
: 
:"\o164" ne "\o{16}4"
:"\d100" ne "\d{10}0"
: 
: Whether that means you can actually specify wide chars in \o, \d, and 
: \b or it's just a disambiguification of the Latin-1 case is open to 
: question.

There ain't no such thing as a "wide" character.  \xff is exactly
the same character as \x[ff].  A character in Perl is an abstract
codepoint number--how it's represented is of no concern to the
programmer (though it might be of concern to any interface to the
outside world, of course).  Do not think of Perl 6 strings as arrays
of bytes (except when they are (and probably not even then...)).

Larry

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-04 Thread Dave Whipp

"Michael Lazzaro" <[EMAIL PROTECTED]> wrote
> Note that \b conflicts with backspace.  I'd rather keep backspace than
> binary, personally; I have yet to feel the need to call out a char in
> binary.  :-)  Or we can make it dependent on the trailing digits, or
> require the brackets, or require backspace to be spelt differently.
>
> But I think we'd definitely like to introduce \d.
>

Our numeric literals use # for radix stuff. So perhaps we could use "\#..."
to introduce explicit codings:

 "\#d13"
 "\#h0d"
 "\#b1101"
 "\#{ 1<<6 - 20 * 2 - 9#1:2 }"

would all be synonyms!


Dave.

ps. how did this thread migrate from p6d to p6l?

Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-04 Thread Michael Lazzaro

We still need to verify whether we can have, in qq strings:

   \033  - octal   (p5; deprecated but allowed in p6?)
   \o33  - octal   (p5)
   \x1b  - hex (p5)
   \d123 - decimal (?)
   \b1001- binary  (?)

and if so, if these are allowed too:

   \o{777}   - (?)
   \x{1b}- "wide" hex  (p5)
   \d{123}   - (?)
   \b{1001}  - (?)

Only four of these nine constructs are allowed in Perl5.

Note that \b conflicts with backspace.  I'd rather keep backspace than 
binary, personally; I have yet to feel the need to call out a char in 
binary.  :-)  Or we can make it dependent on the trailing digits, or 
require the brackets, or require backspace to be spelt differently.

But I think we'd definitely like to introduce \d.

There is also the question of what the bracketed format does.  "Wide" 
chars, e.g. for Unicode, seem appropriate only in hex.  But it would 
seem useful to allow a bracketed form for the others that prevents 
ambiguities:

   "\o164" ne "\o{16}4"
   "\d100" ne "\d{10}0"

Whether that means you can actually specify wide chars in \o, \d, and 
\b or it's just a disambiguification of the Latin-1 case is open to 
question.

MikeL

*Not to be confused with an eigendisambiguification, of course.

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

Re: Usage of \[oxdb] (was Re: String Literals, take 2)

Usage of \[oxdb] (was Re: String Literals, take 2)

8 matches

Site Navigation

Mail list logo

Footer information