Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-06 Thread James Mastros
On 12/05/2002 12:18 PM, Michael Lazzaro wrote:


On Thursday, December 5, 2002, at 02:11  AM, James Mastros wrote:


On 12/04/2002 3:21 PM, Larry Wall wrote:


\x and \o are then just shortcuts.


Can we please also have \0 as a shortcut for \0x0?


\0 in addition to \x, meaning the same thing?  I think that would get 
us back to where we were with octal, wouldn't it?  I'm not real keen 
on leading zero meaning anything, personally...  :-P 

You misinterpret.  I meant \0 meaning the same as \c[NUL], IE the same 
as chr(0), a null character.  (I suppse I should have said \0x[0].)

Which means that the only way to get a string with a literal 0xFF 
byte in it is with qq:u1[\xFF]? (Larry, I don't know that this has 
been mentioned before: is that right?)  chr:u1(0xFF) might do it too, 
but we're getting ahead of ourselves.

Hmm... does this matter?


Sorry.  It does, in fact, not matter... momentarly stopped thinking in 
terms of utf8 encoding being a completly transparent process.




Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-05 Thread Larry Wall
On Thu, Dec 05, 2002 at 09:18:21AM -0800, Michael Lazzaro wrote:
: 
: On Thursday, December 5, 2002, at 02:11  AM, James Mastros wrote:
: 
: >On 12/04/2002 3:21 PM, Larry Wall wrote:
: >>\x and \o are then just shortcuts.
: >Can we please also have \0 as a shortcut for \0x0?
: 
: \0 in addition to \x, meaning the same thing?  I think that would get 
: us back to where we were with octal, wouldn't it?  I'm not real keen on 
: leading zero meaning anything, personally...  :-P

\0 still means chr(0).  I don't think there's much conflict with
the new \0x, \0o, \0b, and \0d, since \0 almost always occurs at the
end of a string, if anywhere.

: >>There ain't no such thing as a "wide" character.  \xff is exactly
: >>the same character as \x[ff].
: >Which means that the only way to get a string with a literal 0xFF byte 
: >in it is with qq:u1[\xFF]? (Larry, I don't know that this has been 
: >mentioned before: is that right?)  chr:u1(0xFF) might do it too, but 
: >we're getting ahead of ourselves.
: 
: Hmm... does this matter?  I'm a bit rusty on my Unicode these days, but 
: I was assuming that \xFF and \x00FF always pointed to the same 
: character, and that you in fact _don't_ have the ability to put 
: individual bytes in a string, because Perl is deciding how to place the 
: characters for you (how long they should be, etc.)  So if you wanted 
: more explicit control, you'd use C.

A "byte" string is any string whose characters are all under 256.  It's
up to an interface to coerce this to actual bytes if it needs them.

We'll presumably have something like "use bytes" that turns off all
multi-byte processing, in which case you have to deal with any UTF that
comes in by hand.  But in general it'll be better if the interface coerces
to types like "str8", which is presumably pronouced "straight".

Don't ask me how str16 and str32 are pronounced.  (But generally you should
be using utf16 instead of str16 in any event, unless your interface truly
doesn't know how to deal with surrogates.)  In other words, str16 is
the name of the obsolescent UCS-2, and str32 is the name for UCS-4, which
is more or less the same as UTF-32, except that UTF-32 is not allowed to
use the bits above 0x10.

So anyway, we've got all these types:

str8utf8
str16   utf16
str32   utf32

where the "str" version is essentially just a compact integer array.  One could
alias str8 to "latin1" since the default coercion from Unicode to str8 would
have those semantics.

It's not clear exactly what the bare "str" type is.  "Str" is obviously
the abstract string type, but "str" probably means the default C string
type for the current architecture/OS/locale/whatever.  In other words,
it might be str8, or it might be utf8.  Let's hope it's utf8, because
that will work forever, give or take an eon.

: >Also, an annoying corner case: is "\0x1ff" eq "\0x[1f]f", or is it eq 
: >"\0x[1ff]"?  What about other bases?  Is "\0x1x" eq "\0x[1]", or is it 
: >eq "\0x[1x]" (IE illegal).  (Now that I put those three questions 
: >together, the only reasonable answer seems to be that the number ends 
: >in the last place it's valid to end if you don't use explicit 
: >brackets.)
: 
: Yeah, my guess is that it's as you say... it goes till it can't goes no 
: more, but never gives an error (well, maybe for "\0xz", where there are 
: zero valid digits?)  But I would suspect that the bracketed form is 
: *strongly* recommended.  At least, that's what I plan on telling 
: people.  :-)

Sounds good to me.  Dwimming is wonderful, but so is dwissing.

Larry



Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-05 Thread Michael Lazzaro

On Thursday, December 5, 2002, at 02:11  AM, James Mastros wrote:


On 12/04/2002 3:21 PM, Larry Wall wrote:

\x and \o are then just shortcuts.

Can we please also have \0 as a shortcut for \0x0?


\0 in addition to \x, meaning the same thing?  I think that would get 
us back to where we were with octal, wouldn't it?  I'm not real keen on 
leading zero meaning anything, personally...  :-P

There ain't no such thing as a "wide" character.  \xff is exactly
the same character as \x[ff].

Which means that the only way to get a string with a literal 0xFF byte 
in it is with qq:u1[\xFF]? (Larry, I don't know that this has been 
mentioned before: is that right?)  chr:u1(0xFF) might do it too, but 
we're getting ahead of ourselves.

Hmm... does this matter?  I'm a bit rusty on my Unicode these days, but 
I was assuming that \xFF and \x00FF always pointed to the same 
character, and that you in fact _don't_ have the ability to put 
individual bytes in a string, because Perl is deciding how to place the 
characters for you (how long they should be, etc.)  So if you wanted 
more explicit control, you'd use C.

Also, an annoying corner case: is "\0x1ff" eq "\0x[1f]f", or is it eq 
"\0x[1ff]"?  What about other bases?  Is "\0x1x" eq "\0x[1]", or is it 
eq "\0x[1x]" (IE illegal).  (Now that I put those three questions 
together, the only reasonable answer seems to be that the number ends 
in the last place it's valid to end if you don't use explicit 
brackets.)

Yeah, my guess is that it's as you say... it goes till it can't goes no 
more, but never gives an error (well, maybe for "\0xz", where there are 
zero valid digits?)  But I would suspect that the bracketed form is 
*strongly* recommended.  At least, that's what I plan on telling 
people.  :-)

Design team: If we're wrong on these, please correct.  :-)

MikeL



Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-05 Thread James Mastros
On 12/04/2002 3:21 PM, Larry Wall wrote:

On Wed, Dec 04, 2002 at 11:38:35AM -0800, Michael Lazzaro wrote:
: We still need to verify whether we can have, in qq strings:
: 
:\033  - octal   (p5; deprecated but allowed in p6?)

I think it's disallowed.
Thank the many gods ... or One True God, or Larry, or whatever your 
personal preference may be.  ("So have a merry Christmas, Happy Hanukah, 
Kwazy Kwanzaa, a tip-top Tet, and a solemn, dignified Ramadan.")

   \0o33  - octal
   \0x1b  - hex 
   \0d123 - decimal
   \0b1001- binary
\x and \o are then just shortcuts.
Can we please also have \0 as a shortcut for \0x0?


\c[^H], for instance.  We can overload the \c notation to our heart's
desire, as long as we don't conflict with its use for named characters:

\c[GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI]

Very Cool.  (BTW, for those that don't follow Unicode, this means that 
everything matching /^[^A-Z ]$/ is fair game for us; Unicode limits 
charachter names to that to minimize chicken-and-egg problems.  We 
/probably/ shouldn't take anything in /^[A-Za-z ]$/, to allow people to 
say the much more readable "\c[Greek Capital Letter Omega with Pepperoni 
and Pineapple]".

: There is also the question of what the bracketed format does.  "Wide" 
: chars, e.g. for Unicode, seem appropriate only in hex.  But it would 
: seem useful to allow a bracketed form for the others that prevents 
: ambiguities:
: 
:"\o164" ne "\o{16}4"
:"\d100" ne "\d{10}0"
: 
: Whether that means you can actually specify wide chars in \o, \d, and 
: \b or it's just a disambiguification of the Latin-1 case is open to 
: question.

There ain't no such thing as a "wide" character.  \xff is exactly
the same character as \x[ff].
Which means that the only way to get a string with a literal 0xFF byte 
in it is with qq:u1[\xFF]? (Larry, I don't know that this has been 
mentioned before: is that right?)  chr:u1(0xFF) might do it too, but 
we're getting ahead of ourselves.

Also, an annoying corner case: is "\0x1ff" eq "\0x[1f]f", or is it eq 
"\0x[1ff]"?  What about other bases?  Is "\0x1x" eq "\0x[1]", or is it 
eq "\0x[1x]" (IE illegal).  (Now that I put those three questions 
together, the only reasonable answer seems to be that the number ends in 
the last place it's valid to end if you don't use explicit brackets.)

(BTW, in HTML and XML, numeric character escapes are decimal by default, 
you have to add a # for hex.  In windows and several other OSes (I 
think, I like to play with Unicode but have little actual use for it), 
ALT-0nnn is spelt in decimal only.  Decimal Unicode ordnals are 
fundimently flawed (since blocks are always on nice even hex numbers, 
but ugly decimal ones), but useful anyway).

	-=- James Mastros



Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-04 Thread Damian Conway
Larry wrote:


: But I think we'd definitely like to introduce \d.

Can't, unless we change \d to  in regexen.


Which we ought to be very wary of, given how very frequently it's
used in regexes.

Damian




Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-04 Thread Larry Wall
On Wed, Dec 04, 2002 at 11:38:35AM -0800, Michael Lazzaro wrote:
: We still need to verify whether we can have, in qq strings:
: 
:\033  - octal   (p5; deprecated but allowed in p6?)

I think it's disallowed.

:\o33  - octal   (p5)
:\x1b  - hex (p5)
:\d123 - decimal (?)
:\b1001- binary  (?)

Can't really have \d and \b if they keep their current regex meanings.
I think the general form is:

   \0o33  - octal
   \0x1b  - hex 
   \0d123 - decimal
   \0b1001- binary

\x and \o are then just shortcuts.

: and if so, if these are allowed too:
: 
:\o{777}   - (?)
:\x{1b}- "wide" hex  (p5)
:\d{123}   - (?)
:\b{1001}  - (?)

The general form could be

   \0o[33]  - octal
   \0x[1b]  - hex 
   \0d[123] - decimal
   \0b[1001]- binary

Or it could be

   \c[0o33]  - octal
   \c[0x1b]  - hex 
   \c[0d123] - decimal
   \c[0b1001]- binary

since \c is taking over \N's (rather ill-defined) duties.

: Note that \b conflicts with backspace.  I'd rather keep backspace than 
: binary, personally; I have yet to feel the need to call out a char in 
: binary.  :-)  Or we can make it dependent on the trailing digits, or 
: require the brackets, or require backspace to be spelt differently.

\c[^H], for instance.  We can overload the \c notation to our heart's
desire, as long as we don't conflict with its use for named characters:

\c[GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI]

: But I think we'd definitely like to introduce \d.

Can't, unless we change \d to  in regexen.

: There is also the question of what the bracketed format does.  "Wide" 
: chars, e.g. for Unicode, seem appropriate only in hex.  But it would 
: seem useful to allow a bracketed form for the others that prevents 
: ambiguities:
: 
:"\o164" ne "\o{16}4"
:"\d100" ne "\d{10}0"
: 
: Whether that means you can actually specify wide chars in \o, \d, and 
: \b or it's just a disambiguification of the Latin-1 case is open to 
: question.

There ain't no such thing as a "wide" character.  \xff is exactly
the same character as \x[ff].  A character in Perl is an abstract
codepoint number--how it's represented is of no concern to the
programmer (though it might be of concern to any interface to the
outside world, of course).  Do not think of Perl 6 strings as arrays
of bytes (except when they are (and probably not even then...)).

Larry



Re: Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-04 Thread Dave Whipp
"Michael Lazzaro" <[EMAIL PROTECTED]> wrote
> Note that \b conflicts with backspace.  I'd rather keep backspace than
> binary, personally; I have yet to feel the need to call out a char in
> binary.  :-)  Or we can make it dependent on the trailing digits, or
> require the brackets, or require backspace to be spelt differently.
>
> But I think we'd definitely like to introduce \d.
>

Our numeric literals use # for radix stuff. So perhaps we could use "\#..."
to introduce explicit codings:

 "\#d13"
 "\#h0d"
 "\#b1101"
 "\#{ 1<<6 - 20 * 2 - 9#1:2 }"

would all be synonyms!


Dave.

ps. how did this thread migrate from p6d to p6l?





Usage of \[oxdb] (was Re: String Literals, take 2)

2002-12-04 Thread Michael Lazzaro
We still need to verify whether we can have, in qq strings:

   \033  - octal   (p5; deprecated but allowed in p6?)
   \o33  - octal   (p5)
   \x1b  - hex (p5)
   \d123 - decimal (?)
   \b1001- binary  (?)

and if so, if these are allowed too:

   \o{777}   - (?)
   \x{1b}- "wide" hex  (p5)
   \d{123}   - (?)
   \b{1001}  - (?)

Only four of these nine constructs are allowed in Perl5.

Note that \b conflicts with backspace.  I'd rather keep backspace than 
binary, personally; I have yet to feel the need to call out a char in 
binary.  :-)  Or we can make it dependent on the trailing digits, or 
require the brackets, or require backspace to be spelt differently.

But I think we'd definitely like to introduce \d.

There is also the question of what the bracketed format does.  "Wide" 
chars, e.g. for Unicode, seem appropriate only in hex.  But it would 
seem useful to allow a bracketed form for the others that prevents 
ambiguities:

   "\o164" ne "\o{16}4"
   "\d100" ne "\d{10}0"

Whether that means you can actually specify wide chars in \o, \d, and 
\b or it's just a disambiguification of the Latin-1 case is open to 
question.

MikeL

*Not to be confused with an eigendisambiguification, of course.



Re: String Literals, take 2

2002-12-04 Thread Larry Wall
It's o, not c.

Larry



Re: String Literals, take 2

2002-12-04 Thread Luke Palmer
> Date: Tue, 03 Dec 2002 18:39:27 -0500
> From: James Mastros <[EMAIL PROTECTED]>
>
> Huh?  In that case, somebody should tell Angel Faus; "Numeric literals,
> take 3" says 0c777, and nobody disented.  IIRC, in fact, nobody's
> descented to 0c777 since it was first suggested.

Well, except Larry.  I remember him saying initially that it should be
0o777, not just in the most recent one.  I'm not much of a thread
scaveneger, so I can't point you to the message.

> > (But since I assume you can use \d, \b, \h anywhere you use \o, you 
> > won't have to use octal at all if you don't want to.)
> \d is pure speculation on my part.  (As is \0 == chr(0).)
> 
> p6l guys and the Design Team, if you havn't been following the
> conversation, here's how it goes:
> In perl5, octal numbers are specified as 0101 -- with a leading zero,
> and octal characters in strings are specified as "\0101".  In perl6, our
> current documentation lists 0c101 as being the new way to write octal
> numbers, because it lets people use leading zeros in numbers in an
> intuitive way, and 0o101 was decided to be too difficult to read.  The
> last writing of Larry to address this, as far as I (or anybody else who
> I've noticed) knows, says 0o101.
> 
> It's generaly been agreed on, I think, that 0c101 is the way to go.

I get a different impression.  I think it's generally a
non-controversial topic, and nobody really cares either way... aside
from you, perhaps.

> Now, we're working on string literals, and the question is how we write
> octal character literals.  The current writer of the string literal spec
> wants "\o101" to be the new way to write what is "\101" in perl5 (and
> C).  I'd prefer this to be "\c101", to match up with how the current doc
> says octal numerics are written.  Unfornatly, \c is taken for
> control-characters (ie "\c[" eq chr(ord '[' - 64) eq ESC), which is a
> more important use of \c.
> 
> What do we do, oh great and wonderful design team?
> 
> Numeric   StringUpsideDownside
> ---   ----
> 0101  \101  p5/C compatable   Unintutive
> 0o101 \o101 ConsistentHard to read

Not that I'm "great and wonderful design team," but this one is my
favorite.  I don't think 0o101 is terribly hard to read, and "o"
stands for "octal" a lot better than "c" does.  

That comes back in reading, too.  Once people figure out that's the
letter "o", and not a miniature zero, it will be perfectly clear what
is meant.  That's not true of "c".

Luke



Re: String Literals, take 2

2002-12-04 Thread James Mastros
On 12/03/2002 2:27 PM, Michael Lazzaro wrote:

I think we've been gravitating to a "language reference", geared 
primarily towards intermediate/advanced users.  Something much more 
rigorous than beginners would be comfortable with (since it defines 
things in much greater detail than beginners would need) and written to 
assume *no* prior knowledge of Perl5.  It will be useful to the 
developers -- in that it will describe required P6 behaviors in much 
greater detail than the Apocalypses and Exegesis -- but it will be 
written for users.
I quite agree... which still means we need more rigor then this document
has.  The defintion of a pair and the semantics of \c[ and friends is
important so that users know exactly what "\c~" means ('>',
C<> ), and if C<> will work (no,
those aren't a matched Pi/Pf or Pb/Pe pair, they're just Misc. Shapes
that have no direction information, and we can't do them reasonably
without looking at every character in Unicode visualy -- if somebody
wants to, be my guest!).


Do we want to change shorthand octal literal numbers to 0o123 (I 
don't like this, it's hard to read), change octal chars to \c123 
(can't do this without getting rid of, or changing,  \c for 
control-character), get rid of octal chars entirely, or somthing 
else?  (Baring a good "somthing else", I vote for killing octal chars.)
As of Larry's last writings, there will definitely be an octal (it still 
has good uses), and it's syntax will definitely be 0o777 -- with an 'o', 
not a 'c'.  The 'o' is a little hard to read, but the best anyone can 
come up with.  It has to be lowercase 'o', not uppercase 'O', which 
helps *enormously*.  :-)
Huh?  In that case, somebody should tell Angel Faus; "Numeric literals,
take 3" says 0c777, and nobody disented.  IIRC, in fact, nobody's
descented to 0c777 since it was first suggested.


(But since I assume you can use \d, \b, \h anywhere you use \o, you 
won't have to use octal at all if you don't want to.)
\d is pure speculation on my part.  (As is \0 == chr(0).)

In fact, for this, and \o777 vs. whatever, I'm cc-ing perl6-language on
this.


p6l guys and the Design Team, if you havn't been following the
conversation, here's how it goes:
In perl5, octal numbers are specified as 0101 -- with a leading zero,
and octal characters in strings are specified as "\0101".  In perl6, our
current documentation lists 0c101 as being the new way to write octal
numbers, because it lets people use leading zeros in numbers in an
intuitive way, and 0o101 was decided to be too difficult to read.  The
last writing of Larry to address this, as far as I (or anybody else who
I've noticed) knows, says 0o101.

It's generaly been agreed on, I think, that 0c101 is the way to go.

Now, we're working on string literals, and the question is how we write
octal character literals.  The current writer of the string literal spec
wants "\o101" to be the new way to write what is "\101" in perl5 (and
C).  I'd prefer this to be "\c101", to match up with how the current doc
says octal numerics are written.  Unfornatly, \c is taken for
control-characters (ie "\c[" eq chr(ord '[' - 64) eq ESC), which is a
more important use of \c.

What do we do, oh great and wonderful design team?

Numeric   StringUpsideDownside
---   ----
0101  \101  p5/C compatable   Unintutive
0o101 \o101 ConsistentHard to read
0c101 \o101 keeps \c for  Inconsistent
control-char
0c101 unsupported   Consistentoctal string chars
  unsupported
0t101 \t101 Consistentwhat's tab?

Or somthing else?
All choices are bad, which one is best?

	-=- James Mastros