Re: Usage of \[oxdb] (was Re: String Literals, take 2)
On 12/05/2002 12:18 PM, Michael Lazzaro wrote: On Thursday, December 5, 2002, at 02:11 AM, James Mastros wrote: On 12/04/2002 3:21 PM, Larry Wall wrote: \x and \o are then just shortcuts. Can we please also have \0 as a shortcut for \0x0? \0 in addition to \x, meaning the same thing? I think that would get us back to where we were with octal, wouldn't it? I'm not real keen on leading zero meaning anything, personally... :-P You misinterpret. I meant \0 meaning the same as \c[NUL], IE the same as chr(0), a null character. (I suppse I should have said \0x[0].) Which means that the only way to get a string with a literal 0xFF byte in it is with qq:u1[\xFF]? (Larry, I don't know that this has been mentioned before: is that right?) chr:u1(0xFF) might do it too, but we're getting ahead of ourselves. Hmm... does this matter? Sorry. It does, in fact, not matter... momentarly stopped thinking in terms of utf8 encoding being a completly transparent process.
Re: Usage of \[oxdb] (was Re: String Literals, take 2)
On Thu, Dec 05, 2002 at 09:18:21AM -0800, Michael Lazzaro wrote: : : On Thursday, December 5, 2002, at 02:11 AM, James Mastros wrote: : : >On 12/04/2002 3:21 PM, Larry Wall wrote: : >>\x and \o are then just shortcuts. : >Can we please also have \0 as a shortcut for \0x0? : : \0 in addition to \x, meaning the same thing? I think that would get : us back to where we were with octal, wouldn't it? I'm not real keen on : leading zero meaning anything, personally... :-P \0 still means chr(0). I don't think there's much conflict with the new \0x, \0o, \0b, and \0d, since \0 almost always occurs at the end of a string, if anywhere. : >>There ain't no such thing as a "wide" character. \xff is exactly : >>the same character as \x[ff]. : >Which means that the only way to get a string with a literal 0xFF byte : >in it is with qq:u1[\xFF]? (Larry, I don't know that this has been : >mentioned before: is that right?) chr:u1(0xFF) might do it too, but : >we're getting ahead of ourselves. : : Hmm... does this matter? I'm a bit rusty on my Unicode these days, but : I was assuming that \xFF and \x00FF always pointed to the same : character, and that you in fact _don't_ have the ability to put : individual bytes in a string, because Perl is deciding how to place the : characters for you (how long they should be, etc.) So if you wanted : more explicit control, you'd use C. A "byte" string is any string whose characters are all under 256. It's up to an interface to coerce this to actual bytes if it needs them. We'll presumably have something like "use bytes" that turns off all multi-byte processing, in which case you have to deal with any UTF that comes in by hand. But in general it'll be better if the interface coerces to types like "str8", which is presumably pronouced "straight". Don't ask me how str16 and str32 are pronounced. (But generally you should be using utf16 instead of str16 in any event, unless your interface truly doesn't know how to deal with surrogates.) In other words, str16 is the name of the obsolescent UCS-2, and str32 is the name for UCS-4, which is more or less the same as UTF-32, except that UTF-32 is not allowed to use the bits above 0x10. So anyway, we've got all these types: str8utf8 str16 utf16 str32 utf32 where the "str" version is essentially just a compact integer array. One could alias str8 to "latin1" since the default coercion from Unicode to str8 would have those semantics. It's not clear exactly what the bare "str" type is. "Str" is obviously the abstract string type, but "str" probably means the default C string type for the current architecture/OS/locale/whatever. In other words, it might be str8, or it might be utf8. Let's hope it's utf8, because that will work forever, give or take an eon. : >Also, an annoying corner case: is "\0x1ff" eq "\0x[1f]f", or is it eq : >"\0x[1ff]"? What about other bases? Is "\0x1x" eq "\0x[1]", or is it : >eq "\0x[1x]" (IE illegal). (Now that I put those three questions : >together, the only reasonable answer seems to be that the number ends : >in the last place it's valid to end if you don't use explicit : >brackets.) : : Yeah, my guess is that it's as you say... it goes till it can't goes no : more, but never gives an error (well, maybe for "\0xz", where there are : zero valid digits?) But I would suspect that the bracketed form is : *strongly* recommended. At least, that's what I plan on telling : people. :-) Sounds good to me. Dwimming is wonderful, but so is dwissing. Larry
Re: Usage of \[oxdb] (was Re: String Literals, take 2)
On Thursday, December 5, 2002, at 02:11 AM, James Mastros wrote: On 12/04/2002 3:21 PM, Larry Wall wrote: \x and \o are then just shortcuts. Can we please also have \0 as a shortcut for \0x0? \0 in addition to \x, meaning the same thing? I think that would get us back to where we were with octal, wouldn't it? I'm not real keen on leading zero meaning anything, personally... :-P There ain't no such thing as a "wide" character. \xff is exactly the same character as \x[ff]. Which means that the only way to get a string with a literal 0xFF byte in it is with qq:u1[\xFF]? (Larry, I don't know that this has been mentioned before: is that right?) chr:u1(0xFF) might do it too, but we're getting ahead of ourselves. Hmm... does this matter? I'm a bit rusty on my Unicode these days, but I was assuming that \xFF and \x00FF always pointed to the same character, and that you in fact _don't_ have the ability to put individual bytes in a string, because Perl is deciding how to place the characters for you (how long they should be, etc.) So if you wanted more explicit control, you'd use C. Also, an annoying corner case: is "\0x1ff" eq "\0x[1f]f", or is it eq "\0x[1ff]"? What about other bases? Is "\0x1x" eq "\0x[1]", or is it eq "\0x[1x]" (IE illegal). (Now that I put those three questions together, the only reasonable answer seems to be that the number ends in the last place it's valid to end if you don't use explicit brackets.) Yeah, my guess is that it's as you say... it goes till it can't goes no more, but never gives an error (well, maybe for "\0xz", where there are zero valid digits?) But I would suspect that the bracketed form is *strongly* recommended. At least, that's what I plan on telling people. :-) Design team: If we're wrong on these, please correct. :-) MikeL
Re: Usage of \[oxdb] (was Re: String Literals, take 2)
On 12/04/2002 3:21 PM, Larry Wall wrote: On Wed, Dec 04, 2002 at 11:38:35AM -0800, Michael Lazzaro wrote: : We still need to verify whether we can have, in qq strings: : :\033 - octal (p5; deprecated but allowed in p6?) I think it's disallowed. Thank the many gods ... or One True God, or Larry, or whatever your personal preference may be. ("So have a merry Christmas, Happy Hanukah, Kwazy Kwanzaa, a tip-top Tet, and a solemn, dignified Ramadan.") \0o33 - octal \0x1b - hex \0d123 - decimal \0b1001- binary \x and \o are then just shortcuts. Can we please also have \0 as a shortcut for \0x0? \c[^H], for instance. We can overload the \c notation to our heart's desire, as long as we don't conflict with its use for named characters: \c[GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI] Very Cool. (BTW, for those that don't follow Unicode, this means that everything matching /^[^A-Z ]$/ is fair game for us; Unicode limits charachter names to that to minimize chicken-and-egg problems. We /probably/ shouldn't take anything in /^[A-Za-z ]$/, to allow people to say the much more readable "\c[Greek Capital Letter Omega with Pepperoni and Pineapple]". : There is also the question of what the bracketed format does. "Wide" : chars, e.g. for Unicode, seem appropriate only in hex. But it would : seem useful to allow a bracketed form for the others that prevents : ambiguities: : :"\o164" ne "\o{16}4" :"\d100" ne "\d{10}0" : : Whether that means you can actually specify wide chars in \o, \d, and : \b or it's just a disambiguification of the Latin-1 case is open to : question. There ain't no such thing as a "wide" character. \xff is exactly the same character as \x[ff]. Which means that the only way to get a string with a literal 0xFF byte in it is with qq:u1[\xFF]? (Larry, I don't know that this has been mentioned before: is that right?) chr:u1(0xFF) might do it too, but we're getting ahead of ourselves. Also, an annoying corner case: is "\0x1ff" eq "\0x[1f]f", or is it eq "\0x[1ff]"? What about other bases? Is "\0x1x" eq "\0x[1]", or is it eq "\0x[1x]" (IE illegal). (Now that I put those three questions together, the only reasonable answer seems to be that the number ends in the last place it's valid to end if you don't use explicit brackets.) (BTW, in HTML and XML, numeric character escapes are decimal by default, you have to add a # for hex. In windows and several other OSes (I think, I like to play with Unicode but have little actual use for it), ALT-0nnn is spelt in decimal only. Decimal Unicode ordnals are fundimently flawed (since blocks are always on nice even hex numbers, but ugly decimal ones), but useful anyway). -=- James Mastros
Re: Usage of \[oxdb] (was Re: String Literals, take 2)
Larry wrote: : But I think we'd definitely like to introduce \d. Can't, unless we change \d to in regexen. Which we ought to be very wary of, given how very frequently it's used in regexes. Damian
Re: Usage of \[oxdb] (was Re: String Literals, take 2)
On Wed, Dec 04, 2002 at 11:38:35AM -0800, Michael Lazzaro wrote: : We still need to verify whether we can have, in qq strings: : :\033 - octal (p5; deprecated but allowed in p6?) I think it's disallowed. :\o33 - octal (p5) :\x1b - hex (p5) :\d123 - decimal (?) :\b1001- binary (?) Can't really have \d and \b if they keep their current regex meanings. I think the general form is: \0o33 - octal \0x1b - hex \0d123 - decimal \0b1001- binary \x and \o are then just shortcuts. : and if so, if these are allowed too: : :\o{777} - (?) :\x{1b}- "wide" hex (p5) :\d{123} - (?) :\b{1001} - (?) The general form could be \0o[33] - octal \0x[1b] - hex \0d[123] - decimal \0b[1001]- binary Or it could be \c[0o33] - octal \c[0x1b] - hex \c[0d123] - decimal \c[0b1001]- binary since \c is taking over \N's (rather ill-defined) duties. : Note that \b conflicts with backspace. I'd rather keep backspace than : binary, personally; I have yet to feel the need to call out a char in : binary. :-) Or we can make it dependent on the trailing digits, or : require the brackets, or require backspace to be spelt differently. \c[^H], for instance. We can overload the \c notation to our heart's desire, as long as we don't conflict with its use for named characters: \c[GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI] : But I think we'd definitely like to introduce \d. Can't, unless we change \d to in regexen. : There is also the question of what the bracketed format does. "Wide" : chars, e.g. for Unicode, seem appropriate only in hex. But it would : seem useful to allow a bracketed form for the others that prevents : ambiguities: : :"\o164" ne "\o{16}4" :"\d100" ne "\d{10}0" : : Whether that means you can actually specify wide chars in \o, \d, and : \b or it's just a disambiguification of the Latin-1 case is open to : question. There ain't no such thing as a "wide" character. \xff is exactly the same character as \x[ff]. A character in Perl is an abstract codepoint number--how it's represented is of no concern to the programmer (though it might be of concern to any interface to the outside world, of course). Do not think of Perl 6 strings as arrays of bytes (except when they are (and probably not even then...)). Larry
Re: Usage of \[oxdb] (was Re: String Literals, take 2)
"Michael Lazzaro" <[EMAIL PROTECTED]> wrote > Note that \b conflicts with backspace. I'd rather keep backspace than > binary, personally; I have yet to feel the need to call out a char in > binary. :-) Or we can make it dependent on the trailing digits, or > require the brackets, or require backspace to be spelt differently. > > But I think we'd definitely like to introduce \d. > Our numeric literals use # for radix stuff. So perhaps we could use "\#..." to introduce explicit codings: "\#d13" "\#h0d" "\#b1101" "\#{ 1<<6 - 20 * 2 - 9#1:2 }" would all be synonyms! Dave. ps. how did this thread migrate from p6d to p6l?
Usage of \[oxdb] (was Re: String Literals, take 2)
We still need to verify whether we can have, in qq strings: \033 - octal (p5; deprecated but allowed in p6?) \o33 - octal (p5) \x1b - hex (p5) \d123 - decimal (?) \b1001- binary (?) and if so, if these are allowed too: \o{777} - (?) \x{1b}- "wide" hex (p5) \d{123} - (?) \b{1001} - (?) Only four of these nine constructs are allowed in Perl5. Note that \b conflicts with backspace. I'd rather keep backspace than binary, personally; I have yet to feel the need to call out a char in binary. :-) Or we can make it dependent on the trailing digits, or require the brackets, or require backspace to be spelt differently. But I think we'd definitely like to introduce \d. There is also the question of what the bracketed format does. "Wide" chars, e.g. for Unicode, seem appropriate only in hex. But it would seem useful to allow a bracketed form for the others that prevents ambiguities: "\o164" ne "\o{16}4" "\d100" ne "\d{10}0" Whether that means you can actually specify wide chars in \o, \d, and \b or it's just a disambiguification of the Latin-1 case is open to question. MikeL *Not to be confused with an eigendisambiguification, of course.