In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote:
> utf8 and utf16 are both variable length encodings for space reasons. > There's not much reason to space-compact something then expand the heck out > of it. On the other hand, I'd really, *really* rather not have Unicode > constants in anything other than UTF-32, so I'd as soon we chopped out the > utf-8 and utf-16 constant support from this. > > A should be the prefix for US-ASCII characters. > U should be the prefix for Unicode characters > N should be the prefix for the native character set (and the default) > > Beyond that I'm not sure what, if anything, we should accommodate in the > assembler. Attached is a patch to drop the U8, U16 and U32 prefixes and add U and N prefixes. I havn't added the A prefix because I'm still not clear what encoding those are supposed to map to. I can understand the following mappings: N => enc_native U => enc_utf32 but what is A supposed to map to exactly? or is the assembler supposed to mangle an A string into an N or U string and then put it in the bytecode in one of those formats? Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Index: Assembler.pm =================================================================== RCS file: /home/perlcvs/parrot/Parrot/Assembler.pm,v retrieving revision 1.8 diff -u -w -r1.8 Assembler.pm --- Assembler.pm 2001/10/09 02:45:36 1.8 +++ Assembler.pm 2001/10/09 21:25:28 @@ -279,7 +279,7 @@ =cut -my %encodings=('' => 0, 'U8' => 1, 'U16' => 2, 'U32' => 3); +my %encodings=('' => 0, 'N' => 0, 'U' => 3); my %opcodes = Parrot::Opcode::read_ops( -f "../opcode_table" ? "../opcode_table" : "opcode_table" ); @@ -662,7 +662,7 @@ sub replace_string_constants { my $code = shift; - $code =~ s/(U(?:8|16|32))?\"([^\\\"]*(?:\\.[^\\\"]*)*)\"/constantize_string($2,$1)/eg; + $code =~ s/([NU])?\"([^\\\"]*(?:\\.[^\\\"]*)*)\"/constantize_string($2,$1)/eg; return $code; }