Re: String representation

2000-12-18 Thread David Mitchell
Simon Cozens [EMAIL PROTECTED] IMHO, the first thing we need to design and code is the API and runtime library, since everything else builds on top of that, and we can design other stuff in parallel with coding it. (A lot of it will be grunt work.) Personally I feel that that string part of

Re: Opcodes (was Re: The external interface for the parser piece)

2000-12-18 Thread David Mitchell
Dan Sugalski [EMAIL PROTECTED] wrote: At 06:05 PM 12/12/00 +, David Mitchell wrote: Also, some of the standard perumations would also need to do some re-invoking, eg ($int - $num) would invoke Int-sub[NUM](sv1,sv2,0), which itself would just fall through to Num-sub[INT](sv2,sv1,1) -

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
David Mitchell [EMAIL PROTECTED] writes: Personally I feel that that string part of the SV API should include most (if not all) string functions, including regex matching and substitution. What are string functions in your view? m// s/// join() substr index lc, lcfirst, ... | ~

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
Simon Cozens [EMAIL PROTECTED] writes: So, before we start even thinking about what we need, it's time to look at the vexed question of string representation. How do we do Unicode without getting into the horrendous non-Latin1 cockups we're seeing on p5p right now? Well - my theorist's answer

Re: String representation

2000-12-18 Thread Nicholas Clark
On Mon, Dec 18, 2000 at 02:43:14PM +, Nick Ing-Simmons wrote: David Mitchell [EMAIL PROTECTED] writes: Personally I feel that that string part of the SV API should include most (if not all) string functions, including regex matching and substitution. [list of potential string

Re: String representation

2000-12-18 Thread Philip Newton
On Sun, 17 Dec 2000, Dan Sugalski wrote: I'm thinking for speed that binary and UTF-32 should be our internal representations, at least for the data that gets handed to the regex engine. Or at least we use a constant-width character that's 8 and 32 bits, if I'm misusing UTF-32. (UTF-8 is

Re: String representation

2000-12-18 Thread David Mitchell
Nick Ing-Simmons [EMAIL PROTECTED] wrote: What are string functions in your view? m// s/// join() substr index lc, lcfirst, ... | ~ ++ vec '.' '.=' It rapidly gets out of hand. Perhaps, but consider that somewhere within the perl internals there have to be

Re: String representation

2000-12-18 Thread Jarkko Hietaniemi
On Mon, Dec 18, 2000 at 10:30:53AM -0500, Philip Newton wrote: On Sat, 16 Dec 2000, Jarkko Hietaniemi wrote: On Fri, Dec 15, 2000 at 03:10:16PM -0500, Dan Sugalski wrote: At 11:18 AM 12/15/00 -0600, Jarkko Hietaniemi wrote: As painful as it may sound (codingwise) I would urge to

Re: String representation

2000-12-18 Thread Nicholas Clark
On Fri, Dec 15, 2000 at 11:18:00AM -0600, Jarkko Hietaniemi wrote: As painful as it may sound (codingwise) I would urge to spare some thought to using (internally) UTF-32 for those encodings for which UTF-8 would be *longer* than the UTF-32 (mainly the Asian scripts). most CPUs can load a 32

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
David Mitchell [EMAIL PROTECTED] writes: Personally I would not use such a beast But with different encodings implemented by different SV types - each with their own vtable - surely most of this will "come out in the wash", by the correct method automatically being called. I thought that was

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
Nicholas Clark [EMAIL PROTECTED] writes: On Fri, Dec 15, 2000 at 11:18:00AM -0600, Jarkko Hietaniemi wrote: As painful as it may sound (codingwise) I would urge to spare some thought to using (internally) UTF-32 for those encodings for which UTF-8 would be *longer* than the UTF-32 (mainly the

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
David Mitchell [EMAIL PROTECTED] writes: Nick Ing-Simmons [EMAIL PROTECTED] wrote: What are string functions in your view? m// s/// join() substr index lc, lcfirst, ... | ~ ++ vec '.' '.=' It rapidly gets out of hand. Perhaps, but consider that somewhere

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
Jarkko Hietaniemi [EMAIL PROTECTED] writes: On Mon, Dec 18, 2000 at 03:21:05PM +, Nick Ing-Simmons wrote: Simon Cozens [EMAIL PROTECTED] writes: So, before we start even thinking about what we need, it's time to look at the vexed question of string representation. How do we do Unicode

Re: String representation

2000-12-18 Thread Jarkko Hietaniemi
As I pointed out on p5p even EBCDIC machines can use that model - but the downside is that ord('A') == 65 which will breaks backward compatibility with EBCDIC scripts. Maybe we need $ENV{PERL_ENCODING} to control ord() and chr(), too? That was my suggestion last week some time -

Re: String representation

2000-12-18 Thread Kai Henningsen
[EMAIL PROTECTED] (Jarkko Hietaniemi) wrote on 15.12.00 in [EMAIL PROTECTED]: On Fri, Dec 15, 2000 at 12:13:01PM +, Simon Cozens wrote: IMHO, the first thing we need to design and code is the API and runtime library, since everything else builds on top of that, and we can design