Re: String representation

2000-12-21 Thread Philip Newton
On 18 Dec 00, at 15:21, Nick Ing-Simmons wrote: There needs to be a hierachy of _repertoires_ such that: ASCII is subset of Native is subset of wchar_t is subset of UNICODE. But we can't even rely on that. I can imagine a couple of Native encodings around that fiddle with ASCII (for

Re: String representation

2000-12-21 Thread Nicholas Clark
On Wed, Dec 20, 2000 at 11:07:39PM +, Nick Ing-Simmons wrote: The snag is that there are common pairs e.g. concat(utf8,ascii) / concat(ascii,utf8) or plus(NV,IV) / plus(IV,NV) where it is possible to get "smart" when one arg is a "special case" of the other. And

Re: String representation

2000-12-21 Thread Nick Ing-Simmons
Nicholas Clark [EMAIL PROTECTED] writes: where it is possible to get "smart" when one arg is a "special case" of the other. And similarly numbers must be convertable to "complex long double" or what ever is the top if the built-in tree ? (NV I guess - complex is over-kill.) It is the

Re: String representation

2000-12-21 Thread Nicholas Clark
On Thu, Dec 21, 2000 at 05:36:05PM +, Nick Ing-Simmons wrote: Nicholas Clark [EMAIL PROTECTED] writes: where it is possible to get "smart" when one arg is a "special case" of the other. And similarly numbers must be convertable to "complex long double" or what ever is the top

Re: String representation

2000-12-20 Thread Nick Ing-Simmons
David Mitchell [EMAIL PROTECTED] writes: The problem is "what are the (types of) the arguments passed I dont really see why types af args are (in general) a problem. Hmm, you may be right at the level of your example, which may indeed be typical of pp_(). Perhaps PerlIO is so bother some

Re: String representation

2000-12-19 Thread Nicholas Clark
On Tue, Dec 19, 2000 at 06:11:06PM +, David Mitchell wrote: Since in real life the types of args are often the same, this will usually be a win. I found that you have to make an effort to make them the same, else generally enough of them aren't that decision making code outweighs speed

Re: String representation

2000-12-18 Thread David Mitchell
Simon Cozens [EMAIL PROTECTED] IMHO, the first thing we need to design and code is the API and runtime library, since everything else builds on top of that, and we can design other stuff in parallel with coding it. (A lot of it will be grunt work.) Personally I feel that that string part of

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
David Mitchell [EMAIL PROTECTED] writes: Personally I feel that that string part of the SV API should include most (if not all) string functions, including regex matching and substitution. What are string functions in your view? m// s/// join() substr index lc, lcfirst, ... | ~

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
Simon Cozens [EMAIL PROTECTED] writes: So, before we start even thinking about what we need, it's time to look at the vexed question of string representation. How do we do Unicode without getting into the horrendous non-Latin1 cockups we're seeing on p5p right now? Well - my theorist's answer

Re: String representation

2000-12-18 Thread Nicholas Clark
On Mon, Dec 18, 2000 at 02:43:14PM +, Nick Ing-Simmons wrote: David Mitchell [EMAIL PROTECTED] writes: Personally I feel that that string part of the SV API should include most (if not all) string functions, including regex matching and substitution. [list of potential string

Re: String representation

2000-12-18 Thread Philip Newton
On Sun, 17 Dec 2000, Dan Sugalski wrote: I'm thinking for speed that binary and UTF-32 should be our internal representations, at least for the data that gets handed to the regex engine. Or at least we use a constant-width character that's 8 and 32 bits, if I'm misusing UTF-32. (UTF-8 is

Re: String representation

2000-12-18 Thread David Mitchell
ustom string types. I would argue one does that by making the regex API more modular. Quite possibly, but once having split it into separate components, I might then make the case that certain of those components could be implemented as vtable ops (eg those components that are sensitive to t

Re: String representation

2000-12-18 Thread Jarkko Hietaniemi
On Mon, Dec 18, 2000 at 10:30:53AM -0500, Philip Newton wrote: On Sat, 16 Dec 2000, Jarkko Hietaniemi wrote: On Fri, Dec 15, 2000 at 03:10:16PM -0500, Dan Sugalski wrote: At 11:18 AM 12/15/00 -0600, Jarkko Hietaniemi wrote: As painful as it may sound (codingwise) I would urge to

Re: String representation

2000-12-18 Thread Nicholas Clark
On Fri, Dec 15, 2000 at 11:18:00AM -0600, Jarkko Hietaniemi wrote: As painful as it may sound (codingwise) I would urge to spare some thought to using (internally) UTF-32 for those encodings for which UTF-8 would be *longer* than the UTF-32 (mainly the Asian scripts). most CPUs can load a 32

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
David Mitchell [EMAIL PROTECTED] writes: Personally I would not use such a beast But with different encodings implemented by different SV types - each with their own vtable - surely most of this will "come out in the wash", by the correct method automatically being called. I thought that was

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
Nicholas Clark [EMAIL PROTECTED] writes: On Fri, Dec 15, 2000 at 11:18:00AM -0600, Jarkko Hietaniemi wrote: As painful as it may sound (codingwise) I would urge to spare some thought to using (internally) UTF-32 for those encodings for which UTF-8 would be *longer* than the UTF-32 (mainly the

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
David Mitchell [EMAIL PROTECTED] writes: Nick Ing-Simmons [EMAIL PROTECTED] wrote: What are string functions in your view? m// s/// join() substr index lc, lcfirst, ... | ~ ++ vec '.' '.=' It rapidly gets out of hand. Perhaps, but consider that somewhere

Re: String representation

2000-12-18 Thread Nick Ing-Simmons
Jarkko Hietaniemi [EMAIL PROTECTED] writes: On Mon, Dec 18, 2000 at 03:21:05PM +, Nick Ing-Simmons wrote: Simon Cozens [EMAIL PROTECTED] writes: So, before we start even thinking about what we need, it's time to look at the vexed question of string representation. How do we do Unicode

Re: String representation

2000-12-18 Thread Jarkko Hietaniemi
As I pointed out on p5p even EBCDIC machines can use that model - but the downside is that ord('A') == 65 which will breaks backward compatibility with EBCDIC scripts. Maybe we need $ENV{PERL_ENCODING} to control ord() and chr(), too? That was my suggestion last week some time -

Re: String representation

2000-12-18 Thread Kai Henningsen
other stuff in parallel with coding it. (A lot of it will be grunt work.) So, before we start even thinking about what we need, it's time to look at the vexed question of string representation. How do we do Unicode without getting into the horrendous non-Latin1 cockups we're seeing on p5p right

Re: String representation

2000-12-17 Thread Dan Sugalski
and runtime library, since everything else builds on top of that, and we can design other stuff in parallel with coding it. (A lot of it will be grunt work.) So, before we start even thinking about what we need, it's time to look at the vexed question of string

String representation

2000-12-15 Thread Simon Cozens
at the vexed question of string representation. How do we do Unicode without getting into the horrendous non-Latin1 cockups we're seeing on p5p right now? Larry suggested aeons ago that everything is an array of numbers, and Perl shouldn't care what those numbers represent. But at some point, it has

Re: String representation

2000-12-15 Thread Jarkko Hietaniemi
we start even thinking about what we need, it's time to look at the vexed question of string representation. How do we do Unicode without getting into the horrendous non-Latin1 cockups we're seeing on p5p right now? Larry As painful as it may sound (codingwise) I would urge to spare some thought

Re: String representation

2000-12-15 Thread Jarkko Hietaniemi
in parallel with coding it. (A lot of it will be grunt work.) So, before we start even thinking about what we need, it's time to look at the vexed question of string representation. How do we do Unicode without getting into the horrendous non-Latin1 cockups we're seeing on p5p right now? Larry