On 18 Dec 00, at 15:21, Nick Ing-Simmons wrote:
There needs to be a hierachy of _repertoires_ such that:
ASCII is subset of Native is subset of wchar_t is subset of UNICODE.
But we can't even rely on that. I can imagine a couple of Native
encodings around that fiddle with ASCII (for
On Wed, Dec 20, 2000 at 11:07:39PM +, Nick Ing-Simmons wrote:
The snag is that there are common pairs
e.g. concat(utf8,ascii) / concat(ascii,utf8)
or
plus(NV,IV) / plus(IV,NV)
where it is possible to get "smart" when one arg is a "special case" of
the other.
And
Nicholas Clark [EMAIL PROTECTED] writes:
where it is possible to get "smart" when one arg is a "special case" of
the other.
And similarly numbers must be convertable to "complex long double" or
what ever is the top if the built-in tree ? (NV I guess - complex is
over-kill.)
It is the
On Thu, Dec 21, 2000 at 05:36:05PM +, Nick Ing-Simmons wrote:
Nicholas Clark [EMAIL PROTECTED] writes:
where it is possible to get "smart" when one arg is a "special case" of
the other.
And similarly numbers must be convertable to "complex long double" or
what ever is the top
David Mitchell [EMAIL PROTECTED] writes:
The problem is "what are the (types of) the arguments passed
I dont really see why types af args are (in general) a problem.
Hmm, you may be right at the level of your example, which may indeed
be typical of pp_(). Perhaps PerlIO is so bother some
On Tue, Dec 19, 2000 at 06:11:06PM +, David Mitchell wrote:
Since in real life the types of args are often the same, this will usually
be a win.
I found that you have to make an effort to make them the same, else generally
enough of them aren't that decision making code outweighs speed
Simon Cozens [EMAIL PROTECTED]
IMHO, the first thing we need to design and code is the API and runtime
library, since everything else builds on top of that, and we can design other
stuff in parallel with coding it. (A lot of it will be grunt work.)
Personally I feel that that string part of
David Mitchell [EMAIL PROTECTED] writes:
Personally I feel that that string part of the SV API should include most
(if not all) string functions, including regex matching and substitution.
What are string functions in your view?
m//
s///
join()
substr
index
lc, lcfirst, ...
| ~
Simon Cozens [EMAIL PROTECTED] writes:
So, before we start even thinking about what we need, it's time to look at the
vexed question of string representation. How do we do Unicode without getting
into the horrendous non-Latin1 cockups we're seeing on p5p right now?
Well - my theorist's answer
On Mon, Dec 18, 2000 at 02:43:14PM +, Nick Ing-Simmons wrote:
David Mitchell [EMAIL PROTECTED] writes:
Personally I feel that that string part of the SV API should include most
(if not all) string functions, including regex matching and substitution.
[list of potential string
On Sun, 17 Dec 2000, Dan Sugalski wrote:
I'm thinking for speed that binary and UTF-32 should be our internal
representations, at least for the data that gets handed to the regex
engine. Or at least we use a constant-width character that's 8 and 32 bits,
if I'm misusing UTF-32. (UTF-8 is
ustom string types.
I would argue one does that by making the regex API more modular.
Quite possibly, but once having split it into separate components, I
might then make the case that certain of those components could be
implemented as vtable ops (eg those components that are sensitive to
t
On Mon, Dec 18, 2000 at 10:30:53AM -0500, Philip Newton wrote:
On Sat, 16 Dec 2000, Jarkko Hietaniemi wrote:
On Fri, Dec 15, 2000 at 03:10:16PM -0500, Dan Sugalski wrote:
At 11:18 AM 12/15/00 -0600, Jarkko Hietaniemi wrote:
As painful as it may sound (codingwise) I would urge to
On Fri, Dec 15, 2000 at 11:18:00AM -0600, Jarkko Hietaniemi wrote:
As painful as it may sound (codingwise) I would urge to spare some
thought to using (internally) UTF-32 for those encodings for which
UTF-8 would be *longer* than the UTF-32 (mainly the Asian scripts).
most CPUs can load a 32
David Mitchell [EMAIL PROTECTED] writes:
Personally I would not use such a beast
But with different encodings implemented by different SV types - each with their
own vtable - surely most of this will "come out in the wash", by the correct
method automatically being called. I thought that was
Nicholas Clark [EMAIL PROTECTED] writes:
On Fri, Dec 15, 2000 at 11:18:00AM -0600, Jarkko Hietaniemi wrote:
As painful as it may sound (codingwise) I would urge to spare some
thought to using (internally) UTF-32 for those encodings for which
UTF-8 would be *longer* than the UTF-32 (mainly the
David Mitchell [EMAIL PROTECTED] writes:
Nick Ing-Simmons [EMAIL PROTECTED] wrote:
What are string functions in your view?
m//
s///
join()
substr
index
lc, lcfirst, ...
| ~
++
vec
'.'
'.='
It rapidly gets out of hand.
Perhaps, but consider that somewhere
Jarkko Hietaniemi [EMAIL PROTECTED] writes:
On Mon, Dec 18, 2000 at 03:21:05PM +, Nick Ing-Simmons wrote:
Simon Cozens [EMAIL PROTECTED] writes:
So, before we start even thinking about what we need, it's time to look at the
vexed question of string representation. How do we do Unicode
As I pointed out on p5p even EBCDIC machines can use that model - but
the downside is that ord('A') == 65 which will breaks backward compatibility
with EBCDIC scripts.
Maybe we need $ENV{PERL_ENCODING} to control ord() and chr(), too?
That was my suggestion last week some time -
other stuff in parallel with coding it. (A lot of it will be grunt work.)
So, before we start even thinking about what we need, it's time to look at
the vexed question of string representation. How do we do Unicode without
getting into the horrendous non-Latin1 cockups we're seeing on p5p right
and runtime
library, since everything else builds on top of that, and we can
design
other
stuff in parallel with coding it. (A lot of it will be grunt work.)
So, before we start even thinking about what we need, it's time to
look
at the
vexed question of string
at the
vexed question of string representation. How do we do Unicode without getting
into the horrendous non-Latin1 cockups we're seeing on p5p right now? Larry
suggested aeons ago that everything is an array of numbers, and Perl shouldn't
care what those numbers represent. But at some point, it has
we start even thinking about what we need, it's time to look at the
vexed question of string representation. How do we do Unicode without getting
into the horrendous non-Latin1 cockups we're seeing on p5p right now? Larry
As painful as it may sound (codingwise) I would urge to spare some
thought
in parallel with coding it. (A lot of it will be grunt work.)
So, before we start even thinking about what we need, it's time to look at the
vexed question of string representation. How do we do Unicode without getting
into the horrendous non-Latin1 cockups we're seeing on p5p right now? Larry
24 matches
Mail list logo