[PATCH] ARGV in P0

2002-01-31 Thread Brent Dax

The patch below places the contents of argv into P0.  At the moment it
has the name of the script file in question in P0[0]; I haven't yet
decided if this is to be construed as a feature or a bug.  ;^)

A little test script to see that this is working right:

set I0, P0
set I1, 0

print start\n

LOOP:
ge I1, I0, OUT
set S0, P0, I1

print  
print I1
print : 
print S0
print \n

inc I1
branch LOOP

OUT:
print done\n
end

--Brent Dax
[EMAIL PROTECTED]
Parrot Configure pumpking and regex hacker
Check out the Parrot FAQ: http://www.panix.com/~ziggy/parrot.html (no,
it's not mine)

obra . hawt sysadmin chx0rs
lathos This is sad. I know of *a* hawt sysamin chx0r.
obra I know more than a few.
lathos obra: There are two? Are you sure it's not the same one?


--- /parrot-cvs/embed.c Wed Jan 30 06:41:34 2002
+++ /parrot/embed.c Thu Jan 31 01:49:26 2002
@@ -139,6 +139,9 @@

 void
 Parrot_runcode(struct Parrot_Interp *interpreter, int argc, char
*argv[]) {
+INTVAL i;
+PMC* userargv;
+
 if(interpreter-flags  PARROT_DEBUG_FLAG) {
 fprintf(stderr, Parrot VM: Debugging enabled.\n);

@@ -159,6 +162,24 @@
 exit(1);
 }
 #endif
+
+if(interpreter-flags  PARROT_DEBUG_FLAG) {
+fprintf(stderr, Parrot VM: Setting up ARGV array in P0.
Current argc: %d\n, argc);
+}
+
+userargv=pmc_new(interpreter, enum_class_PerlArray);
+
+for(i=0; i  argc; i++) {
+if(interpreter-flags  PARROT_DEBUG_FLAG) {
+fprintf(stderr, \t%d: %s\n, i, argv[i]);
+}
+
+userargv-vtable-set_string_index(interpreter, userargv,
+string_make(interpreter, argv[i], strlen(argv[i]), 0, 0,
0), i
+);
+}
+
+interpreter-pmc_reg.registers[0]=userargv;

 runops(interpreter, interpreter-code, 0);

--- /parrot-cvs/test_main.c Thu Jan 31 01:18:18 2002
+++ /parrot/test_main.c Thu Jan 31 01:45:20 2002
@@ -96,7 +96,7 @@
 }

 OUT:
-(*argc)--;
+
 return (*(argv++))[0];
 }





Re: Apoc 4: The skip keyword

2002-01-31 Thread Tomas Cerha

skip was uncomfortable when I read it (I at first took it to mean
skip over the following rather than skip to the following), but
I find nobreak also a bit strange.  How about proceed?
 
 If we mean fall-through, why invent a new term? Why not use the
 intent: Cfall_through?


Wow, keyword with underscore. I like proceed much better.

Tomas.




Re: parrot rx engine

2002-01-31 Thread Peter Haworth

On Wed, 30 Jan 2002 17:45:58 +, Graham Barr wrote:
 On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote:
  # rx_setprops P0, i, 2
  # branch $start0
  # $advance:
  # rx_advance P0, $fail
  # $start0:
  # rx_literal P0, a, $advance
  #
  # First, we set the rx engine to case-insensitive. Why is that bad? It's
  # setting a runtime property for what should be compile-time
  # unicode-character-kung-fu. Assuming your CPU knows what the gritty
  # details of unicode in the first place just feels wrong, but I digress.
  
  That i does a once-off case-folding operation on the target string.
  All other input to the engine MUST already be case-folded for speed.
 
 Hm, is that going to work ? What about a rx like /^a(?i:b)C/ where the
 case insensitivity only applies to part of the pattern ?

Or worse, in /^a(b)c/i, where you want to capture the original character,
not the case-folded version?

-- 
Peter Haworth   [EMAIL PROTECTED]
The term `Internet' has the meaning given that term in
 section 230(f)(1) of the Communications Act of 1934.
-- H.R. 3028, Trademark Cyberpiracy Prevention Act



strings: sequence-of-integer ... list of chunks

2002-01-31 Thread Tim Bunce

On Wed, Jan 30, 2002 at 10:47:36AM -0800, Larry Wall wrote:
 
 For various reasons, some of which relate to the sequence-of-integer
 abstraction, and some of which relate to infinite strings and arrays,
 I think Perl 6 strings are likely to be represented by a list of
 chunks, where each chunk is a sequence of integers of the same size or
 representation, but different chunks can have different integer sizes
 or representations.  The abstract string interface must hide this from
 any module that wishes to work at the abstract string level.  In
 particular, it must hide this from the regex engine, which works on
 pure sequences in the abstract.

I hope someone volunteers to start looking into implementing that soon
(if no one has already).

Tim.



RE: parrot rx engine

2002-01-31 Thread Brent Dax

Peter Haworth:
# On Wed, 30 Jan 2002 17:45:58 +, Graham Barr wrote:
#  On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote:
#   # rx_setprops P0, i, 2
#   # branch $start0
#   # $advance:
#   # rx_advance P0, $fail
#   # $start0:
#   # rx_literal P0, a, $advance
#   #
#   # First, we set the rx engine to case-insensitive. Why is
# that bad? It's
#   # setting a runtime property for what should be compile-time
#   # unicode-character-kung-fu. Assuming your CPU knows
# what the gritty
#   # details of unicode in the first place just feels wrong,
# but I digress.
#  
#   That i does a once-off case-folding operation on the
# target string.
#   All other input to the engine MUST already be case-folded
# for speed.
# 
#  Hm, is that going to work ? What about a rx like
# /^a(?i:b)C/ where the
#  case insensitivity only applies to part of the pattern ?
#
# Or worse, in /^a(b)c/i, where you want to capture the
# original character,
# not the case-folded version?

Parentheses just record a pair of indices, not a string.

--Brent Dax
[EMAIL PROTECTED]
Parrot Configure pumpking and regex hacker
Check out the Parrot FAQ: http://www.panix.com/~ziggy/parrot.html (no,
it's not mine)

obra . hawt sysadmin chx0rs
lathos This is sad. I know of *a* hawt sysamin chx0r.
obra I know more than a few.
lathos obra: There are two? Are you sure it's not the same one?




Re: [PATCH] ARGV in P0

2002-01-31 Thread Dan Sugalski

At 2:00 AM -0800 1/31/02, Brent Dax wrote:
The patch below places the contents of argv into P0.  At the moment it
has the name of the script file in question in P0[0]; I haven't yet
decided if this is to be construed as a feature or a bug.  ;^)

Probably a bug, but in the specification.
-- 
 Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: strings: sequence-of-integer ... list of chunks

2002-01-31 Thread Dan Sugalski

At 2:49 PM + 1/31/02, Tim Bunce wrote:
On Wed, Jan 30, 2002 at 10:47:36AM -0800, Larry Wall wrote:

  For various reasons, some of which relate to the sequence-of-integer
  abstraction, and some of which relate to infinite strings and arrays,
  I think Perl 6 strings are likely to be represented by a list of
  chunks, where each chunk is a sequence of integers of the same size or
  representation, but different chunks can have different integer sizes
  or representations.  The abstract string interface must hide this from
  any module that wishes to work at the abstract string level.  In
  particular, it must hide this from the regex engine, which works on
  pure sequences in the abstract.

I hope someone volunteers to start looking into implementing that soon
(if no one has already).

Yup, in progress.

There is an issue of time--what do we do, for example, in the case:

my $pi = Pi::Generate;
if ($pi =~ /[a-z]) {
  print There's a letter in here!\n;
}

if Pi::Generate returns a generator object that will calculate pi for 
you to however far you want, that regex will run forever or until it 
runs out of memory, whichever comes first.
-- 
 Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: parrot rx engine

2002-01-31 Thread Graham Barr

On Thu, Jan 31, 2002 at 08:54:21AM -0800, Brent Dax wrote:
 Peter Haworth:
 # On Wed, 30 Jan 2002 17:45:58 +, Graham Barr wrote:
 #  On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote:
 #   # rx_setprops P0, i, 2
 #   # branch $start0
 #   # $advance:
 #   # rx_advance P0, $fail
 #   # $start0:
 #   # rx_literal P0, a, $advance
 #   #
 #   # First, we set the rx engine to case-insensitive. Why is
 # that bad? It's
 #   # setting a runtime property for what should be compile-time
 #   # unicode-character-kung-fu. Assuming your CPU knows
 # what the gritty
 #   # details of unicode in the first place just feels wrong,
 # but I digress.
 #  
 #   That i does a once-off case-folding operation on the
 # target string.
 #   All other input to the engine MUST already be case-folded
 # for speed.
 # 
 #  Hm, is that going to work ? What about a rx like
 # /^a(?i:b)C/ where the
 #  case insensitivity only applies to part of the pattern ?
 #
 # Or worse, in /^a(b)c/i, where you want to capture the
 # original character,
 # not the case-folded version?
 
 Parentheses just record a pair of indices, not a string.

Yes, I was assuming that. However what is to be gained by case
folding the input string ?

Because parts of an rx can be case-insensitive while other parts
are case-sensitive, we will probably need two sorts of ops anyway
(or a way to tell the op to be case-insensitive).  And you will
only be able to do the case folding when the whole rx is case-insensitive.

It also means creating a copy of the input string, which is something
the current rx engine in perl5 tries to avoid. And while I will agree
that it is often faster todo lc($str) =~ /.../ than $str =~ /.../i
that is normally only the case for small-ish strings.

Graham.




Re: Jit on Solaris: using dis instead of objdump?

2002-01-31 Thread Andy Dougherty

On Wed, 30 Jan 2002, Jason Gloudon wrote:

On Wed, Jan 30, 2002 at 03:27:18PM -0500, Andy Dougherty wrote:
 objdump.  Is anyone with a Solaris system familiar enough with jit
 internals to have a go at adapting it to use dis instead of GNU objdump?

The difference was pretty minimal. It should work with 'dis'.

It doesn't.  (If it had, I would have posted a patch allowing 'dis'
instead of 'objdump' instead of asking for help.  Sorry I hadn't made
it clear originally that I had already tried the simple stuff.)

Today, I note that sun4-solaris.pm now always uses 'dis', and
sun4Generic.pm has been changed a bit.  However, it still doesn't
work.  I get

perl-cc jit2h.pl sun4  include/parrot/jit_struct.h
/usr/ccs/bin/as: t.s, line 2: error: invalid character (0x40)
as t.s failed at lib/Parrot/Jit/sun4Generic.pm line 164, IN line 10.
*** Error code 1

(Note that I'm using Sun's assembler.  That may be the difference.)

-- 
Andy Dougherty  [EMAIL PROTECTED]




Re: strings: sequence-of-integer ... list of chunks

2002-01-31 Thread Tim Bunce

On Thu, Jan 31, 2002 at 12:18:28PM -0500, Dan Sugalski wrote:
 At 2:49 PM + 1/31/02, Tim Bunce wrote:
 On Wed, Jan 30, 2002 at 10:47:36AM -0800, Larry Wall wrote:
 
   For various reasons, some of which relate to the sequence-of-integer
   abstraction, and some of which relate to infinite strings and arrays,
   I think Perl 6 strings are likely to be represented by a list of
   chunks, where each chunk is a sequence of integers of the same size or
   representation, but different chunks can have different integer sizes
   or representations.  The abstract string interface must hide this from
   any module that wishes to work at the abstract string level.  In
   particular, it must hide this from the regex engine, which works on
   pure sequences in the abstract.
 
 I hope someone volunteers to start looking into implementing that soon
 (if no one has already).
 
 Yup, in progress.
 
 There is an issue of time--what do we do, for example, in the case:
 
 my $pi = Pi::Generate;
 if ($pi =~ /[a-z]) {
   print There's a letter in here!\n;
 }
 
 if Pi::Generate returns a generator object that will calculate pi for 
 you to however far you want, that regex will run forever or until it 
 runs out of memory, whichever comes first.

Right. So don't do that.

:-)

Tim.



Re: strings: sequence-of-integer ... list of chunks

2002-01-31 Thread Dan Sugalski

At 5:34 PM + 1/31/02, Tim Bunce wrote:
On Thu, Jan 31, 2002 at 12:18:28PM -0500, Dan Sugalski wrote:
  At 2:49 PM + 1/31/02, Tim Bunce wrote:
  On Wed, Jan 30, 2002 at 10:47:36AM -0800, Larry Wall wrote:
  
For various reasons, some of which relate to the sequence-of-integer
abstraction, and some of which relate to infinite strings and arrays,
I think Perl 6 strings are likely to be represented by a list of
chunks, where each chunk is a sequence of integers of the same size or
representation, but different chunks can have different integer sizes
or representations.  The abstract string interface must hide this from
any module that wishes to work at the abstract string level.  In
particular, it must hide this from the regex engine, which works on
pure sequences in the abstract.
  
  I hope someone volunteers to start looking into implementing that soon
  (if no one has already).

  Yup, in progress.

  There is an issue of time--what do we do, for example, in the case:

  my $pi = Pi::Generate;
  if ($pi =~ /[a-z]) {
print There's a letter in here!\n;
  }

  if Pi::Generate returns a generator object that will calculate pi for
  you to however far you want, that regex will run forever or until it
  runs out of memory, whichever comes first.

Right. So don't do that.

:-)

Oh, sure, *be* sensible. Sheesh, some people... :)
-- 
 Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: strings: sequence-of-integer ... list of chunks

2002-01-31 Thread Alex Gough

On Thu, 31 Jan 2002, Dan Sugalski wrote:

 At 2:49 PM + 1/31/02, Tim Bunce wrote:
 On Wed, Jan 30, 2002 at 10:47:36AM -0800, Larry Wall wrote:
 
   For various reasons, some of which relate to the sequence-of-integer
   abstraction, and some of which relate to infinite strings and arrays,

 I hope someone volunteers to start looking into implementing that soon
 (if no one has already).

 Yup, in progress.

 There is an issue of time--what do we do, for example, in the case:

 my $pi = Pi::Generate;
 if ($pi =~ /[a-z]) {
   print There's a letter in here!\n;
 }

 if Pi::Generate returns a generator object that will calculate pi for
 you to however far you want, that regex will run forever or until it
 runs out of memory, whichever comes first.

We simply guarantee that Perl will always give you enough rope to hang
yourself, you just need to ask nicely.

Alex Gough




Re: parrot rx engine

2002-01-31 Thread Tim Bunce

On Thu, Jan 31, 2002 at 05:15:49PM +, Graham Barr wrote:
 
 Yes, I was assuming that. However what is to be gained by case
 folding the input string ?
 
 Because parts of an rx can be case-insensitive while other parts
 are case-sensitive, we will probably need two sorts of ops anyway
 (or a way to tell the op to be case-insensitive).  And you will
 only be able to do the case folding when the whole rx is case-insensitive.

(Two sorts of ops makes most sense to me as the case-insensitive op
will need to know about fiddly charset conversion stuff whereas the
case-sensitive can just work with the list-of-integers abstraction.)

 It also means creating a copy of the input string, which is something
 the current rx engine in perl5 tries to avoid. And while I will agree
 that it is often faster todo lc($str) =~ /.../ than $str =~ /.../i
 that is normally only the case for small-ish strings.

Agreed on all counts.

Especially as the perl6 rx engine will have to be able to work directly on
non-trivial things like streams and generators ans suchlike.

Tim [who's not really paying attention].



Re: Jit on Solaris: using dis instead of objdump?

2002-01-31 Thread Jason Gloudon


This should make solaris 'as' happy. There will be an assembler warning, but
it's harmless.


diff -r1.3 sun4Generic.pm
78c78
 return Parrot::Jit-Assemble(ld [\%o0], \%o0\njmpl \%o0, \%g0\n);
---
 return Parrot::Jit-Assemble(ld [\%o0], \%o0\njmpl \%o0, \%g0\nnop\n);
151c151
 .typemain,@function
---
 .typemain,#function


-- 
Jason



[PATCH] no need to rebuild everything all the time

2002-01-31 Thread Nicholas Clark

Dependencies in the Makefile are currently too broad brush.
I don't enjoy waiting for everything to recompile every time I try to tweak
the jit. The only file that #includes jit_struct.h is jit.c, so I feel
that the Makefile dependencies should reflect this, and not cause a
gratuitous recompile of everything.
There are probably other auto-generated header files that world+dog should
not depend on.

Nicholas Clark
-- 
EMCFT http://www.ccl4.org/~nick/CV.html

--- Makefile.in~Wed Jan 30 10:31:28 2002
+++ Makefile.in Thu Jan 31 18:54:57 2002
@@ -57,13 +57,14 @@
 #
 ###
 
-H_FILES = $(INC)/config.h $(INC)/exceptions.h $(INC)/io.h $(INC)/op.h \
+GENERAL_H_FILES = $(INC)/config.h $(INC)/exceptions.h $(INC)/io.h $(INC)/op.h \
 $(INC)/register.h $(INC)/string.h $(INC)/events.h $(INC)/interpreter.h \
 $(INC)/memory.h $(INC)/parrot.h $(INC)/stacks.h $(INC)/packfile.h \
 $(INC)/global_setup.h $(INC)/vtable.h $(INC)/oplib/core_ops.h 
$(INC)/oplib/core_ops_prederef.h \
 $(INC)/runops_cores.h $(INC)/trace.h \
 $(INC)/pmc.h $(INC)/key.h $(INC)/resources.h $(INC)/platform.h \
-$(INC)/interp_guts.h ${jit_h} ${jit_struct_h} $(INC)/rx.h $(INC)/rxstacks.h 
$(INC)/embed.h
+$(INC)/interp_guts.h ${jit_h} $(INC)/rx.h $(INC)/rxstacks.h $(INC)/embed.h
+ALL_H_FILES = $(GENERAL_H_FILES) ${jit_struct_h}
 
 CLASS_O_FILES = classes/default$(O) classes/array$(O) \
 classes/perlint$(O) classes/perlstring$(O) classes/perlnum$(O) \
@@ -207,7 +208,7 @@
 #
 ###
 
-test_main$(O): test_main.c $(H_FILES)
+test_main$(O): test_main.c $(GENERAL_H_FILES)
 
 lib/Parrot/Jit.pm: lib/Parrot/Jit/${jitarchname}.pm 
lib/Parrot/Jit/${jitcpuarch}Generic.pm
$(PERL) -MFile::Copy=cp -e ${PQ}cp q|lib/Parrot/Jit/${jitarchname}.pm|, 
q|lib/Parrot/Jit.pm|${PQ}
@@ -261,70 +262,70 @@
 #
 ###
 
-global_setup$(O): $(H_FILES)
+global_setup$(O): $(GENERAL_H_FILES)
 
-pmc$(O): $(H_FILES)
+pmc$(O): $(GENERAL_H_FILES)
 
-jit$(O): $(H_FILES)
+jit$(O): $(GENERAL_H_FILES) ${jit_struct_h}
 
-key$(O): $(H_FILES)
+key$(O): $(GENERAL_H_FILES)
 
-resources$(O): $(H_FILES)
+resources$(O): $(GENERAL_H_FILES)
 
-platform$(O): $(H_FILES)
+platform$(O): $(GENERAL_H_FILES)
 
-string$(O): $(H_FILES)
+string$(O): $(GENERAL_H_FILES)
 
-chartype$(O): $(H_FILES)
+chartype$(O): $(GENERAL_H_FILES)
 
-encoding$(O): $(H_FILES)
+encoding$(O): $(GENERAL_H_FILES)
 
-chartype/usascii$(O): $(H_FILES)
+chartype/usascii$(O): $(GENERAL_H_FILES)
 
-chartype/unicode$(O): $(H_FILES)
+chartype/unicode$(O): $(GENERAL_H_FILES)
 
-exceptions$(O): $(H_FILES)
+exceptions$(O): $(GENERAL_H_FILES)
 
-encoding/singlebyte$(O): $(H_FILES)
+encoding/singlebyte$(O): $(GENERAL_H_FILES)
 
-encoding/utf8$(O): $(H_FILES)
+encoding/utf8$(O): $(GENERAL_H_FILES)
 
-encoding/utf16$(O): $(H_FILES)
+encoding/utf16$(O): $(GENERAL_H_FILES)
 
-encoding/utf32$(O): $(H_FILES)
+encoding/utf32$(O): $(GENERAL_H_FILES)
 
-interpreter$(O): interpreter.c $(H_FILES)
+interpreter$(O): interpreter.c $(GENERAL_H_FILES)
 
-io/io$(O): $(H_FILES)
+io/io$(O): $(GENERAL_H_FILES)
 
-io/io_stdio$(O): $(H_FILES)
+io/io_stdio$(O): $(GENERAL_H_FILES)
 
-io/io_unix$(O): $(H_FILES)
+io/io_unix$(O): $(GENERAL_H_FILES)
 
-io/io_win32$(O): $(H_FILES)
+io/io_win32$(O): $(GENERAL_H_FILES)
 
-memory$(O): $(H_FILES)
+memory$(O): $(GENERAL_H_FILES)
 
-packfile$(O): $(H_FILES)
+packfile$(O): $(GENERAL_H_FILES)
 
-parrot$(O): $(H_FILES)
+parrot$(O): $(GENERAL_H_FILES)
 
-register$(O): $(H_FILES)
+register$(O): $(GENERAL_H_FILES)
 
-rx$(O): $(H_FILES)
+rx$(O): $(GENERAL_H_FILES)
 
-rxstacks$(O): $(H_FILES)
+rxstacks$(O): $(GENERAL_H_FILES)
 
-stacks$(O): $(H_FILES)
+stacks$(O): $(GENERAL_H_FILES)
 
-embed$(O): $(H_FILES)
+embed$(O): $(GENERAL_H_FILES)
 
-core_ops$(O): $(H_FILES) core_ops.c
+core_ops$(O): $(GENERAL_H_FILES) core_ops.c
 
 core_ops.c $(INC)/oplib/core_ops.h: $(OPS_FILES) ops2c.pl lib/Parrot/OpsFile.pm 
lib/Parrot/Op.pm
$(PERL) ops2c.pl C $(OPS_FILES)
 
-core_ops_prederef$(O): $(H_FILES) core_ops_prederef.c
+core_ops_prederef$(O): $(GENERAL_H_FILES) core_ops_prederef.c
 
 core_ops_prederef.c $(INC)/oplib/core_ops_prederef.h: $(OPS_FILES) ops2c.pl 
lib/Parrot/OpsFile.pm lib/Parrot/Op.pm
$(PERL) ops2c.pl CPrederef $(OPS_FILES)



RE: parrot rx engine

2002-01-31 Thread Hong Zhang

 Because parts of an rx can be case-insensitive while other parts
 are case-sensitive, we will probably need two sorts of ops anyway
 (or a way to tell the op to be case-insensitive).  And you will
 only be able to do the case folding when the whole rx is 
 case-insensitive.

I don't like your suggestion. I think we should have one set of
ops, but two input strings: one is the original, the other is case-
folded. Rx chooses the right one depending on the current 
case-sensitivity. 2 regex opcodes will be used for this purpose,
op-case-sensitive-start and op-case-insensitive-start. The opcode
will switch strings begins, ends, positions etc.

 It also means creating a copy of the input string, which is something
 the current rx engine in perl5 tries to avoid. And while I will agree
 that it is often faster todo lc($str) =~ /.../ than $str =~ /.../i
 that is normally only the case for small-ish strings.

I don't think the perl5 approach is the best choice. Unicode case folding
is much much more expensive than malloc/free. And we can always use
per-thread free list, unless the regex is nested or the string is very
big, we don't need to allocate any memory.

Hong



Re: parrot rx engine

2002-01-31 Thread Graham Barr

On Thu, Jan 31, 2002 at 11:18:58AM -0800, Hong Zhang wrote:
  Because parts of an rx can be case-insensitive while other parts
  are case-sensitive, we will probably need two sorts of ops anyway
  (or a way to tell the op to be case-insensitive).  And you will
  only be able to do the case folding when the whole rx is 
  case-insensitive.
 
 I don't like your suggestion. I think we should have one set of
 ops, but two input strings: one is the original, the other is case-
 folded. Rx chooses the right one depending on the current 
 case-sensitivity. 2 regex opcodes will be used for this purpose,
 op-case-sensitive-start and op-case-insensitive-start. The opcode
 will switch strings begins, ends, positions etc.
 
  It also means creating a copy of the input string, which is something
  the current rx engine in perl5 tries to avoid. And while I will agree
  that it is often faster todo lc($str) =~ /.../ than $str =~ /.../i
  that is normally only the case for small-ish strings.
 
 I don't think the perl5 approach is the best choice. Unicode case folding
 is much much more expensive than malloc/free. And we can always use
 per-thread free list, unless the regex is nested or the string is very
 big, we don't need to allocate any memory.

But as you say, case folding is expensive. And with this approach you
are going to case-fold every string that is matched against an rx
that has some part of it that is case-insensitive.

The case-folding should be done in the rx itself, at compile time if possible.
Then it is only done once, which will save a lot of time if the rx happens
to be used in a loop or something.

Graham.




Re: [PATCH] no need to rebuild everything all the time [APPLIED]

2002-01-31 Thread Dan Sugalski

At 7:04 PM + 1/31/02, Nicholas Clark wrote:

Dependencies in the Makefile are currently too broad brush.
I don't enjoy waiting for everything to recompile every time I try to tweak
the jit. The only file that #includes jit_struct.h is jit.c, so I feel
that the Makefile dependencies should reflect this, and not cause a
gratuitous recompile of everything.
There are probably other auto-generated header files that world+dog should
not depend on.

Applied, thanks.

-- 
 Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



RE: parrot rx engine

2002-01-31 Thread Hong Zhang

 But as you say, case folding is expensive. And with this approach you
 are going to case-fold every string that is matched against an rx
 that has some part of it that is case-insensitive.

That is correct in general. But regex compiler can be smarter than that.
For example, rx should optimize /a+/i to /[aA]+/ to avoid case-folding.
If it is too difficult for rx to do case-folding, I think it is better
to use some normalizer to do full-case folding.

 The case-folding should be done in the rx itself, at compile time if
possible.
 Then it is only done once, which will save a lot of time if the rx happens
 to be used in a loop or something.

The regular expression itself is case-folded at compile time. But I am
talking about input string here, not re.

Hong



RE: parrot rx engine

2002-01-31 Thread Brent Dax

Tim Bunce:
# On Thu, Jan 31, 2002 at 05:15:49PM +, Graham Barr wrote:
# 
#  Yes, I was assuming that. However what is to be gained by case
#  folding the input string ?
# 
#  Because parts of an rx can be case-insensitive while other parts
#  are case-sensitive, we will probably need two sorts of ops anyway
#  (or a way to tell the op to be case-insensitive).  And you will
#  only be able to do the case folding when the whole rx is
# case-insensitive.
#
# (Two sorts of ops makes most sense to me as the case-insensitive op
# will need to know about fiddly charset conversion stuff whereas the
# case-sensitive can just work with the list-of-integers abstraction.)
#
#  It also means creating a copy of the input string, which is
# something
#  the current rx engine in perl5 tries to avoid. And while I
# will agree
#  that it is often faster todo lc($str) =~ /.../ than $str =~ /.../i
#  that is normally only the case for small-ish strings.
#
# Agreed on all counts.
#
# Especially as the perl6 rx engine will have to be able to
# work directly on
# non-trivial things like streams and generators ans suchlike.

I have a suggestion similar to the ops suggestion but more flexible:
Regex vtables.

We'd probably need three:

-normal text match
-case-folded text match
-generic sequence match (the stuff Larry's been talking about)

This could probably be implemented without too much difficulty.

Let me know if I'm brilliant, on crack, or both with this idea.

--Brent Dax
[EMAIL PROTECTED]
Parrot Configure pumpking and regex hacker
Check out the Parrot FAQ: http://www.panix.com/~ziggy/parrot.html (no,
it's not mine)

obra . hawt sysadmin chx0rs
lathos This is sad. I know of *a* hawt sysamin chx0r.
obra I know more than a few.
lathos obra: There are two? Are you sure it's not the same one?




[BUG] Makefile assumes . is in my PATH

2002-01-31 Thread Nicholas Clark

$ echo $PATH
/home/nick/bin:/home/nick/bin:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/usr/local/bin:/sbin:/usr/sbin:/usr/local/sbin

$ make mopstest
cd examples  cd assembly  make mops.pbc PERL=perl5.7.2-i386-freebsd  cd ..  cd 
..
perl5.7.2-i386-freebsd -I../../lib ../../assemble.pl mops.pasm  mops.pbc
test_parrot examples/assembly/mops.pbc
test_parrot:No such file or directory
*** Error code 1

Stop in /stuff/parrot/play-jit.

$ ./test_parrot examples/assembly/mops.pbc
Iterations:1
Estimated ops: 2
Elapsed time:  76.224550
M op/s:2.623827

happy (but slow). Or for more speed:

$ ./test_parrot -j examples/assembly/mops.pbc
Iterations:1
Estimated ops: 2
Elapsed time:  4.011099
M op/s:49.861645

I can't work out a portable non-hacky way to add the ./ on Unix.
No, I'm not going to add . to my PATH.

Nicholas Clark
-- 
EMCFT http://www.ccl4.org/~nick/CV.html



RE: parrot rx engine

2002-01-31 Thread Ashley Winters

--- Brent Dax [EMAIL PROTECTED] wrote:
 Tim Bunce:
 # On Thu, Jan 31, 2002 at 05:15:49PM +, Graham Barr wrote:
 #
 # Especially as the perl6 rx engine will have to be able to
 # work directly on
 # non-trivial things like streams and generators ans suchlike.
 
 I have a suggestion similar to the ops suggestion but more flexible:
 Regex vtables.
 
 We'd probably need three:
 
   -normal text match
   -case-folded text match
   -generic sequence match (the stuff Larry's been talking about)

Hmm... based on what I've read in Larry's message and the unicode spec,
some of this could be spirited away into a customizable and/or chained
unicode string iterator.

For instance, it (the iterator) could return case-folded (or not)
characters, it could convert  pairs into Ps/Pe quote pairs (for code
parsers) and remove comments (yay), and it could return locale-based
graphemes (I'm scared). Since graphemes at least will be
multi-character in some locales, I see how my objection to rx_literal
was a Bad Thing. And I expect to be able to write a grapheme-sending
unicode string iterator and plug it into a regex and have it DWIM in my
Distant $future, right?

Perhaps the backtracking mechanism should be *in* the iterator? Maybe
the iterator will be the home of some locale evil? Could we make it
handle locale character-ranges [a-o'] too? Okay, that last one's a bit
much. Still, Larry did mention that business with generalized
backtracking and bookkeeping... I can't wait for Apocalypse 5.

Is there a custom iterator syntax/convention in parrot?

I hope I don't give Larry any *new* scary ideas for the next
apocalypse. This is just for entertainment purposes, after all
/disclaimer.

Ashley the Zealot

__
Do You Yahoo!?
Great stuff seeking new owners in Yahoo! Auctions! 
http://auctions.yahoo.com



Re: parrot rx engine

2002-01-31 Thread Tim Bunce

On Thu, Jan 31, 2002 at 12:50:52PM -0800, Brent Dax wrote:
 
 Let me know if I'm brilliant, on crack, or both with this idea.

I've no idea :-)

Tim.



ARM JIT (just about)

2002-01-31 Thread Nicholas Clark

This just about implements a jit for ARM. It doesn't actually do any ops in
assembler yet, except for end. It's names on the basis that it's for v3 or
later instructions. (I may have all the names slightly wonky, but IIRC v3
is ARM600 and later cores. StrongARM and ARM8 are v4, but the machine I've
got has other hardware that won't cope with the half word loads that v4
brings.) Strictly it's something like little endian, APCS 32, ARM v3
[is it even APCS-R? (Arm Procedure Call Standard). As it's using a frame
 pointer does that mean there's more that should be in the name?
 Not that gdb thinks that I got the frame pointer correct]

Would it be useful for parrot to be able to use the 32*32=64 bit multiply
instructions that come in post ARM v3?

Problems that I remember that I encountered. (Comments in the code may
indicate more). Part of these were understanding things - it doesn't mean
that the current way is wrong, just that it wasn't obvious to me :-(

1: '}' is a necessary character in ARM assembler syntax, so jit2h.pl needs
   to be a bit smarter about deciding when to chop the end of a function

2: There is no terse way to load arbitrary 32 bit constants into a register
   with ARM instructions. There are 2 usual methods
   1: Put the constant in a constant pool within +- 4092 or so bytes of the
  PC, and load it with an offset from the PC.
   2: Make it with 1, 2 or 3 instructions. I believe that currently it is
  conjectured that it is possible to make any 32 bit value with 3 ARM
  instructions, and so far no-one has found any value that they couldn't
  make, but no-one has proved it possible and thereby made an algorithm
  that lets a program generate instructions to build a constant

   Either way, I found I was fighting the current jit which expects (at worst)
   to be able to split a 32 bit constant into 2 (possibly unequal) halves
   stored in two machine instructions. To be more flexible jit would need to
   know what some CPU registers contain (ie things like the current
   interpreter pointer), and be able to choose whether to get a value or
   pointer by arithmetic from a CPU register, by deferencing a CPU register
   (possibly with offset) or by giving up and loading a constant

   This will make more sense to anyone who gets hold of an ARM machine and
   then tries to write ops :-)

3: I wanted to put the pointer to the current interpreter in r7. This made
   the default precompiled call function have its branch somewhere wonky.
   It seems to me that Parrot::Jit-call should be returning a 2 item list
   the  bytecode, and the offset of the branching instruction in there.

4: I think in a RISC way, so expect the offset to be of the start of the
   instruction that needs butchering, not the byte within it. (How the sparc
   position was expressed confused me for a while).

it's a slow beast, particularly with -g:

$ ./test_parrot  examples/assembly/mops.pbc   
Iterations:1
Estimated ops: 2
Elapsed time:  109.129854
M op/s:1.832679

This was the first working jit, with Fix_cpcf_call() as
  ldr r0, [r0]
  mov pc, r0

Iterations:1
Estimated ops: 2
Elapsed time:  65.109552
M op/s:3.071746

This is the slightly faster jit, with Fix_cpcf_call() as ldr pc, [r0]

Iterations:1
Estimated ops: 2
Elapsed time:  60.948834
M op/s:3.281441
Segmentation fault

Which dmesg reports as:

test_parrot: unhandled page fault at pc=0x, lr=0x (bad 
address=0x, code 0)

and I think it may be the irritating hardware bug care of Digital's
engineers' mistake in the early StrongARMs which causes problems on page
faults that load PC.

Anyway, it's not very tested, but it seems that just binning the runops loop
gets a 75% speedup. :-)

**Beware** - I've no idea if loading the addresses of registers actually
works. The .pm code is still from sun4Generic.pm

Nicholas Clark
-- 
EMCFT http://www.ccl4.org/~nick/CV.html

--- include/parrot/jit.h~   Tue Jan 29 14:05:45 2002
+++ include/parrot/jit.hThu Jan 31 16:52:40 2002
@@ -22,6 +22,10 @@
 static void write_32(char *instr_end, ptrcast_t value);
 typedef void (*jit_f)(void *int_reg, void *num_reg, void *str_reg);
 #endif
+#ifdef ARMV3
+typedef void (*jit_f)(void *int_reg, void *num_reg, void *str_reg,
+  void *cur_interpreter);
+#endif
 
 
 #define MAX_SUBSTITUTION 3
--- /dev/null   Mon Jul 16 22:57:44 2001
+++ jit/armv3/core.jit  Thu Jan 31 16:53:24 2002
@@ -0,0 +1,15 @@
+;
+;   armv3_core.jit 
+;
+; $Id:  $
+;
+
+Parrot_end {
+   ldmea   fp, {r4, r5, r6, r7, fp, sp, pc}
+}
+
+Parrot_noop {
+# Seems that as recognises this and assembles mov r0, r0 for a nop.
+   nop
+}
+
--- /dev/null   Mon Jul 16 22:57:44 2001
+++ lib/Parrot/Jit/armv3-linux.pm   Thu Jan 31 23:36:03 2002
@@ -0,0 +1,30 @@
+#
+# Parrot::Jit;
+#
+# $Id: $
+#
+
+package Parrot::Jit;
+
+use base qw(Parrot::Jit::armv3Generic);
+

Re: strings: sequence-of-integer ... list of chunks

2002-01-31 Thread Dave Storrs



On Thu, 31 Jan 2002, Dan Sugalski wrote:

 There is an issue of time--what do we do, for example, in the case:

 my $pi = Pi::Generate;
 if ($pi =~ /[a-z]) {
   print There's a letter in here!\n;
 }

 if Pi::Generate returns a generator object that will calculate pi for
 you to however far you want, that regex will run forever or until it
 runs out of memory, whichever comes first.
 --


Just a thought...the following would be *really* cool:

 my $pi = Pi::Generate;

 # Check the first 200 characters only; halt w/success if NO match
 print There's a letter in here!\n  if ($pi =~ /[a-z]/h200t);

 # Check the first 200 characters only; halt w/failure if NO match
 print There's a letter in here!\n  if ($pi =~ /[a-z]/h200f);


This would be useful for cases where you might be dealing with
infinite data, or when you are only going to need to use the first section
of a string.

Dave





Re: strings: sequence-of-integer ... list of chunks

2002-01-31 Thread Bryan C. Warnock

On Thursday 31 January 2002 21:03, Dave Storrs wrote:

   Just a thought...the following would be *really* cool:

  my $pi = Pi::Generate;

  # Check the first 200 characters only; halt w/success if NO match
  print There's a letter in here!\n  if ($pi =~ /[a-z]/h200t);

print There's a letter in here!\n if ($pi !~ /^.{0,199}?[a-z]/);


  # Check the first 200 characters only; halt w/failure if NO match
  print There's a letter in here!\n  if ($pi =~ /[a-z]/h200f);

print There's a letter in here!\n if ($pi =~ /^.{0,199}?[a-z]/);



   This would be useful for cases where you might be dealing with
 infinite data, or when you are only going to need to use the first section
 of a string.

substr, maybe?

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: strings: sequence-of-integer ... list of chunks

2002-01-31 Thread Bryan C. Warnock

On Thursday 31 January 2002 22:03, Bryan C. Warnock wrote:

junk.  Too tired, I missed the point entirely.  

 On Thursday 31 January 2002 21:03, Dave Storrs wrote:
  Just a thought...the following would be *really* cool:
 
   my $pi = Pi::Generate;
 
   # Check the first 200 characters only; halt w/success if NO match
   print There's a letter in here!\n  if ($pi =~ /[a-z]/h200t);

 print There's a letter in here!\n if ($pi !~ /^.{0,199}?[a-z]/);

print There's a letter in here!\n if (substr($pi, 0, 200) !~ /[a-z]/);


   # Check the first 200 characters only; halt w/failure if NO match
   print There's a letter in here!\n  if ($pi =~ /[a-z]/h200f);

 print There's a letter in here!\n if ($pi =~ /^.{0,199}?[a-z]/);

print There's a letter in here!\n if (substr($pi, 0, 200) =~ /[a-z]/);


  This would be useful for cases where you might be dealing with
  infinite data, or when you are only going to need to use the first
  section of a string.

 substr, maybe?

I said it, but didn't do it.   (My first examples weren't extensible past a 
single letter.)  

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: [COMMIT] PerlArray fixes

2002-01-31 Thread Melvin Smith


2 - Add the PMC type to the array and hash indices

Poke poke. :)

This would be useful, anyone working on this in near term?

Also, just curious how do we plan to unify the get_index_* stuff to one 
function?
Returning a PMC instead of specific type?

-Melvin