[PATCH] MANIFEST update

2002-01-21 Thread Simon Glover


 Please, people, if you create new files, remember to add them to the
 MANIFEST.

 Simon

--- MANIFEST.oldMon Jan 21 12:17:34 2002
+++ MANIFESTMon Jan 21 12:18:47 2002
@@ -75,6 +75,7 @@
 examples/assembly/call.pasm
 examples/assembly/euclid.pasm
 examples/assembly/fact.pasm
+examples/assembly/io1.pasm
 examples/assembly/jump.pasm
 examples/assembly/life.pasm
 examples/assembly/local_label.pasm
@@ -119,6 +120,7 @@
 include/parrot/resources.h
 include/parrot/runops_cores.h
 include/parrot/rx.h
+include/parrot/rxstacks.h
 include/parrot/stacks.h
 include/parrot/string.h
 include/parrot/trace.h
@@ -202,6 +204,7 @@
 runops_cores.c
 rx.c
 rx.ops
+rxstacks.c
 stacks.c
 string.c
 t/harness
 




Re: [PATCH] MANIFEST update

2002-01-21 Thread Melvin Smith

At 12:21 PM 1/21/2002 +, Simon Glover wrote:

>  Please, people, if you create new files, remember to add them to the
>  MANIFEST.
>
>  Simon
>
>--- MANIFEST.oldMon Jan 21 12:17:34 2002
>+++ MANIFESTMon Jan 21 12:18:47 2002
>@@ -75,6 +75,7 @@
>  examples/assembly/call.pasm
>  examples/assembly/euclid.pasm
>  examples/assembly/fact.pasm
>+examples/assembly/io1.pasm

Don't ask me how it didn't get committed, its in my copy.

-Melvin






[Possible PATCH] IO ops docs

2002-01-21 Thread Simon Glover


 While you're online: now that you've split the io ops into their
 own separate file, their documentation isn't going to core_ops.pod 
 any more. The enclosed patch fixes this by autogenerating io_ops.pod
 in the same fashion that core_ops.pod is generated, but I'm not sure
 whether this is the right thing to do - do we want every ops lib to have
 separate documentation, or should we just keep all of the documentation
 in one place, in a single file?

 Simon

--- Makefile.oldMon Jan 21 12:34:36 2002
+++ MakefileMon Jan 21 12:35:05 2002
@@ -1,7 +1,7 @@
 PERL = perl
 RM_F = rm -f
 
-all: packfile-c.pod packfile-perl.pod core_ops.pod
+all: packfile-c.pod packfile-perl.pod core_ops.pod io_ops.pod
 
 packfile-c.pod: ../packfile.c
perldoc -u ../packfile.c > packfile-c.pod
@@ -11,6 +11,9 @@
 
 core_ops.pod: ../core.ops
perldoc -u ../core.ops > core_ops.pod
+
+io_ops.pod: ../io.ops
+   perldoc -u ../io.ops > io_ops.pod
 
 clean:
$(RM_F) packfile-c.pod packfile-perl.pod
 




Re: [Possible PATCH] IO ops docs

2002-01-21 Thread Simon Glover


 If you decide to apply the last patch, you should probably apply this
 one as well, so that people know about the new file. If not, then junk
 'em both.

 Simon

--- parrot.pod.old  Mon Jan 21 12:56:15 2002
+++ parrot.pod  Mon Jan 21 12:57:11 2002
@@ -31,6 +31,10 @@
 
 A description of the core operations in the Parrot assembly language.
 
+=item F
+
+A description of the operations used in Parrot's IO subsystem.
+
 =item F
 
 The master list of Parrot assembly operations; not all of these have




Re: [Possible PATCH] IO ops docs

2002-01-21 Thread Melvin Smith

At 12:54 PM 1/21/2002 +, Simon Glover wrote:

>  While you're online: now that you've split the io ops into their
>  own separate file, their documentation isn't going to core_ops.pod
>  any more. The enclosed patch fixes this by autogenerating io_ops.pod
>  in the same fashion that core_ops.pod is generated, but I'm not sure
>  whether this is the right thing to do - do we want every ops lib to have
>  separate documentation, or should we just keep all of the documentation
>  in one place, in a single file?

My personal feeling is that this makes sense (seperate pod), since
they are sort of an "API" compared to the core ops. I'll see what rest of the
guys say first, then probably apply it.

As far as IO ops, right now they are implemented as inline ops but eventually
they will be replaced by method calls on the IO object and won't
show up in the core (except maybe some bootstrap print/printerr/readline, 
etc.) ..

At least this is the way I see it, opinions may vary.

-Melvin




[PATCH] harness just the tests you want

2002-01-21 Thread Nicholas Clark

À la perl 5, it can be useful just to run 1 test script under the harness.

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- t/harness.orig  Wed Jan  2 19:19:09 2002
+++ t/harness   Mon Jan 21 11:46:54 2002
@@ -1,7 +1,9 @@
 #! perl -w
+# $Id: $
 
 use strict;
 use Test::Harness qw(runtests);
 
-my @tests = map { glob( "t/$_/*.t" ) } ( qw(op) );
+# Pass in a list of tests to run on the command line, else run all the tests.
+my @tests = @ARGV ? @ARGV : map { glob( "t/$_/*.t" ) } ( qw(op) );
 runtests(@tests);



[PATCH] Parrot::Assembler pod clean-up

2002-01-21 Thread Simon Glover


 Enclosed patch fixes the POD brokenness in Parrot::Assembler reported
 by Steve Fink, and generally makes it more aesthetically pleasing.

 I've also supplied the missing documentation for the 
 constantize_number and constantize_integer functions - could someone
 who knows check that I've explained them correctly?

 Also enclosed is a small patch to running.pod to remove the reference
 to the brokenness.

 Simon

--- running.pod.old Mon Jan 21 15:44:20 2002
+++ running.pod Mon Jan 21 15:46:08 2002
@@ -13,8 +13,9 @@
 
   assemble.pl foo.pasm > foo.pbc
 
-Usage information: no usage message available. There is some amount of
-malformed POD visible by running C.
+Usage information: no usage message available. Documentation for the
+C module, around which C is a wrapper,
+can be viewed by running C.
 
 =item C
 
--- Assembler.pm.oldMon Jan 21 14:05:23 2002
+++ Assembler.pmMon Jan 21 15:40:27 2002
@@ -67,6 +67,7 @@
 output_listing() if $options{'listing'};
 exit 0;
 
+=cut
 
 ###
 ###
@@ -85,6 +86,7 @@
 my $pf = $asm->assemble($code);
 exit $interp->run($pf);
 
+=cut
 
 ###
 ###
@@ -105,8 +107,8 @@
 
 =head2 %type_to_suffix
 
-type_to_suffix is used to change from an argument type to the suffix that
-would be used in the name of the function that contained that argument.
+This is used to change from an argument type to the suffix that would be 
+used in the name of the function that contained that argument.
 
 =cut
 
@@ -120,26 +122,26 @@
 
 =head2 @program
 
-@program will hold an array ref for each line in the program. Each array ref
-will contain:
+This holds an array ref for each line in the program. Each array ref
+contains: 
 
 =over 4
 
 =item 1
 
-The file name in which the source line was found
+The file name in which the source line was found.
 
 =item 2
 
-The line number in the file of the source line
+The line number in the file of the source line.
 
 =item 3
 
-The chomped source line without beginning and ending spaces
+The chomped source line without beginning and ending spaces.
 
 =item 4
 
-The chomped source line
+The chomped source line.
 
 =back
 
@@ -150,25 +152,17 @@
 
 ###
 
-=head2 $output
-=head2 $listing
-=head2 $bytecode
+=head2 $output 
 
-=over 4
+What is output to the bytecode file.
 
-=item $output
-
-will be what is output to the bytecode file.
-
-=item $listing
-
-will be what is output to the listing file.
+=head2 $listing
 
-=item $bytecode
+What is output to the listing file.
 
-is the program's bytecode (executable instructions).
+=head2 $bytecode
 
-=back
+The program's bytecode (executable instructions).
 
 =cut
 
@@ -177,14 +171,10 @@
 
 ###
 
-=head2 $file
-=head2 $line
-=head2 $pline
-=head2 $sline
-
-$file, $line, $pline, and $sline are used to reference information from the
-@program array.  Please look at the comments for @program for the description
-of each.
+=head2 $file, $line, $pline, $sline
+
+These variables are used to reference information from the C<@program> array.  
+Please look at the comments for C<@program> for the description of each.
 
 =cut
 
@@ -194,41 +184,31 @@
 ###
 
 =head2 %label
-=head2 %fixup
-=head2 %macros
-=head2 %local_label
-=head2 %local_fixup
-=head2 $last_label
 
-=over 4
-
-=item %label
-
-will hold each label and the PC at which it was defined.
-
-=item %fixup
+This holds each label and the PC at which it was defined.
 
-will hold labels that have not yet been defined, where they are used in
-the source code, and the PC at that point. It is used for backpatching.
+=head2 %fixup
 
-=item %macros
+This holds labels that have not yet been defined, the position they are 
+used in the source code, and the PC at that point. It is used for 
+backpatching.
 
-will map a macro name to an array of program lines with the same format
-as @program.
+=head2 %macros
 
-=item %local_label
+This maps a macro name to an array of program lines with the same format
+as C<@program>.
 
-will hold local label definitions,
+=head2 %local_label
 
-=item %local_fixup
+This holds local label definitions.
 
-will hold the occurances of local labels in the source file.
+=head2 %local_fixup
 
-=item $last_label
+This holds the occurrences of local labels in the source file.
 
-is the name of the last label seen
+=head2 $last_label
 
-=back
+This the name of the last label seen.
 
 =cut
 
@@ -238,10 +218,12 @@
 ###
 
 =head2 $pc

Re: on parrot strings

2002-01-21 Thread Dave Mitchell

Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> There is no string type built out of native eight-bit bytes.

In the good ol'days, one could usefully use regexes on 8-bit binary data,
eg

open G, 'myfile.gif' or die;
read G, $buf, 8192 or die;
if ($buf =~ /^GIF89a\x08\x02/) {
.

where it was clear to everyone that we are checking whether the first few
bytes of the file contain (0x47, 0x49, ..., 0x02)

Is this sort of thing now completely dead in the Brave New World of
Unicode, Locales etc etc? (yes, there's always pack, but pack is so... errr
hmm )

Dave.




Re: on parrot strings

2002-01-21 Thread Jarkko Hietaniemi

On Mon, Jan 21, 2002 at 04:37:46PM +, Dave Mitchell wrote:
> Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> > There is no string type built out of native eight-bit bytes.
> 
> In the good ol'days, one could usefully use regexes on 8-bit binary data,
> eg
> 
> open G, 'myfile.gif' or die;
> read G, $buf, 8192 or die;
> if ($buf =~ /^GIF89a\x08\x02/) {
> .
> 
> where it was clear to everyone that we are checking whether the first few
> bytes of the file contain (0x47, 0x49, ..., 0x02)
> 
> Is this sort of thing now completely dead in the Brave New World of

Of course not, I do not remember forbiddding \xHH.  The default of
data coming in from filehandles could still be opaque 8-bit bytes.

> Unicode, Locales etc etc? (yes, there's always pack, but pack is so... errr
> hmm )

> Dave.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: on parrot strings

2002-01-21 Thread Dave Mitchell

Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> > In the good ol'days, one could usefully use regexes on 8-bit binary data,
> > eg
> > 
> > open G, 'myfile.gif' or die;
> > read G, $buf, 8192 or die;
> > if ($buf =~ /^GIF89a\x08\x02/) {
> > .
> > 
> > where it was clear to everyone that we are checking whether the first few
> > bytes of the file contain (0x47, 0x49, ..., 0x02)
> > 
> > Is this sort of thing now completely dead in the Brave New World of
> 
> Of course not, I do not remember forbiddding \xHH.  The default of
> data coming in from filehandles could still be opaque 8-bit bytes.

Good :-)

I'm not clear though, how binary data could get passed to parrot's
regex engine, unless there's a BINARY_8 CEF in addition to
UNICODE_CEF_UTF_8 etc in C

???




[PATCH] Parrot::Optimizer bugs

2002-01-21 Thread Simon Glover


 Enclosed patch fixes a couple of bugs in the optimizer. The first was 
 that the parser wasn't correctly recognising register names - it needs
 to check for these _before_ checking for labels, or else they're 
 incorrectly identified as labels. Strangely, this wasn't causing
 any problems with the optimized code, at least as far as I could see, 
 but this may be down to luck.

 The other bug is a misplaced ? in the regex checking for integers.
 This makes the match non-greedy, so 1.0 (for example) gets
 split up into 1000 (which matches the regex) and 0.0 (which matches
 as a float the next time around the loop). This means that code
 such as   

 set N1, 1.0

 gets converted to

 set N1, 1000, 0.0

 which quite rightly fails to assemble. Removing the ? appears to make 
 everything work as intended.

 Simon

--- Optimizer.pm.oldFri Dec 14 06:04:27 2001
+++ Optimizer.pmMon Jan 21 17:35:47 2002
@@ -53,16 +53,16 @@
 # Collect arbitrary parameters
 #
 while(/\S/) {
-  if(s/^([a-zA-Z][a-zA-Z0-9]+)//) {# global label
+  if(s/^([INSP]\d+\b)//) { # Register name
+push @{$line->{parameter}},{type=>'register',value=>$1};
+  }
+  elsif(s/^([a-zA-Z][a-zA-Z0-9]+)//) {# global label
 push @{$line->{parameter}},{type=>'label_global',value=>$1};
   }
   elsif(s/^(\$\w+)//) {# local label
 push @{$line->{parameter}},{type=>'label_local',value=>$1};
   }
-  elsif(s/^([INSP]\d+\b)//) {  # Register name
-push @{$line->{parameter}},{type=>'register',value=>$1};
-  }
-  elsif(s/^(-?\d+)(?!\.)//) {  # integer
+  elsif(s/^(-?\d+)(!\.)//) {  # integer
 push @{$line->{parameter}},{type=>'constant_i',value=>$1};
   }
   elsif(s/^(-?\d+\.\d+)//) {   # float
 






Re: on parrot strings

2002-01-21 Thread Jarkko Hietaniemi

On Mon, Jan 21, 2002 at 05:09:06PM +, Dave Mitchell wrote:
> Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> > > In the good ol'days, one could usefully use regexes on 8-bit binary data,
> > > eg
> > > 
> > > open G, 'myfile.gif' or die;
> > > read G, $buf, 8192 or die;
> > > if ($buf =~ /^GIF89a\x08\x02/) {
> > > .
> > > 
> > > where it was clear to everyone that we are checking whether the first few
> > > bytes of the file contain (0x47, 0x49, ..., 0x02)
> > > 
> > > Is this sort of thing now completely dead in the Brave New World of
> > 
> > Of course not, I do not remember forbiddding \xHH.  The default of
> > data coming in from filehandles could still be opaque 8-bit bytes.
> 
> Good :-)
> 
> I'm not clear though, how binary data could get passed to parrot's
> regex engine, unless there's a BINARY_8 CEF in addition to
> UNICODE_CEF_UTF_8 etc in C

Yes, that's somewhat problematic.  Making up "a byte CEF" would be
Wrong, though, because there is, by definition, no CCS to map, and
we would be dangerously close to conflating in CES, too...
ACR-CCS-CEF-CES.  Read the character model.  Understand the character
model.  Embrace the character model.  Be the character model.  (And
once you're it, read the relevant Unicode, XML, and Web standards.)

To highlight the difference between opaque numbers and characters,
the above should really be:

if ($buf =~ /\x47\x49\x46\x38\x39\x61\x08\x02/) { ... }

I think what needs to be done is that \xHH must not be encoded as
literals (as it is now, 'A' and \x41 are identical (in ASCII)), but
instead as regex nodes of their own, storing the code points.  Then
the regex engine can try both the "right/new way" (the Unicode code
point), and the "wrong/legacy way" (the native code point).

String literals have the same problem.  What does "foo\x41" mean?
(Here, unlike with the regular expressions, we can't "try both",
unless we integrate Damian's quantum state variables to the core :-)
We have various options: there might be a pragma to tell what CCS
"naked codepoints" are to be understood in, or the default could be
grovelled out of environment settings (both these options could affect
the regex solution, too), and so forth.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



[maybe PATCH] use Term::ReadLine where possible

2002-01-21 Thread Nicholas Clark

I think that this is a good idea, but there may be arguments against it.
The stub Term::ReadLine has been in perl since pre 5.004, so it's quite safe
to use it. However, to actually get line editing one needs to have installed
either Term::ReadLine::Perl or Term::ReadLine::Gnu. Attached patch makes
Configure.pl use Term::ReadLine to give interactive editing if there's a real
Term::ReadLine present, else Configure.pl continues to use the old way.

I think that this is easier to use than cut and paste or the rem:{} add:{}
syntax that &prompt appears to offer.

Tested with Term::ReadLine::Gnu and Term::ReadLine::Perl
(and I don't know why Term::ReadLine::Perl later decided that it could do
multi-line editing when it initially was doing sideways scrolling)

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- Configure.pl.orig   Sun Jan 20 22:57:28 2002
+++ Configure.plMon Jan 21 17:25:28 2002
@@ -13,10 +13,9 @@
 use Getopt::Long;
 use ExtUtils::Manifest qw(manicheck);
 use File::Copy;
-
+use Term::ReadLine; # The stub is present from earlier than 5.004
 use Parrot::BuildUtil;
 
-
 #
 # Read the array and scalar forms of the version.
 # from the VERSION file.
@@ -287,15 +286,17 @@
 # Ask questions
 #
 
-prompt("What C compiler do you want to use?", 'cc');
-prompt("How about your linker?", 'ld');
-prompt("What flags would you like passed to your C compiler?", 'ccflags');
-prompt("What flags would you like passed to your linker?", 'ldflags');
-prompt("Which libraries would you like your C compiler to include?", 'libs');
-prompt("How big would you like integers to be?", 'iv');
-prompt("And your floats?", 'nv');
-prompt("What is your native opcode type?", 'opcode_t');
+my $term = initialise_term();
 
+prompt($term, "What C compiler do you want to use?", 'cc');
+prompt($term, "How about your linker?", 'ld');
+prompt($term, "What flags would you like passed to your C compiler?",
+   'ccflags');
+prompt($term, "Which libraries would you like your C compiler to include?",
+   'libs');
+prompt($term, "How big would you like integers to be?", 'iv');
+prompt($term, "And your floats?", 'nv');
+prompt($term, "What is your native opcode type?", 'opcode_t');
 
 {
my(@ops)=glob("*.ops");
@@ -326,7 +327,7 @@
 Which opcode files would you like?
 END
 
-   prompt($msg, 'ops');
+   prompt($term, $msg, 'ops');
 }
 
 
@@ -428,7 +429,7 @@
 next unless $opt; # Ignore blank lines
 $c{cc_warn} .= " $opt";
 }
-prompt("What gcc warning flags do you want to use?", 'cc_warn');
+prompt($term, "What gcc warning flags do you want to use?", 'cc_warn');
 }
 
 #
@@ -708,21 +709,29 @@
 sub prompt {
 return if $opt_defaults;
 
-my($message, $field)=(@_);
+my($term, $message, $field)=(@_);
 my($input);
-print "$message [$c{$field}] ";
-chomp($input=);
+if ($term) {
+# Term::ReadLine::Gnu does a multiline edit just like bash.
+# Term::ReadLine::Perl does a sideways scrolling single line like ksh.
+print "$message [$c{$field}]\n";
+$input = $term->readline("", $c{$field});
+$term->addhistory($input) if /\S/ and !$term->Features->{autohistory};
+} else {
+print "$message [$c{$field}] ";
+chomp($input=);
 
-if($input =~ s/^\+//) {
-$input="$c{$field} $input";
-}
-else {
-if($input =~ s/:rem\{(.*?)\}//) {
-$c{$field} =~ s/$_//g for split / /, $1;
+if($input =~ s/^\+//) {
+$input="$c{$field} $input";
 }
+else {
+if($input =~ s/:rem\{(.*?)\}//) {
+$c{$field} =~ s/$_//g for split / /, $1;
+}
 
-if($input =~ s/:add\{(.*?)\}//) {
-$input="$c{$field} $1 $input";
+if($input =~ s/:add\{(.*?)\}//) {
+$input="$c{$field} $1 $input";
+}
 }
 }
 
@@ -816,8 +825,32 @@
 
 exit 1;
 }
-else {
-print <<"END";
+}
+
+#
+# initialise_term()
+#
+
+sub initialise_term {
+my $term = Term::ReadLine->new ('Parrot configuration');
+undef $term if $term && $term->ReadLine eq "Term::ReadLine::Stub";
+
+if ($term) {
+my $type = $term->ReadLine;
+print <<"END";
+Okay, we found everything.  Next you'll need to answer
+a few questions about your system.  You have
+${ type} installed, so I'll use that to let
+you edit your answers interactively. I'll put the
+default in square brackets, and also prime the input
+line with the default. Just hit enter straight away to
+accept the default, or edit it to suit. Like Perl 5's
+Configure you can also chose the default by entering a
+zero length line.
+
+END
+} else {
+print <<"END";
 Okay, we found everything.  Next you'll need to answer
 a few questions about your system.  Defaults are in square
 brackets, and you can hit enter to accept them.  If you
@@ -827,8 +860,8 @@
 
 END
 }
+return $term;
 }
-
 
 #

HP-UX state

2002-01-21 Thread H . Merijn Brand

l1:/pro/3gl/CPAN/parrot-current 114 > perl Configure.pl --default
Parrot Version 0.0.3 Configure
Copyright (C) 2001-2002 Yet Another Society

Since you're running this script, you obviously have
Perl 5--I'll be pulling some defaults from its configuration.

Checking the MANIFEST to make sure you have a complete Parrot kit...

Okay, we found everything.  Next you'll need to answer
a few questions about your system.  Defaults are in square
brackets, and you can hit enter to accept them.  If you
don't want the default, type a new value in.  If that new
value starts with a '+', it will be concatenated to the
default value.


Determining if your C compiler is actually gcc (this could take a while):


Your C compiler is not gcc.


Probing Perl 5's configuration to determine which headers you have (this could
take a while on slow machines)...

Determining C data type sizes by compiling and running a small C program (this
could take a while):

  Building ./test.c   from test_c.in...

Figuring out the formats to pass to pack() for the various Parrot internal
types...
Figuring out what integer type we can mix with pointers...
We'll use 'unsigned int'.

Building a preliminary version of include/parrot/config.h, your Makefiles, and
other files:

  Building include/parrot/config.hfrom config_h.in...
  Building ./Makefile from Makefile.in...
  Building ./classes/Makefile from classes/Makefile.in...
  Building ./docs/Makefilefrom docs/Makefile.in...
  Building ./languages/Makefile   from languages/Makefile.in...
  Building ./languages/jako/Makefile  from languages/jako/Makefile.in...
  Building ./languages/miniperl/Makefile  from languages/miniperl/Makefile.in...
  Building ./languages/scheme/Makefilefrom languages/scheme/Makefile.in...
  Building Parrot/Types.pmfrom Types_pm.in...
  Building Parrot/Config.pm   from Config_pm.in...

Checking some things by compiling and running another small C program (this
could take a while):

  Building ./testparrotsizes.cfrom testparrotsizes_c.in...

Updating include/parrot/config.h:

  Building include/parrot/config.hfrom config_h.in...

Okay, we're done!

You can now use `make' (or your platform's equivalent to `make') to build your
Parrot. After that, you can use `make test' to run the test suite.

Happy Hacking,

The Parrot Team

l1:/pro/3gl/CPAN/parrot-current 115 > make
perl vtable_h.pl
make: *** No rule to make target `include/parrot/rxstacks.h', needed by `test_main.o'. 
 Stop.
Exit 2
l1:/pro/3gl/CPAN/parrot-current 116 > cat .timestamp
1011556802
Sun Jan 20 20:00:02 2002 UTC

(time of this cvs update)
l1:/pro/3gl/CPAN/parrot-current 117 >

-- 
H.Merijn BrandAmsterdam Perl Mongers (http://amsterdam.pm.org/)
using perl-5.6.1, 5.7.2 & 631 on HP-UX 10.20 & 11.00, AIX 4.2, AIX 4.3,
  WinNT 4, Win2K pro & WinCE 2.11.  Smoking perl CORE: [EMAIL PROTECTED]
http:[EMAIL PROTECTED]/   [EMAIL PROTECTED]
send smoke reports to: [EMAIL PROTECTED], QA: http://qa.perl.org




Re: HP-UX state

2002-01-21 Thread Simon Glover



On Mon, 21 Jan 2002, H.Merijn Brand wrote:

> perl vtable_h.pl
> make: *** No rule to make target `include/parrot/rxstacks.h', needed by 
>`test_main.o'.  Stop.

 This exists (and has done for a couple of days) but isn't in the MANIFEST
 at the moment (I've already sent a patch). Could that be causing the
 problem?

 Simon




Re: [PATCH] Parrot::Optimizer bugs

2002-01-21 Thread Simon Glover



On Mon, 21 Jan 2002, Simon Glover wrote:

>  The other bug is a misplaced ? in the regex checking for integers.
>  This makes the match non-greedy, so 1.0 (for example) gets
>  split up into 1000 (which matches the regex) and 0.0 (which matches
>  as a float the next time around the loop). This means that code
>  such as   
> 
>  set N1, 1.0
> 
>  gets converted to
> 
>  set N1, 1000, 0.0
> 
>  which quite rightly fails to assemble. Removing the ? appears to make 
>  everything work as intended.

 Forget this, this is garbage - the ? doesn't mean what I thought it
 meant. Correct patch to follow shortly.

 Simon





Re: [PATCH] Parrot::Optimizer bugs

2002-01-21 Thread Simon Glover


 Right: the real cause of the second bug is similar to what I thought it
 was - when it sees a float, the regex engine first checks to see if it 
 is an integer by trying the substitution:

s/^(-?\d+)(?!\.)// 
 
 The problem is that when, say, 1.0 gets fed to this, and fails
 to match, the regex engine starts to back up, until it sucessfully
 matches (and erases) 1000, leaving 0.0 to be parsed on the next
 iteration of the loop, and hence producing incorrect output.

 The best way that I've been able to think of to fix this is to swap
 the order of the integer and float comparisons, so that any floats
 get matched before we get to the above; if anyone else can think of a
 better way, or some reason why this won't work, I'd be glad to hear it.

 Correct(?) patch enclosed below - note that it replaces the one sent
 earlier, which should be ignored. 

 Simon

--- Optimizer.pm.oldMon Jan 21 18:12:16 2002
+++ Optimizer.pmMon Jan 21 18:44:05 2002
@@ -53,20 +53,20 @@
 # Collect arbitrary parameters
 #
 while(/\S/) {
-  if(s/^([a-zA-Z][a-zA-Z0-9]+)//) {# global label
+  if(s/^([INSP]\d+\b)//) { # Register name
+push @{$line->{parameter}},{type=>'register',value=>$1};
+  }
+  elsif(s/^([a-zA-Z][a-zA-Z0-9]+)//) {# global label
 push @{$line->{parameter}},{type=>'label_global',value=>$1};
   }
   elsif(s/^(\$\w+)//) {# local label
 push @{$line->{parameter}},{type=>'label_local',value=>$1};
   }
-  elsif(s/^([INSP]\d+\b)//) {  # Register name
-push @{$line->{parameter}},{type=>'register',value=>$1};
+  elsif(s/^(-?\d+\.\d+)//) {   # float
+push @{$line->{parameter}},{type=>'constant_n',value=>$1};
   }
   elsif(s/^(-?\d+)(?!\.)//) {  # integer
 push @{$line->{parameter}},{type=>'constant_i',value=>$1};
-  }
-  elsif(s/^(-?\d+\.\d+)//) {   # float
-push @{$line->{parameter}},{type=>'constant_n',value=>$1};
   }
   elsif(s/^("(?:[^\\"]|(?:\\(?>["tnr\\])))*")// or # single-quoted string
 s/^('(?:[^\\']|(?:\\(?>['tnr\\])))*')//) { # double-quoted string
 
  
 
 





RE: [PATCH] Parrot::Optimizer bugs

2002-01-21 Thread Brent Dax

Simon Glover:
#  Right: the real cause of the second bug is similar to what I
# thought it
#  was - when it sees a float, the regex engine first checks to
# see if it
#  is an integer by trying the substitution:
#
# s/^(-?\d+)(?!\.)//
#
#  The problem is that when, say, 1.0 gets fed to this, and fails
#  to match, the regex engine starts to back up, until it sucessfully
#  matches (and erases) 1000, leaving 0.0 to be parsed on the next
#  iteration of the loop, and hence producing incorrect output.
#
#  The best way that I've been able to think of to fix this is to swap
#  the order of the integer and float comparisons, so that any floats
#  get matched before we get to the above; if anyone else can think of a
#  better way, or some reason why this won't work, I'd be glad
# to hear it.

If the problem is backtracking, can't you just use the (?>)
no-backtracking syntax?

--Brent Dax
[EMAIL PROTECTED]
Parrot Configure pumpking and regex hacker

 . hawt sysadmin chx0rs
 This is sad. I know of *a* hawt sysamin chx0r.
 I know more than a few.
 obra: There are two? Are you sure it's not the same one?




RE: [PATCH] Parrot::Optimizer bugs

2002-01-21 Thread Simon Glover



On Mon, 21 Jan 2002, Brent Dax wrote:

> 
> If the problem is backtracking, can't you just use the (?>)
> no-backtracking syntax?
> 

 Didn't think of that. I'm a bit concerned at the large warning
 signs attached to it in perlre.pod, though.

 Simon





RE: on parrot strings

2002-01-21 Thread Hong Zhang

> But e` and e are different letters man. And re`sume` and resume are
> different words come to that. If the user wants something that'll
> match 'em both then the pattern should surely be:
> 
>/r[ee`]sum[ee`]/

I disagree. The difference between 'e' and 'e`' is similar to 'c'
and 'C'. The Unicode compability equivalence has similar effect
too, such as "half width letter" and "full width letter".

It may just be my personal perference. But I don't think it is
good idea to push this problem to user of regex.

Hong



RE: on parrot strings

2002-01-21 Thread Hong Zhang

> Yes, that's somewhat problematic.  Making up "a byte CEF" would be
> Wrong, though, because there is, by definition, no CCS to map, and
> we would be dangerously close to conflating in CES, too...
> ACR-CCS-CEF-CES.  Read the character model.  Understand the character
> model.  Embrace the character model.  Be the character model.  (And
> once you're it, read the relevant Unicode, XML, and Web standards.)
> 
> To highlight the difference between opaque numbers and characters,
> the above should really be:
> 
>   if ($buf =~ /\x47\x49\x46\x38\x39\x61\x08\x02/) { ... }
> 
> I think what needs to be done is that \xHH must not be encoded as
> literals (as it is now, 'A' and \x41 are identical (in ASCII)), but
> instead as regex nodes of their own, storing the code points.  Then
> the regex engine can try both the "right/new way" (the Unicode code
> point), and the "wrong/legacy way" (the native code point).

My suggest will be add a binary mode, such as //b. When binary mode
is in effect, only ascii characters (0 - 127) still carry text property.
\p{IsLower} will only match ascii a to z. All 128 - 255 always have false
text property. Any code points must be between 0 and 255. The regcomp
can easily check it upon compilation.

A dedicated binary mode will simplify many issues. And the regex will
be very readable. We can make binary mode be exclusive with text mode,
i.e. and regex expression must be either binary or text, but not both.
(I am not sure if it is really useful to have mixed mode.)

Hong



RE: on parrot strings

2002-01-21 Thread Garrett Goebel

From: Hong Zhang [mailto:[EMAIL PROTECTED]]
> 
> > But e` and e are different letters man. And re`sume` and resume are
> > different words come to that. If the user wants something that'll
> > match 'em both then the pattern should surely be:
> > 
> >/r[ee`]sum[ee`]/
> 
> I disagree. The difference between 'e' and 'e`' is similar to 'c'
> and 'C'. The Unicode compability equivalence has similar effect
> too, such as "half width letter" and "full width letter".

German to English
 schon => already
 schön => nice

2 totally different words.

I'm sure there are words in some language where the difference between a 'e'
and 'e`' can be the difference between an insult and a compliment.
 



RE: on parrot strings

2002-01-21 Thread Hong Zhang

> > But e` and e are different letters man. And re`sume` and resume are 
> > different words come to that. If the user wants something that'll 
> > match 'em both then the pattern should surely be: 
> > 
> >/r[ee`]sum[ee`]/ 
> 
> I disagree. The difference between 'e' and 'e`' is similar to 'c' 
> and 'C'. The Unicode compability equivalence has similar effect 
> too, such as "half width letter" and "full width letter". 

German to English 
 schon => already 
 schön => nice 

2 totally different words. 

I am talking about similar word where you are talking about different word.
I don't mind if someone can search cross languages. Some Chinese search
enginee can do chinese search using engish keyword (for people having
chinese viewer but not chinese input method.) Of course, no one expect
regex engine should do that.

The "re`sume`" do appear in English sentence. The "[half|full] width letter"
are in the same language.

Hong



Re: on parrot strings

2002-01-21 Thread Russ Allbery

Hong Zhang <[EMAIL PROTECTED]> writes:

> I disagree. The difference between 'e' and 'e`' is similar to 'c'
> and 'C'.

No, it's not.

In many languages, an accented character is a completely different letter.
It's alphabetized separately, it's pronounced differently, and there are
many words that differ only in the presence of an accent.

Changing the capitalization of C does not change the word.  Adding or
removing an accent does.

> The Unicode compability equivalence has similar effect too, such as
> "half width letter" and "full width letter".

You'll find that the Unicode compatibility equivalence does nothing as
ill-conceived as unifying e and e', for very good reason because that
would be a horrible mistake.

-- 
Russ Allbery ([EMAIL PROTECTED]) 



Re: on parrot strings

2002-01-21 Thread Bryan C. Warnock

On Monday 21 January 2002 16:43, Russ Allbery wrote:
> Changing the capitalization of C does not change the word. 

Er, most of the time. 

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



RE: on parrot strings

2002-01-21 Thread Stephen Howard

Not to get modifier-happy, but it seems like a user-oriented solution would be to let 
the user specify a modifier:

 "caseinsensitive" =~ m/CaseInsensitive/i

 "resume" =~ m/re`sume`/d (diacritic modifier?)

-Stephen

-Original Message-
From: Hong Zhang [mailto:[EMAIL PROTECTED]]
Sent: Monday, January 21, 2002 04:10 PM
Cc: [EMAIL PROTECTED]
Subject: RE: on parrot strings


> > But e` and e are different letters man. And re`sume` and resume are
> > different words come to that. If the user wants something that'll
> > match 'em both then the pattern should surely be:
> >
> >/r[ee`]sum[ee`]/
>
> I disagree. The difference between 'e' and 'e`' is similar to 'c'
> and 'C'. The Unicode compability equivalence has similar effect
> too, such as "half width letter" and "full width letter".

German to English
 schon => already
 schön => nice

2 totally different words.

I am talking about similar word where you are talking about different word.
I don't mind if someone can search cross languages. Some Chinese search
enginee can do chinese search using engish keyword (for people having
chinese viewer but not chinese input method.) Of course, no one expect
regex engine should do that.

The "re`sume`" do appear in English sentence. The "[half|full] width letter"
are in the same language.

Hong




Re: on parrot strings

2002-01-21 Thread Russ Allbery

Bryan C Warnock <[EMAIL PROTECTED]> writes:
> On Monday 21 January 2002 16:43, Russ Allbery wrote:

>> Changing the capitalization of C does not change the word. 

> Er, most of the time. 

No, pretty much all of the time.  There are differences between proper
nouns and common nouns, but those are differences routinely quashed as a
typesetting decision; if you write both proper nouns and common nouns in
all caps as part of a headline, the lack of distinction is not considered
a misspelling.  Similarly, if you capitalize the common noun because it
occurs at the beginning of the sentence, that doesn't transform its
meaning.

Whereas adding or removing an accent is always considered a misspelling,
at least in some languages.  It's like adding or removing random letters
from the word.

re'sume' and resume are two different words.  It so happens that in
English re'sume' is a varient spelling for one meaning of resume.  I don't
believe that regexes should try to automatically pick up varient
spellings.  Should the regex /aerie/ match /eyrie/?  That makes as much
sense as a search for /resume/ matching /re'sume'/.

-- 
Russ Allbery ([EMAIL PROTECTED]) 



[PATCH] are characters unsigned?

2002-01-21 Thread Nicholas Clark

This warning:

string.c: In function `string_transcode':
string.c:194: warning: passing arg 2 of pointer to function as unsigned due to 
prototype

represents a can of worms. The summary is "are characters signed or unsigned?"

I am of the opinion that they are UINTVAL, not INTVAL. (and EOF being a
negative value such as -1 is only needed for C stdio, and I seem to remember
that Dan has strong opinions on C stdio, and what C can do with it)

This is not a very considered opinion, I should add. It just feels safer with
them as unsigned, on the assumption that our code doesn't do EOF.

In which case, the following rather involved patch is needed. Or something
similar. And it's scary because it redefines chartypes, so please could
someone sanity check it.

I thought that it should be this

INTVAL (*get_digit)(UINTVAL c);

not this

UINTVAL (*get_digit)(UINTVAL c);

as I'd not be surprised if Unicode contains a glyph in some script that is
for a digit with negative value. (And if there isn't the Klingons will
invent one to be awkward)

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- include/parrot/chartype.h~  Thu Dec 27 18:50:28 2001
+++ include/parrot/chartype.h   Mon Jan 21 19:12:16 2002
@@ -13,15 +13,15 @@
 #if !defined(PARROT_CHARTYPE_H_GUARD)
 #define PARROT_ENCODING_H_GUARD
 
-typedef INTVAL (*CHARTYPE_TRANSCODER)(INTVAL c);
+typedef UINTVAL (*CHARTYPE_TRANSCODER)(UINTVAL c);
 
 typedef struct {
 const char *name;
 const char *default_encoding;
 CHARTYPE_TRANSCODER (*transcode_from)(const char *from);
 CHARTYPE_TRANSCODER (*transcode_to)(const char *to);
-BOOLVAL (*is_digit)(INTVAL c);
-INTVAL (*get_digit)(INTVAL c);
+BOOLVAL (*is_digit)(UINTVAL c);
+INTVAL (*get_digit)(UINTVAL c);
 } CHARTYPE;
 
 const CHARTYPE *
--- ../parrot/string.c  Tue Jan 15 23:14:51 2002
+++ string.cMon Jan 21 19:28:24 2002
@@ -186,7 +186,7 @@
 destend = deststart;
 
 while (srcstart < srcend) {
-INTVAL c = src->encoding->decode(srcstart);
+UINTVAL c = src->encoding->decode(srcstart);
 
 if (transcoder1) c = transcoder1(c);
 if (transcoder2) c = transcoder2(c);
@@ -424,7 +424,7 @@
 }
 
 if (len == 1) {
-INTVAL c = s->encoding->decode(s->bufstart);
+UINTVAL c = s->encoding->decode(s->bufstart);
 if (s->type->is_digit(c) && s->type->get_digit(c) == 0) {
 return 0;
 }
@@ -456,7 +456,7 @@
 BOOLVAL in_number = 0;
 
 while (start < end) {
-INTVAL c = s->encoding->decode(start);
+UINTVAL c = s->encoding->decode(start);
 
 if (s->type->is_digit(c)) {
 in_number = 1;
@@ -500,7 +500,7 @@
 INTVAL fake_exponent = 0;
 
 while (start < end) {
-INTVAL c = s->encoding->decode(start);
+UINTVAL c = s->encoding->decode(start);
 
 if (s->type->is_digit(c)) {
 if (in_exp) {
--- ../parrot/chartypes/unicode.c   Tue Jan 15 20:02:54 2002
+++ chartypes/unicode.c Mon Jan 21 20:06:09 2002
@@ -23,12 +23,12 @@
 }
 
 static BOOLVAL
-unicode_is_digit(INTVAL c) {
+unicode_is_digit(UINTVAL c) {
 return (BOOLVAL)(isdigit(c) ? 1 : 0); /* FIXME - Other code points are also 
digits */
 }
 
-static INTVAL
-unicode_get_digit(INTVAL c) {
+static UINTVAL
+unicode_get_digit(UINTVAL c) {
 return c - '0'; /* FIXME - many more digits than this... */
 }
 
--- ../parrot/chartypes/usascii.c   Tue Jan 15 20:02:54 2002
+++ chartypes/usascii.c Mon Jan 21 20:10:49 2002
@@ -12,9 +12,9 @@
 
 #include "parrot/parrot.h"
 
-static INTVAL
-usascii_transcode_from_unicode(INTVAL c) {
-if (c < 0 || c > 127) {
+static UINTVAL
+usascii_transcode_from_unicode(UINTVAL c) {
+if (c > 127) {
 internal_exception(INVALID_CHARACTER, "Invalid character for US-ASCII");
 }
 return c;
@@ -30,8 +30,8 @@
 }
 }
 
-static INTVAL
-usascii_transcode_to_unicode(INTVAL c) {
+static UINTVAL
+usascii_transcode_to_unicode(UINTVAL c) {
 return c;
 }
 
@@ -46,13 +46,13 @@
 }
 
 static BOOLVAL
-usascii_is_digit(INTVAL c) {
-return (BOOLVAL)(isdigit(c) ? 1 : 0);
+usascii_is_digit(UINTVAL c) {
+return (BOOLVAL)(isdigit((int) c) ? 1 : 0);
 }
 
 static INTVAL
-usascii_get_digit(INTVAL c) {
-return c - '0';
+usascii_get_digit(UINTVAL c) {
+return ((INTVAL) c) - '0';
 }
 
 const CHARTYPE usascii_chartype = {



[PATCH] MANIFEST.SKIP

2002-01-21 Thread Nicholas Clark

This patch (context diffs mean that it's atop the Term::ReadLine patch)
adds a check for unexpected files not in the MANIFEST to Configure.pl

I'm not certain that putting the test in Configure.pl is the right place
for it, but I do believe that having an accurate MANIFEST.SKIP and the
ability to run 

perl -MExtUtils::Manifest -e ExtUtils::Manifest::fullcheck

(possibly as a Makefile target) is useful.

Currently:

Not in MANIFEST: include/parrot/rxstacks.h
Not in MANIFEST: rxstacks.c

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- /mnt/six/parrot/parrot_readline++/Configure.pl~ Mon Jan 21 17:44:03 2002
+++ Configure.plMon Jan 21 19:48:37 2002
@@ -11,7 +11,7 @@
 
 use Config;
 use Getopt::Long;
-use ExtUtils::Manifest qw(manicheck);
+use ExtUtils::Manifest qw(fullcheck);
 use File::Copy;
 use Term::ReadLine; # The stub is present from earlier than 5.004
 use Parrot::BuildUtil;
@@ -810,11 +810,10 @@
 #
 
 sub check_manifest {
-print "\n";
 
-my(@missing)=manicheck();
+my($missing,$extra)=fullcheck();
 
-if(@missing) {
+if(@$missing) {
 print <<"END";
 
 Ack, some files were missing!  I can't continue running
@@ -838,6 +837,7 @@
 if ($term) {
 my $type = $term->ReadLine;
 print <<"END";
+
 Okay, we found everything.  Next you'll need to answer
 a few questions about your system.  You have
 ${ type} installed, so I'll use that to let
--- /dev/null   Mon Jul 16 22:57:44 2001
+++ MANIFEST.SKIP   Mon Jan 21 20:13:26 2002
@@ -0,0 +1,52 @@
+\.o$
+^\.cvsignore$
+/\.cvsignore$
+^include/parrot/config\.h$
+^include/parrot/platform\.h$
+^Makefile$
+/Makefile$
+^Parrot/Types\.pm$
+^Parrot/Config\.pm$
+^platform\.c$
+^config.opt$
+
+^vtable\.ops$
+^include/parrot/vtable\.h$
+^include/parrot/jit_struct\.h$
+^include/parrot/oplib/core_ops\.h$
+^include/parrot/oplib/core_ops_prederef\.h$
+
+^core_ops\.c$
+^core_ops_prederef\.c$
+^vtable_ops\.c$
+
+^Parrot/Jit\.pm$
+^Parrot/PMC\.pm$
+^Parrot/OpLib/core\.pm$
+
+^classes/default\.h$
+^classes/default\.c$
+^classes/intqueue\.h$
+^classes/intqueue\.c$
+^classes/parrotpointer\.h$
+^classes/parrotpointer\.c$
+^classes/perlarray\.h$
+^classes/perlarray\.c$
+^classes/perlhash\.h$
+^classes/perlhash\.c$
+^classes/perlint\.h$
+^classes/perlint\.c$
+^classes/perlnum\.h$
+^classes/perlnum\.c$
+^classes/perlstring\.h$
+^classes/perlstring\.c$
+^classes/perlundef\.h$
+^classes/perlundef\.c$
+
+^docs/packfile-c\.pod$
+^docs/packfile-perl\.pod$
+^docs/core_ops\.pod$
+
+^test_parrot$
+^pdump$
+^blib/
--- ../parrot/MANIFEST  Mon Jan 21 16:42:17 2002
+++ MANIFESTMon Jan 21 19:34:16 2002
@@ -3,6 +3,7 @@
 Configure.pl
 KNOWN_ISSUES
 MANIFEST
+MANIFEST.SKIP
 Makefile.in
 NEWS
 Parrot/Assembler.pm



[PATCH] warnings in test_main.c

2002-01-21 Thread Nicholas Clark

Before:

cc -Wall -Wstrict-prototypes -Wmissing-prototypes -Winline -Wshadow -Wpointer-arith 
-Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Winline -W 
-Wsign-compare -Wno-unused   -I./include  -DHAS_JIT -DI386 -o test_main.o -c 
test_main.c
test_main.c: In function `main':
test_main.c:230: warning: passing arg 4 of `PackFile_unpack' as unsigned due to 
prototype
test_main.c:249: warning: declaration of `time' shadows global declaration

After:

cc -Wall -Wstrict-prototypes -Wmissing-prototypes -Winline -Wshadow -Wpointer-arith 
-Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Winline -W 
-Wsign-compare -Wno-unused   -I./include  -DHAS_JIT -DI386 -o test_main.o -c 
test_main.c

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- test_main.c.origMon Jan 14 20:32:55 2002
+++ test_main.c Mon Jan 21 17:58:38 2002
@@ -227,7 +227,7 @@
 
 pf = PackFile_new();
 if( !PackFile_unpack(interpreter, pf, (char *)program_code, 
- (opcode_t)program_size) ) {
+ (size_t)program_size) ) {
 printf( "Can't unpack.\n" );
 return 1;
 }
@@ -246,7 +246,7 @@
 unsigned int j;
 int op_count   = 0;
 int call_count = 0;
-FLOATVAL time = 0.0;
+FLOATVAL sum_time = 0.0;
 
 printf("Operation profile:\n\n");
 
@@ -257,7 +257,7 @@
 if(interpreter->profile[j].numcalls > 0) {
 op_count++;
 call_count += interpreter->profile[j].numcalls;
-time += interpreter->profile[j].time;
+sum_time += interpreter->profile[j].time;
 
 printf("  %5d  %-12s  %12ld  %5.6f  %5.6f\n", j, 
interpreter->op_info_table[j].full_name,
@@ -274,8 +274,8 @@
 op_count,
 "",
 call_count,
-time,
-time / (FLOATVAL)call_count
+sum_time,
+sum_time / (FLOATVAL)call_count
 );
 }
 }



Re: [maybe PATCH] use Term::ReadLine where possible

2002-01-21 Thread Nicholas Clark

On Mon, Jan 21, 2002 at 05:52:52PM +, Nicholas Clark wrote:
> I think that this is a good idea, but there may be arguments against it.

If it's a good idea it needs this correction

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- Configure.pl~   Mon Jan 21 17:44:03 2002
+++ Configure.plMon Jan 21 20:05:37 2002
@@ -716,7 +716,8 @@
 # Term::ReadLine::Perl does a sideways scrolling single line like ksh.
 print "$message [$c{$field}]\n";
 $input = $term->readline("", $c{$field});
-$term->addhistory($input) if /\S/ and !$term->Features->{autohistory};
+$term->addhistory($input)
+if $input =~ /\S/ and !$term->Features->{autohistory};
 } else {
 print "$message [$c{$field}] ";
 chomp($input=);



Re: on parrot strings

2002-01-21 Thread Bryan C. Warnock

On Monday 21 January 2002 17:11, Russ Allbery wrote:
> No, pretty much all of the time.  There are differences between proper
> nouns and common nouns, but those are differences routinely quashed as a
> typesetting decision; if you write both proper nouns and common nouns in
> all caps as part of a headline, the lack of distinction is not considered
> a misspelling.  Similarly, if you capitalize the common noun because it
> occurs at the beginning of the sentence, that doesn't transform its
> meaning.

That doesn't mitigate the fact that they are different words.  Sure, English 
is forgiving, as its filled with heteronyms and homographs.  But it's all 
moot because regexes are character-oriented, not word-oriented.  

Given that they're character-oriented, we only need to provide character 
transformations between upper, lower, and title case.  But is that the 
dividing line?

>
> Whereas adding or removing an accent is always considered a misspelling,
> at least in some languages.  It's like adding or removing random letters
> from the word.

No, it's substituting letters in a word.  It's adding or removing random 
characters from the string representation of the word.

>
> re'sume' and resume are two different words.  It so happens that in
> English re'sume' is a varient spelling for one meaning of resume.  I don't
> believe that regexes should try to automatically pick up varient
> spellings.  Should the regex /aerie/ match /eyrie/?  That makes as much
> sense as a search for /resume/ matching /re'sume'/.

Varient spellings imply word-oriented searches.  We're talking about 
character-oriented transformations, and the questions is whether or not 
there's enough justification - which I feel won't come from grammatical 
rationales, but from the 7-bit ASCII storage of words with accents - to 
provide a transformation from a base letter with accents to just the base 
letter.  

Do you feel that altering accented letters to better represent them within 
the facilities provided isn't done, or is wrong?  I'm not sure what 
you're typing as your example word, and whether or not it's getting munged 
in the meantime, but "résumé"  (r, e accent, s, u, m, e accent) is coming 
across "re'sume'" (r, e, apostrophe, s, u, m, e, apostrophe).  (The incoming 
message was encoded ISO-8859-1, so presumably it should have preserved 
character 233, which is what I'm sending out.)

This isn't a ridiculous question.  Personally, I don't think that we should. 
The facilities are quickly coming into place to be able to do proper 
character encodings, and I think that we should lead from the front and 
encourage folks to be proper - not only in their searches, but in their text 
production. 


-- 
Bryan C. Warnock
[EMAIL PROTECTED]



[PATCH] quieten many pmc warnings

2002-01-21 Thread Nicholas Clark

This eliminates many gcc warnings from pmc code by
1: changing index to idx
2: including the pmc's own header file so as to give declarations for its
   functions
3: moving the declarations of the global init functions to global_setup.h so
   that the pmc files see a declaration for their own init function (which
   otherwise gcc will warn about, on the zealous warnings we use)

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- ./include/parrot/global_setup.h.origMon Dec 31 15:58:28 2001
+++ ./include/parrot/global_setup.h Mon Jan 21 21:32:03 2002
@@ -14,6 +14,16 @@
 #if !defined(PARROT_GLOBAL_SETUP_H_GUARD)
 #define PARROT_GLOBAL_SETUP_H_GUARD
 
+/* Needed because this might get compiled before pmcs have been built */
+void Parrot_PerlUndef_class_init(void);
+void Parrot_PerlInt_class_init(void);
+void Parrot_PerlNum_class_init(void);
+void Parrot_PerlString_class_init(void);
+void Parrot_PerlArray_class_init(void);
+void Parrot_PerlHash_class_init(void);
+void Parrot_ParrotPointer_class_init(void);
+void Parrot_IntQueue_class_init(void);
+
 void
 init_world(void);
 
--- ./global_setup.c.orig   Mon Jan 14 20:32:52 2002
+++ ./global_setup.cMon Jan 21 21:31:50 2002
@@ -14,16 +14,6 @@
 #define INSIDE_GLOBAL_SETUP
 #include "parrot/parrot.h"
 
-/* Needed because this might get compiled before pmcs have been built */
-void Parrot_PerlUndef_class_init(void);
-void Parrot_PerlInt_class_init(void);
-void Parrot_PerlNum_class_init(void);
-void Parrot_PerlString_class_init(void);
-void Parrot_PerlArray_class_init(void);
-void Parrot_PerlHash_class_init(void);
-void Parrot_ParrotPointer_class_init(void);
-void Parrot_IntQueue_class_init(void);
-
 void
 init_world(void) {
 string_init(); /* Set up the string subsystem */
--- ./classes/pmc2c.pl.orig Fri Jan  4 02:29:18 2002
+++ ./classes/pmc2c.pl  Mon Jan 21 21:21:25 2002
@@ -185,7 +185,10 @@
   my @methods;
 
   my $OUT = '';
-  my $HOUT = '';
+  my $HOUT = <<"EOC";
+ /* Do not edit - automatically generated from '$pmcfile' by $0 */
+
+EOC
   my %defaulted;
 
   while ($classblock =~ s/($signature_re)//) {
@@ -228,9 +231,12 @@
 
   my $includes = '';
   foreach my $class (keys %visible_supers) {
-  next if $class eq $classname;
+  # No, include yourself to check your headers match your bodies
+  # (and gcc -W... is happy then)
+  # next if $class eq $classname;
   $includes .= qq(#include "\L$class.h"\n);
   }
+
 
   $OUT = cache.int_val = value->vtable->get_integer(INTERP,value);
 }
 
-void set_integer_index (INTVAL value, INTVAL index) {
+void set_integer_index (INTVAL value, INTVAL idx) {
 }
 
 void set_number (PMC * value) {
@@ -123,7 +123,7 @@
SELF->cache.num_val = (FLOATVAL)value->cache.int_val;
 }
 
-void set_number_index (FLOATVAL value, INTVAL index) {
+void set_number_index (FLOATVAL value, INTVAL idx) {
 }
 
 void set_string (PMC * value) {
@@ -148,7 +148,7 @@
string_copy(INTERP, (STRING*)value->cache.struct_val);
 }
 
-void set_string_index (STRING* value, INTVAL index) {
+void set_string_index (STRING* value, INTVAL idx) {
 }
 
 void set_value (void* value) {
--- ./classes/perlnum.pmc.orig  Mon Jan 14 20:32:57 2002
+++ ./classes/perlnum.pmc   Mon Jan 21 21:23:31 2002
@@ -50,14 +50,14 @@
return (INTVAL)SELF->cache.num_val;
 }
 
-INTVAL get_integer_index (INTVAL index) {
+INTVAL get_integer_index (INTVAL idx) {
 }
 
 FLOATVAL get_number () {
 return SELF->cache.num_val;
 }
 
-FLOATVAL get_number_index (INTVAL index) {
+FLOATVAL get_number_index (INTVAL idx) {
 }
 
 STRING* get_string () {
@@ -73,7 +73,7 @@
return s;
 }
 
-STRING* get_string_index (INTVAL index) {
+STRING* get_string_index (INTVAL idx) {
 }
 
 BOOLVAL get_bool () {
@@ -108,7 +108,7 @@
SELF->cache.int_val = value->cache.int_val;
 }
 
-void set_integer_index (INTVAL value, INTVAL index) {
+void set_integer_index (INTVAL value, INTVAL idx) {
 }
 
 void set_number (PMC * value) {
@@ -127,7 +127,7 @@
SELF->cache.num_val = value->cache.num_val;
 }
 
-void set_number_index (FLOATVAL value, INTVAL index) {
+void set_number_index (FLOATVAL value, INTVAL idx) {
 }
 
 void set_string (PMC * value) {
@@ -155,7 +155,7 @@
SELF->cache.struct_val = value->cache.struct_val;
 }
 
-void set_string_index (STRING* value, INTVAL index) {
+void set_string_index (STRING* value, INTVAL idx) {
 }
 
 void set_value (void* value) {
--- ./classes/perlint.pmc.orig  Mon Jan 14 20:32:57 2002
+++ ./classes/perlint.pmc   Mon Jan 21 21:23:42 2002
@@ -50,14 +50,14 @@
 return SELF->cache.int_val;
 }
 
-  

[PATCH] tidy up JIT temporaries

2002-01-21 Thread Nicholas Clark

On Mon, Jan 21, 2002 at 09:00:48PM +, Nicholas Clark wrote:
> I'm not certain that putting the test in Configure.pl is the right place
> for it, but I do believe that having an accurate MANIFEST.SKIP and the
> ability to run 
> 
> perl -MExtUtils::Manifest -e ExtUtils::Manifest::fullcheck
> 
> (possibly as a Makefile target) is useful.

If MANIFEST.SKIP is thought worthy, then the appended piece of tidying up is
a good idea.

Nicholas Clark
-- 
ETAXMANUNHAPPY http://www.ccl4.org/~nick/CV.html

--- Parrot/Jit/i386Generic.pm~  Sun Jan 20 20:52:23 2002
+++ Parrot/Jit/i386Generic.pm   Mon Jan 21 20:25:25 2002
@@ -110,6 +110,7 @@
 
 write_as($assembler,TMP_AS);
 assemble(TMP_AS, TMP_OBJ);
+unlink TMP_AS or warn "Could not unlink " . TMP_AS . ": $!";
 return disassemble(TMP_OBJ,\@special_arg,\@special,$ln);
 }
 



[PATCH] format warning in key.c

2002-01-21 Thread Nicholas Clark

We do mandate an ANSI conformant C compiler, don't we?

Appended patch cures these warnings:

key.c: In function `debug_key':
key.c:29: warning: int format, INTVAL arg (arg 3)
key.c:33: warning: int format, INTVAL arg (arg 3)
key.c:33: warning: int format, INTVAL arg (arg 4)
key.c:36: warning: int format, INTVAL arg (arg 3)
key.c:36: warning: int format, INTVAL arg (arg 4)


Nicholas Clark
-- 
ENOJOB http://www.ccl4.org/~nick/CV.html

--- key.c.orig  Mon Jan 14 20:32:54 2002
+++ key.c   Mon Jan 21 23:09:06 2002
@@ -26,14 +26,14 @@
 debug_key (struct Parrot_Interp* interpreter, KEY* key) {
   INTVAL i;
   fprintf(stderr," *** key %p\n",key);
-  fprintf(stderr," *** size %d\n",key->size);
+  fprintf(stderr," *** size " INTVAL_FMT "\n",key->size);
   for(i=0;isize;i++) {
 INTVAL type = key->keys[i].type;
 if(type == enum_key_bucket) {
-  fprintf(stderr," *** Bucket %d type %d\n",i,type);
+  fprintf(stderr," *** Bucket " INTVAL_FMT " type " INTVAL_FMT "\n",i,type);
 }
 else if(type != enum_key_undef) {
-  fprintf(stderr," *** Other %d type %d\n",i,type);
+  fprintf(stderr," *** Other " INTVAL_FMT " type " INTVAL_FMT "\n",i,type);
 }
   }
 }



Re: [PATCH] quieten many pmc warnings

2002-01-21 Thread Nicholas Clark

Something Jarkko has just sent to p5p reminded me of a comment I thought of
but failed to include in the e-mail

On Mon, Jan 21, 2002 at 10:47:20PM +, Nicholas Clark wrote:
> +  # No, include yourself to check your headers match your bodies

There must be a decent Baron Munchausen quote to replace the above
(from the part of the film where they are visiting the king and queen
of the moon)

Nicholas Clark
-- 
ECOPIOUSFREETIME http://www.ccl4.org/~nick/CV.html



Re: Benchmarking regexes

2002-01-21 Thread Steve Fink

On Mon, Jan 14, 2002 at 01:49:44AM -0800, Brent Dax wrote:
> I wrote a _very_ simple benchmark program to compare Perl 5 and Parrot.
> Here's the result of a test run on my machine:
> 
> C:\brent\Visual Studio Projects\Perl 6\parrot\parrot>..\benchmark
> Benchmarking "bbcdefg" =~ /b[cde]*.f/...
>  perl: 0.03000 seconds for 10_000 iters
>parrot: 0.24100 seconds for 10_000 iters
> Best: perl, worst: parrot. Spread of 0.21100.
> 
> The program is attached; it requires my latest regex patch to work.  You
> may need to tr{\\}{/} in a few places to get it to work on Unix systems.

Are you compiling with optimization? I have my own implementation I've
been toying with, and the first time I benchmarked it, it was pretty
much identical to yours (a little surprising, considering I was
benchmarking a totally different expression!) Then I noticed that I
had compiled it without optimization and tried again with -O3, and the
gap narrowed significantly.

With mine, I am currently seeing:

Benchmarking "xxabbbxx" =~ /ab+a*b/...
 perl: 1.20323 seconds for 500_000 iters
   parrot: 2.87138 seconds for 500_000 iters

Mine doesn't yet handle character classes, so I can't do a direct
comparison. If you want, you can send me an rx.ops implementation of
/ab+a*b/ and I'll report the timings of all three. (This isn't a very
fair benchmark, though, because perl5's optimizations come into play
with this one, and neither of our engines has a "scan for exact
string" op that would let us emulate the optimized expression.)

I notice that string_ord() is taking up a pretty big chunk of time.
Which isn't too surprising, considering that string_index() is

return s->encoding->decode(s->encoding->skip_forward(s->bufstart, idx))

which is more levels of indirection than you can shake a stick at. And
that makes me wonder if we can ever compete fairly with perl5 without
implementing a binary buffer matching mode. Seems like we're always
paying a penalty for doing "proper" string matching by going through
all these levels of encoding.

My RE engine is still pretty rudimentary, but I'll mail a patch to
anyone who wants to take a look at it. The core really isn't much
different from Brent's rx stuff; I think his is slightly more
explicit. The internal wiring is likely to be rather different,
though.



Re: [PATCH] format warning in key.c

2002-01-21 Thread Steve Fink

All of your last several patches look good to me. Didn't Dan give you
commit rights yet? I'm pretty sure he intended to. Dan was also going
to have a discussion of commit policy -- when should we just commit,
and when should we discuss first -- as soon as he gets more settled,
but my vote would be to commit all these cleanup patches. (Including
the unsigned characters but signed digits one.)



Re: [PATCH] format warning in key.c

2002-01-21 Thread Dan Sugalski

At 11:10 PM + 1/21/02, Nicholas Clark wrote:
>We do mandate an ANSI conformant C compiler, don't we?

Yep. If we haven't given you commit rights, go over to dev.perl.org 
and get an account. Then mail me the account name and we'll fix that.
-- 

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: [PATCH] format warning in key.c

2002-01-21 Thread Dan Sugalski

At 3:56 PM -0800 1/21/02, Steve Fink wrote:
>All of your last several patches look good to me. Didn't Dan give you
>commit rights yet? I'm pretty sure he intended to. Dan was also going
>to have a discussion of commit policy -- when should we just commit,
>and when should we discuss first -- as soon as he gets more settled,
>but my vote would be to commit all these cleanup patches. (Including
>the unsigned characters but signed digits one.)

All patches that clean up warnings, style gaffes, and add correct 
comments can just go in. Commits in areas you (the generic you, here) 
have some responsibility for (Brent with the RE code, Jeff Goff for 
PMC stuff, Melvin for IO, for example) can also go in if you're 
comfortable with them. The rest use your judgement cautiously, and if 
you're not sure pop a note to the list and we can go from there.
-- 

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: [PATCH] format warning in key.c [APPLIED]

2002-01-21 Thread Dan Sugalski

At 11:10 PM + 1/21/02, Nicholas Clark wrote:
>We do mandate an ANSI conformant C compiler, don't we?
>
>Appended patch cures these warnings:

Oh, and applied. Thanks.
-- 

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: [PATCH] format warning in key.c

2002-01-21 Thread Bryan C. Warnock

On Monday 21 January 2002 19:06, Dan Sugalski wrote:
> Commits in areas you (the generic you, here)
> have some responsibility for (Brent with the RE code, Jeff Goff for
> PMC stuff, Melvin for IO, for example) can also go in if you're
> comfortable with them. 

That should probably be amended with "only" in there somewhere.  Perhaps in 
multiple places.  (Commits in one area that are depending on or dictating 
design in another should probably get some sort of feedback, too.)


-- 
Bryan C. Warnock
[EMAIL PROTECTED]



[APPLIED] Re: [PATCH] are characters unsigned?

2002-01-21 Thread Alex Gough

On Mon, 21 Jan 2002, Nicholas Clark wrote:

> I thought that it should be this
>
> INTVAL (*get_digit)(UINTVAL c);
>
> not this
>
> UINTVAL (*get_digit)(UINTVAL c);
>

It seems you thought both, I've made a small modification and applied
the patch, thanks.

Alex Gough




Re: [PATCH] warnings in test_main.c

2002-01-21 Thread Alex Gough

On Mon, 21 Jan 2002, Nicholas Clark wrote:

> Before:

lots.

> After:

less.


Applied, thanks.

Alex Gough




Re: [PATCH] are characters unsigned?

2002-01-21 Thread Melvin Smith

At 09:41 PM 1/21/2002 +, Nicholas Clark wrote:
>I am of the opinion that they are UINTVAL, not INTVAL. (and EOF being a
>negative value such as -1 is only needed for C stdio, and I seem to remember
>that Dan has strong opinions on C stdio, and what C can do with it)

Specifically Dan has declared Parrot shall not include stdio by default.
This doesn't stop us from adding a stdio wrapper layer later.

I did see someone mention that they thought "miniparrot" or whatever, might
require us to use a stdio wrapper but I'm not convinced that is the case,
unless that system's _only_ API is stdio. Are there any like that? I know 
of none.

I can speak for only the systems I know (mostly UNIXish) and the low
level file calls don't indicate EOF by returning an "EOF" char anyway,
they indicate by returning 0 bytes on a read attempt. STDIO is the library
that implements the (int)EOF char.

If we implement getc/getchar type calls around read, then we probably
have to implement an EOF value but there is no reason we can't cast the
the unsigned to signed -1, right?

-Melvin




Re: [PATCH] quieten many pmc warnings

2002-01-21 Thread Alex Gough

On Mon, 21 Jan 2002, Nicholas Clark wrote:

> This eliminates many gcc warnings from pmc code by

Applied, thanks.

Alex Gough




String/null terminations

2002-01-21 Thread Melvin Smith

While a few people active, can someone "re-clue" me in on intentions
of string handling. I'd like to stick a couple of calls in the string lib
to:
1) Terminate a string's current buffer if there is room
2) Create a local or alloced buffer with a null terminated string.

These calls would only be used for when there were calls expecting C strings.

Else all low-level code has to do its own copying/dinking with the buffers.
I'll submit a patch but since String stuff isn't my area I'd rather whoever
is maintaining it let me know how they want to handle it.

-Melvin




RE: Benchmarking regexes

2002-01-21 Thread Brent Dax

Steve Fink:
# On Mon, Jan 14, 2002 at 01:49:44AM -0800, Brent Dax wrote:
# > I wrote a _very_ simple benchmark program to compare Perl 5
# and Parrot.
# > Here's the result of a test run on my machine:
# >
# > C:\brent\Visual Studio Projects\Perl 6\parrot\parrot>..\benchmark
# > Benchmarking "bbcdefg" =~ /b[cde]*.f/...
# >  perl: 0.03000 seconds for 10_000 iters
# >parrot: 0.24100 seconds for 10_000 iters
# > Best: perl, worst: parrot. Spread of 0.21100.
# >
# > The program is attached; it requires my latest regex patch
# to work.  You
# > may need to tr{\\}{/} in a few places to get it to work on
# Unix systems.
#
# Are you compiling with optimization? I have my own implementation I've
# been toying with, and the first time I benchmarked it, it was pretty
# much identical to yours (a little surprising, considering I was
# benchmarking a totally different expression!) Then I noticed that I
# had compiled it without optimization and tried again with -O3, and the
# gap narrowed significantly.

I tried it once and did see the gap narrow some, but I keep forgetting
to re-enable it as I modify things and rebuild.  BTW, it's probably
better to use -O, which will let the compiler choose the best
optimization level.  -O3 forces it to optimize to level 3 or give up
completely.

# With mine, I am currently seeing:
#
# Benchmarking "xxabbbxx" =~ /ab+a*b/...
#  perl: 1.20323 seconds for 500_000 iters
#parrot: 2.87138 seconds for 500_000 iters
#
# Mine doesn't yet handle character classes, so I can't do a direct
# comparison.

What a shame.  Character classes are the funnest part of it!  ;^)

(To be fair, rx_oneof sat empty for a very long time.  It's the hardest
matching op to implement.)

# If you want, you can send me an rx.ops implementation of
# /ab+a*b/ and I'll report the timings of all three. (This isn't a very
# fair benchmark, though, because perl5's optimizations come into play
# with this one, and neither of our engines has a "scan for exact
# string" op that would let us emulate the optimized expression.)

Once Parrot gets an index() op based on a fast string search algorithm,
that will become a non-issue.  Also, I seem to remember that somebody
was at least trying to figure out what would be necessary to disable
regex optimizations in Perl 5.

Untested implementation of {"xxabbb"=~/ab+a*b/ for(I0=500_000; I0;
I0--)}:

set I0, 50
set S0, "xxabbbxx"
rx_allocinfo P0, S0
time N0
print N0
$top:
bsr RX_0
rx_clearinfo P0, S0
dec I0
if I0, $top

time N0
print N0
rx_freeinfo P0

RX_0:
rx_setprops "", 3
branch $start
$advance:
rx_advance P0, $fail
$start:
rx_literal P0, "ab", $advance
rx_pushmark P0
$top1:
rx_literal P0, "b", $next1
rx_pushindex P0
branch $top1
$back1:
rx_popindex P0, $advance
$next1:
rx_pushmark P0
$top2:
rx_literal P0, "a", $next2
rx_pushindex P0
branch $top2
$back2:
rx_popindex P0, $back1
$next2:
rx_literal P0, "b", $back2
rx_success P0
ret
$fail:
rx_fail P0
ret

# I notice that string_ord() is taking up a pretty big chunk of time.
# Which isn't too surprising, considering that string_index() is
#
# return
# s->encoding->decode(s->encoding->skip_forward(s->bufstart, idx))

It could also be that we're calling it so damn much.  Even a function
that's just {return a+b;} will take a lot of time if it's called
eleventy jillion times.

# which is more levels of indirection than you can shake a stick at. And
# that makes me wonder if we can ever compete fairly with perl5 without
# implementing a binary buffer matching mode. Seems like we're always
# paying a penalty for doing "proper" string matching by going through
# all these levels of encoding.

I really don't want to start mucking around in string internals.  OTOH,
I'm planning on forcing everything to utf32 Normalization Form KC, so it
may not be too big of a problem.

# My RE engine is still pretty rudimentary, but I'll mail a patch to
# anyone who wants to take a look at it. The core really isn't much
# different from Brent's rx stuff; I think his is slightly more
# explicit. The internal wiring is likely to be rather different,
# though.

Send me a copy.  There's sure to be at least a few things in it that are
better implemented, if not the whole thing.

--Brent Dax
[EMAIL PROTECTED]
Parrot Configure pumpking and regex hacker

 . hawt sysadmin chx0rs
 This is sad. I know of *a* hawt sysamin chx0r.
 I know more than a few.
 obra: There