Re: rx.ops

2003-08-03 Thread Luke Palmer
Brent Dax wrote:
 Honestly, though, I'm no longer sure the full regex engine is a good idea.
 A fast index op, a fast ord op, a character class op, and the intstack is
 really all that's needed to make a regex engine from plain Parrot opcodes.

I agree with you on one level.  That is enough to make a regex engine.

However -- we don't want a regex engine, we want a pattern engine.
Grammars are a lot more complicated than regexes[1], and there are
many better ways to parse them.

One of the small goals of Perl 6 is to eventually get people to use it
instead of yacc.  Which means it's got to be competitive in speed[2].
And, I will keep grinding this into people's heads until they finally
decide to listen, Recursive Descent Will Not Be Competitive In Speed!
No matter how well it's implemented.  And of the parsing algorithms
available to us, recursive descent is the only one which requires only
an intstack.  All the others need at least a character table.

Okay, now I'll stop ranting and start thinking.

I think a predictive parser would be pretty well suited.  It's got a
couple advantages over recursive descent which makes it much faster,
but it's sufficiently compatible such that it's possible to switch
over to recursive descent when things get too complicated for the
algorithm (which they will, once in a while).  You need a quick token
lookup table to implement that well (things can be tokenized
on-the-fly in top-down parsing, so that you don't need a discrete
set).

Bottom-up methods seem appealing because they're so fast.  But they're
not sufficiently versatile for what Perl needs.

I'm tired, so I'll put some more thought into this later.  But this is
definitely something that needs to be approached with lateral
thinking, and we've got to end up with something good (but not
necessarily the first, er, third time :-).  After all, Perl's still
got to be good at its claim to fame: text processing.

Luke


[1] I'm speaking in the usual case here.  That is, regexes (of the
Perl variety) can do everything grammars can, but grammars do the
advanced stuff a whole lot more often.

[2] Not faster, because that would be pretty impossible.  But not
agonizingly slow for complex things.


 --Brent Dax [EMAIL PROTECTED]
 Perl and Parrot hacker


generic code generator? [was: subroutines and python status]

2003-08-03 Thread Michal Wallace
On Fri, 1 Aug 2003, K Stol wrote:

  From: Leon Brocard [EMAIL PROTECTED]
...
  I don't like things becoming dead-ends. How much work do you think
  it'd be to extend it some more and update it to latest Lua?
...
 2: I misdesigned the code generator; that is, at the point where I
 couldn't start over, it was too late, the code generator was too big
 already (it was unmaintainable). But because I had a time schedule,
 I kept it this way (the product itself wasn't the most important
 thing, I was writing an undergraduate report for the last semester
 of my education (for the record: the project served me well, I
 finished this education))

  Would it be worth checking this into parrot CVS?

 Only if the thing would be working, otherwise it would only be a
 source of confusion and frustration.  Now I'm just thinking very
 hard to decide if I've got enough spare time to rewrite the code
 generator

Hmm. I've only messed around with Lua for a few hours
though, and it was several months ago, but the Lua 
language seems to be pretty similar to python.

Really, there's a ton of overlap between the various
high level languages that parrot wants to support.
Maybe we could put together a generic code generator
that everyone could use? Obviously, it would have to
be set up so you could override the parts for each
language, but it shouldn't be too terribly hard.

What do you think? Want to try squishing pirate/python 
and pirate/lua together? :)

Sincerely,
 
Michal J Wallace
Sabren Enterprises, Inc.
-
contact: [EMAIL PROTECTED]
hosting: http://www.cornerhost.com/
my site: http://www.withoutane.com/
--




Re: generic code generator? [was: subroutines and python status]

2003-08-03 Thread Michal Wallace
On Sun, 3 Aug 2003, K Stol wrote:

 At this moment, I'm looking at a new version of Lua, the previous
 'pirate' compiled (well, sort of :-) Lua 4 Lua 5 has some features,
 such as coroutines (If I remembered well) and all kinds of neat
 stuff for which Parrot has built-in support (and it dropped some/a
 feature(s) from Lua 4). I think I'll try to create a parser for Lua
 5, and to recreate a Lua/Parrot compiler (should go a lot easier now
 that I had the time to think about the errors I made).

Cool. :) I'm just now reading through your report. 


  Really, there's a ton of overlap between the various
  high level languages that parrot wants to support.
  Maybe we could put together a generic code generator
  that everyone could use? Obviously, it would have to
  be set up so you could override the parts for each
  language, but it shouldn't be too terribly hard.
 
 Sounds like quite a challenge, but a good idea, and I think worth a try.

  What do you think? Want to try squishing pirate/python
  and pirate/lua together? :)
 
 Yeah, I like the idea. Let's try this out.

Great! I figure since you've already got lua 4
working, we can leverage what you've already
got and then just add the new features for 
python and lua 5.

If you're still around, want to meet up online real
quick? I'm logged in as sabren in #parrot on 
irc.infobot.org

Sincerely,
 
Michal J Wallace
Sabren Enterprises, Inc.
-
contact: [EMAIL PROTECTED]
hosting: http://www.cornerhost.com/
my site: http://www.withoutane.com/
--




Re: subs.pod

2003-08-03 Thread Leopold Toetsch
Vladimir Lipskiy [EMAIL PROTECTED] wrote:

 What are -, X, and  (whitespace) supossed to mean there?

X is meaning is in context. Sorry if that is misleading, I'll update
the pod.

 Why is Eval not there? Does it have no context?

Its not specified yet, how eval fits into the picture. Its currently
different, because it runs a different code segment. We don't have
general support for multiple code segment yet.

 If items in the interpreter context are changed between creation of the
 subroutine/return continuation and its invocation, the Cupdatecc opcode
 should be used:

 What items? Items of the interpreter context or items of the Sub context
 mentioned above? Is there any difference betwen these? How do I know
 items are changed?

If you have something like:

newsub .Sub, .Continuation, _sub_label, ret_label
...
  loop:
...
invoke
  ret_label:
branch loop

and e.g. interpreter's warning flags are changed during creation of the
return continuation and the subroutine call, the Cupdatecc opcode
updates the warnings in the return continuation, so that after returning
you have the very same interpreter context. The Cupdatecc should
finally set the same state of the return continuation, as if you had an
Cinvokecc inside the loop, without the overhead of creating new
continuation objects every time.

When using the PIR .pcc_begin/.pcc_end directives, Cupdatecc gets
inserted automatically when needed.

 Thanks.

leo


Re: string.c questions

2003-08-03 Thread Leopold Toetsch
Benjamin Goldberg [EMAIL PROTECTED] wrote:

 Also, although we're told at the top of string.c to not look at
 s-bufstart or s-buflen, I'd like to know if we are allowed to
 assume/assert that for all strings, the following is true:

s-encoding-skip_forward( s-strstart, s-strlen ) ==
   (char*)s-bufstart + s-bufused

No. Fres_lea.c e.g. is using a reference count at bufstart. But with
s/bufstart/strstart/ above equation sould be true.

leo


String value semantics?

2003-08-03 Thread Luke Palmer
Is this supposed to happen?

% parrot -
.sub _main
$S0 = Hello\n
$S1 = $S0
substr $S1, 2, 2, 
print $S0
print $S1
end
.end
(EOF)
Heo
Heo

Aren't strings supposed to follow value semantics?

Luke


JIT bug with restoretop

2003-08-03 Thread Luke Palmer
(or something)

The following program segfaults when run under JIT.

.sub _main
newsub P0, .Sub, _echo

$S0 = abcdefghij

savetop
restoretop

end
.end

.sub _echo
print P5
invoke P1
.end

(note that I never call _echo, but the newsub is required to produce
the bug)

Strangely, it doesn't segfault when $S0 is set to abcdefghi, or
anything of that length.  In that case, it will likely differ from
system to system.

I'm running i686 Linux under gcc version 3.2.2.  I have a modified
i386/jit_emit.h so it will (supposedly) work under this gcc which
looks like:

Index: jit_emit.h
===
RCS file: /cvs/public/parrot/jit/i386/jit_emit.h,v
retrieving revision 1.76
diff -r1.76 jit_emit.h
11c11
 #if defined HAVE_COMPUTED_GOTO  defined __GNUC__
---
 #if defined HAVE_COMPUTED_GOTO  defined __GNUC__  0

This fix has worked fine with JIT until now, so I suspect the problem
is elsewhere.

Luke


Re: JIT bug with restoretop

2003-08-03 Thread Simon Glover

On 3 Aug 2003, Luke Palmer wrote:

 This fix has worked fine with JIT until now, so I suspect the problem
 is elsewhere.


 Bug confirmed here (although I need a slightly longer string to trigger
 it). Here's a stacktrace:

 ---

  Program received signal SIGSEGV, Segmentation fault.
0x0809c215 in Parrot_exec_add_text_rellocation (obj=0x50, nptr=0x820c94c
,
type=2, symbol=0x815ab88 interpre, disp=-4) at exec.c:233
233 new_relloc = mem_sys_realloc(obj-text_rellocation_table,
(gdb) bt
#0  0x0809c215 in Parrot_exec_add_text_rellocation (obj=0x50,
nptr=0x820c94c , type=2, symbol=0x815ab88 interpre, disp=-4)
at exec.c:233
#1  0x080a9dbc in Parrot_jit_begin (jit_info=0x8209960,
interpreter=0x819ed98)
at include/parrot/jit_emit.h:2520
#2  0x080a6919 in build_asm (interpreter=0x819ed98, pc=0x82098e8,
code_start=0x82098e8, code_end=0x8209920, objfile=0x0) at jit.c:1020
#3  0x0806169d in runops_jit (interpreter=0x819ed98, pc=0x82098e8)
at interpreter.c:438
#4  0x080619b9 in runops_int (interpreter=0x819ed98, offset=0)
at interpreter.c:591
#5  0x08061a1b in runops_ex (interpreter=0x819ed98, offset=0)
at interpreter.c:607
#6  0x08061b1d in runops (interpreter=0x819ed98, offset=0) at
interpreter.c:643
#7  0x080d5089 in Parrot_runcode (interpreter=0x819ed98, argc=1,
argv=0xb39c) at embed.c:377

 --

 Simon



Re: JIT bug with restoretop

2003-08-03 Thread Daniel Grunblatt
On Sunday 03 August 2003 15:27, Simon Glover wrote:
 On 3 Aug 2003, Luke Palmer wrote:
  This fix has worked fine with JIT until now, so I suspect the problem
  is elsewhere.

  Bug confirmed here (although I need a slightly longer string to trigger
  it). Here's a stacktrace:

I couldn't reproduce it here, but from your bt I supposed that 
jit_info-objfile was getting something (!= NULL) while running under jit, as 
it was not initialized correctly, so I fixed that.
Could you confirm if the bug's still there?


  Simon

Daniel


Re: JIT bug with restoretop

2003-08-03 Thread Simon Glover

On Sun, 3 Aug 2003, Daniel Grunblatt wrote:

 On Sunday 03 August 2003 15:27, Simon Glover wrote:
  On 3 Aug 2003, Luke Palmer wrote:
   This fix has worked fine with JIT until now, so I suspect the problem
   is elsewhere.
 
   Bug confirmed here (although I need a slightly longer string to trigger
   it). Here's a stacktrace:

 I couldn't reproduce it here, but from your bt I supposed that
 jit_info-objfile was getting something (!= NULL) while running under jit, as
 it was not initialized correctly, so I fixed that.
 Could you confirm if the bug's still there?

 No, that seems to have fixed it.

 Simon



Re: generic code generator? [was: subroutines and python status]

2003-08-03 Thread Stephen Thorne
On Sun, 3 Aug 2003 19:25, Michal Wallace wrote:
 On Fri, 1 Aug 2003, K Stol wrote:
 Really, there's a ton of overlap between the various
 high level languages that parrot wants to support.
 Maybe we could put together a generic code generator
 that everyone could use? Obviously, it would have to
 be set up so you could override the parts for each
 language, but it shouldn't be too terribly hard.

 What do you think? Want to try squishing pirate/python
 and pirate/lua together? :)

A nice high level code generator would be in my interests as well. Seeing as 
I'm currently working on php/parrot and I've got 'hello world' standard imcc 
code generation going. I'd really like to be able to save alot of the low 
level work.

With regards to my own project, would it be appropriate to ask for parrot CVS 
access in order to publish the php compiler in the parrot source tree? One of 
the files is under the Zend license, being a direct derivation from 
zend_language_scanner.y, are there any licensing restrictions about what goes 
into perl cvs?

Stephen.


pdd03_calling_conventions.pod questions

2003-08-03 Thread Vladimir Lipskiy
Q1: Suppose I have the following call into a sub named foo:

foo($var1, $var2, $var3);

What should I set in I1? Is it 3?

And here:

foo($var1, @arr2, %hash3);

Is it still 3, since these aren't gonna be flattened?

Q2: I'm calling without prototyping

foo($var1, $var2, $var3, ... , $var23);

Here, what should I place in I2?  Is it 11 (as we have P5-P15) or
23 (considering the P3 register)?

Thanks (~: he's a boxer, that's why he has a borken nose



Re: string.c questions

2003-08-03 Thread Benjamin Goldberg


Luke Palmer wrote:
 
 Benjamin Golberg writes:
  Actually, these are mostly questions about the string_str_index
  function.
 
 Uh oh...
 
  I've some questions about bufstart, strstart, bufused, strlen and
  encoding-characters?
 
  In string_str_index_multibyte, the lastmatch variable is calculated as:
 
  const void* const lastmatch =
 str-encoding-skip_backward((char*)str-strstart + str-strlen,
find-encoding-characters(find, find-strlen));
 
  There seems to be quite a bit of confusion on this line about bytes and
  characters... the goal here seems to be to find a pointer to the last
  place where it would be possible to begin a match.
 
 Yep.
 
 You're right, there is a bit of confusion about characters and bytes
 in this statement -- mostly because I'm confused about characters and
 bytes in Parrot.  So... str-strlen is the number of *characters* in
 the string?  Hmm.. that changes things.
 
 Maybe someone else should fix this -- who knows what they're doing :-)
 Do we have tests for multibyte string operations in the test suite?
 
  What's with find and -characters?  Shouldn't find-strlen be
  sufficient, without all that other stuff around it?  Next...
 
 If find-strlen represents the number of characters as you say, then
 yes.
 
  If these weren't multibyte strings, then this would be (str-strstart +
  str-strlen - find-strlen), right?  Or, translating that literally (and
  doing the subtraction first):
 
  const void* const lastmatch = str-encoding-skip_forward(
 str-strstart, str-strlen - find-strlen );
 
 Yeah, the thing about that is, for strings in UTF formats,
 skip_forward is a linear time operation, which is pretty expensive
 when there's a lot of data.  That's why I used pointers in this
 function instead of string_index as the previous implementation did.

Except, of course, that the pointer arithmetic version was wrong :(


  Or, if we can do that trick for finding the end of a string:
 
  const void* const lastmatch = str-encoding-skip_backward(
 (char*)str-bufstart + str-bufused, find-strlen );
 
  Similarly, the lastfind variable should either be:
 
  const void* const lastfind = find-encoding-skip_forward(
 find-strlen );
 
 skip_forward takes 2 args, I assume you mean:
 
 const void* const lastfind = find-encoding-skip_forward(
 find, find-strlen);

Actually, I think that I meant:

 const void* const lastfind = find-encoding-skip_forward(
find-strstart, find-strlen );

Since I assume that the functions of encoding objects operate on pointers
into a buffer's data area, *not* on STRING* objects.

 Again, that's linear time.  But usually the string to find won't be
 that long, so it's not so important in this case.  But your shortcut
 would still be faster.
 
  Or:
 
  const void* const lastfind = (char*)find-bufstart + find-bufused;

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: string.c questions

2003-08-03 Thread Benjamin Goldberg


Leopold Toetsch wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] wrote:
 
  Also, although we're told at the top of string.c to not look at
  s-bufstart or s-buflen, I'd like to know if we are allowed to
  assume/assert that for all strings, the following is true:
 
 s-encoding-skip_forward( s-strstart, s-strlen ) ==
(char*)s-bufstart + s-bufused
 
 No. Fres_lea.c e.g. is using a reference count at bufstart. But with
 s/bufstart/strstart/ above equation sould be true.

To what does 'bufused' refer?  The number of bytes from where to where?

I *thought* that it was from bufstart to the end of the string... no?

And where is all this documented?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Ultra bootstrapping :)

2003-08-03 Thread Benjamin Goldberg

Considering that parrot is now emitting an executable (on some
platforms)... and IIRC, C will be one of the languages we plan to have
parrot support for... will parrot be able to compile itself? :)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Infant mortality

2003-08-03 Thread Benjamin Goldberg
I was recently reading the following:

   http://www.parrotcode.org/docs/dev/infant.dev.html

It's missing some things.  One of which is the (currently used?) way of
preventing infant mortality: anchor right away, or else turn off DoD
until the new object isn't needed.

This document doesn't mention another technique, which was mentioned
recently:

   http://groups.google.com/groups?
  selm=m2k7asg87i.fsf%40helium.physik.uni-kl.de

, at the use a linked list of frames part.


Another similar idea (one which I thought of myself, so feel free to
shoot it down!) is to use a generational system, with the current
generation as a value on the C stack, passed as an argument after the
interpreter.  That is, something like:

foo(struct ParrotInterp *interpreter, int generation, ...)
  {
PMC * temp = bar(interpreter, generation);
baz(interpreter, generation+1);
  }

Because inside baz(), generation is a higher value than it was when temp
was created, a DOD run inside of baz() won't kill foo.

During a DOD run, any PMC with a generation less than or equal to the
current generation is considered live.  Any PMC with a generation
greater than the current generation gets it's generation set to 0.

Like the linked list scheme, this works through longjmps and recursive
run_cores, and it's much simpler for the user, too: just add one to the
generation to prevent all temporaries in scope from being freed.

It similarly has the drawback of altering the signature of every parrot
function.

There's another drawback I can think of... consider:

foo(struct ParrotInterp *interpreter, int generation, ...)
  {
PMC * temp = bar(interpreter, generation);
baz(interpreter, generation+1);
qux(interpreter, generation+1);
  }

If baz creates a temporary object and returns, then qux performs a DOD,
baz's (dead) object won't get cleaned up.

This could be solved by keeping a stack of newly created objects, and
providing some sort of generational_dod_helper() function, which would
do something like:
   while( neonates  neonates-top-generation  current_generation ) {
  neonates-top-generation = 0;
  neonates = neonates-next;
   }
, and calling that in foo between baz and qux.  (And maybe sometimes at
toplevel, between opcodes... at times when the generation count in a
normal generation count scheme (with a global counter) would be
incremented)  You lost a bit of simplicity, by having to call this
function occcasionally, but it can save a bit of memory.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


double checking: in vs on?

2003-08-03 Thread Michal Wallace

Hey all,

Python objects can have things in them:

  foo[x] = in

... and it can also have things on them:

  foo.x = on

I noticed lua treats these as the same thing
and got curious about the distinction in IMCC.

Coding it this way seems to work, but I'm
not sure I really  understood the docs, so I'm 
just double checking. Do I have the semantics
right here?

## in_vs_on.imc ###

P0 = new PerlHash
P1 = new PerlString

## foo[x] = in
P0[x] = in

## foo.x = on
P1 = on
setprop P0, x, P1


S1 = P0[x]
print S1   # foo[x]

print  vs 

getprop P2, x, P0
print P2   # foo.x

print \n
end

## outputs: in vs on\n ##


And in the PMC vtable, it maps this way:

  in = get_*_keyed, set_*_keyed, delete_keyed_* 
  on = getprop / setprop / delprop

Is that right?

( Any reason it's not del_*_keyed? :) )

Sincerely,
 
Michal J Wallace
Sabren Enterprises, Inc.
-
contact: [EMAIL PROTECTED]
hosting: http://www.cornerhost.com/
my site: http://www.withoutane.com/
--