Re: rx.ops
Brent Dax wrote: Honestly, though, I'm no longer sure the full regex engine is a good idea. A fast index op, a fast ord op, a character class op, and the intstack is really all that's needed to make a regex engine from plain Parrot opcodes. I agree with you on one level. That is enough to make a regex engine. However -- we don't want a regex engine, we want a pattern engine. Grammars are a lot more complicated than regexes[1], and there are many better ways to parse them. One of the small goals of Perl 6 is to eventually get people to use it instead of yacc. Which means it's got to be competitive in speed[2]. And, I will keep grinding this into people's heads until they finally decide to listen, Recursive Descent Will Not Be Competitive In Speed! No matter how well it's implemented. And of the parsing algorithms available to us, recursive descent is the only one which requires only an intstack. All the others need at least a character table. Okay, now I'll stop ranting and start thinking. I think a predictive parser would be pretty well suited. It's got a couple advantages over recursive descent which makes it much faster, but it's sufficiently compatible such that it's possible to switch over to recursive descent when things get too complicated for the algorithm (which they will, once in a while). You need a quick token lookup table to implement that well (things can be tokenized on-the-fly in top-down parsing, so that you don't need a discrete set). Bottom-up methods seem appealing because they're so fast. But they're not sufficiently versatile for what Perl needs. I'm tired, so I'll put some more thought into this later. But this is definitely something that needs to be approached with lateral thinking, and we've got to end up with something good (but not necessarily the first, er, third time :-). After all, Perl's still got to be good at its claim to fame: text processing. Luke [1] I'm speaking in the usual case here. That is, regexes (of the Perl variety) can do everything grammars can, but grammars do the advanced stuff a whole lot more often. [2] Not faster, because that would be pretty impossible. But not agonizingly slow for complex things. --Brent Dax [EMAIL PROTECTED] Perl and Parrot hacker
generic code generator? [was: subroutines and python status]
On Fri, 1 Aug 2003, K Stol wrote: From: Leon Brocard [EMAIL PROTECTED] ... I don't like things becoming dead-ends. How much work do you think it'd be to extend it some more and update it to latest Lua? ... 2: I misdesigned the code generator; that is, at the point where I couldn't start over, it was too late, the code generator was too big already (it was unmaintainable). But because I had a time schedule, I kept it this way (the product itself wasn't the most important thing, I was writing an undergraduate report for the last semester of my education (for the record: the project served me well, I finished this education)) Would it be worth checking this into parrot CVS? Only if the thing would be working, otherwise it would only be a source of confusion and frustration. Now I'm just thinking very hard to decide if I've got enough spare time to rewrite the code generator Hmm. I've only messed around with Lua for a few hours though, and it was several months ago, but the Lua language seems to be pretty similar to python. Really, there's a ton of overlap between the various high level languages that parrot wants to support. Maybe we could put together a generic code generator that everyone could use? Obviously, it would have to be set up so you could override the parts for each language, but it shouldn't be too terribly hard. What do you think? Want to try squishing pirate/python and pirate/lua together? :) Sincerely, Michal J Wallace Sabren Enterprises, Inc. - contact: [EMAIL PROTECTED] hosting: http://www.cornerhost.com/ my site: http://www.withoutane.com/ --
Re: generic code generator? [was: subroutines and python status]
On Sun, 3 Aug 2003, K Stol wrote: At this moment, I'm looking at a new version of Lua, the previous 'pirate' compiled (well, sort of :-) Lua 4 Lua 5 has some features, such as coroutines (If I remembered well) and all kinds of neat stuff for which Parrot has built-in support (and it dropped some/a feature(s) from Lua 4). I think I'll try to create a parser for Lua 5, and to recreate a Lua/Parrot compiler (should go a lot easier now that I had the time to think about the errors I made). Cool. :) I'm just now reading through your report. Really, there's a ton of overlap between the various high level languages that parrot wants to support. Maybe we could put together a generic code generator that everyone could use? Obviously, it would have to be set up so you could override the parts for each language, but it shouldn't be too terribly hard. Sounds like quite a challenge, but a good idea, and I think worth a try. What do you think? Want to try squishing pirate/python and pirate/lua together? :) Yeah, I like the idea. Let's try this out. Great! I figure since you've already got lua 4 working, we can leverage what you've already got and then just add the new features for python and lua 5. If you're still around, want to meet up online real quick? I'm logged in as sabren in #parrot on irc.infobot.org Sincerely, Michal J Wallace Sabren Enterprises, Inc. - contact: [EMAIL PROTECTED] hosting: http://www.cornerhost.com/ my site: http://www.withoutane.com/ --
Re: subs.pod
Vladimir Lipskiy [EMAIL PROTECTED] wrote: What are -, X, and (whitespace) supossed to mean there? X is meaning is in context. Sorry if that is misleading, I'll update the pod. Why is Eval not there? Does it have no context? Its not specified yet, how eval fits into the picture. Its currently different, because it runs a different code segment. We don't have general support for multiple code segment yet. If items in the interpreter context are changed between creation of the subroutine/return continuation and its invocation, the Cupdatecc opcode should be used: What items? Items of the interpreter context or items of the Sub context mentioned above? Is there any difference betwen these? How do I know items are changed? If you have something like: newsub .Sub, .Continuation, _sub_label, ret_label ... loop: ... invoke ret_label: branch loop and e.g. interpreter's warning flags are changed during creation of the return continuation and the subroutine call, the Cupdatecc opcode updates the warnings in the return continuation, so that after returning you have the very same interpreter context. The Cupdatecc should finally set the same state of the return continuation, as if you had an Cinvokecc inside the loop, without the overhead of creating new continuation objects every time. When using the PIR .pcc_begin/.pcc_end directives, Cupdatecc gets inserted automatically when needed. Thanks. leo
Re: string.c questions
Benjamin Goldberg [EMAIL PROTECTED] wrote: Also, although we're told at the top of string.c to not look at s-bufstart or s-buflen, I'd like to know if we are allowed to assume/assert that for all strings, the following is true: s-encoding-skip_forward( s-strstart, s-strlen ) == (char*)s-bufstart + s-bufused No. Fres_lea.c e.g. is using a reference count at bufstart. But with s/bufstart/strstart/ above equation sould be true. leo
String value semantics?
Is this supposed to happen? % parrot - .sub _main $S0 = Hello\n $S1 = $S0 substr $S1, 2, 2, print $S0 print $S1 end .end (EOF) Heo Heo Aren't strings supposed to follow value semantics? Luke
JIT bug with restoretop
(or something) The following program segfaults when run under JIT. .sub _main newsub P0, .Sub, _echo $S0 = abcdefghij savetop restoretop end .end .sub _echo print P5 invoke P1 .end (note that I never call _echo, but the newsub is required to produce the bug) Strangely, it doesn't segfault when $S0 is set to abcdefghi, or anything of that length. In that case, it will likely differ from system to system. I'm running i686 Linux under gcc version 3.2.2. I have a modified i386/jit_emit.h so it will (supposedly) work under this gcc which looks like: Index: jit_emit.h === RCS file: /cvs/public/parrot/jit/i386/jit_emit.h,v retrieving revision 1.76 diff -r1.76 jit_emit.h 11c11 #if defined HAVE_COMPUTED_GOTO defined __GNUC__ --- #if defined HAVE_COMPUTED_GOTO defined __GNUC__ 0 This fix has worked fine with JIT until now, so I suspect the problem is elsewhere. Luke
Re: JIT bug with restoretop
On 3 Aug 2003, Luke Palmer wrote: This fix has worked fine with JIT until now, so I suspect the problem is elsewhere. Bug confirmed here (although I need a slightly longer string to trigger it). Here's a stacktrace: --- Program received signal SIGSEGV, Segmentation fault. 0x0809c215 in Parrot_exec_add_text_rellocation (obj=0x50, nptr=0x820c94c , type=2, symbol=0x815ab88 interpre, disp=-4) at exec.c:233 233 new_relloc = mem_sys_realloc(obj-text_rellocation_table, (gdb) bt #0 0x0809c215 in Parrot_exec_add_text_rellocation (obj=0x50, nptr=0x820c94c , type=2, symbol=0x815ab88 interpre, disp=-4) at exec.c:233 #1 0x080a9dbc in Parrot_jit_begin (jit_info=0x8209960, interpreter=0x819ed98) at include/parrot/jit_emit.h:2520 #2 0x080a6919 in build_asm (interpreter=0x819ed98, pc=0x82098e8, code_start=0x82098e8, code_end=0x8209920, objfile=0x0) at jit.c:1020 #3 0x0806169d in runops_jit (interpreter=0x819ed98, pc=0x82098e8) at interpreter.c:438 #4 0x080619b9 in runops_int (interpreter=0x819ed98, offset=0) at interpreter.c:591 #5 0x08061a1b in runops_ex (interpreter=0x819ed98, offset=0) at interpreter.c:607 #6 0x08061b1d in runops (interpreter=0x819ed98, offset=0) at interpreter.c:643 #7 0x080d5089 in Parrot_runcode (interpreter=0x819ed98, argc=1, argv=0xb39c) at embed.c:377 -- Simon
Re: JIT bug with restoretop
On Sunday 03 August 2003 15:27, Simon Glover wrote: On 3 Aug 2003, Luke Palmer wrote: This fix has worked fine with JIT until now, so I suspect the problem is elsewhere. Bug confirmed here (although I need a slightly longer string to trigger it). Here's a stacktrace: I couldn't reproduce it here, but from your bt I supposed that jit_info-objfile was getting something (!= NULL) while running under jit, as it was not initialized correctly, so I fixed that. Could you confirm if the bug's still there? Simon Daniel
Re: JIT bug with restoretop
On Sun, 3 Aug 2003, Daniel Grunblatt wrote: On Sunday 03 August 2003 15:27, Simon Glover wrote: On 3 Aug 2003, Luke Palmer wrote: This fix has worked fine with JIT until now, so I suspect the problem is elsewhere. Bug confirmed here (although I need a slightly longer string to trigger it). Here's a stacktrace: I couldn't reproduce it here, but from your bt I supposed that jit_info-objfile was getting something (!= NULL) while running under jit, as it was not initialized correctly, so I fixed that. Could you confirm if the bug's still there? No, that seems to have fixed it. Simon
Re: generic code generator? [was: subroutines and python status]
On Sun, 3 Aug 2003 19:25, Michal Wallace wrote: On Fri, 1 Aug 2003, K Stol wrote: Really, there's a ton of overlap between the various high level languages that parrot wants to support. Maybe we could put together a generic code generator that everyone could use? Obviously, it would have to be set up so you could override the parts for each language, but it shouldn't be too terribly hard. What do you think? Want to try squishing pirate/python and pirate/lua together? :) A nice high level code generator would be in my interests as well. Seeing as I'm currently working on php/parrot and I've got 'hello world' standard imcc code generation going. I'd really like to be able to save alot of the low level work. With regards to my own project, would it be appropriate to ask for parrot CVS access in order to publish the php compiler in the parrot source tree? One of the files is under the Zend license, being a direct derivation from zend_language_scanner.y, are there any licensing restrictions about what goes into perl cvs? Stephen.
pdd03_calling_conventions.pod questions
Q1: Suppose I have the following call into a sub named foo: foo($var1, $var2, $var3); What should I set in I1? Is it 3? And here: foo($var1, @arr2, %hash3); Is it still 3, since these aren't gonna be flattened? Q2: I'm calling without prototyping foo($var1, $var2, $var3, ... , $var23); Here, what should I place in I2? Is it 11 (as we have P5-P15) or 23 (considering the P3 register)? Thanks (~: he's a boxer, that's why he has a borken nose
Re: string.c questions
Luke Palmer wrote: Benjamin Golberg writes: Actually, these are mostly questions about the string_str_index function. Uh oh... I've some questions about bufstart, strstart, bufused, strlen and encoding-characters? In string_str_index_multibyte, the lastmatch variable is calculated as: const void* const lastmatch = str-encoding-skip_backward((char*)str-strstart + str-strlen, find-encoding-characters(find, find-strlen)); There seems to be quite a bit of confusion on this line about bytes and characters... the goal here seems to be to find a pointer to the last place where it would be possible to begin a match. Yep. You're right, there is a bit of confusion about characters and bytes in this statement -- mostly because I'm confused about characters and bytes in Parrot. So... str-strlen is the number of *characters* in the string? Hmm.. that changes things. Maybe someone else should fix this -- who knows what they're doing :-) Do we have tests for multibyte string operations in the test suite? What's with find and -characters? Shouldn't find-strlen be sufficient, without all that other stuff around it? Next... If find-strlen represents the number of characters as you say, then yes. If these weren't multibyte strings, then this would be (str-strstart + str-strlen - find-strlen), right? Or, translating that literally (and doing the subtraction first): const void* const lastmatch = str-encoding-skip_forward( str-strstart, str-strlen - find-strlen ); Yeah, the thing about that is, for strings in UTF formats, skip_forward is a linear time operation, which is pretty expensive when there's a lot of data. That's why I used pointers in this function instead of string_index as the previous implementation did. Except, of course, that the pointer arithmetic version was wrong :( Or, if we can do that trick for finding the end of a string: const void* const lastmatch = str-encoding-skip_backward( (char*)str-bufstart + str-bufused, find-strlen ); Similarly, the lastfind variable should either be: const void* const lastfind = find-encoding-skip_forward( find-strlen ); skip_forward takes 2 args, I assume you mean: const void* const lastfind = find-encoding-skip_forward( find, find-strlen); Actually, I think that I meant: const void* const lastfind = find-encoding-skip_forward( find-strstart, find-strlen ); Since I assume that the functions of encoding objects operate on pointers into a buffer's data area, *not* on STRING* objects. Again, that's linear time. But usually the string to find won't be that long, so it's not so important in this case. But your shortcut would still be faster. Or: const void* const lastfind = (char*)find-bufstart + find-bufused; -- $a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca );{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED] ]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}
Re: string.c questions
Leopold Toetsch wrote: Benjamin Goldberg [EMAIL PROTECTED] wrote: Also, although we're told at the top of string.c to not look at s-bufstart or s-buflen, I'd like to know if we are allowed to assume/assert that for all strings, the following is true: s-encoding-skip_forward( s-strstart, s-strlen ) == (char*)s-bufstart + s-bufused No. Fres_lea.c e.g. is using a reference count at bufstart. But with s/bufstart/strstart/ above equation sould be true. To what does 'bufused' refer? The number of bytes from where to where? I *thought* that it was from bufstart to the end of the string... no? And where is all this documented? -- $a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca );{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED] ]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}
Ultra bootstrapping :)
Considering that parrot is now emitting an executable (on some platforms)... and IIRC, C will be one of the languages we plan to have parrot support for... will parrot be able to compile itself? :) -- $a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca );{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED] ]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}
Infant mortality
I was recently reading the following: http://www.parrotcode.org/docs/dev/infant.dev.html It's missing some things. One of which is the (currently used?) way of preventing infant mortality: anchor right away, or else turn off DoD until the new object isn't needed. This document doesn't mention another technique, which was mentioned recently: http://groups.google.com/groups? selm=m2k7asg87i.fsf%40helium.physik.uni-kl.de , at the use a linked list of frames part. Another similar idea (one which I thought of myself, so feel free to shoot it down!) is to use a generational system, with the current generation as a value on the C stack, passed as an argument after the interpreter. That is, something like: foo(struct ParrotInterp *interpreter, int generation, ...) { PMC * temp = bar(interpreter, generation); baz(interpreter, generation+1); } Because inside baz(), generation is a higher value than it was when temp was created, a DOD run inside of baz() won't kill foo. During a DOD run, any PMC with a generation less than or equal to the current generation is considered live. Any PMC with a generation greater than the current generation gets it's generation set to 0. Like the linked list scheme, this works through longjmps and recursive run_cores, and it's much simpler for the user, too: just add one to the generation to prevent all temporaries in scope from being freed. It similarly has the drawback of altering the signature of every parrot function. There's another drawback I can think of... consider: foo(struct ParrotInterp *interpreter, int generation, ...) { PMC * temp = bar(interpreter, generation); baz(interpreter, generation+1); qux(interpreter, generation+1); } If baz creates a temporary object and returns, then qux performs a DOD, baz's (dead) object won't get cleaned up. This could be solved by keeping a stack of newly created objects, and providing some sort of generational_dod_helper() function, which would do something like: while( neonates neonates-top-generation current_generation ) { neonates-top-generation = 0; neonates = neonates-next; } , and calling that in foo between baz and qux. (And maybe sometimes at toplevel, between opcodes... at times when the generation count in a normal generation count scheme (with a global counter) would be incremented) You lost a bit of simplicity, by having to call this function occcasionally, but it can save a bit of memory. -- $a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca );{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED] ]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}
double checking: in vs on?
Hey all, Python objects can have things in them: foo[x] = in ... and it can also have things on them: foo.x = on I noticed lua treats these as the same thing and got curious about the distinction in IMCC. Coding it this way seems to work, but I'm not sure I really understood the docs, so I'm just double checking. Do I have the semantics right here? ## in_vs_on.imc ### P0 = new PerlHash P1 = new PerlString ## foo[x] = in P0[x] = in ## foo.x = on P1 = on setprop P0, x, P1 S1 = P0[x] print S1 # foo[x] print vs getprop P2, x, P0 print P2 # foo.x print \n end ## outputs: in vs on\n ## And in the PMC vtable, it maps this way: in = get_*_keyed, set_*_keyed, delete_keyed_* on = getprop / setprop / delprop Is that right? ( Any reason it's not del_*_keyed? :) ) Sincerely, Michal J Wallace Sabren Enterprises, Inc. - contact: [EMAIL PROTECTED] hosting: http://www.cornerhost.com/ my site: http://www.withoutane.com/ --