Re: This week's summary
On Wed, 22 Sep 2004 21:11:02 +0100, The Perl 6 Summarizer [EMAIL PROTECTED] wrote: The Perl 6 Summary for the week ending 2004-09-17 Another week, another summary, and I'm running late. So: This week in perl6-compiler Bootstrapping the grammar Uri Guttman had some thoughts on bootstrapping Perl 6's grammar. He hoped that his suggested approach would enable lots of people to work on the thing at once without necessarily getting in each other's way. Adam Turoff pointed everyone at a detailed description of how Squeak (a free Smalltalk) got bootstrapped. http://xrl.us/c6kp This link doesn't seem to be working, and www.perl6.org doesn't have the archives of perl6-compiler online yet. Does anyone have a link to the archives that works?
Re: Please rename 'but' to 'has'.
At 09:45 AM 04-26-2002 -0700, Larry Wall wrote: Tim Bunce writes: : For perl at least I thought Larry has said that you'll be able to : create new ops but only give them the same precedence as any one : of the existing ops. Close, but not quite. What I think I said was that you can't specify a raw precedence--you can only specify a precedence relative to an existing operator. That way it doesn't matter what the initial precedence assignments are. We can always change them internally. : Why not use a 16 bit int and specify that languages should use : default precedence levels spread through the range but keeping the : bottom 8 bits all zero. That gives 255 levels between '3' and '4'. : Seems like enough to me! : : Floating point seems like over-egging the omelette. It's also under-egging the omelette, and not just because you eventually run out of bits. I don't think either integer or floating point is the best solution, because in either case you have to remember separately how many levels of derivation from the standard precedence levels you are, so you know which bit to flip, or which increment to add or subtract from the floater. snip So you'd have something like: sub operator:mult($a, $b) is looser('*') is inline {...} sub operator:add($a, $b) is tighter(+) is inline {...} sub operator:div($a,$b) is looser(/) is inline {...} assuming default Perl5 precedences for *, *, and / you would have the precedence strings for *, +, /, mult, add, and div to be S, R, S, S2, S1, S2 respectively? So mult and div would have the same precedences? Hmmm What problems would be caused by: sub operator:radd($a,$b) is tighter(+) is inline is rightassociative {...} sub operator:ladd($a,$b) is tighter(+) is inline is leftassociative {...} Right now, all the operator precedence levels in Perl5 have either right, left, or no associativity, but they do not mix right and left associative operators. Will that be allowed in Perl6? Larry
Re: scheme-pairs?
At 11:32 AM 01-24-2002 -0500, Dan Sugalski wrote: At 4:19 PM + 1/24/02, Dave Mitchell wrote: Dan Sugalski [EMAIL PROTECTED] wrote: That was my biggest objection. I like the thought of having a scheme pair data type. The interpreter should see it, and it should be accessed, as a restricted array, one with only two entries. Is this then the same datatype as a Perl6 pair (cf '=' op in Apo 3) ?? Good point. it probably is, yes. (Though there may be potential differences--depends on whether the scheme pair can only have scalars on each side, or should allow other things) In scheme, at least, pairs can contain any data on either side. The notation for a pair is (value . value), and standard list notation (a b c d e f g) is simply syntactic sugar for (a . (b . (c . (d . (e . (f . (g . '(. Although only the cdr of these pairs contain pairs, in a list like ((a a) (b b)) (also written as ((a . (a . '())) . ((b . (b . '())) . '(, both the car and cdr of the outermost pair contain pairs.
Re: RFC: Bytecode file format
At 03:10 PM 09-14-2001 -0500, Brian Wheeler wrote: I've been thinking alot about the bytecode file format lately. Its going to get really gross really fast when we start adding other (optional) sections to the code. So, with that in mind, here's what I propose: snip What do you guys think? Have you taken a look at the old Amiga IFF format? It consisted mainly of chunks identified by a 32-bit type code and a chunk-length code. While most implementations were for specific multi-media applications (chunks defining sound formats, chunks defining image formats, etc), the standard itself was data-neutral. I believe that Microsoft is using a derivative of that format for some of its files, and I think that TIFF files are another instantiation. It may be worth looking at to avoid re-inventing wheels. Brian
Re: Using int32_t instead of IV for code
At 04:55 PM 09-13-2001 -0400, Andy Dougherty wrote: In perl.perl6.internals, you wrote: The attached patch makes all bytecode have a type of int32_t rather than IV; it also contains the other stuff I needed to get the tests running on my Alpha (modifications to config.h.in and register.c). I think this is a bad idea. There simply is no guarantee that there's a native integral type with 32 bits. And having an int32_t type that *isn't* 32-bits is just plain confusing. Just ask anyone who's gotten burnt by perl5's I32, which has the exact same problem. Well, since bytecode is defined to be 32-bit, it makes sense to define it as an int32_t type and have the definition of an int32_t be platform-specific.
Re: Math functions? (Particularly transcendental ones)
Dan Sugalski [EMAIL PROTECTED] writes: At 07:43 PM 9/8/2001 -0700, Wizard wrote: Questions regarding Bitwise operators: =item rol tx, ty, tz * ... =item ror tx, ty, tz * Are these with or without carry? That's a good question. Now that we have a list of bitwise ops, we can decide how they work. What happens when you rotate/shift/bit-or a float? Or a bitint/bigfloat? Or a string? Important questions, and we can hammer something out now that we know what they are. I'd like to suggest that the shift- and roll/rotate- ops take a 4th parameter, that being the word-size in bits. For Bigints and arbitrary-length bit-vectors, the size of a word to rotate or shift could be infinite, probably isn't what is wanted. It would also make simpler such operations that might come up in some cryptographic routines, like rotate the upper 64 bits left 3 bits, which would be encoded as (assuming rotate_l dest, source, roll-amount, wordsize) rotate_l P1, P1, 64, 128 rotate_l P1, P1, 3, 64 rotate_r P1, P1, 64, 128 Just my 2 centums.
Re: Math functions? (Particularly transcendental ones)
Dan Sugalski [EMAIL PROTECTED] writes: Okay, I'm whipping together the fancy math section of the interpreter assembly language. I've got: snip Can anyone think of things I've forgotten? It's been a while since I've done numeric work. Uri mentioned exp(x) = e^x, but I think if you are going to include log2, log10, log, etc, you should also include ln.
Re: pads and lexicals
At 10:45 AM 09-06-2001 -0400, Ken Fox wrote: Dave Mitchell wrote: So how does that all work then? What does the parrot assembler for foo($x+1, $x+2, , $x+65) The arg list will be on the stack. Parrot just allocates new PMCs and pushes the PMC on the stack. I assume it will look something like new_pmc pmc_register[0] add pmc_register[0], $x, 1 push pmc_register[0] new_pmc pmc_register[0] add pmc_register[0], $x, 2 push pmc_register[0] ... call foo, 65 Hmmm, I assumed it would be something like: load $x, P0 ;; load $x into PMC register 0 new P2 ;; Create a new PMC in register 2 push p0,p2 ;; Make P2 be ($x) add p0,#1,p1;; Add 1 to $x, store in PMC register 1 push p1,p2 ;; Make P2 be ($x,$x+1) add p0,#2,p1;; Add 2 to $x, store in PMC register 1 push p1,p2 ;; Make P2 be ($x,$x+1,$x+2) ... call foo,p2 ;; Call foo($x,$x+1,...,$x+65) Although this would be premature optimization, since I see this idiom being used a lot, it may be useful to have some special-purpose ops to handle creating arg-lists, like a new_array size,register op, that would create a new PMC containing a pre-sized array (thus eliminating repeatedly growing the array with the push ops), or a push5 destreg, reg1, reg2, reg3, reg4, reg5 op (and corresponding pushN ops for N=2 to 31) that push the specified registers (in order) onto the destreg. Hmm. It didn't occur to me that raw values might go on the call stack. Is the call stack going to store PMCs only? That would simplify things a lot. If ops and functions should be able to be used interchangeably, I wouldn't expect any function arguments to be stored on the stack, but passed via registers (or lists referenced in registers). - Ken
Re: More character matching bits
Dan Sugalski [EMAIL PROTECTED] writes: We probably also ought to answer the question How accommodating to non-latin writing systems are we going to be? It's an uncomfortable question, but one that needs asking. Answering by Larry, probably, but definitely asking. Perl's not really language-neutral now (If you think so, go wave locales at Jarkko and see what happens... :) but all our biases are sort of implicit and un (or under) stated. I'd rather they be explicit, though I know that's got problems in and of itself. Perl came from ASCII-centric roots, so it's likely that most of our biases are ASCII-centric. And for a couple of reasons, it's going to be hard to deal with that: 1. Backwards compatability with existing Perl practice, and 2. To do language-neutral right is -really- hard; look at locales and Unicode as examples. As such, instead of trying to make Perl work for all languages out of the box, why not make Perl's language handling extensible from within the language and have it be as language-free as possible (except for backwards compatability stuff) out of the box. Examples of what we can do: I. Make ranges work on Unicode code-points (if they don't already). II. Make POSIX-style character classes (e.g. [:space:]) user-definable and modifiable. That way, a Unicode::Japanese module could do something like: [:hiragana:] = /[\x{3041}-\x{3094}]/; [:katakana:] = /[\x{30A1}-\x{30F4}]/; [:kana:] = [:hiragana:] + [:katakana:]; and then each of those three classes could be used in RE's when needed. III. Allow for character equivalence tables to be user-definable. This would allow for the /i behavior of RE's to be generalized. As an example, consider the following code: $kanainsensitive = td/[:hiragana:]/[:katakana:]/; if ($japanesetext =~ m/$japanesepattern/i{$kanainsentive} { print $japanesetext matched $japanesepattern\n; } The new td// construct would create a character equivalence table that could be used with a generalized /i option to indicate that hiragana and katakana should be treated equivalently. A more sophisticated example could be: $vowelsoptional = td/aeiouAEIOU//; which would make vowels equivalent to no characters at all. For certain applications, it would be useful to allow matches of more than one character: $kanainsensitive += td/\x{304C}\x{3042}/\x{30AC}\x{30FC}/r + td/\x{304D}\x{3044}/\x{30AD}\x{30FC}/r + ... ; In this case, it represents the fact that long vowels are represented by one form in hiragana (HIRAGANA LETTER KA + HIRAGANA LETTER A), and a different form in katakana (KATAKANA LETTER KA + KATAKANA-HIRAGANA PROLONGED SOUND MARK). I used a /r there to indicate that the two parts of the td/// are regular expressions which are designed to be treated equivalent. That would allow both of those lines above to be written: $kanainsensitive += td/([\x{304C}\x{304D}])\x{3042}/\1\x{30FC}/r; It would also allow people to deal with combining forms, although there are probably better ways than this. IV. Make the character class switches be redefinable, but default to the current set. That would allow someone who is doing lots of work in Japanese be able use \w to mean kanji, hiragana, and katakana instead of the default of [0-9A-Za-z_]. There are probably lots of things I overlooked, but if it can be done cheaply, abstracting out the existing biases and making them user-expandable/definable would probably go a long way towards getting rid of language bias. Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: More character matching bits
At 01:14 PM 06-11-2001 -0700, Russ Allbery wrote: Dan Sugalski [EMAIL PROTECTED] writes: At 01:05 PM 6/11/2001 -0700, Russ Allbery wrote: Dan Sugalski [EMAIL PROTECTED] writes: Should perl's regexes and other character comparison bits have an option to consider different characters for the same thing as identical beasts? I'm thinking in particular of the Katakana/Hiragana bits of japanese, but other languages may have the same concepts. I think canonicalization gets you that if that's what you want. I don't think canonicalization should do this. (I really hope not) This isn't really a canonicalization matter--words written with one character set aren't (AFAIK) the same as words written with the other, and which alphabet you use matters. (Which sort of argues against being able to do this, I suppose...) I guess I don't know what the definition of the same thing you're using here is. I thought Dan was talking about something equivalent to the m//i functionality. Would it, or should it, be possible to tell m// to treat Katakana characters as the same as hiragana characters, in much the same way as m//i treats UPPERCASE the same as lowercase? Canonicalization won't get you that. My feeling is that the hooks should be there, but the specific equivalence mappings should be in the library, not the core.
Re: Should we care much about this Unicode-ish criticism?
Nick Ing-Simmons [EMAIL PROTECTED] writes: Dan Sugalski [EMAIL PROTECTED] writes: It does bring up a deeper issue, however. Unicode is, at the moment, apparently inadequate to represent at least some part of the asian languages. Are the encodings currently in use less inadequate? I've been assuming that an Anything-Unicode translation will be lossless, but this makes me wonder whether that assumption is correct. One reason perl5.7.1+'s Encode does not do asian encodings yet is that the tables I have found so far (Mainly Unicode 3.0 based) are lossy. Er, are the Unicode tables going to be embedded in /usr/bin/perl6? That doesn't give me a warm, cozy feeling about Perl-6 support of Unicode. I think it's great that Perl internals will be able to handle arbitrary strings of Unicode characters (using some version of UTF-*), but may I suggest that anything that relies on the properties of characters (case, conversions, combining, visibility, etc) require explicit library support? We'd lose some things, like normalization, but we wouldn't have to carry around huge tables, either. -- Nick Ing-Simmons who is looking for a new job see http://www.ni-s.u-net.com/
Please shoot down this GC idea...
Why won't this work: As I see it, we can't guarantee that DESTROYable objects will be DESTROYed immediately when they become garbage without a full ref-counting scheme. A full ref-counting scheme is potentially expensive. Even full ref-counting schemes can't guarantee proper and timely destruction in the face of circular data structures, which ref-counting schemes leak. Partial ref-counting is very difficult to get right, and is likely to be even more expensive than full ref-counts. I haven't seen another possible problem with DESTROY-by-GC brought up: non-refcounting GCs can be fast because they don't have to look at the garbage, only the non-garbage. If we want it to DESTROY garbage that needs to be DESTROYed, they will have to look at the garbage to find the DESTROYable garbage -- which negates the advantage of just looking at non-garbage. So, here's an idea: 1. Maintain a list of DESTROYable objects. This list is automagically maintained by bless and DESTROY. 2. If the compiler can determine that an object is DESTROYable and garbage, the compiler can automatically insert a call to DESTROY at the appropriate place. e.g: { $fh = new Destroyable; $fh-methodcalls(); } could be transformed to: { $fh = new Destroyable; $fh-methodcalls(); $fh-DESTROY(); } This step may not be always possible -- can the compiler determine that $fh-methodcalls doesn't do anything to keep $fh alive? If not, it can't do this step. 3. After finding live objects, the GC would walk the DESTROYed list looking for objects not found alive. If/when it finds them, it DESTROYs them. It needs to do this before it rewrites over the reclaimed space, so that the data necessary for the DESTROY is still available. I feel that the number of objects that need to be DESTROYed will likely be small compared to the total number of Perl objects, so the DESTROYables list will be relatively small and fast to walk. The automagically detecting of when an object can be DESTROYed (if possible) should also help in keeping the DESTROYables list short. I'm sure this idea has flaws. But it's an idea. Tell me what I'm missing.
Re: Garbage collection (was Re: JWZ on s/Java/Perl/)
At 01:45 PM 02-12-2001 -0300, Branden wrote: I think having both copying-GC and refcounting-GC is a good idea. I may be saying a stupid thing, since I'm not a GC expert, but I think objects that rely on having their destructors called the soonest possible for resource cleanup could use a refcount-GC, while other objects that don't need that could use a copy-GC. I really don't know if this is really feasible, it's only an idea now. I also note that objects that are associated to resources aren't typically the ones that get shared much in Perl, so using refcount for them wouldn't be very expensive... Am I too wrong here? It's... complicated... Here's an example of where things could go wrong: sub foo { my $destroyme1 = new SomeClass; my $destroyme2 = new SomeClass; my @processme1; my @processme2; ... push @processme1, $destroyme1; push @processme2; $destroyme2; ... return \@processme2; } At the end of foo(), $destroyme1 and $processme1 are dead, but $destroyme2 is alive. If $destroyme1 and $destroyme2 are ref-counted, but @processme1 and @processme2 are not, then at the end of foo(), both objects will have ref-counts of 1 ($destroyme1 because of the ref from @processme1, which is a spurious ref-count; $destroyme2 because of the ref from @processme2, which is valid). $destroyme1 won't be destroyed until @processme1 is finalized, presumably by the GC, which could take a long time. That ref-count from @processme1 is necessary because if @processme1 escapes scope (like @processme2 did) then $destroyme1 is still alive, and can't be finalized. Going with full ref-counts solves the problem, because when @proccessme1 goes out of scope, it's ref-count drops to 0, and it gets finalized immediately, thus dropping $destroyme1 to 0, and it gets finalized. But with @processme2, its refcount drops from 2 to 1, so it survives and so does $destroyme2. Full ref-counting has a potentially large overhead for values that don't require finalization, which is likely the majority of our data. Going with partial ref-counts solves the simple case when the object is only referred to by objects with ref-counts, but could allow some objects' finalization to be delayed until the GC kicks in. Going with no ref-counts doesn't have the overhead of full refcounting, but unless some other mechanism (as yet undescribed) helps, finalization on all objects could be delayed until GC. - Branden
Re: Another approach to vtables
At 01:14 PM 02-07-2001 -0500, Dan Sugalski wrote: At 01:35 PM 2/7/2001 -0200, Branden wrote: As far as I know (and I could be _very_ wrong), the primary objectives of vtables are: 1. Allowing extensible datatypes to be created by extensions and used in Perl. Secondarily, yes. 2. Making the implementation of `tie' and `overload' more efficient ('cause it's very slow in Perl 5). No, not at all. This isn't really a consideration as such. (The vtable functions as desinged are inadequate for most overloading, for example) Hmm, I seem to remember vtables were being cited as a cure for lots of ills (perhaps combined with other aspects, like "make Perl nearly as fast as C".) The vtables were implied (or possibly out-right stated) as giving the low-level core a more object-oriented structure: as you state below, branching and conditionals in the runtime can be eliminated by the values knowing how to operate on themselves. It was also implied (or out-right claimed) that different objects/classes/packages/whatever could have class-specific vtables, defined at run-time, that would be used to handle the class-specific implementation details. I'm not sure what that could refer to except ties and overloading; class-specific methods wouldn't go in the vtable. There was some discussion that allowing the vtables to refer to functions written in perl would be a good idea, as it would allow extensions to be written in perl -- which is a good thing. I had gotten the impression that the perl code-sequence: $a = $b + $c; would generate the same op-code sequence regardless of the type of $a, $b, $c, and the vtables would do all the magic behind the scenes, calling tied or overloaded versions of the base functions if so defined for $a, $b, or $c. Now I seem to be hearing that this is not the case, that variable ties and overloads are at a much higher level, never touching the vtables. It now seems that the vtables will exist only for built-in types, and be inaccessible for user-defined types (unless those types are defined by the perl6 equivilant of XS, for example). This almost seems to be defaulting on the promise of vtables I thought was made.
RE: Meta-design
At 03:54 PM 12-06-2000 -0500, Sam Tregar wrote: On Wed, 6 Dec 2000, Dan Sugalski wrote: Non-refcounting GC schemes are more expensive when they collect, but less expensive otherwise, and it apparently is a win for the non-refcount schemes. Which is why GC is intimately tied to DESTROY consideration in terms of Perl. If we intend to honor predictable DESTROY timing, and I think we should, then we will need to reference count. No ifs, elses or alternations. Anyone care to refute? This is not a complete refutation, but... It seems to me that there are three types of thingies[1] we are concerned about, conceptually: A) Thingies with no DESTROY considerations, which don't need refcounts. B) Thingies with DESTROY methods, but aren't timing-sensitive. They can be destroyed anytime after they die. These don't really need refcounts either. C) Thingies with DESTROY methods which need to be DESTROYed as soon as they die. These would seem to need refcounts. I think that distinguishing between B and C is a syntax issue out of scope here. Although B could be lumped with A if we could tell B and C apart, I'll assume that we must lump B and C together. If we could refcount only C for destruction, and let the GC-of-your-choice handle the actuall memory reclaimation, then the expense of refcounting should only affect C thingies. I am uncertain what the ratio of C thingies to A thingies is, so I can't judge how big a win it is. Theoretically, a non-refcount GC should never find any C thingies that would have a refcount0, so the non-refcount GC shouldn't have to worry about it. If we're going to be ref-counting anyway then the performance gain of a non-refcounting GC, avoiding counting, is basically moot. If we're ref-counting for DESTROY timing then we may as well use that data in the GC. But we only care about the ref-count for DESTROY timing. If we can avoid counting for DESTROY timing insensitive thingies, we may still have a net performance gain. I'm not some kind of ref-count true-believer - if you think we should put this discussion of to a later date then I'm cool. I'm just spoiling for some Perl 6 work to do and this area seemed ripe for critical development. -sam
Re: Opcodes (was Re: The external interface for the parser piece)
At 05:59 PM 11-30-2000 +, Nicholas Clark wrote: On Thu, Nov 30, 2000 at 12:46:26PM -0500, Dan Sugalski wrote: (Note, Dan was writing about "$a=1.2; $b=3; $c = $a + $b") $a=1; $b =3; $c = $a + $b If they don't exist already, then something like: newscalar a, num, 1.2 newscalar b, int, 3 newscalar c, num, 0 add t3, a, b and $c ends up a num? why that line "newscalar c, num, 0" ? It looks to me like add needs to be polymorphic and work out the best compromise for the type of scalar to create based on the integer/num/ complex/oddball types of its two operands. I think the "add t3, a, b" was a typo, and should be "add c, a, b" Another way of looking at it, assuming that the Perl6 interpreter is stack-based, not register-based, is that the sequence would get converted into something like this: push num 1.3 ;; literal can be precomputed at compile time dup newscaler a;; get value from top of stack push int 3;; literal can be precomputed at compile time dup newscaler b push a push b add newscaler c The "add" op would, in C code, do something like: void add() { P6Scaler *addend; P6Scaler *adder; addend = pop(); adder = pop(); push addend-vtable-add(addend, adder); } it would be up to the addend-vtable-add() to figure out how to do the actual addition, and what type to return. But that probably doesn't help much. Let me throw together something more detailed and we'll see where we go from there. Hopefully it will cover the above case too. Nicholas Clark
Re: A tentative list of vtable functions
At 04:43 PM 8/31/00 -0400, Dan Sugalski wrote: Okay, here's a list of functions I think should go into variable vtables. Functions marked with a * will take an optional type offset so we can handle asking for various permutations of the basic type. Perhaps I'm missing something... Is this for scalars alone? I see no arrays/hashes here. type name get_bool get_string * get_int * get_float * get_value set_string * set_int * set_float * set_value add * subtract * multiply * divide * modulus * clone (returns a new copy of the thing in question) new (creates a new thing) concatenate is_equal (true if this thing is equal to the parameter thing) is_same (True if this thing is the same thing as the parameter thing) logical_or logical_and logical_not bind (For =~) repeat (For x) Anyone got anything to add before I throw together the base vtable RFC? Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: RFC 127 (v1) Sane resolution to large function returns
At 11:26 AM 8/23/00 -0700, Larry Wall wrote: I expect that we'll get more compile-time benefit from my HASH sub foo { ... } %bar = foo(); So how would you fill in the type in: my TYPE sub foo { ... if (wanthash()) { return %bar; } if (wantarray()) { return @baz; ) if (wantscalar()) { return $quux; }; } $scalar = foo(); @array = foo(); %hash = foo(); Larry