Re: closures
Hi John, > I thought Chicken's approach seemed novel, but I don't understand it Do you have a link? > well enough to explain it. From what I gather, they use the C stack as > the "new" heap and collapse it when the garbage collector copies. A copying garbage collector is a bad thing anyway ;-) Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: closures
I thought Chicken's approach seemed novel, but I don't understand it well enough to explain it. From what I gather, they use the C stack as the "new" heap and collapse it when the garbage collector copies. John On 30 Oct 2008, at 3:56 AM, Alexander Burger wrote: > >> class continuations like scheme does. Have you thought about using >> the picoLisp heap instead of the traditional C stack? > > Yes, I experimented with that, but did not find a better solution. The > hardware stack seems optimal for certain things. On the other hand, it > is always worth to explore other possibilities. > > Cheers, > - Alex > -- > UNSUBSCRIBE: mailto:[EMAIL PROTECTED] -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: closures
Hi Tomas, > > x010 Number > > x100 Symbol > > x000 Cell > > A cell consists of two pointers, isn't it 64 bits on 32 bit picoLisp2 > then? Yes. The above patterns show the format of the pointers (in 32 bits). > Same for miniPicoLisp, isn't a cell 128 bits big when compiled on 64 > bit platform? Right. > What the "raw data" mean in doc/structures there? I am puzzled by > "bin". That's what I called the "convoluted structures" ;-) 'bin' is a word (32 or 64 bits) completely without tag bits. It can be used totally for binary data, in places where the type is clear from the context. It is the same as the 'DIG' data in the CAR parts of bignums in picoLisp[23]. I just did not want to call it 'DIG' in miniPicoLisp, because it does not represent digits there, but only the packed characters of symbol names. 'txt', the other "raw data" term, is a little tricky. As you see, it uses the least significant bit as a tag bit, which is normally not allowed, because this bit is reserved for the garbage collector. However, 'txt' is used exclusively for short symbol names, completely fitting into the CAR part of a symbol cell, and the garbage collector only uses bit zero in the CDR part of each cell. In this way, symbols with up to 5 characters fit into a single pointer on a 32 bit machine (as lowercase characters, space and a few punctuations occupy 6 bits in miniPicoLisp). > Why is the size of things documented as 8 bits for miniPicoLisp? This was just to keep it unspecified, as opposed to picoLisp2 which is restricted to 32, and picoLisp3 to 64 bits. > > That is, a number is still indicated by an AND with 2, or an atom with > > 6. Note, however, that you cannot directly check for a symbol here, > > because a number may also have bit three on. To determine if a given > > datum is a symbol, it must first be asserted that it is not a number. > > Can't you just say something like: > > num if (X & 3) == 2 > sym if (X & 7) == 4 > cell if (X & 7) == 0 Yes, this would be the same, because bit zero is always 0 in normal (i.e. non-gc) operation. But the comparison with '==' is unnecessary and would generate an additional "compare" instruction, while (X & 2) or (X & 3) which is the same as (X & 2) != 0 generates a single AND or TEST instruction. But for the assertion of a symbol, as I wrote above, a double check is necessary in miniPicoLisp and picoLisp3: (X & 6) == 4 # On 32bit miniPicoLisp (X & 12) == 8 # On 64 bit miniPicoLisp and even (X & 14) == 8 # On picoLisp3 > >cnt S010 > >big S100 > >sym 1000 > >cell > > > > We have an additional tag bit, and use it to differentiate between short > > numbers and bignums. The 'S' bit in each type is the sign bit. So a > > number can be identified by ANDing with 6, a symbol with 8, and so on. > > No, you can't determine symbol by anding with 8 because that would > clash with the sign bit of cnt and big? I am really puzzled now;-) Is Totally correct. This is what I wanted to say with: > > because a number may also have bit three on. To determine if a given > > datum is a symbol, it must first be asserted that it is not a number. Practically, there are two possibilities: 1. You know from the context that the data item is not a number. The most typical example is EVAL (e.g. in miniPicoLisp): #define EVAL(x) (isNum(x)? x : isSym(x)? val(x) : evList(x)) "If x is a number, return it. Otherwise, if it is a symbol ..." So here it is sufficient to AND with 4 or 8 to find out if we have a symbol. For other cases you have to do the full double check with AND and CMP. These cases are not so frequent, because most functions will check for a number type first to either handle it or generate an error message. > it some trick with "symbol names are simply combined big and short > numbers"? > > I would say: > > cnt if X & 2 > big if X & 4 > sym if (X & f) == 8 > cell if (X & f) == 0 Correct (if this was for picoLisp3, the many versions get confusing ;-) > > The implementation of 6-and-a-half bits for ASCII characters in > > miniPicoLisp does not allow for UTF-8 support or external symbol > > encodings. > > This is serious. However, could not be miniPicoLisp more clever and > use as many bits as possible (like picoLisp2 and 3 do) instead of > being limited to 8 bits? Yes, we could use a combination with the other systems. This would probably make sense for 64bit miniPicoLisp, as the 'txt' data item has 63bits instead of just 31. > class continuations like scheme does. Have you thought about using > the picoLisp heap instead of the traditional C stack? Yes,
Re: closures
Hi Alex, > To explain this, I'd like to refer to "doc/structures" in each model. > "doc/structures" is the foundation of each implementation, like a set of That seems like long term assignment;-) > The primary data types of picoLisp2 are: > > x010 Number > x100 Symbol > x000 Cell A cell consists of two pointers, isn't it 64 bits on 32 bit picoLisp2 then? > For miniPicoLisp it is (the number of 'x's is reduced): > > num xx10 > sym x100 > cell x000 Same for miniPicoLisp, isn't a cell 128 bits big when compiled on 64 bit platform? What the "raw data" mean in doc/structures there? I am puzzled by "bin". Why is the size of things documented as 8 bits for miniPicoLisp? > That is, a number is still indicated by an AND with 2, or an atom with > 6. Note, however, that you cannot directly check for a symbol here, > because a number may also have bit three on. To determine if a given > datum is a symbol, it must first be asserted that it is not a number. Can't you just say something like: num if (X & 3) == 2 sym if (X & 7) == 4 cell if (X & 7) == 0 > This design gives an additional bit for the number's value, at the > expense of a possibly more time-consuming check. I don't follow the reason really, the check can be pretty simple. > Finally, for the encoding of symbol names, rather convoluted structures > are used which are too involved to describe in this mail. Perhaps an > intensive study of "doc/structures" and the sources could give some > insight. In picoLisp2, symbol names are simply combined big and short > numbers. Yes, symbol names and string are big mystery to me so far:-) > Now, for picoLisp3, which is guaranteed to have a word size of 64 bits: > >cnt S010 >big S100 >sym 1000 >cell > > We have an additional tag bit, and use it to differentiate between short > numbers and bignums. The 'S' bit in each type is the sign bit. So a > number can be identified by ANDing with 6, a symbol with 8, and so on. No, you can't determine symbol by anding with 8 because that would clash with the sign bit of cnt and big? I am really puzzled now;-) Is it some trick with "symbol names are simply combined big and short numbers"? I would say: cnt if X & 2 big if X & 4 sym if (X & f) == 8 cell if (X & f) == 0 > So, to bring it to the point: The above implementation was chosen simply > to save space. To my experience, optimizing space consumptions is far > more important than short sighted code optimizations or structural > design decisions (e.g. for a compiler). If the code is slow, you might > have a system that is half as fast as the other. So what? But if you run > out of space because you'd need twice as much, performance will go down > dramatically because of cache misses, swapping and trashing. Yes. > Even if you decide to live with (62 bit) short numbers only, still the > limitation of current C compilers does not allow to directly implement > the mul/div operation with an intermediate double-word result (as used > in '*/'). So you'd have to resort to half-word twiddling or assembly > language here too. This should not be a problem. > The implementation of 6-and-a-half bits for ASCII characters in > miniPicoLisp does not allow for UTF-8 support or external symbol > encodings. This is serious. However, could not be miniPicoLisp more clever and use as many bits as possible (like picoLisp2 and 3 do) instead of being limited to 8 bits? Other people on the mailing list discussed other languages like python etc. in the thread about asynchronous I/O. I think that the frameworks they used do that quite easily because the underlying interpreter does not use C stack and so is built to support first class continuations like scheme does. Have you thought about using the picoLisp heap instead of the traditional C stack? Thanks for explanations, Tomas -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: closures
Hi Tomas, > so what is the reason for optimizing picolisp for 32/64 bits > specificly? Is miniPicoLisp so inefficient? To explain this, I'd like to refer to "doc/structures" in each model. "doc/structures" is the foundation of each implementation, like a set of axioms. Almost everything else follows from them. In the following, I'll compare the pointer tag patterns in each implementation. Note that in all cases the least significant bit is not used as a tag bit; it is reserved as a mark bit for garbage collection. The primary data types of picoLisp2 are: x010 Number x100 Symbol x000 Cell You can see that a bitwise AND with 2 will indicate a number, 4 a symbol, 6 an atom, and so on. For miniPicoLisp it is (the number of 'x's is reduced): num xx10 sym x100 cell x000 That is, a number is still indicated by an AND with 2, or an atom with 6. Note, however, that you cannot directly check for a symbol here, because a number may also have bit three on. To determine if a given datum is a symbol, it must first be asserted that it is not a number. This design gives an additional bit for the number's value, at the expense of a possibly more time-consuming check. The reason is that miniPicoLisp has only short numbers, and especially with a 32 bit word size each bit is very precious. Finally, for the encoding of symbol names, rather convoluted structures are used which are too involved to describe in this mail. Perhaps an intensive study of "doc/structures" and the sources could give some insight. In picoLisp2, symbol names are simply combined big and short numbers. Now, for picoLisp3, which is guaranteed to have a word size of 64 bits: cnt S010 big S100 sym 1000 cell We have an additional tag bit, and use it to differentiate between short numbers and bignums. The 'S' bit in each type is the sign bit. So a number can be identified by ANDing with 6, a symbol with 8, and so on. In summary, picoLisp2 has only bignums, miniPicoLisp has only short numbers, and picoLisp2 uses a combination of both. Why is this so critical? For a 64bit system, it would be a huge waste of space without a short number type. Take a list (1 2 3) as an example. If there are only bignums as in picoLisp2, it would look like +-+-+ +-+-+ +-+-+ | | | ---+---> | | | ---+---> | | | / | +--+--+-+ +--+--+-+ +--+--+-+ | | | V V V +-+-+ +-+-+ +-+-+ | 1 | / | | 2 | / | | 3 | / | +-+-+ +-+-+ +-+-+ The list would occupy 6 cells in total, 48 bytes on a 32bit machine, but 96 bytes on a 64bit machine. With short numbers, this is reduced to half: +-+-+ +-+-+ +-+-+ | 1 | ---+>| 2 | ---+>| 3 | / | +-+-+ +-+-+ +-+-+ The same applies to symbol names (and thus also to strings). In picoLisp3, a symbol with a name of up to 7 characters fits into a single cell: Symbol | V +-+-+ |'abc'| VAL | +-+-+ The name "abc" is stored as a short number directly in the symbol's tail. Otherwise (as in picoLisp2) it would look like Symbol | V +-+-+ | | | VAL | +-+-+ | V +-+-+ |'abc'| / | +-+-+ here, too, we would need twice as much space. On picoLisp3, this can even be combined. The whole name of a symbol with up to 15 characters fits into a single cell, 8 into the bignum's digit part, and 7 into the final short number: Symbol | V +-+-+ | | | VAL | +-+-+ | V +-+-+ | 8 | 7 | +-+-+ So, to bring it to the point: The above implementation was chosen simply to save space. To my experience, optimizing space consumptions is far more important than short sighted code optimizations or structural design decisions (e.g. for a compiler). If the code is slow, you might have a system that is half as fast as the other. So what? But if you run out of space because you'd need twice as much, performance will go down dramatically because of cache misses, swapping and trashing. > Or what features of picoLisp cannot be implemented in miniPicoLisp? Even if you decide to live with (62 bi
Re: closures
Hi Alex, > The two implementations have almost nothing in common. Just that > 'mini' compiles also on 64 bits, because its model is independent of > the machine's word size. so what is the reason for optimizing picolisp for 32/64 bits specificly? Is miniPicoLisp so inefficient? Or what features of picoLisp cannot be implemented in miniPicoLisp? Cheers, Tomas -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: closures
Hi Tomas, > When you work on the 64 bit version, you obviously implemented > miniPicoLisp "prototype" and now building it in assembler. How and The two implementations have almost nothing in common. Just that 'mini' compiles also on 64 bits, because its model is independent of the machine's word size. picoLisp-3 is closer to picoLisp-2, and I often look at the existing C code while implementing the assembly version. Some things are different or more complicated (notably the bignum arithmetics, as they now involve also short numbers), while other things are easier (e.g. the design of some internal interpreter mechanisms). > when do you decide what should be implemented in C (or asm) and what > in picoLisp? Do you first implement a minimal neccessary core in > C/asm and a few things in picoLisp and then later reimplement some > picolisp code in C/asm for efficiency reasons? If everything works out as I plan, there will be zero lines of C code. It will, however, link to the C runtime libraries (for standard I/O, networking etc.). I'll implement everything directly in assembler if possible. In the past I often wrote some function in Lisp first, and later decided to rewrite it in C. Perhaps it will be easier to do some things in Lisp, especially if they are not time critical. Let's see. At the moment I'm still fighting with the bignums ;-) Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: closures
Hi Alex, >> Have you measured and/or noticed impact on performance of these >> micro-optimizations? I mean, are they really worth it? > > Very probably not ... it is similar to the situation with 'do' ;-) When you work on the 64 bit version, you obviously implemented miniPicoLisp "prototype" and now building it in assembler. How and when do you decide what should be implemented in C (or asm) and what in picoLisp? Do you first implement a minimal neccessary core in C/asm and a few things in picoLisp and then later reimplement some picolisp code in C/asm for efficiency reasons? > Still I think it is good to have certain convenience functions that > don't evaluate their arguments. Not having to quote their arguments > makes their use a little shorter and more readable. No big deal. Probably. That makes them convenient for manual typing but not so much for program "transformations" like the original closure example. Cheers, Tomas -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: closures
Hi Tomas, > ... (zero A) ... > .. > Have you measured and/or noticed impact on performance of these > micro-optimizations? I mean, are they really worth it? Very probably not ... it is similar to the situation with 'do' ;-) Still I think it is good to have certain convenience functions that don't evaluate their arguments. Not having to quote their arguments makes their use a little shorter and more readable. No big deal. This applies to many other functions, too: Do we really need 'if' when we have 'cond'? Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: closures
Hi Alex, >(let @S '((I . 0)) > (def 'count (curry (@S) () (job '@S (inc 'I > (def 'reset (curry (@S) () (job '@S (zero I ) > >(let @S (list (cons 'I 0)) >(let I (cons 0) I like it this way, thanks. I also found that once the things inside the closures get complicated, it might be worth using objects to get better code factoring: (class +Counter) # i (dm T () (=: i 0)) (dm count> () (inc (:: i))) (dm reset> () (=: i 0)) (let @C (list (cons 'C (new '(+Counter (def 'count (curry (@C) () (job '@C (count> C (def 'reset (curry (@C) () (job '@C (reset> C ) Or using objects directly so that I can have many independent counters... It did the job for my understanding of closures in picoLisp, I hope:-) > Sometimes non-evaluating functions are a little more convenient, and > at least theoretically more efficient. > > (zero A) occupies two cells. > > (zero 'A) would be three, and that's the same as (setq A 0), so it > would not save any space. Have you measured and/or noticed impact on performance of these micro-optimizations? I mean, are they really worth it? Thanks, Tomas -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: closures
On Fri, Oct 24, 2008 at 05:54:10PM +0200, Alexander Burger wrote: > > Do you have anything on mind? > > Unfortunately not. I just had the feeling there must be other ways ;-) Perhaps I was too much in a hurry yesterday ... There is a kind of middle way between our two solutions: (let @S '((I . 0)) (def 'count (curry (@S) () (job '@S (inc 'I (def 'reset (curry (@S) () (job '@S (zero I ) This is not shorter than your initial solution, but perhaps a little bit more clear by letting 'curry' do the job of building the function. BTW, all three solutions have in common that they depend on a shared data structure (a cell (I . 0) or (0)). If the 'let' is going to be used within some other function (instead of the top level here), it should better be (let @S (list (cons 'I 0)) or (let I (cons 0) to use a locally encapsulated cell. Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: closures
Hi Tomas, > > Perhaps there are other/better solutions? > Do you have anything on mind? Unfortunately not. I just had the feeling there must be other ways ;-) But perhaps using a cell is not such a bad idea ... > Why 'zero', 'one', 'on', 'off' and 'onOff' don't evaluate arguments? > ... > which would kind of keep it uniform/consistent. I don't think it > would be a problem writing (zero 'A 'B) instead of (zero A B). They are there for the same reason as, for example, 'setq'. Sometimes non-evaluating functions are a little more convenient, and at least theoretically more efficient. (zero A) occupies two cells. (zero 'A) would be three, and that's the same as (setq A 0), so it would not save any space. Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: closures
Hi Alex, >(let I '(0) > (def 'count (curry (I) () (inc I))) > (def 'reset (curry (I) () (set I 0))) ) I see:-) > Perhaps there are other/better solutions? Do you have anything on mind? Why 'zero', 'one', 'on', 'off' and 'onOff' don't evaluate arguments? If they did, I could write: (let I '(0) (def 'count (curry (I) () (inc I))) (def 'reset (curry (I) () (zero I))) ) which would kind of keep it uniform/consistent. I don't think it would be a problem writing (zero 'A 'B) instead of (zero A B). Thank you, Tomas -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]
Re: closures
Hi Tomas, > (let @S '((I . 0)) >(def 'count (fill '(() (job '@S (inc 'I) >(def 'reset (fill '(() (job '@S (zero I)) > ... > http://www.software-lab.de/faq.html#closures discusses curry function > which allows closure "inside" a single function only. With a little trick, you could also use it here: (let I '(0) (def 'count (curry (I) () (inc I))) (def 'reset (curry (I) () (set I 0))) ) Perhaps there are other/better solutions? Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]