Re: closures

2008-10-30 Thread Alexander Burger
Hi John,

> I thought Chicken's approach seemed novel, but I don't understand it  

Do you have a link?

> well enough to explain it. From what I gather, they use the C stack as  
> the "new" heap and collapse it when the garbage collector copies.

A copying garbage collector is a bad thing anyway ;-)

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: closures

2008-10-30 Thread John Duncan
I thought Chicken's approach seemed novel, but I don't understand it  
well enough to explain it. From what I gather, they use the C stack as  
the "new" heap and collapse it when the garbage collector copies.

John

On 30 Oct 2008, at 3:56 AM, Alexander Burger wrote:
>
>> class continuations like scheme does.  Have you thought about using
>> the picoLisp heap instead of the traditional C stack?
>
> Yes, I experimented with that, but did not find a better solution. The
> hardware stack seems optimal for certain things. On the other hand, it
> is always worth to explore other possibilities.
>
> Cheers,
> - Alex
> -- 
> UNSUBSCRIBE: mailto:[EMAIL PROTECTED]

-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: closures

2008-10-30 Thread Alexander Burger
Hi Tomas,

> >   x010 Number
> >   x100 Symbol
> >   x000 Cell
> 
> A cell consists of two pointers, isn't it 64 bits on 32 bit picoLisp2
> then?

Yes. The above patterns show the format of the pointers (in 32 bits).


> Same for miniPicoLisp, isn't a cell 128 bits big when compiled on 64
> bit platform?

Right.


> What the "raw data" mean in doc/structures there?  I am puzzled by
> "bin".

That's what I called the "convoluted structures" ;-)

'bin' is a word (32 or 64 bits) completely without tag bits. It can be
used totally for binary data, in places where the type is clear from the
context. It is the same as the 'DIG' data in the CAR parts of bignums in
picoLisp[23]. I just did not want to call it 'DIG' in miniPicoLisp,
because it does not represent digits there, but only the packed
characters of symbol names.

'txt', the other "raw data" term, is a little tricky. As you see, it
uses the least significant bit as a tag bit, which is normally not
allowed, because this bit is reserved for the garbage collector.
However, 'txt' is used exclusively for short symbol names, completely
fitting into the CAR part of a symbol cell, and the garbage collector
only uses bit zero in the CDR part of each cell. In this way, symbols
with up to 5 characters fit into a single pointer on a 32 bit machine
(as lowercase characters, space and a few punctuations occupy 6 bits in
miniPicoLisp).


> Why is the size of things documented as 8 bits for miniPicoLisp?

This was just to keep it unspecified, as opposed to picoLisp2 which is
restricted to 32, and picoLisp3 to 64 bits.


> > That is, a number is still indicated by an AND with 2, or an atom with
> > 6. Note, however, that you cannot directly check for a symbol here,
> > because a number may also have bit three on. To determine if a given
> > datum is a symbol, it must first be asserted that it is not a number.
> 
> Can't you just say something like:
> 
> num  if (X & 3) == 2
> sym  if (X & 7) == 4
> cell if (X & 7) == 0

Yes, this would be the same, because bit zero is always 0 in normal
(i.e. non-gc) operation. But the comparison with '==' is unnecessary and
would generate an additional "compare" instruction, while

   (X & 2)  or  (X & 3)

which is the same as

   (X & 2) != 0

generates a single AND or TEST instruction.

But for the assertion of a symbol, as I wrote above, a double check is
necessary in miniPicoLisp and picoLisp3:

   (X & 6) == 4  # On 32bit miniPicoLisp
   (X & 12) == 8  # On 64 bit miniPicoLisp

and even

   (X & 14) == 8  # On picoLisp3


> >cnt   S010
> >big   S100
> >sym   1000
> >cell  
> >
> > We have an additional tag bit, and use it to differentiate between short
> > numbers and bignums. The 'S' bit in each type is the sign bit. So a
> > number can be identified by ANDing with 6, a symbol with 8, and so on.
> 
> No, you can't determine symbol by anding with 8 because that would
> clash with the sign bit of cnt and big?  I am really puzzled now;-) Is

Totally correct. This is what I wanted to say with:

> > because a number may also have bit three on. To determine if a given
> > datum is a symbol, it must first be asserted that it is not a number.

Practically, there are two possibilities:

1. You know from the context that the data item is not a number. The
   most typical example is EVAL (e.g. in miniPicoLisp):

   #define EVAL(x) (isNum(x)? x : isSym(x)? val(x) : evList(x))

"If x is a number, return it. Otherwise, if it is a symbol ..."

So here it is sufficient to AND with 4 or 8 to find out if we have a
symbol. For other cases you have to do the full double check with AND
and CMP. These cases are not so frequent, because most functions will
check for a number type first to either handle it or generate an error
message.


> it some trick with "symbol names are simply combined big and short
> numbers"?
> 
> I would say:
> 
> cnt  if X & 2
> big  if X & 4
> sym  if (X & f) == 8
> cell if (X & f) == 0

Correct (if this was for picoLisp3, the many versions get confusing ;-)



> > The implementation of 6-and-a-half bits for ASCII characters in
> > miniPicoLisp does not allow for UTF-8 support or external symbol
> > encodings.
> 
> This is serious.  However, could not be miniPicoLisp more clever and
> use as many bits as possible (like picoLisp2 and 3 do) instead of
> being limited to 8 bits?

Yes, we could use a combination with the other systems. This would
probably make sense for 64bit miniPicoLisp, as the 'txt' data item has
63bits instead of just 31.


> class continuations like scheme does.  Have you thought about using
> the picoLisp heap instead of the traditional C stack?

Yes, 

Re: closures

2008-10-29 Thread Tomas Hlavaty
Hi Alex,

> To explain this, I'd like to refer to "doc/structures" in each model.
> "doc/structures" is the foundation of each implementation, like a set of

That seems like long term assignment;-)

> The primary data types of picoLisp2 are:
>
>   x010 Number
>   x100 Symbol
>   x000 Cell

A cell consists of two pointers, isn't it 64 bits on 32 bit picoLisp2
then?

> For miniPicoLisp it is (the number of 'x's is reduced):
>
>  num  xx10
>  sym  x100
>  cell x000

Same for miniPicoLisp, isn't a cell 128 bits big when compiled on 64
bit platform?

What the "raw data" mean in doc/structures there?  I am puzzled by
"bin".

Why is the size of things documented as 8 bits for miniPicoLisp?

> That is, a number is still indicated by an AND with 2, or an atom with
> 6. Note, however, that you cannot directly check for a symbol here,
> because a number may also have bit three on. To determine if a given
> datum is a symbol, it must first be asserted that it is not a number.

Can't you just say something like:

num  if (X & 3) == 2
sym  if (X & 7) == 4
cell if (X & 7) == 0

> This design gives an additional bit for the number's value, at the
> expense of a possibly more time-consuming check.

I don't follow the reason really, the check can be pretty simple.

> Finally, for the encoding of symbol names, rather convoluted structures
> are used which are too involved to describe in this mail. Perhaps an
> intensive study of "doc/structures" and the sources could give some
> insight. In picoLisp2, symbol names are simply combined big and short
> numbers.

Yes, symbol names and string are big mystery to me so far:-)

> Now, for picoLisp3, which is guaranteed to have a word size of 64 bits:
>
>cnt   S010
>big   S100
>sym   1000
>cell  
>
> We have an additional tag bit, and use it to differentiate between short
> numbers and bignums. The 'S' bit in each type is the sign bit. So a
> number can be identified by ANDing with 6, a symbol with 8, and so on.

No, you can't determine symbol by anding with 8 because that would
clash with the sign bit of cnt and big?  I am really puzzled now;-) Is
it some trick with "symbol names are simply combined big and short
numbers"?

I would say:

cnt  if X & 2
big  if X & 4
sym  if (X & f) == 8
cell if (X & f) == 0

> So, to bring it to the point: The above implementation was chosen simply
> to save space. To my experience, optimizing space consumptions is far
> more important than short sighted code optimizations or structural
> design decisions (e.g. for a compiler). If the code is slow, you might
> have a system that is half as fast as the other. So what? But if you run
> out of space because you'd need twice as much, performance will go down
> dramatically because of cache misses, swapping and trashing.

Yes.

> Even if you decide to live with (62 bit) short numbers only, still the
> limitation of current C compilers does not allow to directly implement
> the mul/div operation with an intermediate double-word result (as used
> in '*/'). So you'd have to resort to half-word twiddling or assembly
> language here too.

This should not be a problem.

> The implementation of 6-and-a-half bits for ASCII characters in
> miniPicoLisp does not allow for UTF-8 support or external symbol
> encodings.

This is serious.  However, could not be miniPicoLisp more clever and
use as many bits as possible (like picoLisp2 and 3 do) instead of
being limited to 8 bits?


Other people on the mailing list discussed other languages like python
etc. in the thread about asynchronous I/O.  I think that the
frameworks they used do that quite easily because the underlying
interpreter does not use C stack and so is built to support first
class continuations like scheme does.  Have you thought about using
the picoLisp heap instead of the traditional C stack?

Thanks for explanations,

Tomas
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: closures

2008-10-27 Thread Alexander Burger
Hi Tomas,

> so what is the reason for optimizing picolisp for 32/64 bits
> specificly?  Is miniPicoLisp so inefficient?

To explain this, I'd like to refer to "doc/structures" in each model.
"doc/structures" is the foundation of each implementation, like a set of
axioms. Almost everything else follows from them.

In the following, I'll compare the pointer tag patterns in each
implementation. Note that in all cases the least significant bit is not
used as a tag bit; it is reserved as a mark bit for garbage collection.


The primary data types of picoLisp2 are:

  x010 Number
  x100 Symbol
  x000 Cell

You can see that a bitwise AND with 2 will indicate a number, 4 a
symbol, 6 an atom, and so on.


For miniPicoLisp it is (the number of 'x's is reduced):

 num  xx10
 sym  x100
 cell x000

That is, a number is still indicated by an AND with 2, or an atom with
6. Note, however, that you cannot directly check for a symbol here,
because a number may also have bit three on. To determine if a given
datum is a symbol, it must first be asserted that it is not a number.

This design gives an additional bit for the number's value, at the
expense of a possibly more time-consuming check. The reason is that
miniPicoLisp has only short numbers, and especially with a 32 bit word
size each bit is very precious.

Finally, for the encoding of symbol names, rather convoluted structures
are used which are too involved to describe in this mail. Perhaps an
intensive study of "doc/structures" and the sources could give some
insight. In picoLisp2, symbol names are simply combined big and short
numbers.


Now, for picoLisp3, which is guaranteed to have a word size of 64 bits:

   cnt   S010
   big   S100
   sym   1000
   cell  

We have an additional tag bit, and use it to differentiate between short
numbers and bignums. The 'S' bit in each type is the sign bit. So a
number can be identified by ANDing with 6, a symbol with 8, and so on.


In summary, picoLisp2 has only bignums, miniPicoLisp has only short
numbers, and picoLisp2 uses a combination of both.


Why is this so critical? For a 64bit system, it would be a huge waste of
space without a short number type. Take a list (1 2 3) as an example. If
there are only bignums as in picoLisp2, it would look like

  +-+-+ +-+-+ +-+-+
  |  |  |  ---+---> |  |  |  ---+---> |  |  |  /  |
  +--+--+-+ +--+--+-+ +--+--+-+
 | | |
 V V V
  +-+-+ +-+-+ +-+-+
  |  1  |  /  | |  2  |  /  | |  3  |  /  |
  +-+-+ +-+-+ +-+-+

The list would occupy 6 cells in total, 48 bytes on a 32bit machine, but
96 bytes on a 64bit machine.

With short numbers, this is reduced to half:

  +-+-+ +-+-+ +-+-+
  |  1  |  ---+>|  2  |  ---+>|  3  |  /  |
  +-+-+ +-+-+ +-+-+


The same applies to symbol names (and thus also to strings). In
picoLisp3, a symbol with a name of up to 7 characters fits into a single
cell:

Symbol
|
V
  +-+-+
  |'abc'| VAL |
  +-+-+

The name "abc" is stored as a short number directly in the symbol's
tail. Otherwise (as in picoLisp2) it would look like

Symbol
|
V
  +-+-+
  |  |  | VAL |
  +-+-+
 |
 V
  +-+-+
  |'abc'|  /  |
  +-+-+

here, too, we would need twice as much space.


On picoLisp3, this can even be combined. The whole name of a symbol with
up to 15 characters fits into a single cell, 8 into the bignum's digit
part, and 7 into the final short number:

Symbol
|
V
  +-+-+
  |  |  | VAL |
  +-+-+
 |
 V
  +-+-+
  |  8  |  7  |
  +-+-+


So, to bring it to the point: The above implementation was chosen simply
to save space. To my experience, optimizing space consumptions is far
more important than short sighted code optimizations or structural
design decisions (e.g. for a compiler). If the code is slow, you might
have a system that is half as fast as the other. So what? But if you run
out of space because you'd need twice as much, performance will go down
dramatically because of cache misses, swapping and trashing.



> Or what features of picoLisp cannot be implemented in miniPicoLisp?

Even if you decide to live with (62 bi

Re: closures

2008-10-27 Thread Tomas Hlavaty
Hi Alex,

> The two implementations have almost nothing in common. Just that
> 'mini' compiles also on 64 bits, because its model is independent of
> the machine's word size.

so what is the reason for optimizing picolisp for 32/64 bits
specificly?  Is miniPicoLisp so inefficient?  Or what features of
picoLisp cannot be implemented in miniPicoLisp?

Cheers,

Tomas
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: closures

2008-10-26 Thread Alexander Burger
Hi Tomas,

> When you work on the 64 bit version, you obviously implemented
> miniPicoLisp "prototype" and now building it in assembler.  How and

The two implementations have almost nothing in common. Just that 'mini'
compiles also on 64 bits, because its model is independent of the
machine's word size.

picoLisp-3 is closer to picoLisp-2, and I often look at the existing C
code while implementing the assembly version. Some things are different
or more complicated (notably the bignum arithmetics, as they now involve
also short numbers), while other things are easier (e.g. the design of
some internal interpreter mechanisms).

> when do you decide what should be implemented in C (or asm) and what
> in picoLisp?  Do you first implement a minimal neccessary core in
> C/asm and a few things in picoLisp and then later reimplement some
> picolisp code in C/asm for efficiency reasons?

If everything works out as I plan, there will be zero lines of C code.
It will, however, link to the C runtime libraries (for standard I/O,
networking etc.).

I'll implement everything directly in assembler if possible. In the past
I often wrote some function in Lisp first, and later decided to rewrite
it in C. Perhaps it will be easier to do some things in Lisp, especially
if they are not time critical. Let's see. At the moment I'm still
fighting with the bignums ;-)

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: closures

2008-10-26 Thread Tomas Hlavaty
Hi Alex,

>> Have you measured and/or noticed impact on performance of these
>> micro-optimizations?  I mean, are they really worth it?
>
> Very probably not ... it is similar to the situation with 'do' ;-)

When you work on the 64 bit version, you obviously implemented
miniPicoLisp "prototype" and now building it in assembler.  How and
when do you decide what should be implemented in C (or asm) and what
in picoLisp?  Do you first implement a minimal neccessary core in
C/asm and a few things in picoLisp and then later reimplement some
picolisp code in C/asm for efficiency reasons?

> Still I think it is good to have certain convenience functions that
> don't evaluate their arguments. Not having to quote their arguments
> makes their use a little shorter and more readable. No big deal.

Probably.  That makes them convenient for manual typing but not so
much for program "transformations" like the original closure example.

Cheers,

Tomas
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: closures

2008-10-26 Thread Alexander Burger
Hi Tomas,

> ... (zero A) ...
> ..
> Have you measured and/or noticed impact on performance of these
> micro-optimizations?  I mean, are they really worth it?

Very probably not ... it is similar to the situation with 'do' ;-)

Still I think it is good to have certain convenience functions that
don't evaluate their arguments. Not having to quote their arguments
makes their use a little shorter and more readable. No big deal.

This applies to many other functions, too: Do we really need 'if' when
we have 'cond'?

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: closures

2008-10-26 Thread Tomas Hlavaty
Hi Alex,

>(let @S '((I . 0))
>   (def 'count (curry (@S) () (job '@S (inc 'I
>   (def 'reset (curry (@S) () (job '@S (zero I )
>
>(let @S (list (cons 'I 0))
>(let I (cons 0)

I like it this way, thanks.

I also found that once the things inside the closures get complicated,
it might be worth using objects to get better code factoring:

(class +Counter) # i
(dm T () (=: i 0))
(dm count> () (inc (:: i)))
(dm reset> () (=: i 0))

(let @C (list (cons 'C (new '(+Counter
   (def 'count (curry (@C) () (job '@C (count> C
   (def 'reset (curry (@C) () (job '@C (reset> C )

Or using objects directly so that I can have many independent
counters...  It did the job for my understanding of closures in
picoLisp, I hope:-)

> Sometimes non-evaluating functions are a little more convenient, and
> at least theoretically more efficient.
>
> (zero A) occupies two cells.
>
> (zero 'A) would be three, and that's the same as (setq A 0), so it
> would not save any space.

Have you measured and/or noticed impact on performance of these
micro-optimizations?  I mean, are they really worth it?

Thanks,

Tomas
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: closures

2008-10-24 Thread Alexander Burger
On Fri, Oct 24, 2008 at 05:54:10PM +0200, Alexander Burger wrote:
> > Do you have anything on mind?
> 
> Unfortunately not. I just had the feeling there must be other ways ;-)

Perhaps I was too much in a hurry yesterday ...


There is a kind of middle way between our two solutions:

   (let @S '((I . 0))
  (def 'count (curry (@S) () (job '@S (inc 'I
  (def 'reset (curry (@S) () (job '@S (zero I )

This is not shorter than your initial solution, but perhaps a little bit
more clear by letting 'curry' do the job of building the function.


BTW, all three solutions have in common that they depend on a shared
data structure (a cell (I . 0) or (0)). If the 'let' is going to be used
within some other function (instead of the top level here), it should
better be

   (let @S (list (cons 'I 0))

or

   (let I (cons 0)

to use a locally encapsulated cell.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: closures

2008-10-24 Thread Alexander Burger
Hi Tomas,

> > Perhaps there are other/better solutions?
> Do you have anything on mind?

Unfortunately not. I just had the feeling there must be other ways ;-)
But perhaps using a cell is not such a bad idea ...


> Why 'zero', 'one', 'on', 'off' and 'onOff' don't evaluate arguments?
> ...
> which would kind of keep it uniform/consistent.  I don't think it
> would be a problem writing (zero 'A 'B) instead of (zero A B).

They are there for the same reason as, for example, 'setq'.

Sometimes non-evaluating functions are a little more convenient, and at
least theoretically more efficient.

(zero A) occupies two cells.

(zero 'A) would be three, and that's the same as (setq A 0), so it would
not save any space.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: closures

2008-10-24 Thread Tomas Hlavaty
Hi Alex,

>(let I '(0)
>   (def 'count (curry (I) () (inc I)))
>   (def 'reset (curry (I) () (set I 0))) )

I see:-)

> Perhaps there are other/better solutions?

Do you have anything on mind?


Why 'zero', 'one', 'on', 'off' and 'onOff' don't evaluate arguments?
If they did, I could write:

(let I '(0)
   (def 'count (curry (I) () (inc I)))
   (def 'reset (curry (I) () (zero I))) )

which would kind of keep it uniform/consistent.  I don't think it
would be a problem writing (zero 'A 'B) instead of (zero A B).

Thank you,

Tomas
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]


Re: closures

2008-10-24 Thread Alexander Burger
Hi Tomas,

> (let @S '((I . 0))
>(def 'count (fill '(() (job '@S (inc 'I)
>(def 'reset (fill '(() (job '@S (zero I))
> ...
> http://www.software-lab.de/faq.html#closures discusses curry function
> which allows closure "inside" a single function only.

With a little trick, you could also use it here:

   (let I '(0)
  (def 'count (curry (I) () (inc I)))
  (def 'reset (curry (I) () (set I 0))) )

Perhaps there are other/better solutions?

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]