Hi Edwin,

> so i guess this mean that symbols are stored like picolisp ints? CAR
> holding a digit (in the case of symbols, a single character) and the
> CDR pointing to the next cell, which holds a digit in its CAR, and so
> on?

Yes, it is similar to that.

As you observed, a cell is defined as a structure of two pointers. This
is just to keep the C compiler happy. In truth, the CAR and the CDR
contain either a pointer to another cell, or a plain binary value.

A binary value is usually - for reasons of efficiency - not just a
single character, but a word-sized interger or as many UTF-8 characters
as fit into the available space.

The interpreter determines at runtime, using flags in the lowest bits of
each "pointer", what kind of data there actually are. This differs
depending on the architecture (32 or 64 bits).

I would recommend to take a look at the "doc/structures" and
"doc64/structures" files, which are a bit more detailed (but without any
explanation) than the description in "doc/ref.html#vm".

A PicoLisp value (something that can be stored in a CAR or CDR, in the
value or property of a cell, or passed to functions) is always such a
"pointer". In 32-bit systems, it is in fact always a true pointer (i.e.
pointing to a cell), while the 64-bit version also has short numbers
(marked by bit 1), where the pointer value _is_ the number.

Looking at the 32-bit version (doc/structures), we have three types of
pointers:

      xxxxxxxxxxxxxxxxxxxxxxxxxxxxx010 Number
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxx100 Symbol
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxx000 Cell

That is, if bit 1 is set, the interpreter knows it got a number. The
pattern 010 means that the pointer actually points to the cell at an
offset of 2

         Number
         |
         V
      +-----+-----+     +-----+-----+     +-----+-----+
      |'DIG'|  ---+---> |'DIG'|  ---+---> |'DIG'|  /  |
      +-----+-----+     +-----+-----+     +-----+-----+

so if the interpreter is interested in the numeric value, it can access
the first digit (a 32-bit chunk of data) at an offset of -2, and the
next cell of that number at an offset of +2.

For such a number, the interpreter knows of course that it should not
handle the CAR of such a cell as another pointer. This would create
havoc.

The other two data types, symbols and list cells, are combinations of
cells.

A symbol is recognized by an offset of 4, so the pointer to a symbol
actually points directly to the value cell, which is convenient and
efficient. The name of a symbol is in turn a number.


In the 64-bit version,

   cnt   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxS010
   big   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxS100
   sym   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1000
   cell  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0000

the interpreter knows that it got a number if bit 1 or bit 2 is set.
Thus, a number can be check with an "AND 6". If it is bit 1, then we
have a short number and the upper 60 bits of the "pointer" hold the
numeric value. Otherwise, if bit 2 is set, then we have a pointer to a
bignum analog to the 32-bit version, just that the offsets are -4 and +4
instead of -2 and +2. In both cases, the sign of the number is stored
in bit 3 (the 'S').

The rest (symbols and cells) is analog to the 32-bit version.


In case of miniPicoLisp, the situation is again slightly different:

         num      xxxxxx10
         sym      xxxxx100
         cell     xxxxx000

miniPicoLisp is not aware of the word size (32 or 64 bits), so it wastes
a bit on a 64-bit architecture. It has no bignums, so the tag pattern 10
indicates a short number (30 or 62 bits). miniPicoLisp also does a lot
of magic to pack characters into symbol names, using a six-and-a-half
bit encoding and a special short word marker, but that's another issue.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe

Reply via email to