Hi Edwin, > so i guess this mean that symbols are stored like picolisp ints? CAR > holding a digit (in the case of symbols, a single character) and the > CDR pointing to the next cell, which holds a digit in its CAR, and so > on?
Yes, it is similar to that. As you observed, a cell is defined as a structure of two pointers. This is just to keep the C compiler happy. In truth, the CAR and the CDR contain either a pointer to another cell, or a plain binary value. A binary value is usually - for reasons of efficiency - not just a single character, but a word-sized interger or as many UTF-8 characters as fit into the available space. The interpreter determines at runtime, using flags in the lowest bits of each "pointer", what kind of data there actually are. This differs depending on the architecture (32 or 64 bits). I would recommend to take a look at the "doc/structures" and "doc64/structures" files, which are a bit more detailed (but without any explanation) than the description in "doc/ref.html#vm". A PicoLisp value (something that can be stored in a CAR or CDR, in the value or property of a cell, or passed to functions) is always such a "pointer". In 32-bit systems, it is in fact always a true pointer (i.e. pointing to a cell), while the 64-bit version also has short numbers (marked by bit 1), where the pointer value _is_ the number. Looking at the 32-bit version (doc/structures), we have three types of pointers: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx010 Number xxxxxxxxxxxxxxxxxxxxxxxxxxxxx100 Symbol xxxxxxxxxxxxxxxxxxxxxxxxxxxxx000 Cell That is, if bit 1 is set, the interpreter knows it got a number. The pattern 010 means that the pointer actually points to the cell at an offset of 2 Number | V +-----+-----+ +-----+-----+ +-----+-----+ |'DIG'| ---+---> |'DIG'| ---+---> |'DIG'| / | +-----+-----+ +-----+-----+ +-----+-----+ so if the interpreter is interested in the numeric value, it can access the first digit (a 32-bit chunk of data) at an offset of -2, and the next cell of that number at an offset of +2. For such a number, the interpreter knows of course that it should not handle the CAR of such a cell as another pointer. This would create havoc. The other two data types, symbols and list cells, are combinations of cells. A symbol is recognized by an offset of 4, so the pointer to a symbol actually points directly to the value cell, which is convenient and efficient. The name of a symbol is in turn a number. In the 64-bit version, cnt xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxS010 big xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxS100 sym xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1000 cell xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0000 the interpreter knows that it got a number if bit 1 or bit 2 is set. Thus, a number can be check with an "AND 6". If it is bit 1, then we have a short number and the upper 60 bits of the "pointer" hold the numeric value. Otherwise, if bit 2 is set, then we have a pointer to a bignum analog to the 32-bit version, just that the offsets are -4 and +4 instead of -2 and +2. In both cases, the sign of the number is stored in bit 3 (the 'S'). The rest (symbols and cells) is analog to the 32-bit version. In case of miniPicoLisp, the situation is again slightly different: num xxxxxx10 sym xxxxx100 cell xxxxx000 miniPicoLisp is not aware of the word size (32 or 64 bits), so it wastes a bit on a 64-bit architecture. It has no bignums, so the tag pattern 10 indicates a short number (30 or 62 bits). miniPicoLisp also does a lot of magic to pack characters into symbol names, using a six-and-a-half bit encoding and a special short word marker, but that's another issue. Cheers, - Alex -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe