Re: First 64-bit release

2009-07-26 Thread Tomas Hlavaty
Hi Alex,

very interesting!

 I think I have to try to explain some of the reasons that drive
 me. When I program in assembly, I switch to a completely different
 mindset.

 ...

Reminds me a bit when I worked on a project where I learnt to quickly
understand what was going on inside a railway interlocking system by
reading hex dump of communication packets.  Not easy for outsiders
trying to understand something at such a low level.  But it can become
a second nature after some time and it has the advantage of really
understanding the things under the hood.

Thank you,

Tomas
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: First 64-bit release

2009-07-23 Thread Alexander Burger
Hi Tomas,

 Also, it would allow for macros/shortcuts to automate common patterns.

I think I have to try to explain some of the reasons that drive me. When
I program in assembly, I switch to a completely different mindset.

While writing a core function in assembly, I want it to be as close to
the optimum as possible. It does not matter how often I re-arrange the
code, or how long I need for that, because this function will be written
just once, but called very often later. There lies also the fun of it.

This is completely different from using Lisp to write applications.
There I want to be as productive as possible, to have an easier life,
and to be able to abstract as much as possible.

On the assembly level, I want to know each individual bit personally.
Try not to hide anything. I want to keep my mind within the data model,
as described e.g. in doc64/structures.

For example, I had a long fight with myself whether I should introduce
and use constants like 'BIG', 'CDR', or 'TAIL'. They are defined in
src64/defs.l:

   (equ BIG 4)# Rest of a bignum + bignum tag
   (equ CDR 8)# CDR part of a list cell
   (equ TAIL -8)  # Tail of a symbol

A completely straightforward, good and normal thing in daily
programming. It is used like that:

   ld E (E CDR)  # Take CDR

What is my problem with that? It hides the true nature of the
underlying data structures. It could instead be written as

   ld E (E 8)  # Take CDR

which results in the x86-64 code

   mov 8(%rbx), %rbx

Now when I use an opaque constant 'CDR' instead of '8', I easily forget
what goes on on the low level, and have a higher concept of a CDR in
mind. This makes it more difficult in some situations to keep in control
of the lower levels, and to recognize common patterns. If I use 'TAIL'
instead of '-8', I easily forget how I am accessing the pointers of a
cell, how they are related to the pointers of neighboring cells etc.
Then I have to keep both concepts in mind at the same time, and
constantly switch between them. The awareness about the nature of
constant like 4, -8 is necessary to interconnect them to the pointer
tags in the lowest four bits

   cnt   ... S010
   big   ... S100
   sym   ... 1000
   cell  ... 

and I need to constantly juggle with knowing that cnt is 2, big is 4
and so on.

For the same reasons I was reluctant to introduce macros like

   cnt A  # A short?

which could equally be written as

   test A 2  # A short?

or, in this case

   test B 2  # A short?

Could I explain what I mean? Though each higher-level abstraction makes
the code easier, more readable and (the important point) better
searchable, it departs me more and more from the real model.

This becomes very obvious when I debug the code with 'gdb'. Then you see
only pointer structures, identified by the tag bits, and numeric
constant offsets like -8, 4 and 8. Now I'm used to immediately see the
type of a data object in the debugger, knowing that if it ends with a
'8' it is a symbol, and if it ends with a '2' it is a short number or
string. If I see in the debugger

   $rbx = 0x2b484d2d6538

(contents of the register 'E'), it is a symbol, because it ends with a
'8'. So I can inspect it by replacing the 8 with a 0 to get the cell
pointer:

   (gdb) x/2g 0x2b484d2d6530
   0x2b484d2d6530: 0x0612  0x00619438

You see that the 'TAIL' (the CAR of that cell) ends with a '2', so this
is a name. The hex code 61 is the ASCII char a. What we have here is
the symbol 'a'.

   The value '0x00619438' of that is a symbol again (turns out
   to be NIL):

   (gdb) x/2g 0x00619430
   0x619430 data_start+560:  0x04c494e2  0x00619438

(4e is 'N', 49 is 'I' and 4c is 'L')


Well, as you see, I stayed with 'TAIL', and 'CDR', use the test macros
'cnt', 'big', 'sym' etc., and also implemented flow macros like 'if' and
'while'. I always try to keep their double-nature in mind. But there are
limits on how far I want to go.

 E.g. push/pop: there are lots of places where I can see patterns like

This would be too far IMHO.

 ...
 (asmFn 'apply 2 (X Y Z)
... )

This would tempt the programmer to write it for every function, and he
would cease to optimize the flow locally. For example, many function
will do a push X in the beginning, and push other registers only when
a certain condition arises.

To be sure, such optimizations have no measurable impact on the
performance of the code. But still they are important for me, as they
bring it closer to the (to be defined) optimum. Let's say, they make
up the fun ;-)


 Maybe the question should be whether there are ways of building the
 assembly code programmatically rather than manually?

For core functions, this would defeat the described purpose. But for
application level libraries (like your ffi.l) it might be a valuable
option.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: First 64-bit release

2009-07-23 Thread randall . dow
Hi Alex,

Thanks for the explanation.  I haven't seriously programmed in
assembly since 1986. However, the mental transitions that you mention
are very familiar to me.  I remember starting to write macros for
everything and very soon it didn't matter that there were often
unnecessary instructions hidden in the macros.  Performance was
not important then, code readability was.  The goal soon became to
write everything in C and use assembler as little as possible. Your
decisions here to write picolisp64 in assembler and define an optimum
virtual assembler should lead to a very optimum system.

Cheers,
 - Rand


On Thu, Jul 23, 2009 at 8:21 AM, Alexander Burgera...@software-lab.de wrot=
e:
 Hi Tomas,

 Also, it would allow for macros/shortcuts to automate common patterns.

 I think I have to try to explain some of the reasons that drive me. When
 I program in assembly, I switch to a completely different mindset.

 While writing a core function in assembly, I want it to be as close to
 the optimum as possible. It does not matter how often I re-arrange the
 code, or how long I need for that, because this function will be written
 just once, but called very often later. There lies also the fun of it.

 This is completely different from using Lisp to write applications.
 There I want to be as productive as possible, to have an easier life,
 and to be able to abstract as much as possible.

 On the assembly level, I want to know each individual bit personally.
 Try not to hide anything. I want to keep my mind within the data model,
 as described e.g. in doc64/structures.

 For example, I had a long fight with myself whether I should introduce
 and use constants like 'BIG', 'CDR', or 'TAIL'. They are defined in
 src64/defs.l:

 =C2=A0 (equ BIG 4) =C2=A0 =C2=A0# Rest of a bignum + bignum tag
 =C2=A0 (equ CDR 8) =C2=A0 =C2=A0# CDR part of a list cell
 =C2=A0 (equ TAIL -8) =C2=A0# Tail of a symbol

 A completely straightforward, good and normal thing in daily
 programming. It is used like that:

 =C2=A0 ld E (E CDR) =C2=A0# Take CDR

 What is my problem with that? It hides the true nature of the
 underlying data structures. It could instead be written as

 =C2=A0 ld E (E 8) =C2=A0# Take CDR

 which results in the x86-64 code

 =C2=A0 mov 8(%rbx), %rbx

 Now when I use an opaque constant 'CDR' instead of '8', I easily forget
 what goes on on the low level, and have a higher concept of a CDR in
 mind. This makes it more difficult in some situations to keep in control
 of the lower levels, and to recognize common patterns. If I use 'TAIL'
 instead of '-8', I easily forget how I am accessing the pointers of a
 cell, how they are related to the pointers of neighboring cells etc.
 Then I have to keep both concepts in mind at the same time, and
 constantly switch between them. The awareness about the nature of
 constant like 4, -8 is necessary to interconnect them to the pointer
 tags in the lowest four bits

 =C2=A0 cnt =C2=A0 ... S010
 =C2=A0 big =C2=A0 ... S100
 =C2=A0 sym =C2=A0 ... 1000
 =C2=A0 cell =C2=A0... 

 and I need to constantly juggle with knowing that cnt is 2, big is 4
 and so on.

 For the same reasons I was reluctant to introduce macros like

 =C2=A0 cnt A =C2=A0# A short?

 which could equally be written as

 =C2=A0 test A 2 =C2=A0# A short?

 or, in this case

 =C2=A0 test B 2 =C2=A0# A short?

 Could I explain what I mean? Though each higher-level abstraction makes
 the code easier, more readable and (the important point) better
 searchable, it departs me more and more from the real model.

 This becomes very obvious when I debug the code with 'gdb'. Then you see
 only pointer structures, identified by the tag bits, and numeric
 constant offsets like -8, 4 and 8. Now I'm used to immediately see the
 type of a data object in the debugger, knowing that if it ends with a
 '8' it is a symbol, and if it ends with a '2' it is a short number or
 string. If I see in the debugger

 =C2=A0 $rbx =3D 0x2b484d2d6538

 (contents of the register 'E'), it is a symbol, because it ends with a
 '8'. So I can inspect it by replacing the 8 with a 0 to get the cell
 pointer:

 =C2=A0 (gdb) x/2g 0x2b484d2d6530
 =C2=A0 0x2b484d2d6530: 0x0612 =C2=A0 =C2=A0 =C2=A00x0=
0619438

 You see that the 'TAIL' (the CAR of that cell) ends with a '2', so this
 is a name. The hex code 61 is the ASCII char a. What we have here is
 the symbol 'a'.

 =C2=A0 The value '0x00619438' of that is a symbol again (turns ou=
t
 =C2=A0 to be NIL):

 =C2=A0 (gdb) x/2g 0x00619430
 =C2=A0 0x619430 data_start+560: =C2=A0 =C2=A0 =C2=A00x04c494e2 =
=C2=A0 =C2=A0 =C2=A00x00619438

 (4e is 'N', 49 is 'I' and 4c is 'L')


 Well, as you see, I stayed with 'TAIL', and 'CDR', use the test macros
 'cnt', 'big', 'sym' etc., and also implemented flow macros like 'if' and
 'while'. I always try to keep their double-nature in mind. But there are
 limits on how far I want to go.

 E.g. push/pop: there are lots of 

Re: First 64-bit release

2009-07-22 Thread Tomas Hlavaty
Hi Alex,

 Why did you abandon lisp syntax for the assembler?

 Good question. I considered it initially, but found no advantage
 using it (not even for the parser). I think that in such a case the
 parentheses would be rather unwieldy. Why would you prefier lisp
 syntax here?

List syntax gives the code more explicit structure and makes code
editing much easier usually but I appreciate that it is assemmbly
where each command is on separate line so it is not such an issue.
Also, it would allow for macros/shortcuts to automate common patterns.
E.g. push/pop: there are lots of places where I can see patterns like

(code 'doApply 2)
push X
push Y
push Z
..
pop Z
pop Y
pop X
ret

which could take advantage of more structured code:

(asmFn 'apply 2 (X Y Z)
   ... )

Maybe the question should be whether there are ways of building the
assembly code programmatically rather than manually?

Thanks  well done to your 64 bit release!

Cheers,

Tomas
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: The 'native' function (Was: First 64-bit release)

2009-07-22 Thread Tomas Hlavaty
Hi Alex,

thanks for the description of the FFI.  When I have time, I should
port my ffi.l to the 64 bit version.

Cheers,

Tomas
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


The 'native' function (Was: First 64-bit release)

2009-07-07 Thread Alexander Burger
Hi Tomas,

now the 'native' C function call is working. It is in the testing
release, and I think it looks quite nice! :-)

The syntax is

   (native lib fun ret val ..) - any

(the lib and fun arguments were swapped since my last mail).

With this function it is possible to call many C functions right out of
the box, without needing a glue function. In cases where a glue is still
needed, it should be easier to write one, as it does not need to know
about Lisp data types.


lib is the path name of the dynamically loadable library. When it is
empty ( or NIL), the main program is used.

fun must be the name of a symbol in that library.

ret specifies the return value, and an arbitrary number of vals
specify the arguments.


For val it is legal to pass:

   - a number (for a byte, int, long or pointer argument)
   - a symbol (for a string argument)
   - a list with a single number (for a dynamic buffer size)


The return value specification ret is more complicated. It uses
the following token symbols for primitive types:

  NIL   void
  T bool # NIL if zero / T if non-zero byte value
  B byte # Unsigned byte
  C char # UTF-8 character, 1-3 bytes
  I int  # Signed integer. Default if just a number is given
  N long # Unsigned long or pointer
  S String   # UTF-8. Default if just a symbol is given

That is, a function may directly return such a scalar value.

But a function may also return a pointer to an array or a structure (it
could optionally have been passed in with the single number list
argument above). In that case, the token symbols can be arbitrarily
nested, in combination with count values. Examples:

   The return value is an array of 4 longs:
   (N . 4)  -  long[4];

   The return value is a structure consisting of a char pointer, an
   integer array, and a character buffer:
   (S (I . 4) (C . 8))  -  struct {char *s; int i[4]; char nm[8];}


Let's see it in action:

   : (native NIL getenv 'S TERM)
   - xterm

So 'getenv' returns a string 'S', and accepts a string argument.


We might use 'printf':

   : (native NIL printf 'I abc%d%s^J (+ 3 4) (pack X Y Z))
   abc7XYZ
   - 8


More than the six register arguments of the x86-64 calling convention do
also work:

   : (native NIL printf 'I %d %d %d %d %d %d %d %d %d^J 1 2 3 4 5 6 7 8 9)
   1 2 3 4 5 6 7 8 9
   - 18


Then, to test structured returns, I wrote a simple C program, in a file
named dll.c:

   struct {
  char *s;
  int i[4];
  char nm[8];
   } Data;

   void *foo(int i, char *s) {
  printf(%d -- %s\n, i, s);
  Data.s = Hello;
  Data.i[0] = 1;
  Data.i[1] = 2;
  Data.i[2] = 3;
  Data.i[3] = 4;
  strcpy(Data.nm, world);
  return Data;
   }

I compiled it in the current directory with

   gcc -o dll.so \
  -fPIC -shared -export-dynamic \
  -O -falign-functions -fomit-frame-pointer \
  -W -Wimplicit -Wreturn-type -Wunused -Wformat \
  -Wuninitialized -Wstrict-prototypes \
  -pipe -D_GNU_SOURCE dll.c

Then I can called it as

   : (native ./dll.so foo '(S (I . 4) (C . 8)) 12345 a number)
   12345 -- a number
   - (Hello (1 2 3 4) (w o r l d NIL NIL NIL))

or

   : (native ./dll.so foo '((S I (I . 2) (B . 4)) (C . 8)) 12345 a number)
   12345 -- a number
   - ((Hello 1 (2 3) (4 0 0 0)) (w o r l d NIL NIL NIL))

The ret structure let's you quite nicely control the result.


Hey, Randall, isn't this quite near to what you always urged me to do
while at BMSC?

One drawback of 'native' is that it does not directly support floating
point numbers. Here, a glue function is still necessary.

Also functions that expect a structure pointer as an argument will need
a simple glue function. The elements of the structure must then be
passed as arguments, and the glue function can build the structure.

If a function expects a structure so that it can fill it with values,
and also returns that structure to the caller, it can be called directly
if the size of the structure (say 1 kB) is passed to 'native' as (1024).
'native' will then allocate the buffer, parse the returned contents, and
free the buffer memory afterwards.

If such a function does not return the structure pointer, such a dynamic
buffer is still useful when writing a glue function so that that glue
function will not have to worry about memory allocation and disposing,
as this must be handled by the caller (i.e 'native' in our case).

Hope I did not forget too much.  Any comments?

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: The 'native' function (Was: First 64-bit release)

2009-07-07 Thread Randall Dow
Hi Alex,

This looks very cool!  Yes, that is what I wished for at bmsc. Thanks!

Rand

On Tue, Jul 7, 2009 at 7:48 PM, Alexander Burgera...@software-lab.de wrote:
 now the 'native' C function call is working. It is in the testing
 release, and I think it looks quite nice! :-)
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: First 64-bit release

2009-07-02 Thread Tomas Hlavaty
Hi Alex,

 today I released the very first version of 64-bit PicoLisp!

congratulation!

 Two major issues are still missing: Networking and database support.

Is there a way to link this version with C libraries and do some kind
of FFI easily?

Why did you abandon lisp syntax for the assembler?

Thank you for the great work!

Tomas
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: First 64-bit release

2009-07-01 Thread dexen deVries
On Wednesday 01 of July 2009 18:35:26 Alexander Burger wrote:
 today I released the very first version of 64-bit PicoLisp!

Congratulations :D

--
dexen
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe