Re: First 64-bit release
Hi Alex, very interesting! I think I have to try to explain some of the reasons that drive me. When I program in assembly, I switch to a completely different mindset. ... Reminds me a bit when I worked on a project where I learnt to quickly understand what was going on inside a railway interlocking system by reading hex dump of communication packets. Not easy for outsiders trying to understand something at such a low level. But it can become a second nature after some time and it has the advantage of really understanding the things under the hood. Thank you, Tomas -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: First 64-bit release
Hi Tomas, Also, it would allow for macros/shortcuts to automate common patterns. I think I have to try to explain some of the reasons that drive me. When I program in assembly, I switch to a completely different mindset. While writing a core function in assembly, I want it to be as close to the optimum as possible. It does not matter how often I re-arrange the code, or how long I need for that, because this function will be written just once, but called very often later. There lies also the fun of it. This is completely different from using Lisp to write applications. There I want to be as productive as possible, to have an easier life, and to be able to abstract as much as possible. On the assembly level, I want to know each individual bit personally. Try not to hide anything. I want to keep my mind within the data model, as described e.g. in doc64/structures. For example, I had a long fight with myself whether I should introduce and use constants like 'BIG', 'CDR', or 'TAIL'. They are defined in src64/defs.l: (equ BIG 4)# Rest of a bignum + bignum tag (equ CDR 8)# CDR part of a list cell (equ TAIL -8) # Tail of a symbol A completely straightforward, good and normal thing in daily programming. It is used like that: ld E (E CDR) # Take CDR What is my problem with that? It hides the true nature of the underlying data structures. It could instead be written as ld E (E 8) # Take CDR which results in the x86-64 code mov 8(%rbx), %rbx Now when I use an opaque constant 'CDR' instead of '8', I easily forget what goes on on the low level, and have a higher concept of a CDR in mind. This makes it more difficult in some situations to keep in control of the lower levels, and to recognize common patterns. If I use 'TAIL' instead of '-8', I easily forget how I am accessing the pointers of a cell, how they are related to the pointers of neighboring cells etc. Then I have to keep both concepts in mind at the same time, and constantly switch between them. The awareness about the nature of constant like 4, -8 is necessary to interconnect them to the pointer tags in the lowest four bits cnt ... S010 big ... S100 sym ... 1000 cell ... and I need to constantly juggle with knowing that cnt is 2, big is 4 and so on. For the same reasons I was reluctant to introduce macros like cnt A # A short? which could equally be written as test A 2 # A short? or, in this case test B 2 # A short? Could I explain what I mean? Though each higher-level abstraction makes the code easier, more readable and (the important point) better searchable, it departs me more and more from the real model. This becomes very obvious when I debug the code with 'gdb'. Then you see only pointer structures, identified by the tag bits, and numeric constant offsets like -8, 4 and 8. Now I'm used to immediately see the type of a data object in the debugger, knowing that if it ends with a '8' it is a symbol, and if it ends with a '2' it is a short number or string. If I see in the debugger $rbx = 0x2b484d2d6538 (contents of the register 'E'), it is a symbol, because it ends with a '8'. So I can inspect it by replacing the 8 with a 0 to get the cell pointer: (gdb) x/2g 0x2b484d2d6530 0x2b484d2d6530: 0x0612 0x00619438 You see that the 'TAIL' (the CAR of that cell) ends with a '2', so this is a name. The hex code 61 is the ASCII char a. What we have here is the symbol 'a'. The value '0x00619438' of that is a symbol again (turns out to be NIL): (gdb) x/2g 0x00619430 0x619430 data_start+560: 0x04c494e2 0x00619438 (4e is 'N', 49 is 'I' and 4c is 'L') Well, as you see, I stayed with 'TAIL', and 'CDR', use the test macros 'cnt', 'big', 'sym' etc., and also implemented flow macros like 'if' and 'while'. I always try to keep their double-nature in mind. But there are limits on how far I want to go. E.g. push/pop: there are lots of places where I can see patterns like This would be too far IMHO. ... (asmFn 'apply 2 (X Y Z) ... ) This would tempt the programmer to write it for every function, and he would cease to optimize the flow locally. For example, many function will do a push X in the beginning, and push other registers only when a certain condition arises. To be sure, such optimizations have no measurable impact on the performance of the code. But still they are important for me, as they bring it closer to the (to be defined) optimum. Let's say, they make up the fun ;-) Maybe the question should be whether there are ways of building the assembly code programmatically rather than manually? For core functions, this would defeat the described purpose. But for application level libraries (like your ffi.l) it might be a valuable option. Cheers, - Alex -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: First 64-bit release
Hi Alex, Thanks for the explanation. I haven't seriously programmed in assembly since 1986. However, the mental transitions that you mention are very familiar to me. I remember starting to write macros for everything and very soon it didn't matter that there were often unnecessary instructions hidden in the macros. Performance was not important then, code readability was. The goal soon became to write everything in C and use assembler as little as possible. Your decisions here to write picolisp64 in assembler and define an optimum virtual assembler should lead to a very optimum system. Cheers, - Rand On Thu, Jul 23, 2009 at 8:21 AM, Alexander Burgera...@software-lab.de wrot= e: Hi Tomas, Also, it would allow for macros/shortcuts to automate common patterns. I think I have to try to explain some of the reasons that drive me. When I program in assembly, I switch to a completely different mindset. While writing a core function in assembly, I want it to be as close to the optimum as possible. It does not matter how often I re-arrange the code, or how long I need for that, because this function will be written just once, but called very often later. There lies also the fun of it. This is completely different from using Lisp to write applications. There I want to be as productive as possible, to have an easier life, and to be able to abstract as much as possible. On the assembly level, I want to know each individual bit personally. Try not to hide anything. I want to keep my mind within the data model, as described e.g. in doc64/structures. For example, I had a long fight with myself whether I should introduce and use constants like 'BIG', 'CDR', or 'TAIL'. They are defined in src64/defs.l: =C2=A0 (equ BIG 4) =C2=A0 =C2=A0# Rest of a bignum + bignum tag =C2=A0 (equ CDR 8) =C2=A0 =C2=A0# CDR part of a list cell =C2=A0 (equ TAIL -8) =C2=A0# Tail of a symbol A completely straightforward, good and normal thing in daily programming. It is used like that: =C2=A0 ld E (E CDR) =C2=A0# Take CDR What is my problem with that? It hides the true nature of the underlying data structures. It could instead be written as =C2=A0 ld E (E 8) =C2=A0# Take CDR which results in the x86-64 code =C2=A0 mov 8(%rbx), %rbx Now when I use an opaque constant 'CDR' instead of '8', I easily forget what goes on on the low level, and have a higher concept of a CDR in mind. This makes it more difficult in some situations to keep in control of the lower levels, and to recognize common patterns. If I use 'TAIL' instead of '-8', I easily forget how I am accessing the pointers of a cell, how they are related to the pointers of neighboring cells etc. Then I have to keep both concepts in mind at the same time, and constantly switch between them. The awareness about the nature of constant like 4, -8 is necessary to interconnect them to the pointer tags in the lowest four bits =C2=A0 cnt =C2=A0 ... S010 =C2=A0 big =C2=A0 ... S100 =C2=A0 sym =C2=A0 ... 1000 =C2=A0 cell =C2=A0... and I need to constantly juggle with knowing that cnt is 2, big is 4 and so on. For the same reasons I was reluctant to introduce macros like =C2=A0 cnt A =C2=A0# A short? which could equally be written as =C2=A0 test A 2 =C2=A0# A short? or, in this case =C2=A0 test B 2 =C2=A0# A short? Could I explain what I mean? Though each higher-level abstraction makes the code easier, more readable and (the important point) better searchable, it departs me more and more from the real model. This becomes very obvious when I debug the code with 'gdb'. Then you see only pointer structures, identified by the tag bits, and numeric constant offsets like -8, 4 and 8. Now I'm used to immediately see the type of a data object in the debugger, knowing that if it ends with a '8' it is a symbol, and if it ends with a '2' it is a short number or string. If I see in the debugger =C2=A0 $rbx =3D 0x2b484d2d6538 (contents of the register 'E'), it is a symbol, because it ends with a '8'. So I can inspect it by replacing the 8 with a 0 to get the cell pointer: =C2=A0 (gdb) x/2g 0x2b484d2d6530 =C2=A0 0x2b484d2d6530: 0x0612 =C2=A0 =C2=A0 =C2=A00x0= 0619438 You see that the 'TAIL' (the CAR of that cell) ends with a '2', so this is a name. The hex code 61 is the ASCII char a. What we have here is the symbol 'a'. =C2=A0 The value '0x00619438' of that is a symbol again (turns ou= t =C2=A0 to be NIL): =C2=A0 (gdb) x/2g 0x00619430 =C2=A0 0x619430 data_start+560: =C2=A0 =C2=A0 =C2=A00x04c494e2 = =C2=A0 =C2=A0 =C2=A00x00619438 (4e is 'N', 49 is 'I' and 4c is 'L') Well, as you see, I stayed with 'TAIL', and 'CDR', use the test macros 'cnt', 'big', 'sym' etc., and also implemented flow macros like 'if' and 'while'. I always try to keep their double-nature in mind. But there are limits on how far I want to go. E.g. push/pop: there are lots of
Re: First 64-bit release
Hi Alex, Why did you abandon lisp syntax for the assembler? Good question. I considered it initially, but found no advantage using it (not even for the parser). I think that in such a case the parentheses would be rather unwieldy. Why would you prefier lisp syntax here? List syntax gives the code more explicit structure and makes code editing much easier usually but I appreciate that it is assemmbly where each command is on separate line so it is not such an issue. Also, it would allow for macros/shortcuts to automate common patterns. E.g. push/pop: there are lots of places where I can see patterns like (code 'doApply 2) push X push Y push Z .. pop Z pop Y pop X ret which could take advantage of more structured code: (asmFn 'apply 2 (X Y Z) ... ) Maybe the question should be whether there are ways of building the assembly code programmatically rather than manually? Thanks well done to your 64 bit release! Cheers, Tomas -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: The 'native' function (Was: First 64-bit release)
Hi Alex, thanks for the description of the FFI. When I have time, I should port my ffi.l to the 64 bit version. Cheers, Tomas -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
The 'native' function (Was: First 64-bit release)
Hi Tomas, now the 'native' C function call is working. It is in the testing release, and I think it looks quite nice! :-) The syntax is (native lib fun ret val ..) - any (the lib and fun arguments were swapped since my last mail). With this function it is possible to call many C functions right out of the box, without needing a glue function. In cases where a glue is still needed, it should be easier to write one, as it does not need to know about Lisp data types. lib is the path name of the dynamically loadable library. When it is empty ( or NIL), the main program is used. fun must be the name of a symbol in that library. ret specifies the return value, and an arbitrary number of vals specify the arguments. For val it is legal to pass: - a number (for a byte, int, long or pointer argument) - a symbol (for a string argument) - a list with a single number (for a dynamic buffer size) The return value specification ret is more complicated. It uses the following token symbols for primitive types: NIL void T bool # NIL if zero / T if non-zero byte value B byte # Unsigned byte C char # UTF-8 character, 1-3 bytes I int # Signed integer. Default if just a number is given N long # Unsigned long or pointer S String # UTF-8. Default if just a symbol is given That is, a function may directly return such a scalar value. But a function may also return a pointer to an array or a structure (it could optionally have been passed in with the single number list argument above). In that case, the token symbols can be arbitrarily nested, in combination with count values. Examples: The return value is an array of 4 longs: (N . 4) - long[4]; The return value is a structure consisting of a char pointer, an integer array, and a character buffer: (S (I . 4) (C . 8)) - struct {char *s; int i[4]; char nm[8];} Let's see it in action: : (native NIL getenv 'S TERM) - xterm So 'getenv' returns a string 'S', and accepts a string argument. We might use 'printf': : (native NIL printf 'I abc%d%s^J (+ 3 4) (pack X Y Z)) abc7XYZ - 8 More than the six register arguments of the x86-64 calling convention do also work: : (native NIL printf 'I %d %d %d %d %d %d %d %d %d^J 1 2 3 4 5 6 7 8 9) 1 2 3 4 5 6 7 8 9 - 18 Then, to test structured returns, I wrote a simple C program, in a file named dll.c: struct { char *s; int i[4]; char nm[8]; } Data; void *foo(int i, char *s) { printf(%d -- %s\n, i, s); Data.s = Hello; Data.i[0] = 1; Data.i[1] = 2; Data.i[2] = 3; Data.i[3] = 4; strcpy(Data.nm, world); return Data; } I compiled it in the current directory with gcc -o dll.so \ -fPIC -shared -export-dynamic \ -O -falign-functions -fomit-frame-pointer \ -W -Wimplicit -Wreturn-type -Wunused -Wformat \ -Wuninitialized -Wstrict-prototypes \ -pipe -D_GNU_SOURCE dll.c Then I can called it as : (native ./dll.so foo '(S (I . 4) (C . 8)) 12345 a number) 12345 -- a number - (Hello (1 2 3 4) (w o r l d NIL NIL NIL)) or : (native ./dll.so foo '((S I (I . 2) (B . 4)) (C . 8)) 12345 a number) 12345 -- a number - ((Hello 1 (2 3) (4 0 0 0)) (w o r l d NIL NIL NIL)) The ret structure let's you quite nicely control the result. Hey, Randall, isn't this quite near to what you always urged me to do while at BMSC? One drawback of 'native' is that it does not directly support floating point numbers. Here, a glue function is still necessary. Also functions that expect a structure pointer as an argument will need a simple glue function. The elements of the structure must then be passed as arguments, and the glue function can build the structure. If a function expects a structure so that it can fill it with values, and also returns that structure to the caller, it can be called directly if the size of the structure (say 1 kB) is passed to 'native' as (1024). 'native' will then allocate the buffer, parse the returned contents, and free the buffer memory afterwards. If such a function does not return the structure pointer, such a dynamic buffer is still useful when writing a glue function so that that glue function will not have to worry about memory allocation and disposing, as this must be handled by the caller (i.e 'native' in our case). Hope I did not forget too much. Any comments? Cheers, - Alex -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: The 'native' function (Was: First 64-bit release)
Hi Alex, This looks very cool! Yes, that is what I wished for at bmsc. Thanks! Rand On Tue, Jul 7, 2009 at 7:48 PM, Alexander Burgera...@software-lab.de wrote: now the 'native' C function call is working. It is in the testing release, and I think it looks quite nice! :-) -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: First 64-bit release
Hi Alex, today I released the very first version of 64-bit PicoLisp! congratulation! Two major issues are still missing: Networking and database support. Is there a way to link this version with C libraries and do some kind of FFI easily? Why did you abandon lisp syntax for the assembler? Thank you for the great work! Tomas -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: First 64-bit release
On Wednesday 01 of July 2009 18:35:26 Alexander Burger wrote: today I released the very first version of 64-bit PicoLisp! Congratulations :D -- dexen -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe