Good day - '(struct ...)' can guess C native structure alignment wrong: (with pil built for x86_64 from recent (few weeks old) pil21.tgz ) :
$ pil -version -bye 21.11.10 : (def 'Ptr (%@ "calloc" 'P 1 32)) -> 33897232 : (hex Ptr) -> "2053B10" : (eval (append (list 'struct 'Ptr ''(B . 32)) (C~carsz '(( B . 32 )) (need 32 0)))) -> (0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0) # C~carsz is a utility I wrote to make each member of its second # argument a cons-cell with the Size of the applicable Structure # element of its first (in car) argument in the cdr - so the list of zeros # becomes a list of (0 . 1) pairs. : (struct Ptr '( B . 32 )) -> (0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0) : (struct Ptr '( P W I ) '(1 . 8) '(2 . 2) '(3 . 4)) -> (1 2 3) : (struct Ptr '( B . 32 )) -> (1 0 0 0 0 0 0 0 2 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0) So on my x86_64, the little-endian 8-byte '1' value is correctly followed by the 2-byte short '2' value at offset 8, since offset 8 is 2-byte aligned; but then the 4-byte '3' value incorrectly begins at offset 10, which is NOT 4-byte aligned. An ia64 Itanium machine or ARM64 would get a SIGBUS trying to access an integer at offset 10 from an 8-byte aligned structure address; others may read garbled / wrongly byte-swapped bits or read all such third 'i' values < 65536 as 0. The natural native C-standards conforming alignment of the C structure: " struct s { unsigned long ul; unsigned short s; int i; } a_s; " is, as found by compiling only that code into object file t.o: $ gcc -o t.o -c t.c && objdump -Wi t.o t.o: file format elf64-x86-64 Contents of the .debug_info section: ... <1><22>: Abbrev Number: 4 (DW_TAG_structure_type) <23> DW_AT_name : s <25> DW_AT_byte_size : 16 <26> DW_AT_decl_file : 1 <27> DW_AT_decl_line : 1 <28> DW_AT_decl_column : 8 <29> DW_AT_sibling : <0x4d> <2><2d>: Abbrev Number: 1 (DW_TAG_member) <2e> DW_AT_name : ul <31> DW_AT_decl_file : 1 <31> DW_AT_decl_line : 2 <32> DW_AT_decl_column : 17 <33> DW_AT_type : <0x4d> <37> DW_AT_data_member_location: 0 <2><38>: Abbrev Number: 1 (DW_TAG_member) <39> DW_AT_name : s <3b> DW_AT_decl_file : 1 <3b> DW_AT_decl_line : 3 <3c> DW_AT_decl_column : 18 <3d> DW_AT_type : <0x53> <41> DW_AT_data_member_location: 8 <2><42>: Abbrev Number: 1 (DW_TAG_member) <43> DW_AT_name : i <45> DW_AT_decl_file : 1 <45> DW_AT_decl_line : 4 <46> DW_AT_decl_column : 7 <47> DW_AT_type : <0x59> <4b> DW_AT_data_member_location: 12 <2><4c>: Abbrev Number: 0 ... So, you see the 'i' member is at offset 12, NOT offset 10, because the native alignment of 'int' is 4, NOT 2 . What PicoLisp has done in the above example would work if the C structure had the 'packed' attribute, but it doesn't, or if the alignment of the third member was specified as 2, 1 or 0, but it isn't. I also discovered this issue when trying to access the modern POSIX last two members of the 'struct tm' localtime/gmtime(3) structure, 'tm_gmtoff' and 'tm_zone' - At first I tried using the structure descriptor: (def 'TMS '(I I I I I I I I I P S)) for the 9 integers of the old POSIX spec, followed by the long __tm_gmtoff and Zone abbreviation string pointer __tm_zone fields - but since a new 8-byte long MUST begin at an 8-byte boundary, an extra, unused, always 0 pad integer field is necessary to get this right: (def 'TMS '(I I I I I I I I I I P S)) # ^- necessary for padding! So, please ammend 'struct' et al to get this alignment right. And why does 'struct, when given the full structure description and a list of arguments, insist on those argments being cons pairs of ( val size ) , when it already knows what the size MUST be from the structure description? That is why I wrote 'carsz' - I attach the code, which suffers from the same alignment issue. Also, lack of support for UNSIGNED integers of size < P (pointer) is a pain! Please support 'U (unsigned int) and 'H (unsigned short) atom types ! Otherwise I will have to begin doing so as a project - perhaps combined with an ELF DWARF-{2,3,4} parser that can auto-generate PicoLisp '(struct ...)' descriptions from DWARF-3 '.debug_info' sections, if present in the ELF file, or in a SEPARATE .debug_info section ONLY containing ELF file, in a well defined location, eg. as produced by: $ objcopy --only-keep-debug $ELF_SRC $ELF_DBG_DEST $ strip --keep-file-symbols $ELF_SRC which is what I do with my executables that I build & install ( a glibc abort() will give better info with 'strip --keep-file-symbols' as opposed to just 'strip' - gdb can combine these with the full .debug_info section when it loads it ). .debug_info sections are easy to generate and parse, easy to store in separate (perhaps compressed) files & ship, and will ALWAYS contain 100% accurate offset, alignment, bit field info, etc, for ALL structures + function prototypes (also details of registers used for calling function protoypes with 'register' parameters) the code they were compiled from. This IMHO is MUCH better & simpler approach than complex C header parsing & intermediate description languages as used by PERL, which has implemented a PERL C parser : C::Scan, which basically means a PicoLisp C compiler / parser / equivalent of C::Scan and equivalents of 'h2ph' / 'h2xs' and an XS equivalent - a huge task. I envision augmented '(struct ...)' function that will accept a new type of struct descriptor handled by struct: (struct ptr "an_elf_debug_info_file_path:struct s" ...) or when just a string symbol like "struct s" given (no '*:' prefix), the .debug_info sections of all current loaded modules are searched for the 's' structure debug info, or in the the context of a '(native )' call, the load module being loaded is searched first, then the current load modules . Also '(native) should be augmented to handle function parameters which are registers, also 'struct' valued parameters which are NOT pointers and which fit in a native long (passed in registers) - eg. a bit-field struct of < 64 bits. And support a proper new 'V argument type meaning "C Variable Argument List" (va_list) . This could be nicer to use and just as accurate as SBCL's sb-alien package - my favorite C FFI generator used so far, but one has to give it VERY detailed LISP descriptions - I'd like to do a DWARF generator for those as well - but I'd like PicoLisp to have one first. Also what about 128-bit integers ( __{,u}int128 ) ? and long doubles? All modern 64-bit machines support them - can't PicoLisp, eg. with a new 'R (long double) and 'G 'g (__int128 / unsigned __int128 ) ("giant" int) C types? Since PicoLisp seems to be a mainly Linux-centric implementation, why not only support this feature for ELF files on Linux / Android at least ? True, it might not be so easy for PE / XCOFF files - but there is .NET / CLR for PE, which does all of the above. Then 'struct' needs to be augmented to handle C alignment declarations like '#pragma packed' / '__attribute__((packed))' - maybe the size cdr member could itself be a cons of size and alignment ? and struct alignment declarations and member alignment declarations, and fully handle bit-fields, getting the bit order right (lowest bit number first on LSB, highest on MSB machines - implies that on LSB, low bit number members must precede high bit number members in bit-field initializer lists , and vice-versa on MSB machines. ) - and maybe also PicoLisp could provide new '(enum ...)' and '(bitf ...)' functions for defining / importing C compatible enum and bitfield definitions from debug_info also - then really some better bit operations , that at least include the 'lognot' / C/C++ "~" single operand prefix bitwise negation operator , which appears to be still missing from PicoLisp, and an implementation of Common LISP's 'BITS, 'BYTE, 'LDB, and 'DPB functions would be most useful - maybe use the UTF8 symbol '¬' for 'lognot', since '~' is already used - and what about supporting the full set of ( ASH, BITS, BYTE, DPB, LDB, LOGAND, LOGANDC1, LOGANDC2, LOGEQV, LOGIOR, LOGNAND, LOGNOR, LOGNOT, LOGORC1, LOGORC2, LOGXOR ) bit operations ? PicoLisp is a bit lacking in bit-ops support. Please, at least also provide '(<< ..)' as well as '(>> ..)' and '(¬ ..)' (lognot), as native built-ins! I'd like to have a go at this 'struct and 'native improvement and new bit-ops project if there are no plans for doing the above soon by the PicoLisp development team - I will share my results if they work well. PicoLisp is so far in advance of most alternatives, I'd really like to continue using it in projects, but unless I can get this structure member alignment issue sorted easily, which SBCL's 'sb-alien' and CL-FFI packages do not suffer from, and good native bit-ops and UNSIGNED small integer support, and support for passing bit-field and enum and variable argument list parameter values, I may have to bite the bullet and go back to using SBCL as my main LISP - which I agree is too large and complex for many tasks (like Java) - I'd prefer to use PicoLisp with a nice new augmented '(struct ...)' descriptor that handles alignment properly, then I can work on augmenting it further to support struct descriptor auto-generation from DWARF info as described above. I think the main.l '_struct work is mainly done by 'ofs : 'llvm~ofs / 'getelementptr ? Surely this could be updated to keep a running count of the bytes of structure laid out so far, O, so that the current offset O is known, then we have the needed alignment A of next field, so instead of adding just size S of next field, whose alignment A has up to N bits, so the mask M for A is ((1 << N)-1), we need to add any remainder to bring the offset up to the required alignment: O = (O + S + ( (O & M) ? (A - (O & M)) : 0)) (in C-style arithmetic) to get the correct next value of offset O given alignment details of the next field and the current offset. It should be fairly straightforward to implemnt a similar algorithm in 'struct / 'ofs / 'getelementptr . I will start off doing it in my Ssz and carsz functions - they both need changing to add the missing pad bytes. This is simple to do for the basic integer atomic types, but what about attributes 'packed' and 'align' that allow users to specify their own alignments? And for 'native, users can specify register parameters and bit-field struct arguments which are not pointers , and functions which return values in registers, not on stack). There is nowhere in PicoLisp's current 'struct or 'native implementations that allows these alignments or parameter types to be specified . So carsz and Ssz for instance must now be ammended to support an optional extra parameter that specifies (structure member number, alignment) pairs or the single 'alignment atom for the whole structure which can be 0 (meaning the structure has the 'packed' attribute). I guess that would be a temporary fix for me - but then I will need to fix 'struct to handle alignment properly also if it is not done soon - please let me know if this is going to be the case. Thanks for PicoLisp, which is otherwise excellent, but yet can still be improved ! All the best, Jason
C.l
Description: implementation of 'Ssz and 'carsz