Good day -

  '(struct ...)' can guess C native structure alignment wrong:
  (with pil built for x86_64 from recent (few  weeks old) pil21.tgz ) :

  $ pil -version -bye
  21.11.10

  : (def 'Ptr (%@ "calloc" 'P 1 32))
  -> 33897232
  : (hex Ptr)
  -> "2053B10"
  : (eval (append (list 'struct 'Ptr  ''(B . 32)) (C~carsz '(( B . 32 )) (need 
32 0))))
  -> (0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
  # C~carsz is a utility I wrote to make each member of its second
  # argument a cons-cell with the Size of the applicable Structure
  # element of its first (in car) argument in the cdr - so the list of zeros
  # becomes a list of (0 . 1) pairs.
  : (struct Ptr '( B . 32 ))
  -> (0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
  : (struct Ptr '( P W I ) '(1 . 8) '(2 . 2) '(3 . 4))
  -> (1 2 3)
  : (struct Ptr '( B . 32 ))
  -> (1 0 0 0 0 0 0 0 2 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)

  So on my x86_64, the little-endian 8-byte '1' value is correctly
  followed by the 2-byte short '2'  value at offset 8, since offset
  8 is 2-byte aligned; but then the 4-byte '3' value incorrectly
  begins at offset 10, which is NOT 4-byte aligned.

  An ia64 Itanium machine or ARM64 would get a SIGBUS trying to 
  access an integer at offset 10 from an 8-byte aligned 
  structure address; others may read
  garbled / wrongly byte-swapped bits 
  or read all such third 'i' values < 65536 as 0.

  The natural native C-standards conforming alignment of the
  C structure:
  " struct s
    { unsigned long ul;
      unsigned short s;
      int i;
    } a_s;
  "
  is, as found by compiling only that code into object file t.o:
  $ gcc -o t.o -c t.c && objdump -Wi t.o
  t.o:     file format elf64-x86-64

  Contents of the .debug_info section:
  ...
   <1><22>: Abbrev Number: 4 (DW_TAG_structure_type)
    <23>   DW_AT_name        : s
    <25>   DW_AT_byte_size   : 16
    <26>   DW_AT_decl_file   : 1
    <27>   DW_AT_decl_line   : 1
    <28>   DW_AT_decl_column : 8
    <29>   DW_AT_sibling     : <0x4d>
 <2><2d>: Abbrev Number: 1 (DW_TAG_member)
    <2e>   DW_AT_name        : ul
    <31>   DW_AT_decl_file   : 1
    <31>   DW_AT_decl_line   : 2
    <32>   DW_AT_decl_column : 17
    <33>   DW_AT_type        : <0x4d>
    <37>   DW_AT_data_member_location: 0
 <2><38>: Abbrev Number: 1 (DW_TAG_member)
    <39>   DW_AT_name        : s
    <3b>   DW_AT_decl_file   : 1
    <3b>   DW_AT_decl_line   : 3
    <3c>   DW_AT_decl_column : 18
    <3d>   DW_AT_type        : <0x53>
    <41>   DW_AT_data_member_location: 8
 <2><42>: Abbrev Number: 1 (DW_TAG_member)
    <43>   DW_AT_name        : i
    <45>   DW_AT_decl_file   : 1
    <45>   DW_AT_decl_line   : 4
    <46>   DW_AT_decl_column : 7
    <47>   DW_AT_type        : <0x59>
    <4b>   DW_AT_data_member_location: 12
 <2><4c>: Abbrev Number: 0
  ...

  So, you see the 'i' member is at offset 12,
  NOT offset 10, because the native alignment
  of 'int' is 4, NOT 2 .

  What PicoLisp has done in the
  above example would work if the
  C structure had the 'packed'
  attribute, but it doesn't, or
  if the alignment of the third
  member was specified as 2, 1 or 0,
  but it isn't.

  I also discovered this issue when trying to access
  the modern POSIX last two members of the
  'struct tm' localtime/gmtime(3) structure,
   'tm_gmtoff' and 'tm_zone' -
  At first I tried using the structure descriptor:
  (def 'TMS  '(I I I I  I I I I  I  P S))
  for the 9 integers of the old POSIX spec,
  followed by the long __tm_gmtoff and Zone
  abbreviation string pointer __tm_zone fields -
  but since a new 8-byte long MUST begin at
  an 8-byte boundary, an extra, unused, always 0
  pad integer field is necessary to get this right:
  (def 'TMS  '(I I I I  I I I I  I  I  P S))
  #                                 ^- necessary for padding!

  So, please ammend 'struct' et al to get this
  alignment right.

  And why does 'struct, when given
  the full structure description and
  a list of arguments, insist on
  those argments being cons pairs
  of ( val size ) , when it already
  knows what the size MUST be from
  the structure description? 
  That is why I wrote 'carsz' -
  I attach the code, which suffers
  from the same alignment issue.

  Also, lack of support for UNSIGNED integers
  of size < P (pointer) is a pain!

  Please support 'U (unsigned int) and 'H
  (unsigned short) atom types !

  Otherwise I will have to begin doing so as 
  a project - perhaps combined with an ELF DWARF-{2,3,4}
  parser that can auto-generate PicoLisp '(struct ...)'
  descriptions from DWARF-3 '.debug_info' sections,
  if present in the ELF file, or in a SEPARATE
  .debug_info section ONLY containing ELF file,
  in a well defined location, eg.  as produced by:
   $ objcopy --only-keep-debug $ELF_SRC $ELF_DBG_DEST
   $ strip --keep-file-symbols $ELF_SRC
  which is what I do with my executables that I build & install 
  ( a glibc abort() will give better info
    with 'strip --keep-file-symbols' as opposed to just 'strip' -
    gdb can combine these with the full .debug_info section when
    it loads it
  ).

  .debug_info sections are easy to generate and parse, easy
  to store in separate (perhaps compressed) files & ship, 
  and will ALWAYS contain 100%  accurate offset, alignment,
  bit field info, etc, for ALL structures + function prototypes 
  (also details of  registers used for calling function
  protoypes with 'register' parameters) the code they were compiled from.
  This IMHO is MUCH better & simpler approach than complex
  C header parsing & intermediate description languages as used
  by PERL, which has implemented a PERL C parser : C::Scan,
  which basically means a PicoLisp C compiler / parser /
  equivalent of C::Scan and equivalents of 'h2ph' / 'h2xs'
  and an XS equivalent - a huge task. 

  I envision augmented '(struct ...)' function that will accept
  a new type of struct descriptor handled by struct:
    (struct ptr "an_elf_debug_info_file_path:struct s"  ...)
  or when just a string symbol like "struct s" given (no '*:' prefix),
  the .debug_info sections of all current loaded modules are searched
  for the 's' structure debug info, 
  or in the the context of a '(native )' call, the load module being
  loaded is searched first, then the  current load modules .

  Also '(native) should be augmented to handle
  function parameters which are registers,
  also 'struct' valued parameters  which are NOT pointers 
  and which fit in a native long (passed in registers) -
  eg. a bit-field struct of < 64  bits.
  And support a proper new 'V argument type
  meaning "C Variable Argument List" (va_list) .

  This could be nicer to use and just as accurate as SBCL's
  sb-alien package - my favorite C FFI generator used so far, 
  but one has to give it VERY detailed LISP descriptions - 
  I'd like to do a DWARF generator for those as well -
  but I'd like PicoLisp to have one first.

  Also what about 128-bit integers ( __{,u}int128 ) ?
  and long doubles?
  All modern 64-bit machines support them - can't PicoLisp, eg. with a
  new 'R (long double) and 'G 'g (__int128 / unsigned __int128 )
  ("giant" int)  C types?

  Since PicoLisp seems to be a mainly Linux-centric implementation,
  why not only support this feature for ELF files on Linux / Android at
  least ? 
  True, it might not be so easy for PE / XCOFF files - 
  but there is .NET / CLR for PE, which does all of the above.

  Then 'struct' needs to be augmented to handle C alignment declarations
  like '#pragma packed' / '__attribute__((packed))' - maybe
  the size cdr member could itself be a cons of size and alignment ? 
  and struct alignment declarations and member alignment declarations, and
  fully handle bit-fields, getting the bit order right
  (lowest bit number first on LSB, highest on MSB machines -
   implies that on LSB, low bit number members must precede high bit 
   number members in bit-field initializer lists , and vice-versa
   on MSB machines.
  ) - and maybe also PicoLisp could provide  new '(enum ...)'
  and '(bitf ...)' functions for defining / importing C compatible
  enum and bitfield definitions from debug_info also - then
  really some better bit operations , that at least include the
  'lognot' / C/C++ "~" single operand prefix bitwise negation operator ,
  which appears to be still missing from PicoLisp, and an
  implementation of Common LISP's 'BITS, 'BYTE, 'LDB, and 'DPB functions
  would be most  useful - maybe use the UTF8 symbol '¬' for 'lognot',
  since '~' is already used - and  what about supporting the full set of
  ( ASH, BITS, BYTE, DPB, LDB, LOGAND, LOGANDC1,
    LOGANDC2, LOGEQV, LOGIOR, LOGNAND,
    LOGNOR, LOGNOT, LOGORC1, LOGORC2, LOGXOR 
  ) bit operations ? PicoLisp is a bit lacking in bit-ops support.
  Please, at least also provide '(<<  ..)' as well as '(>> ..)' and
  '(¬ ..)' (lognot), as native built-ins!

  I'd like to have a go at this 'struct and 'native improvement 
  and new bit-ops project if there are no plans for doing the above soon
  by the PicoLisp development team - I will share my results if they work well.

  PicoLisp is so far in advance of most alternatives, I'd really like
  to continue using it in projects,  but unless I can get this
  structure member alignment issue sorted easily, which SBCL's 
  'sb-alien' and CL-FFI packages do not suffer from, and good native  
  bit-ops and UNSIGNED small integer support, and support for passing 
  bit-field and enum and variable argument list parameter values, I may have
  to bite the bullet and go back to using SBCL as my main LISP - which I  
  agree is too large and complex for many tasks (like Java) -
  I'd prefer to use PicoLisp with a nice new augmented '(struct ...)'
  descriptor that handles alignment properly, then I can work on  
  augmenting it further to support struct descriptor auto-generation
  from DWARF info as described above.

  I think the main.l '_struct work is mainly done by 'ofs : 'llvm~ofs /
  'getelementptr ? Surely this could  be updated to keep a 
  running count of the bytes of structure laid out so far, O, so that
  the current offset O is known, then we have the needed alignment A 
  of next field, so instead of adding just size S of next field,
  whose alignment A has up to N bits, so the mask M for A is ((1 << N)-1),
  we need to add any remainder to bring the offset up to the
  required alignment:

   O = (O + S + ( (O & M) ? (A - (O & M)) : 0))

  (in C-style arithmetic) to get the correct next value of offset O 
  given alignment details of the  next field and the current offset.
  It should be fairly  straightforward to implemnt a similar
  algorithm in 'struct / 'ofs / 'getelementptr .
  I will start off doing it in my Ssz and carsz functions - they
  both need changing to add the missing pad bytes.
  This is simple to do for the basic integer atomic types, but what
  about attributes 'packed' and 'align' that allow users to
  specify their own alignments?
  And for 'native, users can specify register parameters and bit-field
  struct arguments which are not pointers , and functions which return
   values in registers, not on  stack).
  There is nowhere in PicoLisp's current 'struct or 'native implementations
  that allows these alignments or parameter types to be specified .
  So carsz and Ssz for instance must now be ammended to support an optional 
extra 
  parameter that specifies
   (structure member number, alignment)
  pairs or the single 'alignment atom for the whole structure which
  can be 0 (meaning the structure  has the 'packed' attribute).
  I guess that would be a temporary fix for me - but then I will need to fix
  'struct to handle alignment  properly also if it is not done
  soon - please let me know if this is going to be the case.

  Thanks for PicoLisp, which is otherwise excellent, but yet
  can still be improved !

All the best,
Jason

Attachment: C.l
Description: implementation of 'Ssz and 'carsz

Reply via email to