On Sat, 18 Jan 2020, JJ Merelo wrote:
> The example works perfectly, and it does because it's a string literal
> which is already 0 terminated. Let's use this code instead of the one that
> I used in my other mail about this (which you probably didn't read anyway):
> 
> 8< 8< 8<
> 
> What does this mean? It means that NativeCall does the right call
> (badum-tssss) and converts a Raku string literal into a C string literal,
> inserting the null termination even if we didn't. I actually don't care if
> it was the NativeCall API or the encode method. It just works. It gets
> allocated the right amount of memory, it gets passed correctly into the C
> realm. Just works. Since @array.elems has 3 elements, well, it might be
> rather the C part the one that does that. But I really don't care, and it
> does not really matter, and thus the example is correct, no need to add
> anything else to the documentation. Except maybe "get your C right"
> 

What you say seems to be correct: if you have a string literal in your
Raku code, this works for me, too. (Sometimes, see below.)

BUT the terminating NUL character is not inserted by NativeCall and it
isn't inserted by &encode. If you run this program which uses a much
longer string that is not a literal on the face of it:

    use NativeCall;
    sub c-printf(CArray[uint8]) is native is symbol<printf> { * };

    my $string = "X" x 1994;
    my $array = CArray[uint8].new($string.encode.list);
    c-printf $array;

through valgrind, it will warn you about a one-character invalid read,
that is a byte accessed by printf() which is at offset 0 after a properly
allocated block of size 1994:

    $ perl6-valgrind-m -MNativeCall -e 'sub c-printf(CArray[uint8]) is native 
is symbol<printf> { * }; my $string = "X" x 1994; my $array = 
CArray[uint8].new($string.encode.list); c-printf $array' >/dev/null

    ==325957== Invalid read of size 1
    ==325957==    at 0x48401FC: strchrnul (vg_replace_strmem.c:1395)
    ==325957==    by 0x50CD334: __vfprintf_internal (in /usr/lib/libc-2.30.so)
    ==325957==    by 0x50BA26E: printf (in /usr/lib/libc-2.30.so)
    ==325957==    by 0x4B58048: ??? (in $rakudo/install/lib/libmoar.so)
    ==325957==    by 0x1FFEFFFB5F: ???
    ==325957==    by 0x4B57F81: dc_callvm_call_x64 (in 
$rakudo/install/lib/libmoar.so)
    ==325957==    by 0x50BA1BF: ??? (in /usr/lib/libc-2.30.so)
    ==325957==    by 0xA275E3F: ???
    ==325957==    by 0x990153F: ???
    ==325957==    by 0xA275E3F: ???
    ==325957==    by 0x1FFEFFFB7F: ???
    ==325957==    by 0x4B578D1: dcCallVoid (in $rakudo/install/lib/libmoar.so)
    ==325957==  Address 0xb5ebf1a is 0 bytes after a block of size 1,994 alloc'd
    ==325957==    at 0x483AD7B: realloc (vg_replace_malloc.c:836)
    ==325957==    by 0x4A9DFDF: expand.isra.3 (in 
$rakudo/install/lib/libmoar.so)
    ==325957==    by 0x4A9E6F4: bind_pos (in $rakudo/install/lib/libmoar.so)
    ==325957==    by 0x4A2C9AF: MVM_interp_run (in 
$rakudo/install/lib/libmoar.so)
    ==325957==    by 0x4B2CC24: MVM_vm_run_file (in 
$rakudo/install/lib/libmoar.so)
    ==325957==    by 0x109500: main (in $rakudo/install/bin/perl6-m)

This is the NUL byte that happens to be there and terminate our string
correctly, but nothing in the moarvm process has allocated it, because
knowing what is allocated and what isn't is valgrind's job. And if it's
not allocated, then either moarvm routinely writes NULs to memory it
doesn't own or it simply does not automatically insert a NUL after the
storage of every CArray[uint8]. And why would it? I for one would not
expect CArray[uint8] to have special precautions built in for when it's
used to hold a C string.

Why does it work with a string literal in the Raku code? I don't know,
but consider the following variations of the code, with my oldish Rakudo:

  - with $string = "X" x 1994:     valgrind sad
  - with $string = "X" x 4:        valgrind sad
  - with $string = "X" x 3:        valgrind happy
  - with $string = "X" x 2:        valgrind happy
  - with $string a short literal
    like "FOO":                    valgrind happy
  - with $string a literal of
    sufficient length like "FOOO": valgrind sad
  - with $string = "FOO" x 2:      valgrind happy
  - with $string = "FOO" x 200:    valgrind sad

My guess is that if it's sufficiently small and easy, then it is computed
at compile-time and stored somewhere in the bytecode, for which some
serialization routine ensures a trailing NUL byte inside an allocated
region of memory for the process.

That is only a barely informed guesses, but independently of what causes
it to work on short string literals, I strongly believe that this is an
implementation detail and hence I would call the example in the docs
misleading. Appending the ", 0" when constructing the CArray[uint8]
seems like a really neat fix.

Anyway, the most important message of this mail should actually be:
_Normally_, the way you pass a Raku string to a native function is by
using the `is encoded` trait. It's such a nice way offered by the
*language* to tell NativeCall to "do the right thing":

  sub c-printf(Str is encoded<ascii>) is native is symbol<printf> { * }

It encodes and NUL terminates. Far preferable to cooking up your own
CArray[uint8]. As the documentation mentions, you only need to serialize
Raku strings to CArray[uint8] if you have to manage their lifetime beyond
the callee. I highly doubt that the Windows API requires you to do that
on the ordinary.

Regards,
Tobias

-- 
"There's an old saying: Don't change anything... ever!" -- Mr. Monk

Reply via email to