Re: [Libguestfs] [PATCH v1] ldmtool: fix NULL pointer dereference

2023-06-19 Thread Laszlo Ersek
On 6/19/23 16:36, Vincent Mailhol wrote:
> If /sys/block can not be opened, get_devices() returns NULL.
> 
> cmdline() does not check this result and below code snippet:
> 
>   scanned = get_devices();
>   devices = (gchar **) scanned->data;
> 
> results in a segmentation fault.
> 
> Add a check on scanned.
> 
> Relevant logs:
> 
>   Unable to open /sys/block: No such file or directory
>   [0.777352] ldmtool[164]: segfault at 0 ip 563a225cd6a5 sp 
> 7ffe54965a60 error 4 in ldmtool[563a225cb000+3000]
>   [0.778278] Code: 18 64 48 33 1c 25 28 00 00 00 75 5e 48 83 c4 28 5b 5d 
> 41 5c 41 5d 41 5e 41 5f c3 66 2e 0f 1f 84 00 00 00 00 00 e8 db fd ff ff <4c> 
> 8b 20 48 89 44 24 08 4c 89 e7 e8 0b e1 ff ff 45 31 c0 4c 89 e1
> 
> Fixes: 25d9635e4ee5 ("Add ldmtool")
> Signed-off-by: Vincent Mailhol 
> ---
> This thread did not yet show-up in
>   https://listman.redhat.com/archives/libguestfs/2023-June/subject.html
> not sure why.
> 
> For this reason, I couln't add a link reference.
> ---
>  src/ldmtool.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/ldmtool.c b/src/ldmtool.c
> index 6957c1a..87aaccc 100644
> --- a/src/ldmtool.c
> +++ b/src/ldmtool.c
> @@ -746,6 +746,8 @@ cmdline(LDM * const ldm, gchar **devices,
>  GArray * scanned = NULL;
>  if (!devices) {
>  scanned = get_devices();
> +if (!scanned)
> +goto error;
>  devices = (gchar **) scanned->data;
>  }
>  

(I'm reviewing this against commit 5014da5b9071, in repository
.)

This fix is almost right, but not entirely. Consider what we have under
the "error" label:

error:
if (scanned) g_array_unref(scanned);
if (jb) g_object_unref(jb);
return FALSE;

When we jump to "error" on the new branch, "scanned" is is NULL, so that
is safe. However, the

JsonBuilder *jb = NULL;

line has not been reached yet, and that's a problem, because
g_object_unref() will receive an indeterminate value (undefined
behavior). (In fact, just *evaluating* "jb" as the controlling
expression of the "if" that is supposed to protect g_object_unref() is
*already* undefined behavior.)

According to C99 6.8 "Statements and blocks" paragraph 3,

A /block/ allows a set of declarations and statements to be grouped
into one syntactic unit. The initializers of objects that have
automatic storage duration [...] are evaluated and the values are
stored in the objects (including storing an indeterminate value in
objects without an initializer) each time the declaration is reached
in the order of execution, as if it were a statement, and within
each declaration in the order that declarators appear.

Furthermore, C99 6.8.4.2 "The switch statement" paragraph 7 provides the
following example:

In the artificial program fragment

switch (expr)
{
int i = 4;
f(i);
case 0:
i = 17;
/* falls through into default code */
default:
printf("%d\n", i);
}

the object whose identifier is *i* exists with automatic storage
duration (within the block) but is never initialized, and thus if
the controlling expression has a nonzero value, the call to the
*printf* function will access an indeterminate value. Similarly, the
call to the function *f* cannot be reached.

Thus, you can do one of two things (at least):

- *In addition* to the current contents of the patch, move the
declaration of "jb", together with its NULL initializer, to the top of
the function (just below that of "scanned"). This makes sure that the
new jump to "error" will still "pass" a well-defined "jb" value (NULL),
which the logic at the "error" label handles well.

- Alternatively: replace "goto error" with just "return FALSE" in your
patch. At that point, there is nothing to release yet.

Laszlo
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libldm crashes in a linux-sandbox context

2023-06-19 Thread Laszlo Ersek
On 6/19/23 16:32, Vincent Mailhol wrote:
> On Mon 19 June 2023 at 21:16, Laszlo Ersek  wrote:
>> On 6/19/23 13:18, Vincent MAILHOL wrote:
>>> On Fri. 16 juin 2023 at 16:34, Richard W.M. Jones  wrote:
>>> (...)
> Last thing, the segfault on ldmtool [1] still seems a valid issue.
> Even if I now do have a workaround for my problem, that segfault might
> be worth a bit more investigation.

 Yes that does look like a real problem.  Does it crash if you just run
 ldmtool as a normal command, nothing to do with libguestfs?  Might be
 a good idea to try to get a stack trace of the crash.
>>>
>>> The fact is that it only crashes with the UUID 65534 in the qemu VM. I
>>> am not sure what command line is passed to ldmtool for this crash to
>>> occur.
>>>
>>> I can help to gather information, but my biggest issue is that I do
>>> not know how to interact with the VM under /tmp/.guestfs-1001/
>>>
>>>   [0.777352] ldmtool[164]: segfault at 0 ip 563a225cd6a5 sp 
>>> 7ffe54965a60 error 4 in ldmtool[563a225cb000+3000]
>>>  ^^^
>>> This smells like a NULL pointer dereference.
>>
>> ... Hey this is actually my line from an email I started writing earlier
>> today :) , but I then decided not to send it.
>>
>> It certainly looks like a null pointer dereference, and if you
>> disassemble the instruction byte stream dump (the "Code:" line from the
>> kernel log) with (e.g.) ndisasm, that confirms it. You get something like
>>
>> 0025  E8DBFDcall 0xfe05
>> 002A  4C8B20mov r12,[rax]  < crash
>> 002D  4889442408mov [rsp+0x8],rax
>> 0032  4C89E7mov rdi,r12
>> 0035  E80BE1call 0xe145
>>
>> with the "mov r12,[rax]" instruction faulting (with the previously
>> called function presumably having returned 0 in rax). See the "<4c> 8b
>> 20" substring in the "Code:" line -- the angle brackets point at the
>> first byte of the crashing instruction.
>>
>> I didn't send the email ultimately because your email included a link
>> [1] pointing at a particular line number:
>>
>> https://github.com/mdbooth/libldm/blob/master/src/ldmtool.c#L164
>>
>> and so I assumed you actually traced the crash to that line.
>>
>> Is that the case?
>>
>> Or did you perhaps mistake *PID* 164 (from the kernel log) for the line
>> number?
> 
> Yes, two messages back, I misinterpreted the PID (164) as a line
> number. Because that particular line manipulate the result of a
> g_array_index(), it looked coherent with the potential NULL pointer
> dereference. Realizing my mistake, I then started to do a deeper
> addr2line investigation in the previous message. Sorry.
> 
>>> The instruction pointer
>>> being 563a225cd6a5, I installed libguestfs-tools-dbgsym and tried a:
>>>
>>>   addr2line -e /usr/bin/ldmtool 564a892506a5
> 
> 
> Reading my previous message, I do not know where this 564a892506a5
> comes from. I meant 563a225cd6a5 here (and below in gdb).
> 
>>> Results:
>>>
>>>   ??:0
>>>
>>> Without conviction, I also tried in GDB:
>>>
>>>   $ gdb /usr/bin/ldmtool
>>>   (...)
>>>   Reading symbols from /usr/bin/ldmtool...
>>>   Reading symbols from
>>> /usr/lib/debug/.build-id/21/37b4a64903ebe427c242be08b8d496ba570583.debug...
>>>   (gdb) info line *0x564a892506a5
>>>   No line number information available for address 0x564a892506a5
>>>
>>> Debug symbols are correctly installed but impossible to convert that
>>> instruction pointer into a line number. It is as if the ldmtool on my
>>> host and the ldmtool in the qemu VM were from a different build. I
>>> tried to mount /tmp/.guestfs-1001/appliance.d/root but that disk image
>>> did not contain ldmtool.
>>>
>>> I am not sure how to generate a stack trace or a core dump within that
>>> qemu VM. If you can tell me how to get an interactive prompt (or any
>>> other guidance) I can try to collect more information.
>>
>> The IP where the crash occurs is 563a225cd6a5. The ldmtool binary
>> (as opposed to a shared object / library) is mapped into the process's
>> address space at 563a225cb000, for a length of 0x3000 bytes. So the
>> offending instruction is supposed to be 563a225cd6a5 - 563a225cb000
>> = 26A5.
> 
> Thanks for the explanation.
> 
>> With the debug symbols installed, can you attach the output of
>>
>>   objdump --headers --wide -S /usr/bin/ldmtool
>>
>> ?
> 
> Results attached at the bottom of the e-mail.
> 
>> Can you try
>>
>>   addr2line -p -i -f -e /usr/bin/ldmtool 26A5
>>
>> ?
> 
> Unfortunately:
> 
>   $ addr2line -p -i -f -e /usr/bin/ldmtool 26a5
>   ?? ??:0
> 
>> (This still may not be good enough; we might have to offset the
>> difference 0x26A5 with some address related to the .text section... The
>> objdump output should help us experiment.)
> 
> For what it is worth, I loaded the program in GDB:
> 
>   (gdb) break main
>   Breakpoint 

[Libguestfs] [v2v PATCH] test-data/phony-guests: fix prerequisite list of "fedora-luks-on-lvm.img"

2023-06-19 Thread Laszlo Ersek
In the virt-v2v repo, commit 1e75569aa074 ("test-data/phony-guests: Allow
virt-v2v to work against phony Fedora") is an ancestor of commit
e4efe4b7d240 ("tests: add LUKS-on-LVM test"). The latter created a state
where "fedora-static-bin" and LUKS on LVM testing would coexist (i.e.,
where "fedora-static-bin" would be uploaded to the LUKS-on-LVM disk image
as well), but the commit didn't spell out the dependency in
"test-data/phony-guests/Makefile.am".

Do that now.

The problem can be triggered with:

> autoreconf -i
> ./configure
> make
> make -C test-data/phony-guests fedora-luks-on-lvm.img

where the last command fails with

> make: Entering directory '.../test-data/phony-guests'
> SRCDIR=. LAYOUT=luks-on-lvm ../../run --test ./make-fedora-img.pl
> open: fedora-static-bin: No such file or directory at
> .../test-data/phony-guests/make-fedora-img.pl line 373.

(In the guestfs-tools repo, the relative order (the descendancy) between
both commits is the opposite. There, commit 27da4b0c4991 ("inspector: add
LUKS-on-LVM test") came first, and commit eb0ff1859eb6
("test-data/phony-guests: Allow virt-v2v to work against phony Fedora"),
came second. The latter commit, in fact being a port of virt-v2v commit
1e75569aa074, brought together "fedora-static-bin" with "LUKS on LVM"
testing,  and it correctly added "fedora-static-bin" as a pre-requisite
for building "fedora-luks-on-lvm.img".)

Fixes: e4efe4b7d240b66b1d53fbe5a127f4f5966f6903
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168506
Signed-off-by: Laszlo Ersek 
---
 test-data/phony-guests/Makefile.am | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/test-data/phony-guests/Makefile.am 
b/test-data/phony-guests/Makefile.am
index 29dbd4d0f9f2..10c0241b7289 100644
--- a/test-data/phony-guests/Makefile.am
+++ b/test-data/phony-guests/Makefile.am
@@ -103,7 +103,8 @@ fedora-btrfs.img: make-fedora-img.pl \
 # Make a (dummy) Fedora image with LUKS-on-LVM.
 fedora-luks-on-lvm.img: make-fedora-img.pl \
fedora-journal.tar.xz \
-   fedora.db
+   fedora.db \
+   fedora-static-bin
SRCDIR=$(srcdir) LAYOUT=luks-on-lvm $(top_builddir)/run --test ./$<
 
 # Make a (dummy) Fedora image with LVM-on-LUKS.
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



[Libguestfs] [PATCH v1] ldmtool: fix NULL pointer dereference

2023-06-19 Thread Vincent Mailhol
If /sys/block can not be opened, get_devices() returns NULL.

cmdline() does not check this result and below code snippet:

  scanned = get_devices();
  devices = (gchar **) scanned->data;

results in a segmentation fault.

Add a check on scanned.

Relevant logs:

  Unable to open /sys/block: No such file or directory
  [0.777352] ldmtool[164]: segfault at 0 ip 563a225cd6a5 sp 
7ffe54965a60 error 4 in ldmtool[563a225cb000+3000]
  [0.778278] Code: 18 64 48 33 1c 25 28 00 00 00 75 5e 48 83 c4 28 5b 5d 41 
5c 41 5d 41 5e 41 5f c3 66 2e 0f 1f 84 00 00 00 00 00 e8 db fd ff ff <4c> 8b 20 
48 89 44 24 08 4c 89 e7 e8 0b e1 ff ff 45 31 c0 4c 89 e1

Fixes: 25d9635e4ee5 ("Add ldmtool")
Signed-off-by: Vincent Mailhol 
---
This thread did not yet show-up in
  https://listman.redhat.com/archives/libguestfs/2023-June/subject.html
not sure why.

For this reason, I couln't add a link reference.
---
 src/ldmtool.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/ldmtool.c b/src/ldmtool.c
index 6957c1a..87aaccc 100644
--- a/src/ldmtool.c
+++ b/src/ldmtool.c
@@ -746,6 +746,8 @@ cmdline(LDM * const ldm, gchar **devices,
 GArray * scanned = NULL;
 if (!devices) {
 scanned = get_devices();
+if (!scanned)
+goto error;
 devices = (gchar **) scanned->data;
 }
 
-- 
2.25.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libldm crashes in a linux-sandbox context

2023-06-19 Thread Laszlo Ersek
On 6/19/23 13:18, Vincent MAILHOL wrote:
> On Fri. 16 juin 2023 at 16:34, Richard W.M. Jones  wrote:
> (...)
>>> Last thing, the segfault on ldmtool [1] still seems a valid issue.
>>> Even if I now do have a workaround for my problem, that segfault might
>>> be worth a bit more investigation.
>>
>> Yes that does look like a real problem.  Does it crash if you just run
>> ldmtool as a normal command, nothing to do with libguestfs?  Might be
>> a good idea to try to get a stack trace of the crash.
> 
> The fact is that it only crashes with the UUID 65534 in the qemu VM. I
> am not sure what command line is passed to ldmtool for this crash to
> occur.
> 
> I can help to gather information, but my biggest issue is that I do
> not know how to interact with the VM under /tmp/.guestfs-1001/
> 
>   [0.777352] ldmtool[164]: segfault at 0 ip 563a225cd6a5 sp
> 7ffe54965a60 error 4 in ldmtool[563a225cb000+3000]
>  ^^^
> This smells like a NULL pointer dereference.

... Hey this is actually my line from an email I started writing earlier
today :) , but I then decided not to send it.

It certainly looks like a null pointer dereference, and if you
disassemble the instruction byte stream dump (the "Code:" line from the
kernel log) with (e.g.) ndisasm, that confirms it. You get something like

0025  E8DBFDcall 0xfe05
002A  4C8B20mov r12,[rax]  < crash
002D  4889442408mov [rsp+0x8],rax
0032  4C89E7mov rdi,r12
0035  E80BE1call 0xe145

with the "mov r12,[rax]" instruction faulting (with the previously
called function presumably having returned 0 in rax). See the "<4c> 8b
20" substring in the "Code:" line -- the angle brackets point at the
first byte of the crashing instruction.

I didn't send the email ultimately because your email included a link
[1] pointing at a particular line number:

https://github.com/mdbooth/libldm/blob/master/src/ldmtool.c#L164

and so I assumed you actually traced the crash to that line.

Is that the case?

Or did you perhaps mistake *PID* 164 (from the kernel log) for the line
number?

> The instruction pointer
> being 563a225cd6a5, I installed libguestfs-tools-dbgsym and tried a:
> 
>   addr2line -e /usr/bin/ldmtool 564a892506a5
> 
> Results:
> 
>   ??:0
> 
> Without conviction, I also tried in GDB:
> 
>   $ gdb /usr/bin/ldmtool
>   (...)
>   Reading symbols from /usr/bin/ldmtool...
>   Reading symbols from
> /usr/lib/debug/.build-id/21/37b4a64903ebe427c242be08b8d496ba570583.debug...
>   (gdb) info line *0x564a892506a5
>   No line number information available for address 0x564a892506a5
> 
> Debug symbols are correctly installed but impossible to convert that
> instruction pointer into a line number. It is as if the ldmtool on my
> host and the ldmtool in the qemu VM were from a different build. I
> tried to mount /tmp/.guestfs-1001/appliance.d/root but that disk image
> did not contain ldmtool.
> 
> I am not sure how to generate a stack trace or a core dump within that
> qemu VM. If you can tell me how to get an interactive prompt (or any
> other guidance) I can try to collect more information.

The IP where the crash occurs is 563a225cd6a5. The ldmtool binary
(as opposed to a shared object / library) is mapped into the process's
address space at 563a225cb000, for a length of 0x3000 bytes. So the
offending instruction is supposed to be 563a225cd6a5 - 563a225cb000
= 26A5.

With the debug symbols installed, can you attach the output of

  objdump --headers --wide -S /usr/bin/ldmtool

?

Can you try

  addr2line -p -i -f -e /usr/bin/ldmtool 26A5

?

(This still may not be good enough; we might have to offset the
difference 0x26A5 with some address related to the .text section... The
objdump output should help us experiment.)

Laszlo
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [libnbd PATCH v4 1/4] states: Document our reliance on type overlaps

2023-06-19 Thread Laszlo Ersek
On 6/12/23 20:10, Richard W.M. Jones wrote:
> On Fri, Jun 09, 2023 at 03:39:19PM -0500, Eric Blake wrote:
>> [Bah - I typed up a longer response, but lost it when accidentally
>> trying to send through the wrong SMTP server, so now I have to
>> remember what I had...]
>>
>> On Fri, Jun 09, 2023 at 02:45:56PM +0200, Laszlo Ersek wrote:
>>> On 6/9/23 04:17, Eric Blake wrote:
 When I added structured replies to the NBD spec, I intentionally chose
 a wire layout where the magic number and cookie overlap, even while
 the middle member changes from uint32_t error to the pair uint16_t
 flags and type.  Based only on a strict reading of C rules on
 effective types and compatible type prefixes, it's probably
 questionable on whether my reliance on type aliasing to reuse cookie
 from the same offset of a union, or even the fact that a structured
 reply is built by first reading bytes into sbuf.simple_reply then
 following up with only bytes into the tail of sbuf.sr.structured_reply
 is strictly portable.  But since it works in practice, it's worth at
 least adding some compile- and run-time assertions that our (ab)use of
 aliasing is accessing the bytes we want under the types we expect.
 Upcoming patches will restructure part of the sbuf layout to hopefully
 be a little easier to tie back to strict C standards.

 Suggested-by: Laszlo Ersek 
 Signed-off-by: Eric Blake 
 ---
  generator/states-reply.c| 17 +
  generator/states-reply-structured.c | 13 +
  2 files changed, 22 insertions(+), 8 deletions(-)

 diff --git a/generator/states-reply.c b/generator/states-reply.c
 index 511e5cb1..2c77658b 100644
 --- a/generator/states-reply.c
 +++ b/generator/states-reply.c
 @@ -17,6 +17,7 @@
   */

  #include 
 +#include 

  /* State machine for receiving reply messages from the server.
   *
 @@ -63,9 +64,15 @@  REPLY.START:
ssize_t r;

/* We read all replies initially as if they are simple replies, but
 -   * check the magic in CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
 -   * This works because the structured_reply header is larger.
 +   * check the magic in CHECK_SIMPLE_OR_STRUCTURED_REPLY below.  This
 +   * works because the structured_reply header is larger, and because
 +   * the last member of a simple reply, cookie, is coincident between
 +   * the two structs (an intentional design decision in the NBD spec
 +   * when structured replies were added).
 */
 +  STATIC_ASSERT (offsetof (struct nbd_handle, sbuf.simple_reply.cookie) ==
 + offsetof (struct nbd_handle, 
 sbuf.sr.structured_reply.cookie),
 + cookie_aliasing);
>>>
>>> Can you perhaps append
>>>
>>>  ... &&
>>>  sizeof h->sbuf.simple_reply.cookie ==
>>>  sizeof h->sbuf.sr.structured_reply.cookie
>>>
>>> (if you agree)?
>>
>> Yes, that makes sense, and I did so for what got pushed as 29342fedb53
>>
>>>
>>> Also, the commit message and the comment talk about the magic number as
>>> well, not just the cookie, and the static assertion ignores magic.
>>> However, I can see the magic handling changes in the next patch.
>>
>> I was a bit less concerned about magic (it is easy to see that it is
>> at offset 0 in both types and could satisfy the common prefix rules,
>> while seeing cookie's location and a non-common prefix makes the
>> latter more imporant to assert).  But checking two members instead of
>> one shouldn't hurt, and in fact, once extended types are in (plus
>> patch 4/4 of this series also adds an anonymous sub-struct in 'union
>> reply_header' which is also worth validating), it may make sense to do
>> a followup patch that adds:
>>
>> #define ASSERT_MEMBER_OVERLAP(TypeA, memberA, TypeB, memberB) \
>>   STATIC_ASSERT (offsetof (TypeA, memberA) == offsetof (TypeB, memberB) && \
>>  sizeof ((TypeA *)NULL)->memberA == sizeof ((TypeB 
>> *)NULL)->memberB, \
>>  member_overlap)
>>
>> to be used either as:
>>
>> ASSERT_MEMBER_OVERLAP (struct nbd_simple_reply, cookie,
>>struct nbd_structured_reply, cookie);
>>
>> or as
>>
>> ASSERT_MEMBER_OVERLAP (struct nbd_handle, sbuf.simple_reply.magic,
>>struct nbd_handle, sbuf.sr.structured_reply.magic);
> 
> This is a nice idea!
> 
>> Would it make sense to have the macro take only three arguments (since
>> both of those invocations repeat an argument); if so, is it better to
>> share the common type name, or the common member name?
> 
> We can always start with the 3 arg version and change it if we need to
> later.  At the moment I can't think of a reason to check that fields
> in two unrelated types overlap, since you'd presumably always want to
> use them through an actual union type, but I suppose it could happen.

That's a good point!

> 
>> I also note that our 

Re: [Libguestfs] libldm crashes in a linux-sandbox context

2023-06-19 Thread Vincent MAILHOL
On Fri. 16 juin 2023 at 16:34, Richard W.M. Jones  wrote:
(...)
> > Last thing, the segfault on ldmtool [1] still seems a valid issue.
> > Even if I now do have a workaround for my problem, that segfault might
> > be worth a bit more investigation.
>
> Yes that does look like a real problem.  Does it crash if you just run
> ldmtool as a normal command, nothing to do with libguestfs?  Might be
> a good idea to try to get a stack trace of the crash.

The fact is that it only crashes with the UUID 65534 in the qemu VM. I
am not sure what command line is passed to ldmtool for this crash to
occur.

I can help to gather information, but my biggest issue is that I do
not know how to interact with the VM under /tmp/.guestfs-1001/

  [0.777352] ldmtool[164]: segfault at 0 ip 563a225cd6a5 sp
7ffe54965a60 error 4 in ldmtool[563a225cb000+3000]
 ^^^
This smells like a NULL pointer dereference. The instruction pointer
being 563a225cd6a5, I installed libguestfs-tools-dbgsym and tried a:

  addr2line -e /usr/bin/ldmtool 564a892506a5

Results:

  ??:0

Without conviction, I also tried in GDB:

  $ gdb /usr/bin/ldmtool
  (...)
  Reading symbols from /usr/bin/ldmtool...
  Reading symbols from
/usr/lib/debug/.build-id/21/37b4a64903ebe427c242be08b8d496ba570583.debug...
  (gdb) info line *0x564a892506a5
  No line number information available for address 0x564a892506a5

Debug symbols are correctly installed but impossible to convert that
instruction pointer into a line number. It is as if the ldmtool on my
host and the ldmtool in the qemu VM were from a different build. I
tried to mount /tmp/.guestfs-1001/appliance.d/root but that disk image
did not contain ldmtool.

I am not sure how to generate a stack trace or a core dump within that
qemu VM. If you can tell me how to get an interactive prompt (or any
other guidance) I can try to collect more information.


Yours sincerely,
Vincent Mailhol

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] [libnbd PATCH v4 1/4] states: Document our reliance on type overlaps

2023-06-19 Thread Laszlo Ersek
On 6/9/23 22:39, Eric Blake wrote:
> [Bah - I typed up a longer response, but lost it when accidentally
> trying to send through the wrong SMTP server, so now I have to
> remember what I had...]
> 
> On Fri, Jun 09, 2023 at 02:45:56PM +0200, Laszlo Ersek wrote:
>> On 6/9/23 04:17, Eric Blake wrote:
>>> When I added structured replies to the NBD spec, I intentionally chose
>>> a wire layout where the magic number and cookie overlap, even while
>>> the middle member changes from uint32_t error to the pair uint16_t
>>> flags and type.  Based only on a strict reading of C rules on
>>> effective types and compatible type prefixes, it's probably
>>> questionable on whether my reliance on type aliasing to reuse cookie
>>> from the same offset of a union, or even the fact that a structured
>>> reply is built by first reading bytes into sbuf.simple_reply then
>>> following up with only bytes into the tail of sbuf.sr.structured_reply
>>> is strictly portable.  But since it works in practice, it's worth at
>>> least adding some compile- and run-time assertions that our (ab)use of
>>> aliasing is accessing the bytes we want under the types we expect.
>>> Upcoming patches will restructure part of the sbuf layout to hopefully
>>> be a little easier to tie back to strict C standards.
>>>
>>> Suggested-by: Laszlo Ersek 
>>> Signed-off-by: Eric Blake 
>>> ---
>>>  generator/states-reply.c| 17 +
>>>  generator/states-reply-structured.c | 13 +
>>>  2 files changed, 22 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/generator/states-reply.c b/generator/states-reply.c
>>> index 511e5cb1..2c77658b 100644
>>> --- a/generator/states-reply.c
>>> +++ b/generator/states-reply.c
>>> @@ -17,6 +17,7 @@
>>>   */
>>>
>>>  #include 
>>> +#include 
>>>
>>>  /* State machine for receiving reply messages from the server.
>>>   *
>>> @@ -63,9 +64,15 @@  REPLY.START:
>>>ssize_t r;
>>>
>>>/* We read all replies initially as if they are simple replies, but
>>> -   * check the magic in CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
>>> -   * This works because the structured_reply header is larger.
>>> +   * check the magic in CHECK_SIMPLE_OR_STRUCTURED_REPLY below.  This
>>> +   * works because the structured_reply header is larger, and because
>>> +   * the last member of a simple reply, cookie, is coincident between
>>> +   * the two structs (an intentional design decision in the NBD spec
>>> +   * when structured replies were added).
>>> */
>>> +  STATIC_ASSERT (offsetof (struct nbd_handle, sbuf.simple_reply.cookie) ==
>>> + offsetof (struct nbd_handle, 
>>> sbuf.sr.structured_reply.cookie),
>>> + cookie_aliasing);
>>
>> Can you perhaps append
>>
>>  ... &&
>>  sizeof h->sbuf.simple_reply.cookie ==
>>  sizeof h->sbuf.sr.structured_reply.cookie
>>
>> (if you agree)?
> 
> Yes, that makes sense, and I did so for what got pushed as 29342fedb53
> 
>>
>> Also, the commit message and the comment talk about the magic number as
>> well, not just the cookie, and the static assertion ignores magic.
>> However, I can see the magic handling changes in the next patch.
> 
> I was a bit less concerned about magic (it is easy to see that it is
> at offset 0 in both types and could satisfy the common prefix rules,
> while seeing cookie's location and a non-common prefix makes the
> latter more imporant to assert).  But checking two members instead of
> one shouldn't hurt, and in fact, once extended types are in (plus
> patch 4/4 of this series also adds an anonymous sub-struct in 'union
> reply_header' which is also worth validating), it may make sense to do
> a followup patch that adds:
> 
> #define ASSERT_MEMBER_OVERLAP(TypeA, memberA, TypeB, memberB) \
>   STATIC_ASSERT (offsetof (TypeA, memberA) == offsetof (TypeB, memberB) && \
>  sizeof ((TypeA *)NULL)->memberA == sizeof ((TypeB 
> *)NULL)->memberB, \
>  member_overlap)
> 
> to be used either as:
> 
> ASSERT_MEMBER_OVERLAP (struct nbd_simple_reply, cookie,
>struct nbd_structured_reply, cookie);
> 
> or as
> 
> ASSERT_MEMBER_OVERLAP (struct nbd_handle, sbuf.simple_reply.magic,
>struct nbd_handle, sbuf.sr.structured_reply.magic);
> 
> Would it make sense to have the macro take only three arguments (since
> both of those invocations repeat an argument); if so, is it better to
> share the common type name, or the common member name?

Both 4-arg invocations look fine to me, so I wouldn't push for a 3-arg
variant at this time.

> I also note that our "static-assert.h" file defines STATIC_ASSERT() as
> a do/while statement (that is, it MUST appear inside a function body,
> so we can't use it easily in .h files); contrast that with C11's
> _Static_assert() or qemu's QEMU_BUILD_BUG_ON() that behave more as a
> type declaration (and can therefore appear outside of a function body;
> C23 will take it one step further by adding static_assert(expr)
> alongside