Re: [libvirt] Race condition between qemuDomainCreate and qemuDomainDestroy

2018-04-09 Thread John Ferlan


On 04/09/2018 12:12 PM, Laine Stump wrote:
> On 04/06/2018 12:27 PM, John Ferlan wrote:
>>
>> On 04/03/2018 07:47 AM, Marc Hartmayer wrote:
>>> On Tue, Mar 20, 2018 at 11:25 AM +0100, Marc Hartmayer 
>>>  wrote:
 Hi,

 there is a race condition between 'qemuDomainCreate' and
 'qemuDomainDestroy' causing a NULL pointer segmentation fault when
 accessing priv->monConfig. The race condition can be easily reproduced
 using gdb.

 (gdb) set non-stop on
 # set breakpoint on line 'mon = qemuMonitorOpen(vm, …)'
 (gdb) b qemu_process.c:1799
 # Actually, this second breakpoint is optional but it’s good to see
 where priv->monConfig is set to NULL
 # set breakpoint on line priv->monConfig = NULL;
 (gdb) b qemu_process.c:6589
 (gdb) run
 # continue all threads - just for the case we hit a breakpoint already
 (gdb) c -a

 Now start a domain (that is using QEMU)

 $ virsh start domain

 The first breakpoint will be hit. Now run in a second shell

 $ virsh destroy domain

 The second breakpoint will be hit. Continue the thread where the second
 breakpoint was hit (for this example this is thread 4)

 (gdb) thread apply 4 continue

 Now continue the thread where the first breakpoint was hit.

 => Segmentation fault because of a NULL pointer dereference at
config->value

 Since I'm not very familiar with that part of the code, I wanted to ask
 for your advice.

 Thanks in advance.

 Beste Grüße / Kind regards
Marc Hartmayer

 IBM Deutschland Research & Development GmbH
 Vorsitzende des Aufsichtsrats: Martina Koederitz
 Geschäftsführung: Dirk Wittkopp
 Sitz der Gesellschaft: Böblingen
 Registergericht: Amtsgericht Stuttgart, HRB 243294
>>> Any ideas?
>>>
>> Seeing as no one else has an exact or authoritative answer...
>>
>> qemuDomainCreate{XML|WithFlags} (and a few others) will call
>> qemuProcessBeginJob which calls qemuDomainObjBeginAsyncJob and
>> qemuDomainObjSetAsyncJobMask which IIUC allows QEMU_JOB_DESTROY
>> to be run.
>>
>> The qemuDomainDestroyFlags calls qemuProcessBeginStopJob which calls
>> qemuDomainObjBeginJob (e.g. sync job) using QEMU_JOB_DESTROY, which
>> again IIUC is allowed to happen alongside the Async job because of the
>> mask setting.
>>
>> In the code where you've broken during create, the @vm object lock is
>> dropped allowing destroy to obtain it. So with the perfect timing and
>> storm the window of opportunity does exist that the monConfig could be
>> free'd and thus priv->monConfig set to NULL before or during the create
>> processing uses it in qemuMonitorOpen.  If during the processing, then
>> obviously the config-> would "appear" to be valid, but it may not
>> actually be what was passed.
>>
>> The fix I believe involves using objects for virDomainChrSourceDef
>> rather than VIR_ALLOC'd and VIR_FREE'd memory directly. I've put
>> together a few patches and will post them shortly. Using the patches I
>> don't see a core, but rather the (I believe) expected "error: internal
>> error: qemu unexpectedly closed the monitor"
> 
> Isn't this just a symptom of a wider problem? Your patches fix the
> problem for monConfig, but what about all the other members of the
> domain object that aren't protected with the trappings of virObject?
> Isn't it just as likely that they could fall prey to the same problem?
> 

I don't think it'd be a wider problem as long as the virObjectLock is
held on the object not much can happen. In this case, there was a
perfect storm of sorts where that lock was dropped and we had an allowed
job (DESTROY) processed. That processing messed with one pointer that
was then used in a call when the lock was down.  The @priv pointer used
for monJSON boolean is a @vm pointer and it's only freed if the @vm was
somehow removed - which it couldn't because we have a reference to it
(we only unlock'd and not unref'd).

So could there be more, I hope not, but I never want to say it's
impossible, because someone will open that window of opportunity.

Perhaps this discussion will cause some other poor souls that have had
to walk through that code in the past to chime in!

John

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Race condition between qemuDomainCreate and qemuDomainDestroy

2018-04-09 Thread Laine Stump
On 04/06/2018 12:27 PM, John Ferlan wrote:
>
> On 04/03/2018 07:47 AM, Marc Hartmayer wrote:
>> On Tue, Mar 20, 2018 at 11:25 AM +0100, Marc Hartmayer 
>>  wrote:
>>> Hi,
>>>
>>> there is a race condition between 'qemuDomainCreate' and
>>> 'qemuDomainDestroy' causing a NULL pointer segmentation fault when
>>> accessing priv->monConfig. The race condition can be easily reproduced
>>> using gdb.
>>>
>>> (gdb) set non-stop on
>>> # set breakpoint on line 'mon = qemuMonitorOpen(vm, …)'
>>> (gdb) b qemu_process.c:1799
>>> # Actually, this second breakpoint is optional but it’s good to see
>>> where priv->monConfig is set to NULL
>>> # set breakpoint on line priv->monConfig = NULL;
>>> (gdb) b qemu_process.c:6589
>>> (gdb) run
>>> # continue all threads - just for the case we hit a breakpoint already
>>> (gdb) c -a
>>>
>>> Now start a domain (that is using QEMU)
>>>
>>> $ virsh start domain
>>>
>>> The first breakpoint will be hit. Now run in a second shell
>>>
>>> $ virsh destroy domain
>>>
>>> The second breakpoint will be hit. Continue the thread where the second
>>> breakpoint was hit (for this example this is thread 4)
>>>
>>> (gdb) thread apply 4 continue
>>>
>>> Now continue the thread where the first breakpoint was hit.
>>>
>>> => Segmentation fault because of a NULL pointer dereference at
>>>config->value
>>>
>>> Since I'm not very familiar with that part of the code, I wanted to ask
>>> for your advice.
>>>
>>> Thanks in advance.
>>>
>>> Beste Grüße / Kind regards
>>>Marc Hartmayer
>>>
>>> IBM Deutschland Research & Development GmbH
>>> Vorsitzende des Aufsichtsrats: Martina Koederitz
>>> Geschäftsführung: Dirk Wittkopp
>>> Sitz der Gesellschaft: Böblingen
>>> Registergericht: Amtsgericht Stuttgart, HRB 243294
>> Any ideas?
>>
> Seeing as no one else has an exact or authoritative answer...
>
> qemuDomainCreate{XML|WithFlags} (and a few others) will call
> qemuProcessBeginJob which calls qemuDomainObjBeginAsyncJob and
> qemuDomainObjSetAsyncJobMask which IIUC allows QEMU_JOB_DESTROY
> to be run.
>
> The qemuDomainDestroyFlags calls qemuProcessBeginStopJob which calls
> qemuDomainObjBeginJob (e.g. sync job) using QEMU_JOB_DESTROY, which
> again IIUC is allowed to happen alongside the Async job because of the
> mask setting.
>
> In the code where you've broken during create, the @vm object lock is
> dropped allowing destroy to obtain it. So with the perfect timing and
> storm the window of opportunity does exist that the monConfig could be
> free'd and thus priv->monConfig set to NULL before or during the create
> processing uses it in qemuMonitorOpen.  If during the processing, then
> obviously the config-> would "appear" to be valid, but it may not
> actually be what was passed.
>
> The fix I believe involves using objects for virDomainChrSourceDef
> rather than VIR_ALLOC'd and VIR_FREE'd memory directly. I've put
> together a few patches and will post them shortly. Using the patches I
> don't see a core, but rather the (I believe) expected "error: internal
> error: qemu unexpectedly closed the monitor"

Isn't this just a symptom of a wider problem? Your patches fix the
problem for monConfig, but what about all the other members of the
domain object that aren't protected with the trappings of virObject?
Isn't it just as likely that they could fall prey to the same problem?

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Race condition between qemuDomainCreate and qemuDomainDestroy

2018-04-09 Thread Marc Hartmayer
On Fri, Apr 06, 2018 at 06:27 PM +0200, John Ferlan  wrote:
> On 04/03/2018 07:47 AM, Marc Hartmayer wrote:
>> On Tue, Mar 20, 2018 at 11:25 AM +0100, Marc Hartmayer 
>>  wrote:
>>> Hi,
>>>
>>> there is a race condition between 'qemuDomainCreate' and
>>> 'qemuDomainDestroy' causing a NULL pointer segmentation fault when
>>> accessing priv->monConfig. The race condition can be easily reproduced
>>> using gdb.
>>>
>>> (gdb) set non-stop on
>>> # set breakpoint on line 'mon = qemuMonitorOpen(vm, …)'
>>> (gdb) b qemu_process.c:1799
>>> # Actually, this second breakpoint is optional but it’s good to see
>>> where priv->monConfig is set to NULL
>>> # set breakpoint on line priv->monConfig = NULL;
>>> (gdb) b qemu_process.c:6589
>>> (gdb) run
>>> # continue all threads - just for the case we hit a breakpoint already
>>> (gdb) c -a
>>>
>>> Now start a domain (that is using QEMU)
>>>
>>> $ virsh start domain
>>>
>>> The first breakpoint will be hit. Now run in a second shell
>>>
>>> $ virsh destroy domain
>>>
>>> The second breakpoint will be hit. Continue the thread where the second
>>> breakpoint was hit (for this example this is thread 4)
>>>
>>> (gdb) thread apply 4 continue
>>>
>>> Now continue the thread where the first breakpoint was hit.
>>>
>>> => Segmentation fault because of a NULL pointer dereference at
>>>config->value
>>>
>>> Since I'm not very familiar with that part of the code, I wanted to ask
>>> for your advice.
>>>
>>> Thanks in advance.
>>>
>>> Beste Grüße / Kind regards
>>>Marc Hartmayer
>>>
>>> IBM Deutschland Research & Development GmbH
>>> Vorsitzende des Aufsichtsrats: Martina Koederitz
>>> Geschäftsführung: Dirk Wittkopp
>>> Sitz der Gesellschaft: Böblingen
>>> Registergericht: Amtsgericht Stuttgart, HRB 243294
>> 
>> Any ideas?
>> 
>
> Seeing as no one else has an exact or authoritative answer...
>
> qemuDomainCreate{XML|WithFlags} (and a few others) will call
> qemuProcessBeginJob which calls qemuDomainObjBeginAsyncJob and
> qemuDomainObjSetAsyncJobMask which IIUC allows QEMU_JOB_DESTROY
> to be run.
>
> The qemuDomainDestroyFlags calls qemuProcessBeginStopJob which calls
> qemuDomainObjBeginJob (e.g. sync job) using QEMU_JOB_DESTROY, which
> again IIUC is allowed to happen alongside the Async job because of the
> mask setting.
>
> In the code where you've broken during create, the @vm object lock is
> dropped allowing destroy to obtain it. So with the perfect timing and
> storm the window of opportunity does exist that the monConfig could be
> free'd and thus priv->monConfig set to NULL before or during the create
> processing uses it in qemuMonitorOpen.  If during the processing, then
> obviously the config-> would "appear" to be valid, but it may not
> actually be what was passed.
>
> The fix I believe involves using objects for virDomainChrSourceDef
> rather than VIR_ALLOC'd and VIR_FREE'd memory directly. I've put
> together a few patches and will post them shortly. Using the patches I
> don't see a core, but rather the (I believe) expected "error: internal
> error: qemu unexpectedly closed the monitor"
>
> John
>
>
>

Thanks for the fix!

-- 
Beste Grüße / Kind regards
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Race condition between qemuDomainCreate and qemuDomainDestroy

2018-04-06 Thread John Ferlan


On 04/03/2018 07:47 AM, Marc Hartmayer wrote:
> On Tue, Mar 20, 2018 at 11:25 AM +0100, Marc Hartmayer 
>  wrote:
>> Hi,
>>
>> there is a race condition between 'qemuDomainCreate' and
>> 'qemuDomainDestroy' causing a NULL pointer segmentation fault when
>> accessing priv->monConfig. The race condition can be easily reproduced
>> using gdb.
>>
>> (gdb) set non-stop on
>> # set breakpoint on line 'mon = qemuMonitorOpen(vm, …)'
>> (gdb) b qemu_process.c:1799
>> # Actually, this second breakpoint is optional but it’s good to see
>> where priv->monConfig is set to NULL
>> # set breakpoint on line priv->monConfig = NULL;
>> (gdb) b qemu_process.c:6589
>> (gdb) run
>> # continue all threads - just for the case we hit a breakpoint already
>> (gdb) c -a
>>
>> Now start a domain (that is using QEMU)
>>
>> $ virsh start domain
>>
>> The first breakpoint will be hit. Now run in a second shell
>>
>> $ virsh destroy domain
>>
>> The second breakpoint will be hit. Continue the thread where the second
>> breakpoint was hit (for this example this is thread 4)
>>
>> (gdb) thread apply 4 continue
>>
>> Now continue the thread where the first breakpoint was hit.
>>
>> => Segmentation fault because of a NULL pointer dereference at
>>config->value
>>
>> Since I'm not very familiar with that part of the code, I wanted to ask
>> for your advice.
>>
>> Thanks in advance.
>>
>> Beste Grüße / Kind regards
>>Marc Hartmayer
>>
>> IBM Deutschland Research & Development GmbH
>> Vorsitzende des Aufsichtsrats: Martina Koederitz
>> Geschäftsführung: Dirk Wittkopp
>> Sitz der Gesellschaft: Böblingen
>> Registergericht: Amtsgericht Stuttgart, HRB 243294
> 
> Any ideas?
> 

Seeing as no one else has an exact or authoritative answer...

qemuDomainCreate{XML|WithFlags} (and a few others) will call
qemuProcessBeginJob which calls qemuDomainObjBeginAsyncJob and
qemuDomainObjSetAsyncJobMask which IIUC allows QEMU_JOB_DESTROY
to be run.

The qemuDomainDestroyFlags calls qemuProcessBeginStopJob which calls
qemuDomainObjBeginJob (e.g. sync job) using QEMU_JOB_DESTROY, which
again IIUC is allowed to happen alongside the Async job because of the
mask setting.

In the code where you've broken during create, the @vm object lock is
dropped allowing destroy to obtain it. So with the perfect timing and
storm the window of opportunity does exist that the monConfig could be
free'd and thus priv->monConfig set to NULL before or during the create
processing uses it in qemuMonitorOpen.  If during the processing, then
obviously the config-> would "appear" to be valid, but it may not
actually be what was passed.

The fix I believe involves using objects for virDomainChrSourceDef
rather than VIR_ALLOC'd and VIR_FREE'd memory directly. I've put
together a few patches and will post them shortly. Using the patches I
don't see a core, but rather the (I believe) expected "error: internal
error: qemu unexpectedly closed the monitor"

John



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Race condition between qemuDomainCreate and qemuDomainDestroy

2018-04-03 Thread Marc Hartmayer
On Tue, Mar 20, 2018 at 11:25 AM +0100, Marc Hartmayer 
 wrote:
> Hi,
>
> there is a race condition between 'qemuDomainCreate' and
> 'qemuDomainDestroy' causing a NULL pointer segmentation fault when
> accessing priv->monConfig. The race condition can be easily reproduced
> using gdb.
>
> (gdb) set non-stop on
> # set breakpoint on line 'mon = qemuMonitorOpen(vm, …)'
> (gdb) b qemu_process.c:1799
> # Actually, this second breakpoint is optional but it’s good to see
> where priv->monConfig is set to NULL
> # set breakpoint on line priv->monConfig = NULL;
> (gdb) b qemu_process.c:6589
> (gdb) run
> # continue all threads - just for the case we hit a breakpoint already
> (gdb) c -a
>
> Now start a domain (that is using QEMU)
>
> $ virsh start domain
>
> The first breakpoint will be hit. Now run in a second shell
>
> $ virsh destroy domain
>
> The second breakpoint will be hit. Continue the thread where the second
> breakpoint was hit (for this example this is thread 4)
>
> (gdb) thread apply 4 continue
>
> Now continue the thread where the first breakpoint was hit.
>
> => Segmentation fault because of a NULL pointer dereference at
>config->value
>
> Since I'm not very familiar with that part of the code, I wanted to ask
> for your advice.
>
> Thanks in advance.
>
> Beste Grüße / Kind regards
>Marc Hartmayer
>
> IBM Deutschland Research & Development GmbH
> Vorsitzende des Aufsichtsrats: Martina Koederitz
> Geschäftsführung: Dirk Wittkopp
> Sitz der Gesellschaft: Böblingen
> Registergericht: Amtsgericht Stuttgart, HRB 243294

Any ideas?

---
Beste Grüße / Kind regards
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list