Re: RFC: tracking valid backing chain issue

2020-10-22 Thread Nikolay Shirokovskiy



On 21.10.2020 13:56, Kevin Wolf wrote:
> Am 20.10.2020 um 12:29 hat Nikolay Shirokovskiy geschrieben:
>>
>>
>> On 20.10.2020 13:23, Nikolay Shirokovskiy wrote:
>>>
>>>
>>> On 20.10.2020 11:50, Kevin Wolf wrote:
 Am 20.10.2020 um 10:21 hat Nikolay Shirokovskiy geschrieben:
> Hi, all.
>
> I recently found a corner case when it is impossible AFAIK to find out 
> valid
> backing chain after block commit operation. Imagine committing top image. 
> After
> commit ready state pivot is sent and then mgmt crashed. So far so good. 
> Upon
> next start mgmt can either check block job status for non-autodissmised 
> job or
> inspect backing chain to infer was pivot was successful or not in case of 
> older
> qemu.
>
> But imagine after mgmt crash qemu process was destroyed too. In this case 
> there
> is no option to know now what is valid backing chain. Yeah libvirt starts 
> qemu
> process with -no-shutdown flags so process is not destroyed in case of 
> shutdown
> but still process can crash.

 I don't think this is a problem.

 Between completion of the job and finalising it, both the base node and
 the top node are equivalent. You can access either and you'll always get
 the same data.

 So if libvirt didn't save that the job was already completed, it will
 use the old image file, and it's fine. And if libvirt already sent the
 job-finalize command, it will first have saved that the job was
 completed and therefore use the new image, and it's fine, too.
>>>
>>> So finalizing can't fail? Otherwise libvirt can save that job is completed 
>>> and
>>> graph is changed while is was really wasn't
>>
>> Hmm, it is even not the matter of qemu. Libvirt can save that job is 
>> completed
>> and then crash before sending command to finalize to qemu. So after qemu 
>> crash
>> and libvirt start libvirt would think that valid backing chain is without
>> top image which is not true.
> 
> Why not? During this time the top and base image are equally valid to be
> used as the active image.
> 
> If QEMU hadn't switched from top to base yet when it crashed, it's still
> no problem if libvirt does the switch when restarting QEMU.
> 

Now it clear. Thanx for explanation.

Nikolay



Re: RFC: tracking valid backing chain issue

2020-10-21 Thread Kevin Wolf
Am 20.10.2020 um 12:29 hat Nikolay Shirokovskiy geschrieben:
> 
> 
> On 20.10.2020 13:23, Nikolay Shirokovskiy wrote:
> > 
> > 
> > On 20.10.2020 11:50, Kevin Wolf wrote:
> >> Am 20.10.2020 um 10:21 hat Nikolay Shirokovskiy geschrieben:
> >>> Hi, all.
> >>>
> >>> I recently found a corner case when it is impossible AFAIK to find out 
> >>> valid
> >>> backing chain after block commit operation. Imagine committing top image. 
> >>> After
> >>> commit ready state pivot is sent and then mgmt crashed. So far so good. 
> >>> Upon
> >>> next start mgmt can either check block job status for non-autodissmised 
> >>> job or
> >>> inspect backing chain to infer was pivot was successful or not in case of 
> >>> older
> >>> qemu.
> >>>
> >>> But imagine after mgmt crash qemu process was destroyed too. In this case 
> >>> there
> >>> is no option to know now what is valid backing chain. Yeah libvirt starts 
> >>> qemu
> >>> process with -no-shutdown flags so process is not destroyed in case of 
> >>> shutdown
> >>> but still process can crash.
> >>
> >> I don't think this is a problem.
> >>
> >> Between completion of the job and finalising it, both the base node and
> >> the top node are equivalent. You can access either and you'll always get
> >> the same data.
> >>
> >> So if libvirt didn't save that the job was already completed, it will
> >> use the old image file, and it's fine. And if libvirt already sent the
> >> job-finalize command, it will first have saved that the job was
> >> completed and therefore use the new image, and it's fine, too.
> > 
> > So finalizing can't fail? Otherwise libvirt can save that job is completed 
> > and
> > graph is changed while is was really wasn't
> 
> Hmm, it is even not the matter of qemu. Libvirt can save that job is completed
> and then crash before sending command to finalize to qemu. So after qemu crash
> and libvirt start libvirt would think that valid backing chain is without
> top image which is not true.

Why not? During this time the top and base image are equally valid to be
used as the active image.

If QEMU hadn't switched from top to base yet when it crashed, it's still
no problem if libvirt does the switch when restarting QEMU.

Kevin




Re: RFC: tracking valid backing chain issue

2020-10-20 Thread Nikolay Shirokovskiy



On 20.10.2020 13:23, Nikolay Shirokovskiy wrote:
> 
> 
> On 20.10.2020 11:50, Kevin Wolf wrote:
>> Am 20.10.2020 um 10:21 hat Nikolay Shirokovskiy geschrieben:
>>> Hi, all.
>>>
>>> I recently found a corner case when it is impossible AFAIK to find out valid
>>> backing chain after block commit operation. Imagine committing top image. 
>>> After
>>> commit ready state pivot is sent and then mgmt crashed. So far so good. Upon
>>> next start mgmt can either check block job status for non-autodissmised job 
>>> or
>>> inspect backing chain to infer was pivot was successful or not in case of 
>>> older
>>> qemu.
>>>
>>> But imagine after mgmt crash qemu process was destroyed too. In this case 
>>> there
>>> is no option to know now what is valid backing chain. Yeah libvirt starts 
>>> qemu
>>> process with -no-shutdown flags so process is not destroyed in case of 
>>> shutdown
>>> but still process can crash.
>>
>> I don't think this is a problem.
>>
>> Between completion of the job and finalising it, both the base node and
>> the top node are equivalent. You can access either and you'll always get
>> the same data.
>>
>> So if libvirt didn't save that the job was already completed, it will
>> use the old image file, and it's fine. And if libvirt already sent the
>> job-finalize command, it will first have saved that the job was
>> completed and therefore use the new image, and it's fine, too.
> 
> So finalizing can't fail? Otherwise libvirt can save that job is completed and
> graph is changed while is was really wasn't
> 

Hmm, it is even not the matter of qemu. Libvirt can save that job is completed
and then crash before sending command to finalize to qemu. So after qemu crash
and libvirt start libvirt would think that valid backing chain is without
top image which is not true.

>>
>> Kevin
>>
>>> So corner case is very rare. Mgmt crash in a specific short moment and then
>>> qemu crash before mgmt is up again.
>>>
>>> I guess some 'invalidated' flag for image would help. And also qemu itself
>>> could check that mgmt is not trying to run on invalid backing chain based
>>> on this flag.
>>>
>>> Nikolay
>>>
>>



Re: RFC: tracking valid backing chain issue

2020-10-20 Thread Nikolay Shirokovskiy



On 20.10.2020 11:50, Kevin Wolf wrote:
> Am 20.10.2020 um 10:21 hat Nikolay Shirokovskiy geschrieben:
>> Hi, all.
>>
>> I recently found a corner case when it is impossible AFAIK to find out valid
>> backing chain after block commit operation. Imagine committing top image. 
>> After
>> commit ready state pivot is sent and then mgmt crashed. So far so good. Upon
>> next start mgmt can either check block job status for non-autodissmised job 
>> or
>> inspect backing chain to infer was pivot was successful or not in case of 
>> older
>> qemu.
>>
>> But imagine after mgmt crash qemu process was destroyed too. In this case 
>> there
>> is no option to know now what is valid backing chain. Yeah libvirt starts 
>> qemu
>> process with -no-shutdown flags so process is not destroyed in case of 
>> shutdown
>> but still process can crash.
> 
> I don't think this is a problem.
> 
> Between completion of the job and finalising it, both the base node and
> the top node are equivalent. You can access either and you'll always get
> the same data.
> 
> So if libvirt didn't save that the job was already completed, it will
> use the old image file, and it's fine. And if libvirt already sent the
> job-finalize command, it will first have saved that the job was
> completed and therefore use the new image, and it's fine, too.

So finalizing can't fail? Otherwise libvirt can save that job is completed and
graph is changed while is was really wasn't

Nikolay

> 
> Kevin
> 
>> So corner case is very rare. Mgmt crash in a specific short moment and then
>> qemu crash before mgmt is up again.
>>
>> I guess some 'invalidated' flag for image would help. And also qemu itself
>> could check that mgmt is not trying to run on invalid backing chain based
>> on this flag.
>>
>> Nikolay
>>
> 



Re: RFC: tracking valid backing chain issue

2020-10-20 Thread Kevin Wolf
Am 20.10.2020 um 10:21 hat Nikolay Shirokovskiy geschrieben:
> Hi, all.
> 
> I recently found a corner case when it is impossible AFAIK to find out valid
> backing chain after block commit operation. Imagine committing top image. 
> After
> commit ready state pivot is sent and then mgmt crashed. So far so good. Upon
> next start mgmt can either check block job status for non-autodissmised job or
> inspect backing chain to infer was pivot was successful or not in case of 
> older
> qemu.
> 
> But imagine after mgmt crash qemu process was destroyed too. In this case 
> there
> is no option to know now what is valid backing chain. Yeah libvirt starts qemu
> process with -no-shutdown flags so process is not destroyed in case of 
> shutdown
> but still process can crash.

I don't think this is a problem.

Between completion of the job and finalising it, both the base node and
the top node are equivalent. You can access either and you'll always get
the same data.

So if libvirt didn't save that the job was already completed, it will
use the old image file, and it's fine. And if libvirt already sent the
job-finalize command, it will first have saved that the job was
completed and therefore use the new image, and it's fine, too.

Kevin

> So corner case is very rare. Mgmt crash in a specific short moment and then
> qemu crash before mgmt is up again.
> 
> I guess some 'invalidated' flag for image would help. And also qemu itself
> could check that mgmt is not trying to run on invalid backing chain based
> on this flag.
> 
> Nikolay
> 




RFC: tracking valid backing chain issue

2020-10-20 Thread Nikolay Shirokovskiy
Hi, all.

I recently found a corner case when it is impossible AFAIK to find out valid
backing chain after block commit operation. Imagine committing top image. After
commit ready state pivot is sent and then mgmt crashed. So far so good. Upon
next start mgmt can either check block job status for non-autodissmised job or
inspect backing chain to infer was pivot was successful or not in case of older
qemu.

But imagine after mgmt crash qemu process was destroyed too. In this case there
is no option to know now what is valid backing chain. Yeah libvirt starts qemu
process with -no-shutdown flags so process is not destroyed in case of shutdown
but still process can crash.

So corner case is very rare. Mgmt crash in a specific short moment and then
qemu crash before mgmt is up again.

I guess some 'invalidated' flag for image would help. And also qemu itself
could check that mgmt is not trying to run on invalid backing chain based
on this flag.

Nikolay