[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

2018-12-02 Thread Raz Tamir
On Sun, Dec 2, 2018 at 8:51 PM Nir Soffer  wrote:

> On Sun, Dec 2, 2018 at 8:33 PM Gal Ben Haim  wrote:
>
>>
>> In order to not block other patches on CQ, I've sent [1] which will double
>> the amount of space on the ISCSI SD (with the patch it will have 40GB).
>>
> And in addition we need to prioritize the fix for this bug + backport to
4.2.
I'd suggest that after the bug will be fixed, revert this change [1] and in
case everything is back on track for few executions, apply it again and
keep it

>
>> As a side note, we use the same configuration on the master suite, which
>> may explain
>> why we don't see the issue there.
>>
>
> Why did we use different configurations?
>
> Can we extract the configuration to external file that will be shared by
> both master
> and 4.x suites?
>
>
>>
>> [1] https://gerrit.ovirt.org/#/c/95922/
>>
>> On Sun, Dec 2, 2018 at 5:41 PM Gal Ben Haim  wrote:
>>
>>> Below you can find 2 jobs, one that succeeded and the other failed on
>>> the iscsi issue.
>>> Both were triggered by unrelated patches.
>>>
>>> Success -
>>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3546/
>>> Failure -
>>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3544/
>>>
>>>
>>> On Sun, Dec 2, 2018 at 2:37 PM Gal Ben Haim  wrote:
>>>
 Raz, thanks for the investigation.
 I'll send a patch for increasing the luns size.

 On Sun, Dec 2, 2018 at 1:27 PM Nir Soffer  wrote:

> On Sun, Dec 2, 2018, 10:44 Raz Tamir 
>> After some analysis, I think the bug we are seeing here is
>> https://bugzilla.redhat.com/show_bug.cgi?id=1588061
>> This applies for suspend/resume and also for a snapshot with memory.
>> Following the steps and considering that the iscsi storage domain is
>> only 20GB, this should be the reason for reaching ~4GB free space
>>
>
>
> OST configuration should change so it is will not fail because of such
> bugs.
>

 I disagree. the purpose of OST it to catch bugs, not covering them.

>
> Iscsi storage can be created using sparse files, not consuming any
> resources until you write to the lvs, so having 100g storage domain cost
> nothing.
>

 OST use sparse files.

>
> Nir
>
>
>> On Fri, Nov 30, 2018 at 10:01 PM Raz Tamir 
>> wrote:
>>
>>>
>>>
>>> On Fri, Nov 30, 2018, 21:57 Ryan Barry >>


 On Fri, Nov 30, 2018 at 2:31 PM Raz Tamir 
 wrote:

>
>
> On Fri, Nov 30, 2018, 19:33 Dafna Ron 
>> Hi,
>>
>> This mail is to provide the current status of CQ and allow people
>> to review status before and after the weekend.
>> Please refer to below colour map for further information on the
>> meaning of the colours.
>>
>> *CQ-4.2*: RED (#1)
>>
>> I checked last date ovirt-engine and vdsm passed and moved
>> packages to tested as they are the bigger projects and it was on the
>> 27-11-218.
>>
>> We have been having sporadic failures for most of the projects on
>> test check_snapshot_with_memory.
>> We have deducted that this is caused by a code regression in
>> storage based on the following things:
>> 1.Evgheni and Gal helped debug this issue to rule out lago and
>> infra issue as the cause of failure and both determined the issue is 
>> a code
>> regression - most likely in storage.
>> 2. The failure only happens on 4.2 branch.
>> 3. the failure itself is cannot run a vm due to low disk space in
>> storage domain and we cannot see any failures which would leave any
>> leftovers in the storage domain.
>>
> Can you please share the link to the execution?
>

 Here's an example of one run:
 https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3550/

 The iSCSI storage domain starts emitting warnings about low storage
 space immediately after removing the VmPool, but it's possible that the
 storage domain is filling before that from some other call prior to 
 that
 which is still running, possibly the VM import.

>>> Thanks Ryan, I'll try to help with debugging this issue
>>>


>
>> Dan and Ryan are actively involved in trying to find the
>> regression but the consensus is that this is a storage related
>> regression and* we are having a problem getting the storage team
>> to join us in debugging the issue. *
>>
>> I prepared a patch to skip the test in case we cannot get
>> cooperation from storage team and resolve this regression in the 
>> next few
>> days:
>> https://gerrit.ovirt.org/#/c/95889/
>>
>> *CQ-Master:* YELLOW (#1)

[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

2018-12-02 Thread Nir Soffer
On Sun, Dec 2, 2018 at 8:33 PM Gal Ben Haim  wrote:

>
> In order to not block other patches on CQ, I've sent [1] which will double
> the amount of space on the ISCSI SD (with the patch it will have 40GB).
>
> As a side note, we use the same configuration on the master suite, which
> may explain
> why we don't see the issue there.
>

Why did we use different configurations?

Can we extract the configuration to external file that will be shared by
both master
and 4.x suites?


>
> [1] https://gerrit.ovirt.org/#/c/95922/
>
> On Sun, Dec 2, 2018 at 5:41 PM Gal Ben Haim  wrote:
>
>> Below you can find 2 jobs, one that succeeded and the other failed on the
>> iscsi issue.
>> Both were triggered by unrelated patches.
>>
>> Success -
>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3546/
>> Failure -
>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3544/
>>
>>
>> On Sun, Dec 2, 2018 at 2:37 PM Gal Ben Haim  wrote:
>>
>>> Raz, thanks for the investigation.
>>> I'll send a patch for increasing the luns size.
>>>
>>> On Sun, Dec 2, 2018 at 1:27 PM Nir Soffer  wrote:
>>>
 On Sun, Dec 2, 2018, 10:44 Raz Tamir >>>
> After some analysis, I think the bug we are seeing here is
> https://bugzilla.redhat.com/show_bug.cgi?id=1588061
> This applies for suspend/resume and also for a snapshot with memory.
> Following the steps and considering that the iscsi storage domain is
> only 20GB, this should be the reason for reaching ~4GB free space
>


 OST configuration should change so it is will not fail because of such
 bugs.

>>>
>>> I disagree. the purpose of OST it to catch bugs, not covering them.
>>>

 Iscsi storage can be created using sparse files, not consuming any
 resources until you write to the lvs, so having 100g storage domain cost
 nothing.

>>>
>>> OST use sparse files.
>>>

 Nir


> On Fri, Nov 30, 2018 at 10:01 PM Raz Tamir  wrote:
>
>>
>>
>> On Fri, Nov 30, 2018, 21:57 Ryan Barry >
>>>
>>>
>>> On Fri, Nov 30, 2018 at 2:31 PM Raz Tamir 
>>> wrote:
>>>


 On Fri, Nov 30, 2018, 19:33 Dafna Ron >>>
> Hi,
>
> This mail is to provide the current status of CQ and allow people
> to review status before and after the weekend.
> Please refer to below colour map for further information on the
> meaning of the colours.
>
> *CQ-4.2*: RED (#1)
>
> I checked last date ovirt-engine and vdsm passed and moved
> packages to tested as they are the bigger projects and it was on the
> 27-11-218.
>
> We have been having sporadic failures for most of the projects on
> test check_snapshot_with_memory.
> We have deducted that this is caused by a code regression in
> storage based on the following things:
> 1.Evgheni and Gal helped debug this issue to rule out lago and
> infra issue as the cause of failure and both determined the issue is 
> a code
> regression - most likely in storage.
> 2. The failure only happens on 4.2 branch.
> 3. the failure itself is cannot run a vm due to low disk space in
> storage domain and we cannot see any failures which would leave any
> leftovers in the storage domain.
>
 Can you please share the link to the execution?

>>>
>>> Here's an example of one run:
>>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3550/
>>>
>>> The iSCSI storage domain starts emitting warnings about low storage
>>> space immediately after removing the VmPool, but it's possible that the
>>> storage domain is filling before that from some other call prior to that
>>> which is still running, possibly the VM import.
>>>
>> Thanks Ryan, I'll try to help with debugging this issue
>>
>>>
>>>

> Dan and Ryan are actively involved in trying to find the
> regression but the consensus is that this is a storage related
> regression and* we are having a problem getting the storage team
> to join us in debugging the issue. *
>
> I prepared a patch to skip the test in case we cannot get
> cooperation from storage team and resolve this regression in the next 
> few
> days:
> https://gerrit.ovirt.org/#/c/95889/
>
> *CQ-Master:* YELLOW (#1)
>
> We have failures which CQ is still bisecting and until its done we
> cannot point to any specific failing projects.
>
>
> Happy week!
> Dafna
>
>
>
> ---
> COLOUR MAP
>
> Green = job has been passing successfully

[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

2018-12-02 Thread Gal Ben Haim
In order to not block other patches on CQ, I've sent [1] which will double
the amount of space on the ISCSI SD (with the patch it will have 40GB).

As a side note, we use the same configuration on the master suite, which
may explain
why we don't see the issue there.

[1] https://gerrit.ovirt.org/#/c/95922/

On Sun, Dec 2, 2018 at 5:41 PM Gal Ben Haim  wrote:

> Below you can find 2 jobs, one that succeeded and the other failed on the
> iscsi issue.
> Both were triggered by unrelated patches.
>
> Success -
> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3546/
> Failure -
> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3544/
>
>
> On Sun, Dec 2, 2018 at 2:37 PM Gal Ben Haim  wrote:
>
>> Raz, thanks for the investigation.
>> I'll send a patch for increasing the luns size.
>>
>> On Sun, Dec 2, 2018 at 1:27 PM Nir Soffer  wrote:
>>
>>> On Sun, Dec 2, 2018, 10:44 Raz Tamir >>
 After some analysis, I think the bug we are seeing here is
 https://bugzilla.redhat.com/show_bug.cgi?id=1588061
 This applies for suspend/resume and also for a snapshot with memory.
 Following the steps and considering that the iscsi storage domain is
 only 20GB, this should be the reason for reaching ~4GB free space

>>>
>>>
>>> OST configuration should change so it is will not fail because of such
>>> bugs.
>>>
>>
>> I disagree. the purpose of OST it to catch bugs, not covering them.
>>
>>>
>>> Iscsi storage can be created using sparse files, not consuming any
>>> resources until you write to the lvs, so having 100g storage domain cost
>>> nothing.
>>>
>>
>> OST use sparse files.
>>
>>>
>>> Nir
>>>
>>>
 On Fri, Nov 30, 2018 at 10:01 PM Raz Tamir  wrote:

>
>
> On Fri, Nov 30, 2018, 21:57 Ryan Barry 
>>
>>
>> On Fri, Nov 30, 2018 at 2:31 PM Raz Tamir  wrote:
>>
>>>
>>>
>>> On Fri, Nov 30, 2018, 19:33 Dafna Ron >>
 Hi,

 This mail is to provide the current status of CQ and allow people
 to review status before and after the weekend.
 Please refer to below colour map for further information on the
 meaning of the colours.

 *CQ-4.2*: RED (#1)

 I checked last date ovirt-engine and vdsm passed and moved packages
 to tested as they are the bigger projects and it was on the 27-11-218.

 We have been having sporadic failures for most of the projects on
 test check_snapshot_with_memory.
 We have deducted that this is caused by a code regression in
 storage based on the following things:
 1.Evgheni and Gal helped debug this issue to rule out lago and
 infra issue as the cause of failure and both determined the issue is a 
 code
 regression - most likely in storage.
 2. The failure only happens on 4.2 branch.
 3. the failure itself is cannot run a vm due to low disk space in
 storage domain and we cannot see any failures which would leave any
 leftovers in the storage domain.

>>> Can you please share the link to the execution?
>>>
>>
>> Here's an example of one run:
>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3550/
>>
>> The iSCSI storage domain starts emitting warnings about low storage
>> space immediately after removing the VmPool, but it's possible that the
>> storage domain is filling before that from some other call prior to that
>> which is still running, possibly the VM import.
>>
> Thanks Ryan, I'll try to help with debugging this issue
>
>>
>>
>>>
 Dan and Ryan are actively involved in trying to find the regression
 but the consensus is that this is a storage related regression and*
 we are having a problem getting the storage team to join us in 
 debugging
 the issue. *

 I prepared a patch to skip the test in case we cannot get
 cooperation from storage team and resolve this regression in the next 
 few
 days:
 https://gerrit.ovirt.org/#/c/95889/

 *CQ-Master:* YELLOW (#1)

 We have failures which CQ is still bisecting and until its done we
 cannot point to any specific failing projects.


 Happy week!
 Dafna



 ---
 COLOUR MAP

 Green = job has been passing successfully

 ** green for more than 3 days may suggest we need a review of our
 test coverage


1.

1-3 days   GREEN (#1)
2.

4-7 days   GREEN (#2)
3.

Over 7 days GREEN (#3)


 Yellow = intermittent 

[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

2018-12-02 Thread Gal Ben Haim
Below you can find 2 jobs, one that succeeded and the other failed on the
iscsi issue.
Both were triggered by unrelated patches.

Success - https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3546/
Failure - https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3544/


On Sun, Dec 2, 2018 at 2:37 PM Gal Ben Haim  wrote:

> Raz, thanks for the investigation.
> I'll send a patch for increasing the luns size.
>
> On Sun, Dec 2, 2018 at 1:27 PM Nir Soffer  wrote:
>
>> On Sun, Dec 2, 2018, 10:44 Raz Tamir >
>>> After some analysis, I think the bug we are seeing here is
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1588061
>>> This applies for suspend/resume and also for a snapshot with memory.
>>> Following the steps and considering that the iscsi storage domain is
>>> only 20GB, this should be the reason for reaching ~4GB free space
>>>
>>
>>
>> OST configuration should change so it is will not fail because of such
>> bugs.
>>
>
> I disagree. the purpose of OST it to catch bugs, not covering them.
>
>>
>> Iscsi storage can be created using sparse files, not consuming any
>> resources until you write to the lvs, so having 100g storage domain cost
>> nothing.
>>
>
> OST use sparse files.
>
>>
>> Nir
>>
>>
>>> On Fri, Nov 30, 2018 at 10:01 PM Raz Tamir  wrote:
>>>


 On Fri, Nov 30, 2018, 21:57 Ryan Barry >>>
>
>
> On Fri, Nov 30, 2018 at 2:31 PM Raz Tamir  wrote:
>
>>
>>
>> On Fri, Nov 30, 2018, 19:33 Dafna Ron >
>>> Hi,
>>>
>>> This mail is to provide the current status of CQ and allow people to
>>> review status before and after the weekend.
>>> Please refer to below colour map for further information on the
>>> meaning of the colours.
>>>
>>> *CQ-4.2*: RED (#1)
>>>
>>> I checked last date ovirt-engine and vdsm passed and moved packages
>>> to tested as they are the bigger projects and it was on the 27-11-218.
>>>
>>> We have been having sporadic failures for most of the projects on
>>> test check_snapshot_with_memory.
>>> We have deducted that this is caused by a code regression in storage
>>> based on the following things:
>>> 1.Evgheni and Gal helped debug this issue to rule out lago and infra
>>> issue as the cause of failure and both determined the issue is a code
>>> regression - most likely in storage.
>>> 2. The failure only happens on 4.2 branch.
>>> 3. the failure itself is cannot run a vm due to low disk space in
>>> storage domain and we cannot see any failures which would leave any
>>> leftovers in the storage domain.
>>>
>> Can you please share the link to the execution?
>>
>
> Here's an example of one run:
> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3550/
>
> The iSCSI storage domain starts emitting warnings about low storage
> space immediately after removing the VmPool, but it's possible that the
> storage domain is filling before that from some other call prior to that
> which is still running, possibly the VM import.
>
 Thanks Ryan, I'll try to help with debugging this issue

>
>
>>
>>> Dan and Ryan are actively involved in trying to find the regression
>>> but the consensus is that this is a storage related regression and*
>>> we are having a problem getting the storage team to join us in debugging
>>> the issue. *
>>>
>>> I prepared a patch to skip the test in case we cannot get
>>> cooperation from storage team and resolve this regression in the next 
>>> few
>>> days:
>>> https://gerrit.ovirt.org/#/c/95889/
>>>
>>> *CQ-Master:* YELLOW (#1)
>>>
>>> We have failures which CQ is still bisecting and until its done we
>>> cannot point to any specific failing projects.
>>>
>>>
>>> Happy week!
>>> Dafna
>>>
>>>
>>>
>>> ---
>>> COLOUR MAP
>>>
>>> Green = job has been passing successfully
>>>
>>> ** green for more than 3 days may suggest we need a review of our
>>> test coverage
>>>
>>>
>>>1.
>>>
>>>1-3 days   GREEN (#1)
>>>2.
>>>
>>>4-7 days   GREEN (#2)
>>>3.
>>>
>>>Over 7 days GREEN (#3)
>>>
>>>
>>> Yellow = intermittent failures for different projects but no lasting
>>> or current regressions
>>>
>>> ** intermittent would be a healthy project as we expect a number of
>>> failures during the week
>>>
>>> ** I will not report any of the solved failures or regressions.
>>>
>>>
>>>1.
>>>
>>>Solved job failuresYELLOW (#1)
>>>2.
>>>
>>>Solved regressions  YELLOW (#2)
>>>
>>>
>>> Red = job has been failing
>>>
>>> ** Active Failures. The colour will 

[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

2018-12-02 Thread Gal Ben Haim
Raz, thanks for the investigation.
I'll send a patch for increasing the luns size.

On Sun, Dec 2, 2018 at 1:27 PM Nir Soffer  wrote:

> On Sun, Dec 2, 2018, 10:44 Raz Tamir 
>> After some analysis, I think the bug we are seeing here is
>> https://bugzilla.redhat.com/show_bug.cgi?id=1588061
>> This applies for suspend/resume and also for a snapshot with memory.
>> Following the steps and considering that the iscsi storage domain is only
>> 20GB, this should be the reason for reaching ~4GB free space
>>
>
>
> OST configuration should change so it is will not fail because of such
> bugs.
>

I disagree. the purpose of OST it to catch bugs, not covering them.

>
> Iscsi storage can be created using sparse files, not consuming any
> resources until you write to the lvs, so having 100g storage domain cost
> nothing.
>

OST use sparse files.

>
> Nir
>
>
>> On Fri, Nov 30, 2018 at 10:01 PM Raz Tamir  wrote:
>>
>>>
>>>
>>> On Fri, Nov 30, 2018, 21:57 Ryan Barry >>


 On Fri, Nov 30, 2018 at 2:31 PM Raz Tamir  wrote:

>
>
> On Fri, Nov 30, 2018, 19:33 Dafna Ron 
>> Hi,
>>
>> This mail is to provide the current status of CQ and allow people to
>> review status before and after the weekend.
>> Please refer to below colour map for further information on the
>> meaning of the colours.
>>
>> *CQ-4.2*: RED (#1)
>>
>> I checked last date ovirt-engine and vdsm passed and moved packages
>> to tested as they are the bigger projects and it was on the 27-11-218.
>>
>> We have been having sporadic failures for most of the projects on
>> test check_snapshot_with_memory.
>> We have deducted that this is caused by a code regression in storage
>> based on the following things:
>> 1.Evgheni and Gal helped debug this issue to rule out lago and infra
>> issue as the cause of failure and both determined the issue is a code
>> regression - most likely in storage.
>> 2. The failure only happens on 4.2 branch.
>> 3. the failure itself is cannot run a vm due to low disk space in
>> storage domain and we cannot see any failures which would leave any
>> leftovers in the storage domain.
>>
> Can you please share the link to the execution?
>

 Here's an example of one run:
 https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3550/

 The iSCSI storage domain starts emitting warnings about low storage
 space immediately after removing the VmPool, but it's possible that the
 storage domain is filling before that from some other call prior to that
 which is still running, possibly the VM import.

>>> Thanks Ryan, I'll try to help with debugging this issue
>>>


>
>> Dan and Ryan are actively involved in trying to find the regression
>> but the consensus is that this is a storage related regression and*
>> we are having a problem getting the storage team to join us in debugging
>> the issue. *
>>
>> I prepared a patch to skip the test in case we cannot get cooperation
>> from storage team and resolve this regression in the next few days:
>> https://gerrit.ovirt.org/#/c/95889/
>>
>> *CQ-Master:* YELLOW (#1)
>>
>> We have failures which CQ is still bisecting and until its done we
>> cannot point to any specific failing projects.
>>
>>
>> Happy week!
>> Dafna
>>
>>
>>
>> ---
>> COLOUR MAP
>>
>> Green = job has been passing successfully
>>
>> ** green for more than 3 days may suggest we need a review of our
>> test coverage
>>
>>
>>1.
>>
>>1-3 days   GREEN (#1)
>>2.
>>
>>4-7 days   GREEN (#2)
>>3.
>>
>>Over 7 days GREEN (#3)
>>
>>
>> Yellow = intermittent failures for different projects but no lasting
>> or current regressions
>>
>> ** intermittent would be a healthy project as we expect a number of
>> failures during the week
>>
>> ** I will not report any of the solved failures or regressions.
>>
>>
>>1.
>>
>>Solved job failuresYELLOW (#1)
>>2.
>>
>>Solved regressions  YELLOW (#2)
>>
>>
>> Red = job has been failing
>>
>> ** Active Failures. The colour will change based on the amount of
>> time the project/s has been broken. Only active regressions would be
>> reported.
>>
>>
>>1.
>>
>>1-3 days  RED (#1)
>>2.
>>
>>4-7 days  RED (#2)
>>3.
>>
>>Over 7 days RED (#3)
>>
>>
>>

 --

 Ryan Barry

 Associate Manager - RHV Virt/SLA

 rba...@redhat.comM: +16518159306 IM: rbarry
 

>>>
>>
>> --
>>
>>
>> 

[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

2018-12-02 Thread Eyal Edri
On Fri, Nov 30, 2018 at 9:25 PM Ryan Barry  wrote:

>
>
> On Fri, Nov 30, 2018 at 2:18 PM Dan Kenigsberg  wrote:
>
>>
>>
>> On Fri, 30 Nov 2018, 19:33 Dafna Ron >
>>> Hi,
>>>
>>> This mail is to provide the current status of CQ and allow people to
>>> review status before and after the weekend.
>>> Please refer to below colour map for further information on the meaning
>>> of the colours.
>>>
>>> *CQ-4.2*: RED (#1)
>>>
>>> I checked last date ovirt-engine and vdsm passed and moved packages to
>>> tested as they are the bigger projects and it was on the 27-11-218.
>>>
>>> We have been having sporadic failures for most of the projects on test
>>> check_snapshot_with_memory.
>>> We have deducted that this is caused by a code regression in storage
>>> based on the following things:
>>> 1.Evgheni and Gal helped debug this issue to rule out lago and infra
>>> issue as the cause of failure and both determined the issue is a code
>>> regression - most likely in storage.
>>> 2. The failure only happens on 4.2 branch.
>>> 3. the failure itself is cannot run a vm due to low disk space in
>>> storage domain and we cannot see any failures which would leave any
>>> leftovers in the storage domain.
>>>
>>> Dan and Ryan are actively
>>>
>>
>> Actually,  my involvement was a misguided attempt to solve another 4.2
>> failure that I thought that I've seen.
>>
>> involved
>>
>>> in trying to find the regression but the consensus is that this is a
>>> storage related regression and* we are having a problem getting the
>>> storage team to join us in debugging the issue. *
>>>
>>> I prepared a patch to skip the test in case we cannot get cooperation
>>> from storage team and resolve this regression in the next few days:
>>> https://gerrit.ovirt.org/#/c/95889/
>>>
>>
>> Why do you consider this? Are we considering a release of 4.2 without
>> live snapshot?
>>
>
> No, we aren't.
>
>
>> Please do not merge it without an ack from Tal and Ryan.
>>
>
> Until we can bisect it, have you considered simply making a larger iSCSI
> volume so OST stops failing there? I know it's an additional burden on
> Infra's resources, and it's hopefully something we can revert later, but
> it's likely to make OST pass for now so we can identify if/where other
> failures are before we discover that even disabling this test (which I'm
> against) doesn't make OST pass and we've lost a good bisection point.
>

I think this was tried already but its probably won't solve the issue, see
a suggested patch by Dan: https://gerrit.ovirt.org/#/c/95712/


>
>>
>>
>>> *CQ-Master:* YELLOW (#1)
>>>
>>> We have failures which CQ is still bisecting and until its done we
>>> cannot point to any specific failing projects.
>>>
>>>
>>> Happy week!
>>> Dafna
>>>
>>>
>>>
>>> ---
>>> COLOUR MAP
>>>
>>> Green = job has been passing successfully
>>>
>>> ** green for more than 3 days may suggest we need a review of our test
>>> coverage
>>>
>>>
>>>1.
>>>
>>>1-3 days   GREEN (#1)
>>>2.
>>>
>>>4-7 days   GREEN (#2)
>>>3.
>>>
>>>Over 7 days GREEN (#3)
>>>
>>>
>>> Yellow = intermittent failures for different projects but no lasting or
>>> current regressions
>>>
>>> ** intermittent would be a healthy project as we expect a number of
>>> failures during the week
>>>
>>> ** I will not report any of the solved failures or regressions.
>>>
>>>
>>>1.
>>>
>>>Solved job failuresYELLOW (#1)
>>>2.
>>>
>>>Solved regressions  YELLOW (#2)
>>>
>>>
>>> Red = job has been failing
>>>
>>> ** Active Failures. The colour will change based on the amount of time
>>> the project/s has been broken. Only active regressions would be reported.
>>>
>>>
>>>1.
>>>
>>>1-3 days  RED (#1)
>>>2.
>>>
>>>4-7 days  RED (#2)
>>>3.
>>>
>>>Over 7 days RED (#3)
>>>
>>>
>>>
>
> --
>
> Ryan Barry
>
> Associate Manager - RHV Virt/SLA
>
> rba...@redhat.comM: +16518159306 IM: rbarry
> 
>


-- 

Eyal edri


MANAGER

RHV/CNV DevOps

EMEA VIRTUALIZATION R


Red Hat EMEA 
 TRIED. TESTED. TRUSTED. 
phone: +972-9-7692018
irc: eedri (on #tlv #rhev-dev #rhev-integ)
___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/Y3VVB6M4IANZCL54W2GAVB373YYII5V7/


[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

2018-12-02 Thread Nir Soffer
On Sun, Dec 2, 2018, 10:44 Raz Tamir  After some analysis, I think the bug we are seeing here is
> https://bugzilla.redhat.com/show_bug.cgi?id=1588061
> This applies for suspend/resume and also for a snapshot with memory.
> Following the steps and considering that the iscsi storage domain is only
> 20GB, this should be the reason for reaching ~4GB free space
>


OST configuration should change so it is will not fail because of such bugs.

Iscsi storage can be created using sparse files, not consuming any
resources until you write to the lvs, so having 100g storage domain cost
nothing.

Nir


> On Fri, Nov 30, 2018 at 10:01 PM Raz Tamir  wrote:
>
>>
>>
>> On Fri, Nov 30, 2018, 21:57 Ryan Barry >
>>>
>>>
>>> On Fri, Nov 30, 2018 at 2:31 PM Raz Tamir  wrote:
>>>


 On Fri, Nov 30, 2018, 19:33 Dafna Ron >>>
> Hi,
>
> This mail is to provide the current status of CQ and allow people to
> review status before and after the weekend.
> Please refer to below colour map for further information on the
> meaning of the colours.
>
> *CQ-4.2*: RED (#1)
>
> I checked last date ovirt-engine and vdsm passed and moved packages to
> tested as they are the bigger projects and it was on the 27-11-218.
>
> We have been having sporadic failures for most of the projects on test
> check_snapshot_with_memory.
> We have deducted that this is caused by a code regression in storage
> based on the following things:
> 1.Evgheni and Gal helped debug this issue to rule out lago and infra
> issue as the cause of failure and both determined the issue is a code
> regression - most likely in storage.
> 2. The failure only happens on 4.2 branch.
> 3. the failure itself is cannot run a vm due to low disk space in
> storage domain and we cannot see any failures which would leave any
> leftovers in the storage domain.
>
 Can you please share the link to the execution?

>>>
>>> Here's an example of one run:
>>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3550/
>>>
>>> The iSCSI storage domain starts emitting warnings about low storage
>>> space immediately after removing the VmPool, but it's possible that the
>>> storage domain is filling before that from some other call prior to that
>>> which is still running, possibly the VM import.
>>>
>> Thanks Ryan, I'll try to help with debugging this issue
>>
>>>
>>>

> Dan and Ryan are actively involved in trying to find the regression
> but the consensus is that this is a storage related regression and*
> we are having a problem getting the storage team to join us in debugging
> the issue. *
>
> I prepared a patch to skip the test in case we cannot get cooperation
> from storage team and resolve this regression in the next few days:
> https://gerrit.ovirt.org/#/c/95889/
>
> *CQ-Master:* YELLOW (#1)
>
> We have failures which CQ is still bisecting and until its done we
> cannot point to any specific failing projects.
>
>
> Happy week!
> Dafna
>
>
>
> ---
> COLOUR MAP
>
> Green = job has been passing successfully
>
> ** green for more than 3 days may suggest we need a review of our test
> coverage
>
>
>1.
>
>1-3 days   GREEN (#1)
>2.
>
>4-7 days   GREEN (#2)
>3.
>
>Over 7 days GREEN (#3)
>
>
> Yellow = intermittent failures for different projects but no lasting
> or current regressions
>
> ** intermittent would be a healthy project as we expect a number of
> failures during the week
>
> ** I will not report any of the solved failures or regressions.
>
>
>1.
>
>Solved job failuresYELLOW (#1)
>2.
>
>Solved regressions  YELLOW (#2)
>
>
> Red = job has been failing
>
> ** Active Failures. The colour will change based on the amount of time
> the project/s has been broken. Only active regressions would be reported.
>
>
>1.
>
>1-3 days  RED (#1)
>2.
>
>4-7 days  RED (#2)
>3.
>
>Over 7 days RED (#3)
>
>
>
>>>
>>> --
>>>
>>> Ryan Barry
>>>
>>> Associate Manager - RHV Virt/SLA
>>>
>>> rba...@redhat.comM: +16518159306 IM: rbarry
>>> 
>>>
>>
>
> --
>
>
> Raz Tamir
> Manager, RHV QE
> ___
> Devel mailing list -- devel@ovirt.org
> To unsubscribe send an email to devel-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> 

[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

2018-12-02 Thread Raz Tamir
On Fri, Nov 30, 2018, 21:57 Ryan Barry 
>
> On Fri, Nov 30, 2018 at 2:31 PM Raz Tamir  wrote:
>
>>
>>
>> On Fri, Nov 30, 2018, 19:33 Dafna Ron >
>>> Hi,
>>>
>>> This mail is to provide the current status of CQ and allow people to
>>> review status before and after the weekend.
>>> Please refer to below colour map for further information on the meaning
>>> of the colours.
>>>
>>> *CQ-4.2*: RED (#1)
>>>
>>> I checked last date ovirt-engine and vdsm passed and moved packages to
>>> tested as they are the bigger projects and it was on the 27-11-218.
>>>
>>> We have been having sporadic failures for most of the projects on test
>>> check_snapshot_with_memory.
>>> We have deducted that this is caused by a code regression in storage
>>> based on the following things:
>>> 1.Evgheni and Gal helped debug this issue to rule out lago and infra
>>> issue as the cause of failure and both determined the issue is a code
>>> regression - most likely in storage.
>>> 2. The failure only happens on 4.2 branch.
>>> 3. the failure itself is cannot run a vm due to low disk space in
>>> storage domain and we cannot see any failures which would leave any
>>> leftovers in the storage domain.
>>>
>> Can you please share the link to the execution?
>>
>
> Here's an example of one run:
> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3550/
>
> The iSCSI storage domain starts emitting warnings about low storage space
> immediately after removing the VmPool, but it's possible that the storage
> domain is filling before that from some other call prior to that which is
> still running, possibly the VM import.
>
Thanks Ryan, I'll try to help with debugging this issue

>
>
>>
>>> Dan and Ryan are actively involved in trying to find the regression but
>>> the consensus is that this is a storage related regression and* we are
>>> having a problem getting the storage team to join us in debugging the
>>> issue. *
>>>
>>> I prepared a patch to skip the test in case we cannot get cooperation
>>> from storage team and resolve this regression in the next few days:
>>> https://gerrit.ovirt.org/#/c/95889/
>>>
>>> *CQ-Master:* YELLOW (#1)
>>>
>>> We have failures which CQ is still bisecting and until its done we
>>> cannot point to any specific failing projects.
>>>
>>>
>>> Happy week!
>>> Dafna
>>>
>>>
>>>
>>> ---
>>> COLOUR MAP
>>>
>>> Green = job has been passing successfully
>>>
>>> ** green for more than 3 days may suggest we need a review of our test
>>> coverage
>>>
>>>
>>>1.
>>>
>>>1-3 days   GREEN (#1)
>>>2.
>>>
>>>4-7 days   GREEN (#2)
>>>3.
>>>
>>>Over 7 days GREEN (#3)
>>>
>>>
>>> Yellow = intermittent failures for different projects but no lasting or
>>> current regressions
>>>
>>> ** intermittent would be a healthy project as we expect a number of
>>> failures during the week
>>>
>>> ** I will not report any of the solved failures or regressions.
>>>
>>>
>>>1.
>>>
>>>Solved job failuresYELLOW (#1)
>>>2.
>>>
>>>Solved regressions  YELLOW (#2)
>>>
>>>
>>> Red = job has been failing
>>>
>>> ** Active Failures. The colour will change based on the amount of time
>>> the project/s has been broken. Only active regressions would be reported.
>>>
>>>
>>>1.
>>>
>>>1-3 days  RED (#1)
>>>2.
>>>
>>>4-7 days  RED (#2)
>>>3.
>>>
>>>Over 7 days RED (#3)
>>>
>>>
>>>
>
> --
>
> Ryan Barry
>
> Associate Manager - RHV Virt/SLA
>
> rba...@redhat.comM: +16518159306 IM: rbarry
> 
>
___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/TN43WNRWOOZZA4G4S5MV77ANBQT7PQNK/


[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

2018-12-02 Thread Raz Tamir
After some analysis, I think the bug we are seeing here is
https://bugzilla.redhat.com/show_bug.cgi?id=1588061
This applies for suspend/resume and also for a snapshot with memory.
Following the steps and considering that the iscsi storage domain is only
20GB, this should be the reason for reaching ~4GB free space

On Fri, Nov 30, 2018 at 10:01 PM Raz Tamir  wrote:

>
>
> On Fri, Nov 30, 2018, 21:57 Ryan Barry 
>>
>>
>> On Fri, Nov 30, 2018 at 2:31 PM Raz Tamir  wrote:
>>
>>>
>>>
>>> On Fri, Nov 30, 2018, 19:33 Dafna Ron >>
 Hi,

 This mail is to provide the current status of CQ and allow people to
 review status before and after the weekend.
 Please refer to below colour map for further information on the meaning
 of the colours.

 *CQ-4.2*: RED (#1)

 I checked last date ovirt-engine and vdsm passed and moved packages to
 tested as they are the bigger projects and it was on the 27-11-218.

 We have been having sporadic failures for most of the projects on test
 check_snapshot_with_memory.
 We have deducted that this is caused by a code regression in storage
 based on the following things:
 1.Evgheni and Gal helped debug this issue to rule out lago and infra
 issue as the cause of failure and both determined the issue is a code
 regression - most likely in storage.
 2. The failure only happens on 4.2 branch.
 3. the failure itself is cannot run a vm due to low disk space in
 storage domain and we cannot see any failures which would leave any
 leftovers in the storage domain.

>>> Can you please share the link to the execution?
>>>
>>
>> Here's an example of one run:
>> https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3550/
>>
>> The iSCSI storage domain starts emitting warnings about low storage space
>> immediately after removing the VmPool, but it's possible that the storage
>> domain is filling before that from some other call prior to that which is
>> still running, possibly the VM import.
>>
> Thanks Ryan, I'll try to help with debugging this issue
>
>>
>>
>>>
 Dan and Ryan are actively involved in trying to find the regression but
 the consensus is that this is a storage related regression and* we are
 having a problem getting the storage team to join us in debugging the
 issue. *

 I prepared a patch to skip the test in case we cannot get cooperation
 from storage team and resolve this regression in the next few days:
 https://gerrit.ovirt.org/#/c/95889/

 *CQ-Master:* YELLOW (#1)

 We have failures which CQ is still bisecting and until its done we
 cannot point to any specific failing projects.


 Happy week!
 Dafna



 ---
 COLOUR MAP

 Green = job has been passing successfully

 ** green for more than 3 days may suggest we need a review of our test
 coverage


1.

1-3 days   GREEN (#1)
2.

4-7 days   GREEN (#2)
3.

Over 7 days GREEN (#3)


 Yellow = intermittent failures for different projects but no lasting or
 current regressions

 ** intermittent would be a healthy project as we expect a number of
 failures during the week

 ** I will not report any of the solved failures or regressions.


1.

Solved job failuresYELLOW (#1)
2.

Solved regressions  YELLOW (#2)


 Red = job has been failing

 ** Active Failures. The colour will change based on the amount of time
 the project/s has been broken. Only active regressions would be reported.


1.

1-3 days  RED (#1)
2.

4-7 days  RED (#2)
3.

Over 7 days RED (#3)



>>
>> --
>>
>> Ryan Barry
>>
>> Associate Manager - RHV Virt/SLA
>>
>> rba...@redhat.comM: +16518159306 IM: rbarry
>> 
>>
>

-- 


Raz Tamir
Manager, RHV QE
___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/6EFAA4LR743GLDGGNVCK2PEOHL7USLB7/


[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

2018-12-02 Thread Raz Tamir
On Fri, Nov 30, 2018, 19:33 Dafna Ron  Hi,
>
> This mail is to provide the current status of CQ and allow people to
> review status before and after the weekend.
> Please refer to below colour map for further information on the meaning of
> the colours.
>
> *CQ-4.2*: RED (#1)
>
> I checked last date ovirt-engine and vdsm passed and moved packages to
> tested as they are the bigger projects and it was on the 27-11-218.
>
> We have been having sporadic failures for most of the projects on test
> check_snapshot_with_memory.
> We have deducted that this is caused by a code regression in storage based
> on the following things:
> 1.Evgheni and Gal helped debug this issue to rule out lago and infra issue
> as the cause of failure and both determined the issue is a code regression
> - most likely in storage.
> 2. The failure only happens on 4.2 branch.
> 3. the failure itself is cannot run a vm due to low disk space in storage
> domain and we cannot see any failures which would leave any leftovers in
> the storage domain.
>
Can you please share the link to the execution?

>
> Dan and Ryan are actively involved in trying to find the regression but
> the consensus is that this is a storage related regression and* we are
> having a problem getting the storage team to join us in debugging the
> issue. *
>
> I prepared a patch to skip the test in case we cannot get cooperation from
> storage team and resolve this regression in the next few days:
> https://gerrit.ovirt.org/#/c/95889/
>
> *CQ-Master:* YELLOW (#1)
>
> We have failures which CQ is still bisecting and until its done we cannot
> point to any specific failing projects.
>
>
> Happy week!
> Dafna
>
>
>
> ---
> COLOUR MAP
>
> Green = job has been passing successfully
>
> ** green for more than 3 days may suggest we need a review of our test
> coverage
>
>
>1.
>
>1-3 days   GREEN (#1)
>2.
>
>4-7 days   GREEN (#2)
>3.
>
>Over 7 days GREEN (#3)
>
>
> Yellow = intermittent failures for different projects but no lasting or
> current regressions
>
> ** intermittent would be a healthy project as we expect a number of
> failures during the week
>
> ** I will not report any of the solved failures or regressions.
>
>
>1.
>
>Solved job failuresYELLOW (#1)
>2.
>
>Solved regressions  YELLOW (#2)
>
>
> Red = job has been failing
>
> ** Active Failures. The colour will change based on the amount of time the
> project/s has been broken. Only active regressions would be reported.
>
>
>1.
>
>1-3 days  RED (#1)
>2.
>
>4-7 days  RED (#2)
>3.
>
>Over 7 days RED (#3)
>
>
>
___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/3UNYDMCVOUOCL4DGSSIQROOGRBENHZYF/


[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

2018-11-30 Thread Ryan Barry
On Fri, Nov 30, 2018 at 2:31 PM Raz Tamir  wrote:

>
>
> On Fri, Nov 30, 2018, 19:33 Dafna Ron 
>> Hi,
>>
>> This mail is to provide the current status of CQ and allow people to
>> review status before and after the weekend.
>> Please refer to below colour map for further information on the meaning
>> of the colours.
>>
>> *CQ-4.2*: RED (#1)
>>
>> I checked last date ovirt-engine and vdsm passed and moved packages to
>> tested as they are the bigger projects and it was on the 27-11-218.
>>
>> We have been having sporadic failures for most of the projects on test
>> check_snapshot_with_memory.
>> We have deducted that this is caused by a code regression in storage
>> based on the following things:
>> 1.Evgheni and Gal helped debug this issue to rule out lago and infra
>> issue as the cause of failure and both determined the issue is a code
>> regression - most likely in storage.
>> 2. The failure only happens on 4.2 branch.
>> 3. the failure itself is cannot run a vm due to low disk space in storage
>> domain and we cannot see any failures which would leave any leftovers in
>> the storage domain.
>>
> Can you please share the link to the execution?
>

Here's an example of one run:
https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3550/

The iSCSI storage domain starts emitting warnings about low storage space
immediately after removing the VmPool, but it's possible that the storage
domain is filling before that from some other call prior to that which is
still running, possibly the VM import.


>
>> Dan and Ryan are actively involved in trying to find the regression but
>> the consensus is that this is a storage related regression and* we are
>> having a problem getting the storage team to join us in debugging the
>> issue. *
>>
>> I prepared a patch to skip the test in case we cannot get cooperation
>> from storage team and resolve this regression in the next few days:
>> https://gerrit.ovirt.org/#/c/95889/
>>
>> *CQ-Master:* YELLOW (#1)
>>
>> We have failures which CQ is still bisecting and until its done we cannot
>> point to any specific failing projects.
>>
>>
>> Happy week!
>> Dafna
>>
>>
>>
>> ---
>> COLOUR MAP
>>
>> Green = job has been passing successfully
>>
>> ** green for more than 3 days may suggest we need a review of our test
>> coverage
>>
>>
>>1.
>>
>>1-3 days   GREEN (#1)
>>2.
>>
>>4-7 days   GREEN (#2)
>>3.
>>
>>Over 7 days GREEN (#3)
>>
>>
>> Yellow = intermittent failures for different projects but no lasting or
>> current regressions
>>
>> ** intermittent would be a healthy project as we expect a number of
>> failures during the week
>>
>> ** I will not report any of the solved failures or regressions.
>>
>>
>>1.
>>
>>Solved job failuresYELLOW (#1)
>>2.
>>
>>Solved regressions  YELLOW (#2)
>>
>>
>> Red = job has been failing
>>
>> ** Active Failures. The colour will change based on the amount of time
>> the project/s has been broken. Only active regressions would be reported.
>>
>>
>>1.
>>
>>1-3 days  RED (#1)
>>2.
>>
>>4-7 days  RED (#2)
>>3.
>>
>>Over 7 days RED (#3)
>>
>>
>>

-- 

Ryan Barry

Associate Manager - RHV Virt/SLA

rba...@redhat.comM: +16518159306 IM: rbarry

___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/JDGY7GLNHBNYGDYQCI7KXHAXOUXM2AQM/


[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

2018-11-30 Thread Ryan Barry
On Fri, Nov 30, 2018 at 2:18 PM Dan Kenigsberg  wrote:

>
>
> On Fri, 30 Nov 2018, 19:33 Dafna Ron 
>> Hi,
>>
>> This mail is to provide the current status of CQ and allow people to
>> review status before and after the weekend.
>> Please refer to below colour map for further information on the meaning
>> of the colours.
>>
>> *CQ-4.2*: RED (#1)
>>
>> I checked last date ovirt-engine and vdsm passed and moved packages to
>> tested as they are the bigger projects and it was on the 27-11-218.
>>
>> We have been having sporadic failures for most of the projects on test
>> check_snapshot_with_memory.
>> We have deducted that this is caused by a code regression in storage
>> based on the following things:
>> 1.Evgheni and Gal helped debug this issue to rule out lago and infra
>> issue as the cause of failure and both determined the issue is a code
>> regression - most likely in storage.
>> 2. The failure only happens on 4.2 branch.
>> 3. the failure itself is cannot run a vm due to low disk space in storage
>> domain and we cannot see any failures which would leave any leftovers in
>> the storage domain.
>>
>> Dan and Ryan are actively
>>
>
> Actually,  my involvement was a misguided attempt to solve another 4.2
> failure that I thought that I've seen.
>
> involved
>
>> in trying to find the regression but the consensus is that this is a
>> storage related regression and* we are having a problem getting the
>> storage team to join us in debugging the issue. *
>>
>> I prepared a patch to skip the test in case we cannot get cooperation
>> from storage team and resolve this regression in the next few days:
>> https://gerrit.ovirt.org/#/c/95889/
>>
>
> Why do you consider this? Are we considering a release of 4.2 without live
> snapshot?
>

No, we aren't.


> Please do not merge it without an ack from Tal and Ryan.
>

Until we can bisect it, have you considered simply making a larger iSCSI
volume so OST stops failing there? I know it's an additional burden on
Infra's resources, and it's hopefully something we can revert later, but
it's likely to make OST pass for now so we can identify if/where other
failures are before we discover that even disabling this test (which I'm
against) doesn't make OST pass and we've lost a good bisection point.

>
>
>
>> *CQ-Master:* YELLOW (#1)
>>
>> We have failures which CQ is still bisecting and until its done we cannot
>> point to any specific failing projects.
>>
>>
>> Happy week!
>> Dafna
>>
>>
>>
>> ---
>> COLOUR MAP
>>
>> Green = job has been passing successfully
>>
>> ** green for more than 3 days may suggest we need a review of our test
>> coverage
>>
>>
>>1.
>>
>>1-3 days   GREEN (#1)
>>2.
>>
>>4-7 days   GREEN (#2)
>>3.
>>
>>Over 7 days GREEN (#3)
>>
>>
>> Yellow = intermittent failures for different projects but no lasting or
>> current regressions
>>
>> ** intermittent would be a healthy project as we expect a number of
>> failures during the week
>>
>> ** I will not report any of the solved failures or regressions.
>>
>>
>>1.
>>
>>Solved job failuresYELLOW (#1)
>>2.
>>
>>Solved regressions  YELLOW (#2)
>>
>>
>> Red = job has been failing
>>
>> ** Active Failures. The colour will change based on the amount of time
>> the project/s has been broken. Only active regressions would be reported.
>>
>>
>>1.
>>
>>1-3 days  RED (#1)
>>2.
>>
>>4-7 days  RED (#2)
>>3.
>>
>>Over 7 days RED (#3)
>>
>>
>>

-- 

Ryan Barry

Associate Manager - RHV Virt/SLA

rba...@redhat.comM: +16518159306 IM: rbarry

___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/6WILBA447CJE6O6PLHNNU6AFOZKBGO6G/


[ovirt-devel] Re: [Ovirt] [CQ weekly status] [30-11-2018]

2018-11-30 Thread Dan Kenigsberg
On Fri, 30 Nov 2018, 19:33 Dafna Ron  Hi,
>
> This mail is to provide the current status of CQ and allow people to
> review status before and after the weekend.
> Please refer to below colour map for further information on the meaning of
> the colours.
>
> *CQ-4.2*: RED (#1)
>
> I checked last date ovirt-engine and vdsm passed and moved packages to
> tested as they are the bigger projects and it was on the 27-11-218.
>
> We have been having sporadic failures for most of the projects on test
> check_snapshot_with_memory.
> We have deducted that this is caused by a code regression in storage based
> on the following things:
> 1.Evgheni and Gal helped debug this issue to rule out lago and infra issue
> as the cause of failure and both determined the issue is a code regression
> - most likely in storage.
> 2. The failure only happens on 4.2 branch.
> 3. the failure itself is cannot run a vm due to low disk space in storage
> domain and we cannot see any failures which would leave any leftovers in
> the storage domain.
>
> Dan and Ryan are actively
>

Actually,  my involvement was a misguided attempt to solve another 4.2
failure that I thought that I've seen.

involved

> in trying to find the regression but the consensus is that this is a
> storage related regression and* we are having a problem getting the
> storage team to join us in debugging the issue. *
>
> I prepared a patch to skip the test in case we cannot get cooperation from
> storage team and resolve this regression in the next few days:
> https://gerrit.ovirt.org/#/c/95889/
>

Why do you consider this? Are we considering a release of 4.2 without live
snapshot?

Please do not merge it without an ack from Tal and Ryan.



> *CQ-Master:* YELLOW (#1)
>
> We have failures which CQ is still bisecting and until its done we cannot
> point to any specific failing projects.
>
>
> Happy week!
> Dafna
>
>
>
> ---
> COLOUR MAP
>
> Green = job has been passing successfully
>
> ** green for more than 3 days may suggest we need a review of our test
> coverage
>
>
>1.
>
>1-3 days   GREEN (#1)
>2.
>
>4-7 days   GREEN (#2)
>3.
>
>Over 7 days GREEN (#3)
>
>
> Yellow = intermittent failures for different projects but no lasting or
> current regressions
>
> ** intermittent would be a healthy project as we expect a number of
> failures during the week
>
> ** I will not report any of the solved failures or regressions.
>
>
>1.
>
>Solved job failuresYELLOW (#1)
>2.
>
>Solved regressions  YELLOW (#2)
>
>
> Red = job has been failing
>
> ** Active Failures. The colour will change based on the amount of time the
> project/s has been broken. Only active regressions would be reported.
>
>
>1.
>
>1-3 days  RED (#1)
>2.
>
>4-7 days  RED (#2)
>3.
>
>Over 7 days RED (#3)
>
>
>
___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/4L7HS7MKYSJZ3YNKJCT735XNLTQGRRM3/