Re: [ovirt-devel] Check-merged is broken since we added network test to it

2016-12-26 Thread Barak Korren
>
> https://gerrit.ovirt.org/#/c/68078/ broke the check-merged job with
> really tough exception:
>
> 15:23:34 sh: [17766: 1 (255)] tcsetattr: Inappropriate ioctl for device
> 15:23:34 Took 2586 seconds
> 15:23:34 Slave went offline during the build

This isa what I was talking about, something is disconnecting the slave.
(But this is happening in a very temporary manner, a few seconds later
the slave reconnects. See logs at [1]).

> 15:23:34 ERROR: Connection was broken: java.io.IOException: Unexpected
>
> Maybe the problem is few line above this. Could Jenkins simply timeout?
>
>
> Ran 44 tests in 1752.270s
>
> OK
> + return 0
> sh: [9520: 1 (255)] tcsetattr: Inappropriate ioctl for device
>

This is not how a Jenkins timeout looks like.
The whole job took 53m and 20s. The timeout is set to 360m [2]...


[1]: https://ovirt-jira.atlassian.net/browse/OVIRT-938
[2]: 
https://gerrit.ovirt.org/gitweb?p=jenkins.git;a=blob;f=jobs/confs/projects/defaults.yaml;h=c0946d70b19684e134ca89d311cb5b125968ce71;hb=refs/heads/master#l28

-- 
Barak Korren
bkor...@redhat.com
RHCE, RHCi, RHV-DevOps Team
https://ifireball.wordpress.com/
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] Check-merged is broken since we added network test to it

2016-12-25 Thread Dan Kenigsberg
On Thu, Dec 22, 2016 at 5:59 PM, Leon Goldberg  wrote:
> So it's not about the added network tests (these were added in the one I've
> mentioned, the one who had successful runs after being merged), and I don't
> see how the patch you've mentioned breaks anything (as it merely moves
> pieces around).
>
> I could of course be wrong entirely and there's something I'm missing about
> the patch you're claiming is the guilty one, but it won't be about the
> network tests either way.
>
> If it's then not about the network tests, then my immediate suspicion is
> that the problem lies somewhere else and isn't in either of the patches. I
> am far from able to prove this either way, though, so if reverting the patch
> you've mentioned somehow fixes check-merged, then it is of course fine by
> me.
>
> On Thu, Dec 22, 2016 at 5:45 PM, Yaniv Bronheim  wrote:
>>
>>
>>
>> On Thu, Dec 22, 2016 at 5:39 PM, Leon Goldberg 
>> wrote:
>>>
>>> Doesn't seem related; the patch does nothing but move pieces around.
>>>
>>> Judging by the title I guess you're referring to
>>> https://gerrit.ovirt.org/#/c/67787/ ?
>>
>>
>>
>> no.. you can see that after this patch it used to work (check the jobs
>> after the merge)
>> so  something in  https://gerrit.ovirt.org/#/c/68078/ broke it
>>
>>>
>>>
>>> On Thu, Dec 22, 2016 at 5:14 PM, Yaniv Bronheim 
>>> wrote:

 Hi guys and Leon,

 https://gerrit.ovirt.org/#/c/68078/ broke the check-merged job with
 really tough exception:

 15:23:34 sh: [17766: 1 (255)] tcsetattr: Inappropriate ioctl for device
 15:23:34 Took 2586 seconds
 15:23:34 Slave went offline during the build
 15:23:34 ERROR: Connection was broken: java.io.IOException: Unexpected

Maybe the problem is few line above this. Could Jenkins simply timeout?


Ran 44 tests in 1752.270s

OK
+ return 0
sh: [9520: 1 (255)] tcsetattr: Inappropriate ioctl for device

Anyway, a revert is available https://gerrit.ovirt.org/#/c/69078/ and
I'm trying to see if it helps.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] Check-merged is broken since we added network test to it

2016-12-23 Thread Barak Korren
>
> http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/778/console
>
> 11:55:54 RuntimeError: Failed to run reposync 3 times for repoid:
> ovirt-master-snapshot-static-el7, aborting.
>
> It is wrong to pick up on the first patch that happened to see the
> "Groovy thread" exception.

It is not wrong when we know all the reasons for the other failures
and can eliminate them like here:

http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/777/

And while we do see one of the VDSM tests fail there I hardly think it
is the reason behind the later slave disconnection.

The reposync error you are quoting is the result of a package file
being updated without updating the version or revision number. This
effectively poisoned all our local YUM caches and prevented all jobs
running Lago from doing anything interesting during Thursday until I
managed to clean all the failed caches. Then again jobs that fail
there do not cause the slave disconnection so I made no point of
citing them here.

We (the CI team) do our best to eliminate false negatives in the
system, please do not take the easy path of pointing out to those
false negatives every time we reach out to you. Give us the benefit of
the doubt of believing we were diligent enough to eliminate such
issues before approaching you.

-- 
Barak Korren
bkor...@redhat.com
RHCE, RHCi, RHV-DevOps Team
https://ifireball.wordpress.com/
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] Check-merged is broken since we added network test to it

2016-12-23 Thread Barak Korren
בתאריך 23 בדצמ׳ 2016 12:46 PM,‏ "Dan Kenigsberg"  כתב:

On Thu, Dec 22, 2016 at 5:45 PM, Yaniv Bronheim  wrote:
> On Thu, Dec 22, 2016 at 5:39 PM, Leon Goldberg 
wrote:
>
>> Doesn't seem related; the patch does nothing but move pieces around.
>>
>> Judging by the title I guess you're referring to
>> https://gerrit.ovirt.org/#/c/67787/ ?
>>
>
>
> no.. you can see that after this patch it used to work (check the jobs
> after the merge)
> so  something in  https://gerrit.ovirt.org/#/c/68078/ broke it

"post hoc ergo propter hoc" is a fallacy.

We had a horrible week CI-wise; check-merged keept failing due to
"Groovy thread" exception which is an internal Jenkins thingy.



Please give the CI team the credit of being able to eliminate the Jenkins
and environment related issues. We have runs that pre-date this week's
issues and still show that behaviour. for example:

http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/692/console

The fact that the job crashes during the groovy code is not an indicator
that the issue is in that code. That exact same code runs for every single
job in the system without failing, so it means that something in this job
probably creates conditions that prevent it from running.

Actually we know more then that. Looking carefully at the job logs you can
see the groovy fails to run because the slave disconnects just before
Jenkins tries to run it (By now you should be familiar with OVIRT-938).
Again, all the CI code that runs before, after and during this job is also
used in other jobs that do not fail in the same manner - With this I want
to Yaniv to ask him to help figuring out if the check_merged.sh code may be
doing anything that may cause this. Our conversation led to this email.

Trying to wave this this as a "Groovy failure" at this point is not
helping. The CI team is not going to come up with a magic solution to this
one without you guys` help.

Now
that's over, but a recent test failed on

http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/778/console

11:55:54 RuntimeError: Failed to run reposync 3 times for repoid:
ovirt-master-snapshot-static-el7, aborting.

It is wrong to pick up on the first patch that happened to see the
"Groovy thread" exception.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] Check-merged is broken since we added network test to it

2016-12-23 Thread Dan Kenigsberg
Yaniv,

Maybe you can look at
http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/791/consoleText
Unlike other, this was actually run by CI, and failed to start vdsm on
the lago host.

"A dependency job for vdsmd.service failed. See 'journalctl -xe' for details."

The cause might be the beautifully-numbered
  Bug 143 - imageio fails during system startup
but that's just a guess.

Yaniv, can you extract the journal from the VM to take a look?
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] Check-merged is broken since we added network test to it

2016-12-23 Thread Dan Kenigsberg
On Thu, Dec 22, 2016 at 5:45 PM, Yaniv Bronheim  wrote:
> On Thu, Dec 22, 2016 at 5:39 PM, Leon Goldberg  wrote:
>
>> Doesn't seem related; the patch does nothing but move pieces around.
>>
>> Judging by the title I guess you're referring to
>> https://gerrit.ovirt.org/#/c/67787/ ?
>>
>
>
> no.. you can see that after this patch it used to work (check the jobs
> after the merge)
> so  something in  https://gerrit.ovirt.org/#/c/68078/ broke it

"post hoc ergo propter hoc" is a fallacy.

We had a horrible week CI-wise; check-merged keept failing due to
"Groovy thread" exception which is an internal Jenkins thingy. Now
that's over, but a recent test failed on

http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/778/console

11:55:54 RuntimeError: Failed to run reposync 3 times for repoid:
ovirt-master-snapshot-static-el7, aborting.

It is wrong to pick up on the first patch that happened to see the
"Groovy thread" exception.

Regards,
Dan.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] Check-merged is broken since we added network test to it

2016-12-22 Thread Leon Goldberg
Doesn't seem related; the patch does nothing but move pieces around.

Judging by the title I guess you're referring to
https://gerrit.ovirt.org/#/c/67787/ ?

On Thu, Dec 22, 2016 at 5:14 PM, Yaniv Bronheim  wrote:

> Hi guys and Leon,
>
> https://gerrit.ovirt.org/#/c/68078/ broke the check-merged job with
> really tough exception:
>
> *15:23:34* sh: [17766: 1 (255)] tcsetattr: Inappropriate ioctl for 
> device*15:23:34* Took 2586 seconds*15:23:34* Slave went offline during the 
> build 
> *15:23:34*
>  ERROR: Connection was broken: java.io.IOException: Unexpected termination of 
> the channel*15:23:34* at 
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)*15:23:34*
>  Caused by: java.io.EOFException*15:23:34*at 
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353)*15:23:34*
>at 
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)*15:23:34*
>   at 
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)*15:23:34*
>  at 
> java.io.ObjectInputStream.(ObjectInputStream.java:301)*15:23:34*   
> at 
> hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:48)*15:23:34*
> at 
> hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)*15:23:34*
> at 
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)*15:23:34*
>  *15:23:34* Build step 'Execute shell' marked build as failure*15:23:34* 
> Performing Post build task...
>
>
> I have no clue what causes it.. we need to investigate in the tests code.
>
>
> you can see it in 
> http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/772/consoleFull
>
>
> and this for a job run just before it got in, which worked well - 
> http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/688/console
>
>
> I suggest to revert this patch (and backport the revert to ovirt-4.1 branch 
> as well) until figuring what causes it
>
>
> --
> *Yaniv Bronhaim.*
>
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] Check-merged is broken since we added network test to it

2016-12-22 Thread Leon Goldberg
So it's not about the added network tests (these were added in the one I've
mentioned, the one who had successful runs after being merged), and I don't
see how the patch you've mentioned breaks anything (as it merely moves
pieces around).

I could of course be wrong entirely and there's something I'm missing about
the patch you're claiming is the guilty one, but it won't be about the
network tests either way.

If it's then not about the network tests, then my immediate suspicion is
that the problem lies somewhere else and isn't in either of the patches. I
am far from able to prove this either way, though, so if reverting the
patch you've mentioned somehow fixes check-merged, then it is of course
fine by me.

On Thu, Dec 22, 2016 at 5:45 PM, Yaniv Bronheim  wrote:

>
>
> On Thu, Dec 22, 2016 at 5:39 PM, Leon Goldberg 
> wrote:
>
>> Doesn't seem related; the patch does nothing but move pieces around.
>>
>> Judging by the title I guess you're referring to
>> https://gerrit.ovirt.org/#/c/67787/ ?
>>
>
>
> no.. you can see that after this patch it used to work (check the jobs
> after the merge)
> so  something in  https://gerrit.ovirt.org/#/c/68078/ broke it
>
>
>>
>> On Thu, Dec 22, 2016 at 5:14 PM, Yaniv Bronheim 
>> wrote:
>>
>>> Hi guys and Leon,
>>>
>>> https://gerrit.ovirt.org/#/c/68078/ broke the check-merged job with
>>> really tough exception:
>>>
>>> *15:23:34* sh: [17766: 1 (255)] tcsetattr: Inappropriate ioctl for 
>>> device*15:23:34* Took 2586 seconds*15:23:34* Slave went offline during the 
>>> build 
>>> *15:23:34*
>>>  ERROR: Connection was broken: java.io.IOException: Unexpected termination 
>>> of the channel*15:23:34*   at 
>>> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)*15:23:34*
>>>  Caused by: java.io.EOFException*15:23:34*at 
>>> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353)*15:23:34*
>>>at 
>>> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)*15:23:34*
>>>   at 
>>> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)*15:23:34*
>>>  at 
>>> java.io.ObjectInputStream.(ObjectInputStream.java:301)*15:23:34*  
>>>  at 
>>> hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:48)*15:23:34*
>>> at 
>>> hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)*15:23:34*
>>> at 
>>> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)*15:23:34*
>>>  *15:23:34* Build step 'Execute shell' marked build as failure*15:23:34* 
>>> Performing Post build task...
>>>
>>>
>>> I have no clue what causes it.. we need to investigate in the tests code.
>>>
>>>
>>> you can see it in 
>>> http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/772/consoleFull
>>>
>>>
>>> and this for a job run just before it got in, which worked well - 
>>> http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/688/console
>>>
>>>
>>> I suggest to revert this patch (and backport the revert to ovirt-4.1 branch 
>>> as well) until figuring what causes it
>>>
>>>
>>> --
>>> *Yaniv Bronhaim.*
>>>
>>
>>
>
>
> --
> *Yaniv Bronhaim.*
>
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] Check-merged is broken since we added network test to it

2016-12-22 Thread Yaniv Bronheim
On Thu, Dec 22, 2016 at 5:39 PM, Leon Goldberg  wrote:

> Doesn't seem related; the patch does nothing but move pieces around.
>
> Judging by the title I guess you're referring to
> https://gerrit.ovirt.org/#/c/67787/ ?
>


no.. you can see that after this patch it used to work (check the jobs
after the merge)
so  something in  https://gerrit.ovirt.org/#/c/68078/ broke it


>
> On Thu, Dec 22, 2016 at 5:14 PM, Yaniv Bronheim 
> wrote:
>
>> Hi guys and Leon,
>>
>> https://gerrit.ovirt.org/#/c/68078/ broke the check-merged job with
>> really tough exception:
>>
>> *15:23:34* sh: [17766: 1 (255)] tcsetattr: Inappropriate ioctl for 
>> device*15:23:34* Took 2586 seconds*15:23:34* Slave went offline during the 
>> build 
>> *15:23:34*
>>  ERROR: Connection was broken: java.io.IOException: Unexpected termination 
>> of the channel*15:23:34*at 
>> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)*15:23:34*
>>  Caused by: java.io.EOFException*15:23:34*at 
>> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353)*15:23:34*
>>at 
>> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)*15:23:34*
>>   at 
>> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)*15:23:34*
>>  at 
>> java.io.ObjectInputStream.(ObjectInputStream.java:301)*15:23:34*   
>> at 
>> hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:48)*15:23:34*
>> at 
>> hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)*15:23:34*
>> at 
>> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)*15:23:34*
>>  *15:23:34* Build step 'Execute shell' marked build as failure*15:23:34* 
>> Performing Post build task...
>>
>>
>> I have no clue what causes it.. we need to investigate in the tests code.
>>
>>
>> you can see it in 
>> http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/772/consoleFull
>>
>>
>> and this for a job run just before it got in, which worked well - 
>> http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/688/console
>>
>>
>> I suggest to revert this patch (and backport the revert to ovirt-4.1 branch 
>> as well) until figuring what causes it
>>
>>
>> --
>> *Yaniv Bronhaim.*
>>
>
>


-- 
*Yaniv Bronhaim.*
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

[ovirt-devel] Check-merged is broken since we added network test to it

2016-12-22 Thread Yaniv Bronheim
Hi guys and Leon,

https://gerrit.ovirt.org/#/c/68078/ broke the check-merged job with really
tough exception:

*15:23:34* sh: [17766: 1 (255)] tcsetattr: Inappropriate ioctl for
device*15:23:34* Took 2586 seconds*15:23:34* Slave went offline during
the build 
*15:23:34*
ERROR: Connection was broken: java.io.IOException: Unexpected
termination of the channel*15:23:34*at
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)*15:23:34*
Caused by: java.io.EOFException*15:23:34*   at
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353)*15:23:34*
at 
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)*15:23:34*
at 
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)*15:23:34*
at 
java.io.ObjectInputStream.(ObjectInputStream.java:301)*15:23:34*
at 
hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:48)*15:23:34*
at 
hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)*15:23:34*
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)*15:23:34*
*15:23:34* Build step 'Execute shell' marked build as
failure*15:23:34* Performing Post build task...


I have no clue what causes it.. we need to investigate in the tests code.


you can see it in
http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/772/consoleFull


and this for a job run just before it got in, which worked well -
http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/688/console


I suggest to revert this patch (and backport the revert to ovirt-4.1
branch as well) until figuring what causes it


-- 
*Yaniv Bronhaim.*
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel