[ovirt-devel] Re: [VDSM] all test passed, build failed with "tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it"

2018-12-01 Thread Edward Haas
On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer  wrote:

> On Thu, Nov 29, 2018 at 11:21 AM Edward Haas  wrote:
>
>>
>>
>> On Thu, Nov 29, 2018 at 10:41 AM Edward Haas  wrote:
>>
>>>
>>>
>>> On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer  wrote:
>>>
 We have this failure that pops randomly:

 1. All tests pass

 *00:13:13.284* ___ summary 
 *00:13:13.285*   tests: commands 
 succeeded*00:13:13.286*   storage-py27: commands succeeded*00:13:13.286*   
 storage-py36: commands succeeded*00:13:13.286*   lib-py27: commands 
 succeeded*00:13:13.287*   lib-py36: commands succeeded*00:13:13.288*   
 network-py27: commands succeeded*00:13:13.290*   network-py36: commands 
 succeeded*00:13:13.291*   virt-py27: commands succeeded*00:13:13.292*   
 virt-py36: commands succeeded*00:13:13.293*   congratulations :)


 2. But we fail to collect logs at the end

 *00:14:35.992* 
 ##*00:14:35.995* 
 ## Wed Nov 28 17:39:50 UTC 2018 Finished env: 
 fc28:fedora-28-x86_64*00:14:35.996* ##  took 764 seconds*00:14:35.997* 
 ##  rc = 1*00:14:35.997* 
 ##*00:14:36.009* 
 ##! ERROR v*00:14:36.010* ##! Last 
 20 log entries: 
 /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* 
 ##!*00:14:36.012* 
 journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: 
 journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as 
 we read it*00:14:36.014* 
 journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* 
 lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* 
 libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* 
 libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* 
 libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* 
 ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* 
 samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* 
 tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* 
 ===*00:14:36.019* ##!*00:14:36.019* ##! 
 ERROR ^^*00:14:36.019* 
 ##!


 This looks like an issue with vdsm check-patch.sh:

 function collect_logs {
 res=$?
 [ "$res" -ne 0 ] && echo "*** err: $res"
 cd /var/log
 tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" *
 cd /var/host_log
 tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" *
 }

 trap collect_logs EXIT

 Seems that tar fail to collect log if the log is modified while copied,
 which makes sense.

 We can ignore errors in tar, since log collection should not fail the
 build, but I think
 a better solution is to avoid collecting any logs since vdsm writes its
 own logs during
 tests - all the info must be in vdsm log.

 Here is the list of collected logs:

 *00:13:47.280* + tar -cvzf 
 /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz
  btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log 
 journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp 
 yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* 
 dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* 
 glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* 
 lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* 
 openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* 
 openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* 
 tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* 
 yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf 
 /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz
  anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log 
 cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log 
 journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog 
 wtmp*00:13:47.356* anaconda/*00:13:47.356* 
 anaconda/ifcfg.log*00:13:47.357* 
 anaconda/ks-script-l5qnynnj.log*00:13:47.358* 
 anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* 
 anaconda/ks-script-b5_08tmo.log*00:13:47.396* 
 anaconda/ks-script-6uks8bp3.log*00:13:47.397* 
 anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* 
 anaconda/journal.log*00:13:47.449* 

[ovirt-devel] Re: [VDSM] test_echo(1024, False) (stomp_test.StompTests) fails

2018-12-01 Thread Nir Soffer
On Wed, Nov 7, 2018 at 2:20 PM Piotr Kliczewski  wrote:

>
>
> On Wed, Nov 7, 2018 at 1:16 PM Nir Soffer  wrote:
>
>> On Wed, Nov 7, 2018 at 10:13 AM Piotr Kliczewski 
>> wrote:
>>
>>>
>>>
>>> On Wed, Nov 7, 2018 at 5:21 AM Germano Veit Michel 
>>> wrote:
>>>

 On Sat, Jun 30, 2018 at 12:43 AM Piotr Kliczewski <
 piotr.kliczew...@gmail.com> wrote:

> Here is the patch [1] which should fix it.
>
> [1] https://gerrit.ovirt.org/#/c/92690/
> On Thu, Jun 28, 2018 at 2:44 PM Piotr Kliczewski 
> wrote:
> >
> > Nir,
> >
> > It looks like we have a race condition where request arrive sooner
> than subscribe frame:
> >
> > 16:19:02 2018-06-27 14:17:54,653 INFO  (jsonrpc/0)
> [jsonrpc.JsonRpcServer] RPC call echo succeeded in 0.01 seconds
> (__init__:311)
> > 16:19:02 2018-06-27 14:17:54,661 INFO  (Detector thread)
> [Broker.StompAdapter] Subscribe command received (stompserver:123)
> >
> > Thanks for reporting I will push a patch to fix it.
> >
> > Thanks,
> > Piotr
> >
> >
> > On Wed, Jun 27, 2018 at 5:40 PM, Piotr Kliczewski <
> pklic...@redhat.com> wrote:
> >>
> >> Ok, I'll take a look.
> >>
> >> śr., 27 cze 2018, 17:36 użytkownik Nir Soffer 
> napisał:
> >>>
> >>> On Wed, Jun 27, 2018 at 6:13 PM Piotr Kliczewski <
> pklic...@redhat.com> wrote:
> 
>  On Wed, Jun 27, 2018 at 5:01 PM, Nir Soffer 
> wrote:
> >
> > This tests used to fail in the past, but since we fixed it or
> the code
> >
> > it never failed.
> >
> >
> > Maybe the slave was overloaded?
> 
> 
>  Very possible. Can you paste a link to the job which failed?
> >>>
> >>>
> >>> Here:
> https://jenkins.ovirt.org/job/vdsm_master_check-patch-el7-x86_64/24130/
> >>>
> >>> The next build passed.
> >>>
> >>> Maybe we need to solve the flakiness of some tests by running a
> flaky test
> >>> again, and let the build fail only if a test failed twice. I
> wonder if there is some
> >>> pytest plugin doing this.
> >>>
> 
> 
> >
> >
> > 14:19:02
> ==
> > 14:19:02 ERROR:
> >
> > 14:19:02
> --
> > 14:19:02 Traceback (most recent call last):
> > 14:19:02   File
> "/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/testlib.py",
> line 143, in wrapper
> > 14:19:02 return f(self, *args)
> > 14:19:02   File
> "/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/stomp_test.py",
> line 95, in test_echo
> > 14:19:02 str(uuid4())),
> > 14:19:02   File
> "/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/lib/yajsonrpc/jsonrpcclient.py",
> line 77, in callMethod
> > 14:19:02 raise
> exception.JsonRpcNoResponseError(method=methodName)
> > 14:19:02 JsonRpcNoResponseError: No response for JSON-RPC
> request: {'method': 'echo'}
> > 14:19:02  >> begin captured logging <<
> 
> > 14:19:02 2018-06-27 14:17:54,524 DEBUG (MainThread)
> [vds.MultiProtocolAcceptor] Creating socket (host='::1', port=0, 
> family=10,
> socketype=1, proto=6) (protocoldetector:225)
> > 14:19:02 2018-06-27 14:17:54,526 INFO  (MainThread)
> [vds.MultiProtocolAcceptor] Listening at ::1:36713 (protocoldetector:183)
> > 14:19:02 2018-06-27 14:17:54,535 DEBUG (MainThread) [Scheduler]
> Starting scheduler test.Scheduler (schedule:98)
> > 14:19:02 2018-06-27 14:17:54,537 DEBUG (test.Scheduler)
> [Scheduler] START thread  140562082535168)> (func= >, args=(), kwargs={})
> (concurrent:193)
> > 14:19:02 2018-06-27 14:17:54,538 DEBUG (test.Scheduler)
> [Scheduler] started (schedule:140)
> > 14:19:02 2018-06-27 14:17:54,546 DEBUG (JsonRpc (StompReactor))
> [root] START thread  140562629256960)> (func= >, args=(),
> kwargs={}) (concurrent:193)
> > 14:19:02 2018-06-27 14:17:54,547 DEBUG (MainThread) [Executor]
> Starting executor (executor:128)
> > 14:19:02 2018-06-27 14:17:54,549 DEBUG (MainThread) [Executor]
> Starting worker jsonrpc/0 (executor:286)
> > 14:19:02 2018-06-27 14:17:54,553 DEBUG (jsonrpc/0) [Executor]
> START thread 
> (func= at 0x7fd74ca00650>>, args=(), kwargs={}) (concurrent:193)
> > 14:19:02 2018-06-27 14:17:54,554 DEBUG (jsonrpc/0) [Executor]
> Worker started (executor:298)
> > 14:19:02 2018-06-27 14:17:54,557 DEBUG (MainThread) [Executor]
> Starting worker jsonrpc/1 (executor:286)
> > 14:19:02 2018-06-27 14:17:54,558 

[ovirt-devel] Re: [VDSM] all test passed, build failed with "tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we read it"

2018-12-01 Thread Nir Soffer
On Thu, Nov 29, 2018 at 11:21 AM Edward Haas  wrote:

>
>
> On Thu, Nov 29, 2018 at 10:41 AM Edward Haas  wrote:
>
>>
>>
>> On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer  wrote:
>>
>>> We have this failure that pops randomly:
>>>
>>> 1. All tests pass
>>>
>>> *00:13:13.284* ___ summary 
>>> *00:13:13.285*   tests: commands 
>>> succeeded*00:13:13.286*   storage-py27: commands succeeded*00:13:13.286*   
>>> storage-py36: commands succeeded*00:13:13.286*   lib-py27: commands 
>>> succeeded*00:13:13.287*   lib-py36: commands succeeded*00:13:13.288*   
>>> network-py27: commands succeeded*00:13:13.290*   network-py36: commands 
>>> succeeded*00:13:13.291*   virt-py27: commands succeeded*00:13:13.292*   
>>> virt-py36: commands succeeded*00:13:13.293*   congratulations :)
>>>
>>>
>>> 2. But we fail to collect logs at the end
>>>
>>> *00:14:35.992* 
>>> ##*00:14:35.995* ## 
>>> Wed Nov 28 17:39:50 UTC 2018 Finished env: 
>>> fc28:fedora-28-x86_64*00:14:35.996* ##  took 764 seconds*00:14:35.997* 
>>> ##  rc = 1*00:14:35.997* 
>>> ##*00:14:36.009* 
>>> ##! ERROR v*00:14:36.010* ##! Last 
>>> 20 log entries: 
>>> /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* 
>>> ##!*00:14:36.012* 
>>> journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* tar: 
>>> journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we 
>>> read it*00:14:36.014* 
>>> journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* 
>>> lastlog*00:14:36.015* libvirt/*00:14:36.015* libvirt/lxc/*00:14:36.015* 
>>> libvirt/libxl/*00:14:36.016* libvirt/qemu/*00:14:36.016* 
>>> libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* 
>>> libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* 
>>> ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* README*00:14:36.018* 
>>> samba/*00:14:36.018* samba/old/*00:14:36.018* sssd/*00:14:36.018* 
>>> tallylog*00:14:36.018* wtmp*00:14:36.018* Took 678 seconds*00:14:36.018* 
>>> ===*00:14:36.019* ##!*00:14:36.019* ##! 
>>> ERROR ^^*00:14:36.019* 
>>> ##!
>>>
>>>
>>> This looks like an issue with vdsm check-patch.sh:
>>>
>>> function collect_logs {
>>> res=$?
>>> [ "$res" -ne 0 ] && echo "*** err: $res"
>>> cd /var/log
>>> tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" *
>>> cd /var/host_log
>>> tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" *
>>> }
>>>
>>> trap collect_logs EXIT
>>>
>>> Seems that tar fail to collect log if the log is modified while copied,
>>> which makes sense.
>>>
>>> We can ignore errors in tar, since log collection should not fail the
>>> build, but I think
>>> a better solution is to avoid collecting any logs since vdsm writes its
>>> own logs during
>>> tests - all the info must be in vdsm log.
>>>
>>> Here is the list of collected logs:
>>>
>>> *00:13:47.280* + tar -cvzf 
>>> /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz
>>>  btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log 
>>> journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp 
>>> yum.log*00:13:47.285* btmp*00:13:47.285* dnf.librepo.log*00:13:47.299* 
>>> dnf.log*00:13:47.309* dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* 
>>> glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* journal/*00:13:47.313* 
>>> lastlog*00:13:47.315* libvirt/*00:13:47.315* libvirt/qemu/*00:13:47.316* 
>>> openvswitch/*00:13:47.317* openvswitch/ovs-vswitchd.log*00:13:47.318* 
>>> openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* 
>>> tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* wtmp*00:13:47.343* 
>>> yum.log*00:13:47.349* + cd /var/host_log*00:13:47.350* + tar -cvzf 
>>> /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz
>>>  anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log 
>>> cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log 
>>> journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog 
>>> wtmp*00:13:47.356* anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* 
>>> anaconda/ks-script-l5qnynnj.log*00:13:47.358* 
>>> anaconda/storage.log*00:13:47.359* anaconda/program.log*00:13:47.395* 
>>> anaconda/ks-script-b5_08tmo.log*00:13:47.396* 
>>> anaconda/ks-script-6uks8bp3.log*00:13:47.397* 
>>> anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* 
>>> anaconda/journal.log*00:13:47.449* anaconda/dnf.librepo.log*00:13:47.458* 
>>> anaconda/packaging.log*00:13:47.465* anaconda/dbus.log*00:13:47.466* 
>>> anaconda/anaconda.log*00:13:47.467* 
>>>