Re: [ovirt-devel] OST Regression in add cluster (IBRS related)

2018-01-15 Thread Nadav Goldin
Maybe related to [1]?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1533125


On Mon, Jan 15, 2018 at 5:40 PM, Gal Ben Haim <gbenh...@redhat.com> wrote:
> I've tried to run 'basic-suite-master' with [1], but I'm getting the
> following error:
>
> _ID: VDS_CPU_LOWER_THAN_CLUSTER(515), Host lago-basic-suite-master-host-1
> moved to Non-Operational state as host does not meet the cluster's minimum
> CPU level. Missing CPU features : model_Haswell-noTSX-IBRS
>
> When running virsh on the host I see the following CPU:
>
> Haswell-noTSX-IBRS
>
> The CPU definition in the dom xml of the host:
>
> 
> 
>   
>
>
> When running virsh on the VM (ovirt host) I see the following CPU:
>
> Haswell-noTSX
>
> Which doesn't match the CPU of the host.
>
> thoughts?
>
>
> [1] https://github.com/lago-project/lago-ost-plugin/pull/31
>
> On Sun, Jan 14, 2018 at 11:46 PM, Nadav Goldin <ngol...@virtual-gate.net>
> wrote:
>>
>> Trying to put together what I remember:
>> 1. We had a QEMU bug where it was stated clearly that
>> nested-virtualization is only supported when using 'host-passthrough'
>> (don't know if that had changed since).
>> 2. As consequence of (1) - Lago uses by default host-passthrough.
>> 3. When running O-S-T, we needed a deterministic way to decide which
>> cluster level to use, taking into account that VDMS's CPU, can be,
>> theoretically, anything.
>> 4. That is why you see 'Skylake' and 'IvyBridge' there  -  to match
>> possible users of OST.
>> 5. Lago already uses 'virsh capabilities' to report the L1 VM's CPU,
>> lago-ost-plugin uses that report as the input key to the mapping file.
>>
>> As far as I remember, we settled for this method after several
>> on-going reports of users unable to run OST on their laptops due to
>> CPU issues.
>>
>>
>>
>> On Fri, Jan 12, 2018 at 6:49 PM, Michal Skrivanek
>> <michal.skriva...@redhat.com> wrote:
>> >
>> >
>> > On 12 Jan 2018, at 17:32, Yaniv Kaul <yk...@redhat.com> wrote:
>> >
>> >
>> >
>> > On Fri, Jan 12, 2018 at 1:05 PM, Michal Skrivanek
>> > <michal.skriva...@redhat.com> wrote:
>> >>
>> >>
>> >>
>> >> On 12 Jan 2018, at 08:32, Tomas Jelinek <tjeli...@redhat.com> wrote:
>> >>
>> >>
>> >>
>> >> On Fri, Jan 12, 2018 at 8:18 AM, Yaniv Kaul <yk...@redhat.com> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Jan 12, 2018 at 9:06 AM, Yaniv Kaul <yk...@redhat.com> wrote:
>> >>>>
>> >>>> See[1] - do we need to update Lago / Lago OST plugin?
>> >>>
>> >>>
>> >>> Something like https://github.com/lago-project/lago-ost-plugin/pull/31
>> >>> perhaps (not tested, don't have the HW).
>> >>
>> >>
>> >> yes, seems like that should do the trick.
>> >>
>> >>
>> >> sure, though, that list is also difficult to maintain
>> >> e.g. IvyBridge is not an oVirt supported model, there’s no “Skylake”
>> >> model
>> >>
>> >> Nadav, what’s the exact purpose of that list, and can it be eliminated
>> >> somehow?
>> >
>> >
>> > It's to match, as possible, between the host CPU (which is passed to L1)
>> > so
>> > it'll match oVirt’s.
>> >
>> >
>> > getting it from "virsh capabilities" on the host would match it a bit
>> > better. It would be enough to just make the L1 host report (via fake
>> > caps
>> > hook if needed) the same model_X in getVdsCapabilities as the L0
>> >
>> > It's not that difficult to maintain. We add new CPUs once-twice a year…?
>> >
>> >
>> > yes, not often
>> >
>> > Y.
>> >
>> >>
>> >>
>> >> Thanks,
>> >> michal
>> >>
>> >>
>> >>
>> >>>
>> >>> Y.
>> >>>
>> >>>>
>> >>>> Error Message
>> >>>>
>> >>>> Unsupported CPU model: Haswell-noTSX-IBRS. Supported models:
>> >>>>
>> >>>> IvyBridge,Westmere,Skylake,Penryn,Haswell,Broadwell,Nehalem,Skylake-Client,Broadwell-noTSX,Conroe,SandyBridge,Haswell-noTSX
>> >>>>
>> >>>> Stacktrace
>> >>>>
>> >>>> Traceback (most recent cal

Re: [ovirt-devel] OST Regression in add cluster (IBRS related)

2018-01-14 Thread Nadav Goldin
Trying to put together what I remember:
1. We had a QEMU bug where it was stated clearly that
nested-virtualization is only supported when using 'host-passthrough'
(don't know if that had changed since).
2. As consequence of (1) - Lago uses by default host-passthrough.
3. When running O-S-T, we needed a deterministic way to decide which
cluster level to use, taking into account that VDMS's CPU, can be,
theoretically, anything.
4. That is why you see 'Skylake' and 'IvyBridge' there  -  to match
possible users of OST.
5. Lago already uses 'virsh capabilities' to report the L1 VM's CPU,
lago-ost-plugin uses that report as the input key to the mapping file.

As far as I remember, we settled for this method after several
on-going reports of users unable to run OST on their laptops due to
CPU issues.



On Fri, Jan 12, 2018 at 6:49 PM, Michal Skrivanek
 wrote:
>
>
> On 12 Jan 2018, at 17:32, Yaniv Kaul  wrote:
>
>
>
> On Fri, Jan 12, 2018 at 1:05 PM, Michal Skrivanek
>  wrote:
>>
>>
>>
>> On 12 Jan 2018, at 08:32, Tomas Jelinek  wrote:
>>
>>
>>
>> On Fri, Jan 12, 2018 at 8:18 AM, Yaniv Kaul  wrote:
>>>
>>>
>>>
>>> On Fri, Jan 12, 2018 at 9:06 AM, Yaniv Kaul  wrote:

 See[1] - do we need to update Lago / Lago OST plugin?
>>>
>>>
>>> Something like https://github.com/lago-project/lago-ost-plugin/pull/31
>>> perhaps (not tested, don't have the HW).
>>
>>
>> yes, seems like that should do the trick.
>>
>>
>> sure, though, that list is also difficult to maintain
>> e.g. IvyBridge is not an oVirt supported model, there’s no “Skylake” model
>>
>> Nadav, what’s the exact purpose of that list, and can it be eliminated
>> somehow?
>
>
> It's to match, as possible, between the host CPU (which is passed to L1) so
> it'll match oVirt’s.
>
>
> getting it from "virsh capabilities" on the host would match it a bit
> better. It would be enough to just make the L1 host report (via fake caps
> hook if needed) the same model_X in getVdsCapabilities as the L0
>
> It's not that difficult to maintain. We add new CPUs once-twice a year…?
>
>
> yes, not often
>
> Y.
>
>>
>>
>> Thanks,
>> michal
>>
>>
>>
>>>
>>> Y.
>>>

 Error Message

 Unsupported CPU model: Haswell-noTSX-IBRS. Supported models:
 IvyBridge,Westmere,Skylake,Penryn,Haswell,Broadwell,Nehalem,Skylake-Client,Broadwell-noTSX,Conroe,SandyBridge,Haswell-noTSX

 Stacktrace

 Traceback (most recent call last):
   File "/usr/lib64/python2.7/unittest/case.py", line 369, in run
 testMethod()
   File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in
 runTest
 self.test(*self.arg)
   File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
 129, in wrapped_test
 test()
   File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 59,
 in wrapper
 return func(get_test_prefix(), *args, **kwargs)
   File
 "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
 line 277, in add_cluster
 add_cluster_4(prefix)
   File
 "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
 line 305, in add_cluster_4
 cpu_family = prefix.virt_env.get_ovirt_cpu_family()
   File "/usr/lib/python2.7/site-packages/ovirtlago/virt.py", line 151,
 in get_ovirt_cpu_family
 ','.join(cpu_map[host.cpu_vendor].iterkeys())
 RuntimeError: Unsupported CPU model: Haswell-noTSX-IBRS. Supported
 models:
 IvyBridge,Westmere,Skylake,Penryn,Haswell,Broadwell,Nehalem,Skylake-Client,Broadwell-noTSX,Conroe,SandyBridge,Haswell-noTSX



 Y.

 [1]
 http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/3498/testReport/junit/(root)/002_bootstrap/add_cluster/
>>>
>>>
>>>
>>> ___
>>> Devel mailing list
>>> Devel@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/devel
>>
>>
>> ___
>> Devel mailing list
>> Devel@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
>
>
>
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] ovirt-system-tests run specific scenario/debugging

2017-09-16 Thread Nadav Goldin
Hi,
Rerunning a specific python test file is possible, though it takes few
manual steps. Take a look at [1].

With regard to the debugger, it is possible to run the tests without
'lago ovirt runtest' commands at all, directly with Lago as a Python
library, though it isn't fully used in OST. Basically you would have
to export the same environment variables as described in [1], and then
use the same imports and decorators as found in OST(mainly the
testlib.with_ovirt_prefix decorator), with that in hand you can call
the python file however you'd like.

Of course this is all good for debugging, but less for OST(as you need
the suite logic: log collections, order of tests, etc).



[1] http://lists.ovirt.org/pipermail/lago-devel/20170402/000650.html

On Thu, Sep 14, 2017 at 8:59 PM, Marc Young <3vilpeng...@gmail.com> wrote:
> Is it possible to run a specific scenario without having to run back through
> spin up/tear down?
>
> I want to rapidly debug a `test-scenarios/00#_something.py` and the
> bootstrap scripts (001,002) take a really long time.
>
> Also is it possible to attach to a debugger within the test-scenario with
> pdb? I didnt have luck and it looks like its abstracted away and not
> executed as a regular python file in a way that i can get to an interactive
> debugger
>
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 01/08/2017 ] [add_secondary_storage_domains]

2017-08-15 Thread Nadav Goldin
Hi Marc,

Some of the logs failed to extract, so I'm not sure what went wrong.
Specifically, host-deploy directory and vdms logs from the hosts.
(They should all be under test_logs, I would just tar.gz the entire
directory and send it next time). As "add hosts" step failed on
timeout(15 minutes) - I suspect is due to a missing package or network
issue(some packages might be downloaded from the VDMS hosts during
that stage). So:
1. Could you retry? is this consistent?
2. If so, can you please upload the entire log directory?

If we discover this indeed is a network issue, you could just increase
the timeout.

Thanks,

Nadav.


On Tue, Aug 15, 2017 at 5:28 AM, Marc Young <3vilpeng...@gmail.com> wrote:
> After updating the python sdk:
>
> myoung at dev-vm in ~/repos/github/ovirt-system-tests on (no branch)▲
> $ rpm -q python-ovirt-engine-sdk4
> python-ovirt-engine-sdk4-4.1.6-2.20170712git1b99f36.el7.centos.x86_64
>
>
> I get more but different errors[1].
>
> Then the lago log[2]
> All engine Logs are even further[3]
>
> [1]
>
>   # add_cluster: Success (in 0:00:03)
>   # add_hosts:
> * Collect artifacts:
>   - [Thread-5] lago-basic-suite-4-1-host1: ERROR (in 0:00:19)
>   - [Thread-4] lago-basic-suite-4-1-engine: ERROR (in 0:00:20)
> Error while running thread
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/lago/utils.py", line 58, in
> _ret_via_queue
> queue.put({'return': func()})
>   File "/usr/lib/python2.7/site-packages/lago/prefix.py", line 1478, in
> _collect_artifacts
> vm.collect_artifacts(path, ignore_nopath)
>   File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line 624, in
> collect_artifacts
> ignore_nopath=ignore_nopath
>   File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line 381, in
> extract_paths
> return self.provider.extract_paths(paths, *args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/lago/providers/libvirt/vm.py", line
> 342, in extract_paths
> ignore_nopath=ignore_nopath,
>   File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line 247, in
> extract_paths
> self._extract_paths_scp(paths=paths, ignore_nopath=ignore_nopath)
>   File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line 266, in
> _extract_paths_scp
> propagate_fail=False
>   File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line 425, in
> copy_from
> local_path=local_path,
>   File "/usr/lib/python2.7/site-packages/scp.py", line 125, in get
> self._recv_all()
>   File "/usr/lib/python2.7/site-packages/scp.py", line 250, in _recv_all
> msg = self.channel.recv(1024)
>   File "/usr/lib/python2.7/site-packages/paramiko/channel.py", line 615, in
> recv
> raise socket.timeout()
> timeout
> Error while running thread
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/lago/utils.py", line 58, in
> _ret_via_queue
> queue.put({'return': func()})
>   File "/usr/lib/python2.7/site-packages/lago/prefix.py", line 1478, in
> _collect_artifacts
> vm.collect_artifacts(path, ignore_nopath)
>   File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line 624, in
> collect_artifacts
> ignore_nopath=ignore_nopath
>   File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line 381, in
> extract_paths
> return self.provider.extract_paths(paths, *args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/lago/providers/libvirt/vm.py", line
> 342, in extract_paths
> ignore_nopath=ignore_nopath,
>   File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line 247, in
> extract_paths
> self._extract_paths_scp(paths=paths, ignore_nopath=ignore_nopath)
>   File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line 266, in
> _extract_paths_scp
> propagate_fail=False
>   File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line 425, in
> copy_from
> local_path=local_path,
>   File "/usr/lib/python2.7/site-packages/scp.py", line 125, in get
> self._recv_all()
>   File "/usr/lib/python2.7/site-packages/scp.py", line 250, in _recv_all
> msg = self.channel.recv(1024)
>   File "/usr/lib/python2.7/site-packages/paramiko/channel.py", line 615, in
> recv
> raise socket.timeout()
> timeout
>
> Error while running thread
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/lago/utils.py", line 58, in
> _ret_via_queue
> queue.put({'return': func()})
>   File "/usr/lib/python2.7/site-packages/lago/prefix.py", line 1478, in
> _collect_artifacts
> vm.collect_artifacts(path, ignore_nopath)
>   File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line 624, in
> collect_artifacts
> ignore_nopath=ignore_nopath
>   File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line 381, in
> extract_paths
> return self.provider.extract_paths(paths, *args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/lago/providers/libvirt/vm.py", line
> 342, in extract_paths
> 

Re: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 03-07-2017 ] [ 006_migrations.migrate_vm ]

2017-07-04 Thread Nadav Goldin
On Tue, Jul 4, 2017 at 1:30 PM, Nadav Goldin <ngol...@redhat.com> wrote:
> 1. I couldn't replicate it locally - which means it is most likely a
> recent change.

To clarify, I meant that with the current tested repository I couldn't
replicate it, I didn't try mimicking the 'under_testing' repository
which includes the recent patches(and which the failure happens on).
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 03-07-2017 ] [ 006_migrations.migrate_vm ]

2017-07-04 Thread Nadav Goldin
Hi, sorry for posting late, I had a brief look at this yesterday:
1. I couldn't replicate it locally - which means it is most likely a
recent change.
2. I looked at the libvirt XMLs Lago generatd for the hosts, as a new
version is used this week(0.40) - and they seem OK - specifically
memroy and vcpus(which was my initial suspect).
3. I saw two Engine patches, a bit prior to the time it started to
fail, which *might* in my common sense be related, but it is out of my
scope to tell(CC'ed patch owners):

core: Make VmAnalyzer to treat a migrated Paused VM as success -
https://gerrit.ovirt.org/78305

fix custom fencing default config setting
https://gerrit.ovirt.org/78720

Shot in the wild - Could it be that the 'CPUOverload' filter was not
active before for some reason?

Also, there are some exceptions in host0 vdsm log[1], failing to get
VM stats, though I can't tell if they are specific to this failure.

Of course this is not a complete analysis, I hope it helps.


[1] 
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.log


Nadav.





On Tue, Jul 4, 2017 at 12:46 PM, Eyal Edri  wrote:
>
>
> On Tue, Jul 4, 2017 at 12:18 PM, Michal Skrivanek
>  wrote:
>>
>>
>> On 3 Jul 2017, at 15:35, Shlomo Ben David  wrote:
>>
>> Hi,
>>
>> Test failed: [ 006_migrations.migrate_vm ]
>> Link to suspected patches: N/A
>> Link to Job:
>> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/
>> Link to all logs:
>> Error snippet from the log:
>> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/7431/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/
>>
>> 
>>
>>  "Fault reason is "Operation Failed". Fault detail is "[Cannot migrate VM.
>> There is no host that satisfies current scheduling constraints. See below
>> for details:, The host lago-basic-suite-master-host0 did not satisfy
>> internal filter CPUOverloaded because its CPU is too loaded.]"
>>
>> 
>>
>> 
>>
>> 2017-07-02 16:43:22,829-04 INFO
>> [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27)
>> [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Lock Acquired to object
>> 'EngineLock:{exclusiveLocks='[2b34910d-cef2-44d6-a274-30e8473eb5d9=VM]',
>> sharedLocks=''}'
>> 2017-07-02 16:43:22,833-04 DEBUG
>> [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall]
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled stored
>> procedure. Call string is [{call getdiskvmelementspluggedtovm(?)}]
>> 2017-07-02 16:43:22,833-04 DEBUG
>> [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall]
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for
>> procedure [GetDiskVmElementsPluggedToVm] compiled
>> 2017-07-02 16:43:22,843-04 DEBUG
>> [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall]
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Compiled stored
>> procedure. Call string is [{call getattacheddisksnapshotstovm(?, ?)}]
>> 2017-07-02 16:43:22,843-04 DEBUG
>> [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall]
>> (default task-27) [87508047-fdc5-4a2f-9692-c83f7b55bbc2] SqlCall for
>> procedure [GetAttachedDiskSnapshotsToVm] compiled
>> 2017-07-02 16:43:22,919-04 INFO
>> [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-27)
>> [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Candidate host
>> 'lago-basic-suite-master-host0' ('46bdc63d-98f5-4eee-81aa-2fb88b8f7cbe') was
>> filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'CPUOverloaded'
>> (correlation id: null)
>> 2017-07-02 16:43:22,920-04 WARN
>> [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-27)
>> [87508047-fdc5-4a2f-9692-c83f7b55bbc2] Validation of action
>> 'MigrateVmToServer' failed for user admin@internal-authz. Reasons:
>> VAR__ACTION__MIGRATE,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName
>> lago-basic-suite-master-host0,$filterName
>> CPUOverloaded,VAR__DETAIL__CPU_OVERLOADED,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL
>>
>>
>>
>> This has nothing to do with migration
>> The CPUOverload is a scheduling policy, unless there was any change in
>> that area the obvious explanation would be that the host has a CPU overload
>> condition.
>> I briefly looked at logs and see ""cpuUser": "83.40", "cpuSys": "16.59",
>> "cpuIdle": “0.08”” which indeed suggests an overload, from the same sample I
>> can see it’s vdsm ("cpuUserVdsmd": “77.38”, cpuSysVdsmd": “18.44"
>>
>> Since similar values are consistently being reported for some time, and
>> there is a setupNetworks and storage rescan prior to the the failure, and
>> there is no other indication of anything wrong, I’d just say 

[ovirt-devel] Fwd: Lago v0.39 is out!

2017-06-04 Thread Nadav Goldin
-- Forwarded message --
From: Nadav Goldin <ngol...@redhat.com>
Date: Sun, Jun 4, 2017 at 2:24 PM
Subject: Lago v0.39 is out!
To: lago-de...@ovirt.org


On behalf of the Lago team, I'm pleased to announce the new release of
Lago and Lago-ost-plugin:
Lago - v0.39
Lago-ost-plugin v0.41

This is the first release where we've separated Lago and
lago-ost-plugin(aka ovirtlago) to different repositories, installation
procedures should be the same. However, from now on the
lago-ost-plugin will follow a different release cycle. It's repository
can be found at [1], and docs at [2]. Note that 'lago-ost-plugin'
requires Lago >= 0.39.

What's new
=

Lago
---
1. Improved Ansible inventory support. For more details see [3].
2. Lago SDK - Allows to run most CLI operations directly from Python.
See [4] for the docs, and [5] for an example. This is mostly
standardization of the already provided SDK.
3. Debian network support in bootstrap stage.

Lago Images
-
3 New images were added, please help in verifying them:

1. fc25-base
2. debian8-base
3. ubuntu16.04-base

There is a known issue with host name resolution after boot in debian,
but it does not affect connectivity.

Tests/CI

1. Moved to tox to setup the virtualenv during the tests:
   * `tox -e docs` - builds the docs.
   * `tox -e py27` - run unittests and linters.
   * `tox -c tox-sdk.ini -- --stage check_patch/check_merged` - to run
the functional SDK tests
  for each stage(after you have installed lago - either in a
nested virtualenv or from RPMs).
2. Added SDK functional tests:
* Easy to run sanity check while developing, under tests/functional-sdk run:
  `pytest -s -vvv --setup-show --stage check_patch test_sdk_sanity.py`
3. Added multi-distro tests, which means prior to merging every Lago
patch, we'll ensure the core images in templates.ovirt.org are
functional with the new patch.
4. Added ansible functional tests on check-merged.

For the full changelog see [6].

Upgrading

To upgrade using yum or dnf, simply run:
```
yum/dnf update lago
```

Resources

Lago Docs: http://lago.readthedocs.io/en/latest/
GitHub: https://github.com/lago-project/lago/
YUM Repository: http://resources.ovirt.org/repos/lago/stable/0.0/rpm/
OST Docs: http://ovirt-system-tests.readthedocs.io/en/latest/

As always, if you find any problems, please open an issue in the GitHub page.

Enjoy!

Nadav.

[1] https://github.com/lago-project/lago-ost-plugin
[2] http://lago-ost-plugin.readthedocs.io/en/latest/
[3] https://github.com/lago-project/lago/pull/544
[4] http://lago.readthedocs.io/en/stable/SDK.html
[5] 
https://github.com/lago-project/lago/blob/master/docs/examples/lago_sdk_one_vm_one_net.ipynb
[6] https://github.com/lago-project/lago/compare/0.38...0.39
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] [ovirt-system-tests] ssh to an oVirt VM in Lago

2017-05-31 Thread Nadav Goldin
Hi,


On Wed, May 31, 2017 at 12:53 AM, Valentina Makarova
 wrote:
>
> Is it possible to get ssh connection to non-host VM in LAGO?

Not at the moment, Lago is not really aware of the nested VMs, just
the first layer(engine + hosts).

> It is easy for host vm and engine, there is method ''ssh" in 
> ovirt-engine-api-model.

This is actually A Lago method, not 'ovirt-engine-sdk' one.

> Please give me advice, can Iget connection to vm0 in a similar way?

I would start first by making sure the VM is booting properly using
SPICE from the GUI, when the tests ends, you should be able to log
into the Engine GUI(run 'lago ovirt status' inside your deployment
directory to get the link, the directory should be something like
ovirt-system-tests/deployment-SUITE-NAME).
Then in the GUI, start the VM and click on 'console', the
username/password should be root/123456. The image used is CirrOS.

If it is booting properly - first would be to check if it even gets an
IP(my guess is not - I'm not even sure if dhcp is running in that
layer, maybe we should setup one..).
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] ovirt-imageio leaves .guestfs-0 folder under /var/tmp during check-patch

2017-05-23 Thread Nadav Goldin
IIRC, this is the default location libguestfs caches files.
It can be changed with LIBGUESTFS_TMPDIR env parameter, but whether
the default behaviour should be changed is a different question I
guess.



On Tue, May 23, 2017 at 4:46 PM, Gil Shinar  wrote:
> Hi,
>
> Should it be like that? Is there a way to clean this leftover in the
> check-patch script?
>
> Thanks
> Gil
>
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 27-04-2017 ] [add_hosts]

2017-04-30 Thread Nadav Goldin
OK - that is easier as it involves only the master suite. Should be
fixed in [1], in [2] is the test run.


[1] https://gerrit.ovirt.org/#/c/76251/
[2] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/342/console


On Sun, Apr 30, 2017 at 10:39 PM, Piotr Kliczewski <pklic...@redhat.com> wrote:
> I think it depends on which name we use when we add a host to the engine.
> We need to be consistent and use the same host name when adding a host and a
> fqdn for the host ip.
>
> On Sun, Apr 30, 2017 at 9:34 PM, Piotr Kliczewski
> <piotr.kliczew...@gmail.com> wrote:
>>
>> Nadav,
>>
>> Thank you for working on this but we have one more issue with name
>> resolution.
>>
>> I checked the last job you triggered and I noticed that vm migration
>> failed due to similar issue between the hosts.
>> Here is a piece of custom logs that you added:
>>
>> 2017-04-30 14:23:35,675-0400 INFO  (Reactor thread)
>> [ProtocolDetector.SSLHandshakeDispatcher] subject:
>> ((('organizationName', u'Test'),), (('commonName',
>> u'lago-basic-suite-master-host0'),)), key: organizationName, value:
>> Test (sslutils:241)
>> 2017-04-30 14:23:35,675-0400 INFO  (Reactor thread)
>> [ProtocolDetector.SSLHandshakeDispatcher] subject:
>> ((('organizationName', u'Test'),), (('commonName',
>> u'lago-basic-suite-master-host0'),)), key: commonName, value:
>> lago-basic-suite-master-host0 (sslutils:241)
>> 2017-04-30 14:23:35,676-0400 INFO  (Reactor thread)
>> [ProtocolDetector.SSLHandshakeDispatcher] src_addr:
>> :::192.168.201.2, cn_addr: lago-basic-suite-master-host0
>> (sslutils:262)
>> 2017-04-30 14:23:35,676-0400 INFO  (Reactor thread)
>> [ProtocolDetector.SSLHandshakeDispatcher] src_addr_extracted:
>> 192.168.201.2, cn_addr_extracted: lago-basic-suite-master-host0
>> (sslutils:266)
>> 2017-04-30 14:23:35,677-0400 INFO  (Reactor thread)
>> [ProtocolDetector.SSLHandshakeDispatcher]
>> socket.gethostbyadd(src_addr)[0]:
>> lago-basic-suite-master-host0.lago.local (sslutils:268)
>> 2017-04-30 14:23:35,678-0400 INFO  (Reactor thread)
>> [ProtocolDetector.SSLHandshakeDispatcher] compare
>> :::192.168.201.2, lago-basic-suite-master-host0, res: False
>> (sslutils:244)
>> 2017-04-30 14:23:35,678-0400 ERROR (Reactor thread)
>> [ProtocolDetector.SSLHandshakeDispatcher] peer certificate does not
>> match host name (sslutils:226)
>>
>> It looks like the engine issued certificate for
>> 'lago-basic-suite-master-host0' but we resolve 192.168.201.2 to
>> 'lago-basic-suite-master-host0.lago.local'.
>> Can we fix it as well?
>>
>> Thanks,
>> Piotr
>>
>> On Sun, Apr 30, 2017 at 7:42 PM, Piotr Kliczewski <pklic...@redhat.com>
>> wrote:
>> > Wow, great.
>> >
>> > Thank you!
>> >
>> > 30 kwi 2017 19:40 "Nadav Goldin" <ngol...@redhat.com> napisał(a):
>> >>
>> >> Ok, I think the issue was the unqualified domain name. The certificate
>> >> was generated(as before for 'engine') without the domain name, i.e.
>> >> 'lago-basic-suite-master-engine', on VDSM side it resolved the IP to
>> >> the address 'lago-basic-suite-master-engine.lago.local' and then
>> >> failed comparing it to the unqualified one. I assume this is the
>> >> expected behaviour, though not sure(as you can easily resolve
>> >> 'lago-basic-suite-master-engine' to
>> >> 'lago-basic-suite-master-engine.lago.local' on the hosts). It should
>> >> be fixed in [1], just ran OST manual with the same debugging patch
>> >> applied on top of yours, and at least add_hosts passed.
>> >>
>> >>
>> >> [1] https://gerrit.ovirt.org/#/c/76225/10
>> >> [2] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/338/console
>> >>
>> >> On Sun, Apr 30, 2017 at 7:50 PM, Piotr Kliczewski <pklic...@redhat.com>
>> >> wrote:
>> >> > Sure, will take look later today.
>> >> >
>> >> > 30 kwi 2017 18:47 "Nadav Goldin" <ngol...@redhat.com> napisał(a):
>> >> >>
>> >> >> Thanks for the explanation.
>> >> >>
>> >> >> I added some more debugging messages on top of your patch, could you
>> >> >> please take a look at [1] and tell me what do you expect to resolve
>> >> >> differently for this to work?
>> >> >>
>> >> >>
>> >> >> [1]
>> >> >>
>> >> >>
>> >> >> http://jenkins.ovirt.org/job/ovirt-system-tests_manual/337/artifact/exported-artifacts/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.log
>
>
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 27-04-2017 ] [add_hosts]

2017-04-30 Thread Nadav Goldin
Ok, I think the issue was the unqualified domain name. The certificate
was generated(as before for 'engine') without the domain name, i.e.
'lago-basic-suite-master-engine', on VDSM side it resolved the IP to
the address 'lago-basic-suite-master-engine.lago.local' and then
failed comparing it to the unqualified one. I assume this is the
expected behaviour, though not sure(as you can easily resolve
'lago-basic-suite-master-engine' to
'lago-basic-suite-master-engine.lago.local' on the hosts). It should
be fixed in [1], just ran OST manual with the same debugging patch
applied on top of yours, and at least add_hosts passed.


[1] https://gerrit.ovirt.org/#/c/76225/10
[2] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/338/console

On Sun, Apr 30, 2017 at 7:50 PM, Piotr Kliczewski <pklic...@redhat.com> wrote:
> Sure, will take look later today.
>
> 30 kwi 2017 18:47 "Nadav Goldin" <ngol...@redhat.com> napisał(a):
>>
>> Thanks for the explanation.
>>
>> I added some more debugging messages on top of your patch, could you
>> please take a look at [1] and tell me what do you expect to resolve
>> differently for this to work?
>>
>>
>> [1]
>> http://jenkins.ovirt.org/job/ovirt-system-tests_manual/337/artifact/exported-artifacts/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.log
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 27-04-2017 ] [add_hosts]

2017-04-30 Thread Nadav Goldin
Thanks for the explanation.

I added some more debugging messages on top of your patch, could you
please take a look at [1] and tell me what do you expect to resolve
differently for this to work?


[1] 
http://jenkins.ovirt.org/job/ovirt-system-tests_manual/337/artifact/exported-artifacts/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.log
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 27-04-2017 ] [add_hosts]

2017-04-30 Thread Nadav Goldin
Looking at the failure, I'm not sure what is wrong here on the setup
side. The FQDN(lago-basic-suite-master-engine) should be resolvable in
the hosts - at least from what I tested that locally. On the engine
setup.log I see this was the generated certificate(if we're talking
about the same one here):

2017-04-30 06:30:41,308-0400 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.pki.ca
plugin.executeRaw:813 execute:
('/usr/share/ovirt-engine/bin/pki-enroll-pkcs12.sh', '--name=engine',
'--password=**FILTERED**',
'--subject=/C=US/O=Test/CN=lago-basic-suite-master-engine'),
executable='None', cwd='None', env=None
2017-04-30 06:30:44,542-0400 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.pki.ca
plugin.executeRaw:863 execute-result:
('/usr/share/ovirt-engine/bin/pki-enroll-pkcs12.sh', '--name=engine',
'--password=**FILTERED**',
'--subject=/C=US/O=Test/CN=lago-basic-suite-master-engine'), rc=0
2017-04-30 06:30:44,543-0400 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.pki.ca
plugin.execute:921 execute-output:
('/usr/share/ovirt-engine/bin/pki-enroll-pkcs12.sh', '--name=engine',
'--password=**FILTERED**',
'--subject=/C=US/O=Test/CN=lago-basic-suite-master-engine')


Do we expect the '--name' parameter to be the same as the hostname? My
thought was that it should use the engine FQDN, and that should match
the certificate name.

If that is not the problem, can you make the output more verbose in
vdsm logs? so we'll know exactly what name is it looking for.


Thanks

Nadav.

On Sun, Apr 30, 2017 at 1:43 PM, Piotr Kliczewski <pklic...@redhat.com> wrote:
> The job failed.
>
> Just to be clear. We need to resolve engine name on a host side or use ip
> address.
>
> Thanks,
> Piotr
>
> On Sun, Apr 30, 2017 at 12:23 PM, Piotr Kliczewski <pklic...@redhat.com>
> wrote:
>>
>> Here is the link
>>
>> http://jenkins.ovirt.org/job/ovirt-system-tests_manual/331/
>>
>> On Sun, Apr 30, 2017 at 12:17 PM, Piotr Kliczewski <pklic...@redhat.com>
>> wrote:
>>>
>>> Sure, will test
>>>
>>> 30 kwi 2017 12:14 "Nadav Goldin" <ngol...@redhat.com> napisał(a):
>>>>
>>>> It is under-work in [1], as it requires cross-changes in all suites it
>>>> takes a while to test it/cover all changes, though basic-suite-master
>>>> already passed.
>>>> Can you test it by running OST manual with your changes and the OST
>>>> patch(i.e. put also in GERRIT_REFSPEC: refs/changes/25/76225/7 )
>>>>
>>>>
>>>>
>>>> [1] https://gerrit.ovirt.org/76225
>>>>
>>>> On Sun, Apr 30, 2017 at 1:09 PM, Yaniv Kaul <yk...@redhat.com> wrote:
>>>> >
>>>> >
>>>> > On Sun, Apr 30, 2017 at 1:03 PM, Piotr Kliczewski
>>>> > <piotr.kliczew...@gmail.com> wrote:
>>>> >>
>>>> >> When we can have it fixed? I checked few minutes ago and the problem
>>>> >> is still there.
>>>> >
>>>> >
>>>> > https://gerrit.ovirt.org/#/c/76225/ should cover this.
>>>> >
>>>> > What I wonder is what caused this in the first place. The SSL change?
>>>> > Y.
>>>> >
>>>> >>
>>>> >>
>>>> >> Thanks,
>>>> >> Piotr
>>>> >>
>>>> >> On Sat, Apr 29, 2017 at 11:18 AM, Piotr Kliczewski
>>>> >> <pklic...@redhat.com>
>>>> >> wrote:
>>>> >> > Nadav,
>>>> >> >
>>>> >> > Yes, vdsm is not able to resolve 'engine' which is used in engine's
>>>> >> > certificate.
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Piotr
>>>> >> >
>>>> >> > 29 kwi 2017 00:37 "Nadav Goldin" <ngol...@redhat.com> napisał(a):
>>>> >> >
>>>> >> > Hi Piotr,
>>>> >> > Can you clarify what you noticed is not resolvable - the 'engine'
>>>> >> > FQDN
>>>> >> > from host0?
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Nadav.
>>>> >> >
>>>> >> >
>>>> >> > On Fri, Apr 28, 2017 at 4:15 PM, Piotr Kliczewski
>>>> >> > <pklic...@redhat.com>
>>>> >> > wrote:
>>>> >> >> I started to investigate the issue [1] and it seems like there is
>>>> >> >> an
>>>> >> >> issue
>>

Re: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 27-04-2017 ] [add_hosts]

2017-04-30 Thread Nadav Goldin
It is under-work in [1], as it requires cross-changes in all suites it
takes a while to test it/cover all changes, though basic-suite-master
already passed.
Can you test it by running OST manual with your changes and the OST
patch(i.e. put also in GERRIT_REFSPEC: refs/changes/25/76225/7 )



[1] https://gerrit.ovirt.org/76225

On Sun, Apr 30, 2017 at 1:09 PM, Yaniv Kaul <yk...@redhat.com> wrote:
>
>
> On Sun, Apr 30, 2017 at 1:03 PM, Piotr Kliczewski
> <piotr.kliczew...@gmail.com> wrote:
>>
>> When we can have it fixed? I checked few minutes ago and the problem
>> is still there.
>
>
> https://gerrit.ovirt.org/#/c/76225/ should cover this.
>
> What I wonder is what caused this in the first place. The SSL change?
> Y.
>
>>
>>
>> Thanks,
>> Piotr
>>
>> On Sat, Apr 29, 2017 at 11:18 AM, Piotr Kliczewski <pklic...@redhat.com>
>> wrote:
>> > Nadav,
>> >
>> > Yes, vdsm is not able to resolve 'engine' which is used in engine's
>> > certificate.
>> >
>> > Thanks,
>> > Piotr
>> >
>> > 29 kwi 2017 00:37 "Nadav Goldin" <ngol...@redhat.com> napisał(a):
>> >
>> > Hi Piotr,
>> > Can you clarify what you noticed is not resolvable - the 'engine' FQDN
>> > from host0?
>> >
>> > Thanks,
>> > Nadav.
>> >
>> >
>> > On Fri, Apr 28, 2017 at 4:15 PM, Piotr Kliczewski <pklic...@redhat.com>
>> > wrote:
>> >> I started to investigate the issue [1] and it seems like there is an
>> >> issue
>> >> in Lago setup we use.
>> >>
>> >> During handshake we have a step to verify whether client certificate
>> >> was
>> >> issued for a specific host (no such functionality in m2crytpo code
>> >> base).
>> >> It works fine when using either ip addresses or fqdns but in this
>> >> particular
>> >> setup we use mixed.
>> >>
>> >> When added logging I see that in engine certificate we use 'engine'
>> >> name
>> >> which is not resolvable on the host side and the check fails.
>> >> I posted a patch [2] which fixes IPv4 mapped addresses issue but we
>> >> need
>> >> to
>> >> fix the setup issue.
>> >>
>> >> Thanks,
>> >> Piotr
>> >>
>> >> [1] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/326/
>> >> [2] https://gerrit.ovirt.org/#/c/76197/
>> >>
>> >> On Thu, Apr 27, 2017 at 3:39 PM, Piotr Kliczewski <pklic...@redhat.com>
>> >> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Apr 27, 2017 at 3:13 PM, Evgheni Dereveanchin
>> >>> <edere...@redhat.com> wrote:
>> >>>>
>> >>>> Test failed: 002_bootstrap/add_hosts
>> >>>>
>> >>>> Link to suspected patches:
>> >>>>  https://gerrit.ovirt.org/76107 - ssl: change default library
>> >>>>
>> >>>> Link to job:
>> >>>>
>> >>>> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6491/
>> >>>>
>> >>>> VDSM log:
>> >>>>
>> >>>>
>> >>>>
>> >>>> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6491/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.log
>> >>>>
>> >>>> Error snippet from VDSM log, this repeats on each connection attempt
>> >>>> from
>> >>>> Engine side:
>> >>>>
>> >>>> 
>> >>>>
>> >>>> 2017-04-27 06:39:27,768-0400 INFO  (Reactor thread)
>> >>>> [ProtocolDetector.AcceptorImpl] Accepted connection from
>> >>>> :::192.168.201.3:49530 (protocoldetector:74)
>> >>>> 2017-04-27 06:39:27,898-0400 ERROR (Reactor thread) [vds.dispatcher]
>> >>>> uncaptured python exception, closing channel
>> >>>> > >>>> (':::192.168.201.3',
>> >>>> 49530, 0, 0) at 0x1cc3b00> (:Address family not
>> >>>> supported by protocol
>> >>>> [/usr/lib64/python2.7/asyncore.py|readwrite|110]
>> >>>> [/usr/lib64/python2.7/asyncore.py|handle_write_event|468]
>> >>>>
>> >&g

Re: [ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 27-04-2017 ] [add_hosts]

2017-04-28 Thread Nadav Goldin
Hi Piotr,
Can you clarify what you noticed is not resolvable - the 'engine' FQDN
from host0?

Thanks,
Nadav.


On Fri, Apr 28, 2017 at 4:15 PM, Piotr Kliczewski  wrote:
> I started to investigate the issue [1] and it seems like there is an issue
> in Lago setup we use.
>
> During handshake we have a step to verify whether client certificate was
> issued for a specific host (no such functionality in m2crytpo code base).
> It works fine when using either ip addresses or fqdns but in this particular
> setup we use mixed.
>
> When added logging I see that in engine certificate we use 'engine' name
> which is not resolvable on the host side and the check fails.
> I posted a patch [2] which fixes IPv4 mapped addresses issue but we need to
> fix the setup issue.
>
> Thanks,
> Piotr
>
> [1] http://jenkins.ovirt.org/job/ovirt-system-tests_manual/326/
> [2] https://gerrit.ovirt.org/#/c/76197/
>
> On Thu, Apr 27, 2017 at 3:39 PM, Piotr Kliczewski 
> wrote:
>>
>>
>>
>> On Thu, Apr 27, 2017 at 3:13 PM, Evgheni Dereveanchin
>>  wrote:
>>>
>>> Test failed: 002_bootstrap/add_hosts
>>>
>>> Link to suspected patches:
>>>  https://gerrit.ovirt.org/76107 - ssl: change default library
>>>
>>> Link to job:
>>>  http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6491/
>>>
>>> VDSM log:
>>>
>>> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6491/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host0/_var_log/vdsm/vdsm.log
>>>
>>> Error snippet from VDSM log, this repeats on each connection attempt from
>>> Engine side:
>>>
>>> 
>>>
>>> 2017-04-27 06:39:27,768-0400 INFO  (Reactor thread)
>>> [ProtocolDetector.AcceptorImpl] Accepted connection from
>>> :::192.168.201.3:49530 (protocoldetector:74)
>>> 2017-04-27 06:39:27,898-0400 ERROR (Reactor thread) [vds.dispatcher]
>>> uncaptured python exception, closing channel
>>> >> 49530, 0, 0) at 0x1cc3b00> (:Address family not
>>> supported by protocol [/usr/lib64/python2.7/asyncore.py|readwrite|110]
>>> [/usr/lib64/python2.7/asyncore.py|handle_write_event|468]
>>> [/usr/lib/python2.7/site-packages/yajsonrpc/betterAsyncore.py|handle_write|70]
>>> [/usr/lib/python2.7/site-packages/yajsonrpc/betterAsyncore.py|_delegate_call|149]
>>> [/usr/lib/python2.7/site-packages/vdsm/sslutils.py|handle_write|213]
>>> [/usr/lib/python2.7/site-packages/vdsm/sslutils.py|_handle_io|223]
>>> [/usr/lib/python2.7/site-packages/vdsm/sslutils.py|_verify_host|237]
>>> [/usr/lib/python2.7/site-packages/vdsm/sslutils.py|compare_names|249])
>>> (betterAsyncore:160)
>>>
>>> 
>>
>>
>> This means that what we have in the certificate do not match the source
>> address we get. I suspect that we issue the certificate for 192.168.201.3
>> but when we get :::192.168.201.3.
>> The change was verified in the env when ipv4 is used. I pushed a revert
>> [1] for now so we can work on fixing the issue.
>>
>> [1] https://gerrit.ovirt.org/#/c/76160
>>
>>>
>>> --
>>> Regards,
>>> Evgheni Dereveanchin
>>
>>
>
>
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] Subject: [ OST Failure Report ] [ oVirt master ] [ 24-04-2017 ] [import_template_from_glance]

2017-04-24 Thread Nadav Goldin
Test failed: add_secondary_storage_domains/import_template_from_glance

Link to suspected patches: https://gerrit.ovirt.org/#/c/74382/

Link to Job: 
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6456/
(started in 6451)

Link to all logs:
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6456/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/

Engine log: 
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6456/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log

Error snippet from the test log:



lago.utils: ERROR: Error while running thread
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/lago/utils.py", line 58, in
_ret_via_queue
queue.put({'return': func()})
  File 
"/home/jenkins/workspace/test-repo_ovirt_experimental_master/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
line 803, in import_template_from_glance
generic_import_from_glance(api, image_name=CIRROS_IMAGE_NAME,
image_ext='_glance_template', as_template=True)
  File 
"/home/jenkins/workspace/test-repo_ovirt_experimental_master/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
line 641, in generic_import_from_glance
lambda: api.disks.get(disk_name).status.state == 'ok',
  File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
264, in assert_true_within_long
assert_equals_within_long(func, True, allowed_exceptions)
  File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
251, in assert_equals_within_long
func, value, LONG_TIMEOUT, allowed_exceptions=allowed_exceptions
  File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
230, in assert_equals_within
'%s != %s after %s seconds' % (res, value, timeout)
AssertionError: False != True after 600 seconds





the engine.log has this sequence repeating(apparently at the end of
the task - 199ed356):

2017-04-24 13:34:50,079-04 INFO
[org.ovirt.engine.core.bll.storage.repoimage.ImportRepoImageCommand]
(DefaultQuartzScheduler10) [199ed356-0960-4ef4-9637-09c76a07c932]
Ending command 
'org.ovirt.engine.core.bll.storage.repoimage.ImportRepoImageCommand'
successfully.
2017-04-24 13:34:50,090-04 ERROR
[org.ovirt.engine.core.bll.CommandsFactory] (DefaultQuartzScheduler10)
[] An exception has occurred while trying to create a command object
for command 'AddVmTemplate' with parameters
'AddVmTemplateParameters:{commandId='a6d45092-dfe0-4a65-bdc4-4c23a68fe7d5',
user='admin', commandType='Unknown'}': WELD-49: Unable to invoke
protected final void
org.ovirt.engine.core.bll.CommandBase.postConstruct() on
org.ovirt.engine.core.bll.AddVmTemplateCommand@35c1cbd5
2017-04-24 13:34:50,095-04 INFO
[org.ovirt.engine.core.utils.transaction.TransactionSupport]
(DefaultQuartzScheduler10) [] transaction rolled back
2017-04-24 13:34:50,123-04 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler10) [] EVENT_ID:
USER_IMPORT_IMAGE_AS_TEMPLATE_FINISHED_SUCCESS(3,018), Correlation ID:
199ed356-0960-4ef4-9637-09c76a07c932, Job ID:
0b91fec3-97be-493f-9dfb-af1230e4d3ee, Call Stack: null, Custom Event
ID: -1, Message: User admin@internal-authz successfully imported image
CirrOS_0.3.4_for_x86_64_glance_template as template
CirrOS_0.3.4_for_x86_64_glance_template to domain iscsi.
2017-04-24 13:34:50,123-04 ERROR
[org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller]
(DefaultQuartzScheduler10) [] Failed invoking callback end method
'onSucceeded' for command '25028c51-d877-44e3-b1ef-40b315b469d3' with
exception 'null', the callback is marked for end method retries

Thanks,
Nadav.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] Fwd: Lago v0.37 Release announcement

2017-04-23 Thread Nadav Goldin
-- Forwarded message --
From: Nadav Goldin <ngol...@redhat.com>
Date: Sun, Apr 23, 2017 at 5:55 PM
Subject: Lago v0.37 Release announcement
To: lago-de...@ovirt.org


Hi all,
On behalf of the Lago team, I'm pleased to announce Lago v0.37 is available!

What's new
=
General

1. Allow running arbitrary 'virt-customize' commands on template
disks, prior to boot-up, for example:
disks:
  - build:
- virt-customize:
ssh-inject: ""
mkdir: "/tmp/some_dir"
template_name: el7.3-base
type: template
name: root
dev: vda
format: qcow2

Will create the directory /tmp/some_dir in the template disk and
inject Lago's generated ssh-keys to the root user. For a full list of
available commands please consult virt-customize docs[1], under
'Customization options' section.

2. Allow skipping the 'bootstrap' stage(virt-sysprep) per VM, by
defining 'bootstrap: false' under the VM definition.

Libvirt
--
1. Use 'host-passthrough' as the default CPU mode.
2. Allow customizing CPU definitions, with 'cpu_custom' and
'cpu_model' parameters, see [2] for more details.
3. Automatically add DHCP leases for none management networks.
4. Auto select management network if not defined.
5. Allow defining custom DNS records in management networks.
6. Restrict DNS configurations to management networks only.
7. Enforce one management network per VM.

ovirtlago
-
1. Export junit XML reports generated by nose with *.junit.xml suffix.
2. Add '--with-vms' option to 'lago ovirt start'.
3. Allow defining a custom CPU <-> Cluster level mapping file.


For the full commit-log, which also includes several bug-fixes, see [3].

Upgrading

To upgrade using yum or dnf, simply run:

yum/dnf update lago


Resources

Lago Docs:  http://lago.readthedocs.io/en/latest/
GitHub: https://github.com/lago-project/lago/
YUM Repository: http://resources.ovirt.org/repos/lago/stable/0.0/rpm/
OST Docs: http://ovirt-system-tests.readthedocs.io/en/latest/

As always, if you find any problems, please open an issue in the GitHub page[4].

Enjoy!

Nadav.


[1] http://libguestfs.org/virt-customize.1.html
[2] http://lago.readthedocs.io/en/latest/LagoInitFile.html#domains-section
[3] https://github.com/lago-project/lago/compare/0.36...0.37
[4] https://github.com/lago-project/lago/issues/
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] GUI error on master: Uncaught exception: com.google.gwt.core.client.JavaScriptException: (TypeError)

2017-04-03 Thread Nadav Goldin
Sent a patch[1], installed ovirt-engine-webadmin-portal-debuginfo
manually and the same error appears in the ui.log:



2017-04-03 12:51:49,285-04 ERROR
[org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
(default task-7) [] Permutation name: C3D24A23286F7C8A99A2E725E808C153
2017-04-03 12:51:49,286-04 ERROR
[org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
(default task-7) [] Uncaught exception:
com.google.gwt.core.client.JavaScriptException: (TypeError) : a.i is
undefined
at 
org.ovirt.engine.ui.uicommonweb.models.SystemTreeModel.$executed(SystemTreeModel.java:415)
at 
org.ovirt.engine.ui.uicommonweb.models.SystemTreeModel.executed(SystemTreeModel.java:415)
at org.ovirt.engine.ui.frontend.Frontend$3.$onSuccess(Frontend.java:328)
[frontend.jar:]
at org.ovirt.engine.ui.frontend.Frontend$3.onSuccess(Frontend.java:328)
[frontend.jar:]
at 
org.ovirt.engine.ui.frontend.communication.OperationProcessor$3.$onSuccess(OperationProcessor.java:176)
[frontend.jar:]
at 
org.ovirt.engine.ui.frontend.communication.OperationProcessor$3.onSuccess(OperationProcessor.java:176)
[frontend.jar:]
at 
org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider$5$1.$onSuccess(GWTRPCCommunicationProvider.java:269)
[frontend.jar:]
at 
org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider$5$1.onSuccess(GWTRPCCommunicationProvider.java:269)
[frontend.jar:]
at 
com.google.gwt.user.client.rpc.impl.RequestCallbackAdapter.onResponseReceived(RequestCallbackAdapter.java:198)
[gwt-servlet.jar:]
at 
com.google.gwt.http.client.Request.$fireOnResponseReceived(Request.java:233)
[gwt-servlet.jar:]
at 
com.google.gwt.http.client.RequestBuilder$1.onReadyStateChange(RequestBuilder.java:409)
[gwt-servlet.jar:]
at 
Unknown.onreadystatechange<(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US)
at com.google.gwt.core.client.impl.Impl.apply(Impl.java:236)
[gwt-servlet.jar:]
at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:275)
[gwt-servlet.jar:]
at Unknown.Du/<(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US)
at Unknown.anonymous(Unknown)


[1] https://gerrit.ovirt.org/#/c/75062/1

On Mon, Apr 3, 2017 at 4:12 PM, Roy Golan <rgo...@redhat.com> wrote:
>
>
> On Mon, Apr 3, 2017 at 4:09 PM Nadav Goldin <ngol...@redhat.com> wrote:
>>
>> Right. Missed that exclude, thanks.
>> I'll send a patch to include it from tested - all *debuginfo* packages
>> look like just a few MBs.
>>
>
> +Guy Chen
>
>>
>>
>> On Mon, Apr 3, 2017 at 3:55 PM, Greg Sheremeta <gsher...@redhat.com>
>> wrote:
>> > Hmm, I think *-debuginfo is explicitly excluded now.
>> >
>> > https://gerrit.ovirt.org/#/c/73497/8/common/yum-repos/ovirt-master.repo
>> >
>> > [ovirt-master-tested-el7]
>> > name=oVirt Master Latest Tested
>> > baseurl=http://resources.ovirt.org/repos/ovirt/tested/master/rpm/el7/
>> > enabled=1
>> > gpgcheck=0
>> > max_connections=10
>> > exclude =  *-debuginfo
>> >
>> > [ovirt-master-snapshot-static-el7]
>> > name=oVirt Master Nightly Statics
>> >
>> > baseurl=http://resources.ovirt.org/pub/ovirt-master-snapshot-static/rpm/el7/
>> > exclude= *-debuginfo
>> >
>> >
>> > Could that be the problem?
>> >
>> > On Sun, Apr 2, 2017 at 1:38 PM, Nadav Goldin <ngol...@redhat.com> wrote:
>> >>
>> >> > Any chance you can install them and retry?
>> >> > sudo yum install ovirt-engine-webadmin-portal-debuginfo
>> >>
>> >> Surprisingly OST didn't pull it. I see it is listed in the reposync
>> >> config under the 'ovirt-master-snapshot-static', but I think its not
>> >> there any more. Is it built like all other packages consumed in the
>> >> experimental flow?
>> >>
>> >>
>> >>
>> >>
>> >> On Sun, Apr 2, 2017 at 4:00 PM, Yaniv Kaul <yk...@redhat.com> wrote:
>> >> >
>> >> >
>> >> > On Sun, Apr 2, 2017 at 3:18 PM, Greg Sheremeta <gsher...@redhat.com>
>> >> > wrote:
>> >> >>
>> >> >> """
>> >> >> GWT symbolmaps are not installed, please install
>> >> >> them to de-obfuscate the UI stack traces
>> >> >> 2017-04-02 06:12:25,980-04 ERROR
>> >> >> """
>> >> >>
>> >> >> Any chance you can install them and retry?
>> >> >> sudo yum install ovirt-engine-webadmin-portal-debuginfo
>> >> >
>> >> >
>> >&

Re: [ovirt-devel] GUI error on master: Uncaught exception: com.google.gwt.core.client.JavaScriptException: (TypeError)

2017-04-02 Thread Nadav Goldin
> Any chance you can install them and retry?
> sudo yum install ovirt-engine-webadmin-portal-debuginfo

Surprisingly OST didn't pull it. I see it is listed in the reposync
config under the 'ovirt-master-snapshot-static', but I think its not
there any more. Is it built like all other packages consumed in the
experimental flow?




On Sun, Apr 2, 2017 at 4:00 PM, Yaniv Kaul <yk...@redhat.com> wrote:
>
>
> On Sun, Apr 2, 2017 at 3:18 PM, Greg Sheremeta <gsher...@redhat.com> wrote:
>>
>> """
>> GWT symbolmaps are not installed, please install
>> them to de-obfuscate the UI stack traces
>> 2017-04-02 06:12:25,980-04 ERROR
>> """
>>
>> Any chance you can install them and retry?
>> sudo yum install ovirt-engine-webadmin-portal-debuginfo
>
>
> && sudo systemctl restart ovirt-engine
>
> Y.
>
>>
>>
>>
>> On Sun, Apr 2, 2017 at 6:25 AM, Nadav Goldin <ngol...@redhat.com> wrote:
>>>
>>> Hi,
>>> Running OST basic-suite-master and logging to the GUI, appears a
>>> warning pop-up window:
>>> Uncaught exception occurred. Please try reloading the page. Details:
>>> (TypeError) : a.i is undefined
>>> Please have your administrator check the UI logs
>>>
>>> ui.log:
>>> 2017-04-02 06:12:25,975-04 INFO
>>> [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
>>> (default task-3) [] GWT symbolmaps are not installed, please install
>>> them to de-obfuscate the UI stack traces
>>> 2017-04-02 06:12:25,980-04 ERROR
>>> [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
>>> (default task-3) [] Permutation name: E0637AA26393B2D56C4B42EFB5EA0C00
>>> 2017-04-02 06:12:25,980-04 ERROR
>>> [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
>>> (default task-3) [] Uncaught exception:
>>> com.google.gwt.core.client.JavaScriptException: (TypeError) : a.i is
>>> undefined
>>> at
>>> Unknown.itp(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.Btp(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.pWo(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.sWo(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.JYo(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.MYo(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.jYo(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.mYo(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.PPe(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.E_(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.T_(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.onreadystatechange<(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.Bu(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.Eu(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at
>>> Unknown.Du/<(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
>>> at Unknown.anonymous(Unknown)
>>>
>>>
>>> engine version: 4.2.0-0.0.master.20170402071029.git17ebf70.el7.centos
>>> firefox: firefox-52.0-7.fc25.x86_64
>>>
>>> Off hand, everything looks functional.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Nadav.
>>> ___
>>> Devel mailing list
>>> Devel@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/devel
>>
>>
>>
>>
>> --
>> Greg Sheremeta, MBA
>> Red Hat, Inc.
>> Sr. Software Engineer
>> gsher...@redhat.com
>>
>> ___
>> Devel mailing list
>> Devel@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
>
>
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] GUI error on master: Uncaught exception: com.google.gwt.core.client.JavaScriptException: (TypeError)

2017-04-02 Thread Nadav Goldin
Hi,
Running OST basic-suite-master and logging to the GUI, appears a
warning pop-up window:
Uncaught exception occurred. Please try reloading the page. Details:
(TypeError) : a.i is undefined
Please have your administrator check the UI logs

ui.log:
2017-04-02 06:12:25,975-04 INFO
[org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
(default task-3) [] GWT symbolmaps are not installed, please install
them to de-obfuscate the UI stack traces
2017-04-02 06:12:25,980-04 ERROR
[org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
(default task-3) [] Permutation name: E0637AA26393B2D56C4B42EFB5EA0C00
2017-04-02 06:12:25,980-04 ERROR
[org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService]
(default task-3) [] Uncaught exception:
com.google.gwt.core.client.JavaScriptException: (TypeError) : a.i is
undefined
at 
Unknown.itp(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.Btp(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.pWo(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.sWo(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.JYo(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.MYo(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.jYo(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.mYo(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.PPe(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.E_(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.T_(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.onreadystatechange<(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.Bu(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.Eu(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at 
Unknown.Du/<(https://192.168.201.3/ovirt-engine/webadmin/?locale=en_US#dashboard-main)
at Unknown.anonymous(Unknown)


engine version: 4.2.0-0.0.master.20170402071029.git17ebf70.el7.centos
firefox: firefox-52.0-7.fc25.x86_64

Off hand, everything looks functional.



Thanks,

Nadav.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] OST: vm_run fails for me (basic-suite-master)

2017-02-08 Thread Nadav Goldin
I would first try testing it without OST, because in OST it will pick
the CPU via the cluster family(which is controlled in virt.py). You
can try specifying the 'cpu_model' in the init file, skipping the 'cpu
family' logic, something like:

> cat LagoInitFile
domains:
  vm-el73:
memory: 2048
service_provider: systemd
cpu_model: Broadwell
nics:
  - net: lago
disks:
  - template_name: el7.3-base
type: template
name: root
dev: vda
format: qcow2
nets:
  lago:
type: nat
dhcp:
  start: 100
  end: 254
management: true
dns_domain_name: lago.local

> lago init && lago start

Then install lago again in the VM, copy the same init file, and check
if for different combinations of cpu_model it works for you - would
give us a hint how to solve this. The 'cpu_model' basically translates
to this xml definition in libvirt:
  
Broadwell



  

I tried manually editing it also to host-passthrough, but still failed
on the same error. The thing is that the 'kvm_put_msrs: Assertion `ret
== n' failed.' error doesn't give any indication where it failed(or if
the cpu is missing a flag), maybe there is a way to debug this at
qemu/kvm level? I'm not sure.






On Wed, Feb 8, 2017 at 1:18 PM, Ondrej Svoboda <osvob...@redhat.com> wrote:
> It is a Skylake-H, and I can see it is not mentioned in lago/virt.py.
>
> I guess I'll step through the code (as well as other places discovered by
> 'git grep cpu') and see if I could solve this by adding the Skylake family
> to _CPU_FAMILIES.
>
> Do you have other pointers?
>
> Thanks,
> Ondra
>
> On Tue, Feb 7, 2017 at 10:40 PM, Nadav Goldin <ngol...@redhat.com> wrote:
>>
>> What is the host CPU you are using?
>> I came across the same error few days ago, but without running OST, I
>> tried running with Lago:
>> fc24 host -> el7 vm -> el7 vm.
>>
>> I have a slight suspect that it is related to the CPU model we
>> configure in libvirt, I tried a mixture of few
>> combinations(host-pass-through, pinning down the CPU model), but it
>> always failed on the same error:
>> kvm_put_msrs: Assertion `ret == n' failed.
>>
>> My CPU is Broadwell btw.
>>
>>
>> Milan, any ideas? you think it might be related?
>>
>> Nadav.
>>
>>
>>
>> On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda <osvob...@redhat.com>
>> wrote:
>> > Yes, I stated that in my message.
>> >
>> > root@osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] # cat
>> > /sys/module/kvm_intel/parameters/nested
>> > :(
>> > Y
>> >
>> > On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri <ee...@redhat.com> wrote:
>> >>
>> >> Did you follow the instructions on [1] ?
>> >>
>> >> Specifically, verifying  ' cat /sys/module/kvm_intel/parameters/nested
>> >> '
>> >> gives you 'Y'.
>> >>
>> >> [1]
>> >>
>> >> http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installation.html
>> >>
>> >> On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda <osvob...@redhat.com>
>> >> wrote:
>> >>>
>> >>> Hi everyone,
>> >>>
>> >>> Even though I have nested virtualization enabled in my Arch Linux
>> >>> system
>> >>> which I use to run OST, vm_run is the first test to fail in
>> >>> 004_basic_sanity
>> >>> (followed by snapshots_merge and suspend_resume_vm).
>> >>>
>> >>> Can you point me to what I might be missing? I believe I get the same
>> >>> failure even on Fedora.
>> >>>
>> >>> This is what host0's CPU capabilities look like (vmx is there):
>> >>> [root@lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo
>> >>> processor: 0
>> >>> vendor_id: GenuineIntel
>> >>> cpu family: 6
>> >>> model: 44
>> >>> model name: Westmere E56xx/L56xx/X56xx (Nehalem-C)
>> >>> stepping: 1
>> >>> microcode: 0x1
>> >>> cpu MHz: 2711.988
>> >>> cache size: 16384 KB
>> >>> physical id: 0
>> >>> siblings: 1
>> >>> core id: 0
>> >>> cpu cores: 1
>> >>> apicid: 0
>> >>> initial apicid: 0
>> >>> fpu: yes
>> >>> fpu_exception: yes
>> >>> cpuid level: 11
>> >>> wp: yes

Re: [ovirt-devel] OST: vm_run fails for me (basic-suite-master)

2017-02-07 Thread Nadav Goldin
What is the host CPU you are using?
I came across the same error few days ago, but without running OST, I
tried running with Lago:
fc24 host -> el7 vm -> el7 vm.

I have a slight suspect that it is related to the CPU model we
configure in libvirt, I tried a mixture of few
combinations(host-pass-through, pinning down the CPU model), but it
always failed on the same error:
kvm_put_msrs: Assertion `ret == n' failed.

My CPU is Broadwell btw.


Milan, any ideas? you think it might be related?

Nadav.



On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda  wrote:
> Yes, I stated that in my message.
>
> root@osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] # cat
> /sys/module/kvm_intel/parameters/nested
> :(
> Y
>
> On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri  wrote:
>>
>> Did you follow the instructions on [1] ?
>>
>> Specifically, verifying  ' cat /sys/module/kvm_intel/parameters/nested '
>> gives you 'Y'.
>>
>> [1]
>> http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installation.html
>>
>> On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda 
>> wrote:
>>>
>>> Hi everyone,
>>>
>>> Even though I have nested virtualization enabled in my Arch Linux system
>>> which I use to run OST, vm_run is the first test to fail in 004_basic_sanity
>>> (followed by snapshots_merge and suspend_resume_vm).
>>>
>>> Can you point me to what I might be missing? I believe I get the same
>>> failure even on Fedora.
>>>
>>> This is what host0's CPU capabilities look like (vmx is there):
>>> [root@lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo
>>> processor: 0
>>> vendor_id: GenuineIntel
>>> cpu family: 6
>>> model: 44
>>> model name: Westmere E56xx/L56xx/X56xx (Nehalem-C)
>>> stepping: 1
>>> microcode: 0x1
>>> cpu MHz: 2711.988
>>> cache size: 16384 KB
>>> physical id: 0
>>> siblings: 1
>>> core id: 0
>>> cpu cores: 1
>>> apicid: 0
>>> initial apicid: 0
>>> fpu: yes
>>> fpu_exception: yes
>>> cpuid level: 11
>>> wp: yes
>>> flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>>> cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc rep_good
>>> nopl xtopology pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes
>>> hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid
>>> bogomips: 5423.97
>>> clflush size: 64
>>> cache_alignment: 64
>>> address sizes: 40 bits physical, 48 bits virtual
>>> power management:
>>>
>>> journalctl -b on host0 shows that libvirt complains about NUMA
>>> configuration:
>>>
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: libvirt
>>> version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem
>>> , 2017-01-17-23:37:48, c1bm.rdu2.centos.org)
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port
>>> 2(vnet0) entered disabled state
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: device vnet0 left
>>> promiscuous mode
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port
>>> 2(vnet0) entered disabled state
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: hostname:
>>> lago-basic-suite-master-host0.lago.local
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: Unable to
>>> read from monitor: Connection reset by peer
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: internal
>>> error: qemu unexpectedly closed the monitor: 2017-02-07T11:33:23.058571Z
>>> qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 8 9
>>> 10 11 12 13 14 15
>>>
>>> 2017-02-07T11:33:23.058826Z qemu-kvm: warning: All CPU(s) up to maxcpus
>>> should be described in NUMA config
>>>qemu-kvm:
>>> /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs:
>>> Assertion `ret == n' failed.
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 NetworkManager[657]: 
>>> [1486467203.1025] device (vnet0): state change: disconnected -> unmanaged
>>> (reason 'unmanaged') [30 10 3]
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 kvm[22059]: 0 guests now
>>> active
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 systemd-machined[22044]:
>>> Machine qemu-1-vm0 terminated.
>>>
>>> Thanks,
>>> Ondra
>>>
>>> ___
>>> Devel mailing list
>>> Devel@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/devel
>>
>>
>>
>>
>> --
>> Eyal Edri
>> Associate Manager
>> RHV DevOps
>> EMEA ENG Virtualization R
>> Red Hat Israel
>>
>> phone: +972-9-7692018
>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>
>
>
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
___
Devel mailing list
Devel@ovirt.org

[ovirt-devel] New failure in OST - master branch: add secondary storage domains fails

2017-01-09 Thread Nadav Goldin
Hi,
There is a new failure on on master in experimental flow, the failing test
is 'add_secondary_storage_domain', the engine.log has few exceptions:

2017-01-09 10:07:24,943-05 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand]
(org.ovirt.thread.pool-6-thread-2) [e9e4e3b] Command
'PollVDSCommand(HostName = lago-basic-suite-master-host1,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='f6ad90f7-1b37-49f0-a958-7151efa0039c'})' execution failed:
VDSGenericException: VDSNetworkException: Timeout during rpc call
2017-01-09 10:07:24,943-05 DEBUG
[org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand]
(org.ovirt.thread.pool-6-thread-2) [e9e4e3b] Exception:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: Timeout during rpc call
at
org.ovirt.engine.core.vdsbroker.vdsbroker.FutureVDSCommand.get(FutureVDSCommand.java:73)
[vdsbroker.jar:]
...
2017-01-09 10:10:23,323-05 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
(DefaultQuartzScheduler10) [7cad9211] Command
'GetAllVmStatsVDSCommand(HostName = lago-basic-suite-master-host1,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='f6ad90f7-1b37-49f0-a958-7151efa0039c'})' execution failed:
VDSGenericException: VDSNetworkException: Heartbeat exceeded
2017-01-09 10:10:23,323-05 DEBUG
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
(DefaultQuartzScheduler10) [7cad9211] Exception:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: Heartbeat exceeded
at
org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188)
[vdsbroker.jar:]
...
2017-01-09 10:10:43,704-05 DEBUG
[org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) []
Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using
backslash to be included in name
 at [Source: [B@6a84a0d0; line: 1, column: 889]:
org.codehaus.jackson.JsonParseException: Illegal unquoted character
((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in
name
 at [Source: [B@6a84a0d0; line: 1, column: 889]
at
org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
[jackson-core-asl-1.9.13.jar:1.9.13]
at
org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
[jackson-core-asl-1.9.13.jar:1.9.13]
at
org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482)
[jackson-core-asl-1.9.13.jar:1.9.13]
at
org.codehaus.jackson.impl.ReaderBasedParser._parseFieldName2(ReaderBasedParser.java:1042)
[jackson-core-asl-1.9.13.jar:1.9.13]
at
org.codehaus.jackson.impl.ReaderBasedParser._parseFieldName(ReaderBasedParser.java:1008)
[jackson-core-asl-1.9.13.jar:1.9.13]




2017-01-09 10:11:33,336-05 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
(DefaultQuartzScheduler7) [7cad9211] Command
'GetAllVmStatsVDSCommand(HostName = lago-basic-suite-master-host1,
VdsIdVDSCommandParameters
Base:{runAsync='true', hostId='f6ad90f7-1b37-49f0-a958-7151efa0039c'})'
execution failed: VDSGenericException: VDSNetworkException: Unrecognized
message received
2017-01-09 10:11:33,336-05 DEBUG
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
(DefaultQuartzScheduler7) [7cad9211] Exception:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNe
tworkException: Unrecognized message received
at
org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188)
[vdsbroker.jar:]
at
org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand.executeVdsBrokerCommand(GetAllVmStatsVDSCommand.java:23)
[vdsbroker.jar:]



VDSM logs on host1:

2017-01-09 10:11:27,120 ERROR (jsonrpc/4) [storage.StorageDomainCache]
domain 80985016-bdd8-4778-abd9-becc8fedcab4 not found (sdc:157)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain
dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 185, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'80985016-bdd8-4778-abd9-becc8fedcab4',)
2017-01-09 10:11:27,452 ERROR (jsonrpc/4) [storage.StorageDomainCache]
looking for unfetched domain 80985016-bdd8-4778-abd9-becc8fedcab4 (sdc:151)
2017-01-09 10:11:27,453 ERROR (jsonrpc/4) [storage.StorageDomainCache]
looking for domain 80985016-bdd8-4778-abd9-becc8fedcab4 (sdc:168)
2017-01-09 10:11:27,552 WARN  (jsonrpc/4) [storage.LVM] lvm vgs failed: 5
[] ['  WARNING: Not using lvmetad because config setting use_lvmetad=0.',
'  WARNING: To avoid corruption, rescan devices to make changes visible
(pvscan --cache).'
, '  Volume group "80985016-bdd8-4778-abd9-becc8fedcab4" not found', '
Cannot process volume 

[ovirt-devel] [Attention] Gerrit 'git://' protocol error

2016-12-25 Thread Nadav Goldin
Hi,
Currently git-daemon is not working on gerrit.ovirt.org, so all
actions with remote configured with  'git://' prefix would
fail(including Jenkins jobs). We are working on resolving the issue,
until then you can use 'https://' for anonymous clones, or
SSH(ssh://usern...@gerrit.ovirt.org:29418/repo.git).

Thanks,

Nadav.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] OST failure on master in 'snapshot_merge' test

2016-12-13 Thread Nadav Goldin
Hi,
There is a new failure in master, on 'snapshot_merge' test(not live),
vdsm.log shows:

2016-12-13 09:57:30,399 ERROR (tasks/0) [storage.Image] delete() takes
exactly 4 arguments (3 given) (image:1321)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 1309, in merge
sdDom, srcVolParams, volParams, reqSize, chain)
  File "/usr/share/vdsm/storage/image.py", line 1068, in _baseCowVolumeMerge
tmpVol.delete(postZero=False, force=True)
TypeError: delete() takes exactly 4 arguments (3 given)
2016-12-13 09:57:30,401 ERROR (tasks/0) [storage.TaskManager.Task]
(Task='085bee3e-681f-4e3f-9b37-d9abbd6b2903') Unexpected error
(task:870)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 877, in _run
return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 333, in run
return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py",
line 79, in wrapper
return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1837, in mergeSnapshots
discard)
  File "/usr/share/vdsm/storage/image.py", line 1322, in merge
raise se.SourceImageActionError(imgUUID, sdUUID, str(e))

This failure is from recent hours. The hosts and engine logs can be
found here[1]

Could someone take a look?

Thanks,

Nadav.

[1] 
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/4160/artifact/exported-artifacts/basic_suite_master.sh-el7/exported-artifacts/test_logs/basic-suite-master/post-004_basic_sanity.py/
___
Devel mailing list
Devel@ovirt.org
http://lists.phx.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] test-repo_ovirt_experimental_master job - failed

2016-10-30 Thread Nadav Goldin
On Sun, Oct 30, 2016 at 12:40 PM, Yaniv Kaul  wrote:
> Not exactly.

My bad, missed that the tests run in parallel, though what this means
is that 'ovirt-log-collector' can fail when there are ongoing
tasks(such as adding the storage domains), I assume that is not the
expected behaviour. I'll send a patch separating the test for now.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] test-repo_ovirt_experimental_master job - failed

2016-10-30 Thread Nadav Goldin
Hi all, bumping this thread due to an almost identical failure[1]:

ovirt-log-collector/ovirt-log-collector-20161030053238.log:2016-10-30
05:33:09::ERROR::__main__::791::root:: Failed to collect logs from:
192.168.200.4; /bin/ls:
/rhev/data-center/mnt/blockSD/63c4fdd3-5d0f-4d16-b1e5-5f43caa4cf82/master/tasks/6b3b6aa1-808c-42df-9db7-52349f8533f2/6b3b6aa1-808c-42df-9db7-52349f8533f2.job.0:
No such file or directory
ovirt-log-collector/ovirt-log-collector-20161030053238.log-/bin/ls:
cannot access 
/rhev/data-center/mnt/blockSD/63c4fdd3-5d0f-4d16-b1e5-5f43caa4cf82/master/tasks/6b3b6aa1-808c-42df-9db7-52349f8533f2/6b3b6aa1-808c-42df-9db7-52349f8533f2.recover.1:
No such file or directory
ovirt-log-collector/ovirt-log-collector-20161030053238.log-/bin/ls:
cannot access 
/rhev/data-center/mnt/blockSD/63c4fdd3-5d0f-4d16-b1e5-5f43caa4cf82/master/tasks/6b3b6aa1-808c-42df-9db7-52349f8533f2/6b3b6aa1-808c-42df-9db7-52349f8533f2.task:
No such file or directory
ovirt-log-collector/ovirt-log-collector-20161030053238.log-/bin/ls:
cannot access 
/rhev/data-center/mnt/blockSD/63c4fdd3-5d0f-4d16-b1e5-5f43caa4cf82/master/tasks/6b3b6aa1-808c-42df-9db7-52349f8533f2/6b3b6aa1-808c-42df-9db7-52349f8533f2.recover.0:
No such file or directory

To ensure I've checked lago/OST, and couldn't find any stage where
there is a reference to '/rhv' nor any manipulation to
ovirt-log-collector, only customizations made is a
'ovirt-log-collector.conf' with user/password. The code that pulls the
logs in OST[2] runs the following command on the engine VM(and there
it fails):

ovirt-log-collector --conf /rot/ovirt-log-collector.conf

The failure comes right after 'add_secondary_storage_domains'[3] test,
which all of its steps ran successfully.

Can anyone look into this?

Thanks,
Nadav.

[1] 
http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-fc24-x86_64/141/console
[2] 
https://github.com/oVirt/ovirt-system-tests/blob/master/basic_suite_master/test-scenarios/002_bootstrap.py#L490
[3] 
https://github.com/oVirt/ovirt-system-tests/blob/master/basic_suite_master/test-scenarios/002_bootstrap.py#L243


On Tue, Sep 20, 2016 at 9:45 AM, Sandro Bonazzola  wrote:
>
>
>
> On Fri, Sep 9, 2016 at 1:19 PM, Yaniv Kaul  wrote:
>>
>> Indeed, this is the log collector. I wonder if we collect its logs...
>> Y.
>
>
> This can't be log-collector, it can be sos vdsm plugin.
> That said, if we run log-collector within lago we should collect the results 
> as job artifacts.
>
>
>>
>>
>>
>> On Thu, Sep 8, 2016 at 6:54 PM, Eyal Edri  wrote:
>>>
>>> I'm pretty sure lago or ovirt system tests aren't doing it but its the log 
>>> collector which is running during that test, I'm not near a computer so 
>>> can't verify it yet.
>>>
>>>
>>> On Sep 8, 2016 6:05 PM, "Nir Soffer"  wrote:

 On Thu, Sep 8, 2016 at 5:45 PM, Eyal Edri  wrote:
 > Adding devel.
 >
 > On Thu, Sep 8, 2016 at 5:43 PM, Shlomo Ben David 
 > wrote:
 >>
 >> Hi,
 >>
 >> Job [1] is failing with the following error:
 >>
 >> lago.ssh: DEBUG: Command 8de75538 on lago_basic_suite_master_engine
 >> errors:
 >>  ERROR: Failed to collect logs from: 192.168.200.2; /bin/ls:
 >> /rhev/data-center/mnt/blockSD/eb8c9f48-5f23-48dc-ab7d-9451890fd422/master/tasks/1350bed7-443e-4ae6-ae1f-9b24d18c70a8.temp:
 >> No such file or directory
 >> /bin/ls: cannot open directory
 >> /rhev/data-center/mnt/blockSD/eb8c9f48-5f23-48dc-ab7d-9451890fd422/master/tasks/1350bed7-443e-4ae6-ae1f-9b24d18c70a8.temp:
 >> No such file or directory

 This looks like a lago issue - it should never read anything inside /rhev

 This is a private directory for vdsm, no other process should ever depend
 on the content inside this directory, or even on the fact that it exists.

 In particular, /rhev/data-center/mnt/blockSD/*/master/tasks/*.temp
 Is not a log file, and lago should not collect it.

 Nir

 >> lago.utils: ERROR: Error while running thread
 >> Traceback (most recent call last):
 >>   File "/usr/lib/python2.7/site-packages/lago/utils.py", line 53, in
 >> _ret_via_queue
 >> queue.put({'return': func()})
 >>   File
 >> "/home/jenkins/workspace/test-repo_ovirt_experimental_master/ovirt-system-tests/basic_suite_master/test-scenarios/002_bootstrap.py",
 >> line 493, in log_collector
 >> result.code, 0, 'log collector failed. Exit code is %s' % 
 >> result.code
 >>   File "/usr/lib/python2.7/site-packages/nose/tools/trivial.py", line 
 >> 29,
 >> in eq_
 >> raise AssertionError(msg or "%r != %r" % (a, b))
 >> AssertionError: log collector failed. Exit code is 2
 >>
 >>
 >> * The previous issue already fixed (SDK) and now we have a new issue on
 >> the same area.
 >>
 >>
 >> [1] -
 >> 

Re: [ovirt-devel] Failure on start VM in ovirt-system-tests from patches merged to master on the 25/10/2016

2016-10-29 Thread Nadav Goldin
;>>>>>> org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79)
>>>>>>> [weld-core-impl-2.3.5.Final.jar:2.3.5.Final]
>>>>>>> at
>>>>>>> org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68)
>>>>>>> [weld-core-impl-2.3.5.Final.jar:2.3.5.Final]
>>>>>>> at
>>>>>>> org.ovirt.engine.core.bll.scheduling.SchedulingManager$Proxy$_$$_WeldSubclass.schedule(Unknown
>>>>>>> Source) [bll.jar:]
>>>>>>> at
>>>>>>> org.ovirt.engine.core.bll.RunVmCommand.getVdsToRunOn(RunVmCommand.java:818)
>>>>>>> [bll.jar:]
>>>>>>> at
>>>>>>> org.ovirt.engine.core.bll.RunVmCommand.runVm(RunVmCommand.java:231)
>>>>>>> [bll.jar:]
>>>>>>> at
>>>>>>> org.ovirt.engine.core.bll.RunVmCommand.perform(RunVmCommand.java:414)
>>>>>>> [bll.jar:]
>>>>>>> at
>>>>>>> org.ovirt.engine.core.bll.RunVmCommand.executeVmCommand(RunVmCommand.java:339)
>>>>>>> [bll.jar:]
>>>>>>> at
>>>>>>> org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:106)
>>>>>>> [bll.jar:]
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Oct 27, 2016 at 5:12 AM, Allon Mureinik <amure...@redhat.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Seems like we cleared up the engine issues related to the recent
>>>>>>>> injection changes.
>>>>>>>>
>>>>>>>> I am now seeing stop errors, e.g.:
>>>>>>>>
>>>>>>>> {"jsonrpc": "2.0", "id": "ea0c564f-bc17-4fc2-8f1b-67c4d28257c6",
>>>>>>>> "result": {"cpuStatistics": {"1": {"cpuUser": "3.07", "nodeIndex": 0,
>>>>>>>> "cpuSys": "3.00", "cpuIdle": "93.93"}, "0": {"cpuUser": "1.67", 
>>>>>>>> "nodeIndex":
>>>>>>>> 0, "cpuSys": "2.07", "cpuIdle": "96.26"}}, "numaNodeMemFree": {"0":
>>>>>>>> {"memPercent": 83, "memFree": "359"}}, "memShared": 0, "thpState": 
>>>>>>>> "always",
>>>>>>>> "ksmMergeAcrossNodes": true, "vmCount": 0, "memUsed": "20",
>>>>>>>> "storageDomains": {"b2bb3220-1eb3-426a-90c2-5e236aefbe1a": {"code": 0,
>>>>>>>> "actual": true, "version": 0, "acquired": true, "delay": "0.000840117",
>>>>>>>> "lastCheck": "7.1", "valid": true}, 
>>>>>>>> "3130195a-73f9-4490-b554-98a9205cead6":
>>>>>>>> {"code": 0, "actual": true, "version": 4, "acquired": true, "delay":
>>>>>>>> "0.00150771", "lastCheck": "7.5", "valid": true},
>>>>>>>> "1a9e202b-83b7-4bdc-9b0c-e76b83676068": {"code": 0, "actual": true,
>>>>>>>> "version": 4, "acquired": true, "delay": "0.000590956",
>>>>>>>> 2016-10-26 21:51:09,878 DEBUG
>>>>>>>> [org.ovirt.engine.core.utils.timer.FixedDelayJobListener]
>>>>>>>> (DefaultQuartzScheduler7) [6d206bd1] Rescheduling
>>>>>>>> DEFAULT.org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods#-9223372036854775783
>>>>>>>> as there is no unfired trigger.
>>>>>>>> 2016-10-26 21:51:28,705 DEBUG
>>>>>>>> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp 
>>>>>>>> Reactor)
>>>>>>>> [383dd6a0] Heartbeat exceeded. Closing channel
>>>>>>>> 2016-10-26 21:51:28,708 DEBUG
>

[ovirt-devel] Failure on start VM in ovirt-system-tests from patches merged to master on the 25/10/2016

2016-10-26 Thread Nadav Goldin
Hi,
We have new failure on OST from patches merged to master yesterday,
the failure started after the merge of [1], but as there were quite a
few patches merged quickly I can't make sure it is the one causing
it(OST aren't ran per-patch).
The test that fails is [2] when attempting to start the VM.

The error from the API side:

RequestError:
status: 500
reason: Internal Server Error
detail: javax.ejb.EJBException: java.lang.NullPointerException
at 
org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInNoTx(CMTTxInterceptor.java:213)
at 
org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:265)
at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:374)
at 
org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:243)
at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340)


In the engine logs there are a few 'java.lang.NullPointerException' errors:

2016-10-25 11:53:52,845 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase]
(org.ovirt.thread.pool-6-thread-2) [5e6a88be] Failed to get vds
'd60db21f-95f0-487b-9f17-44861e2610a7', error: null
2016-10-25 11:53:52,864 DEBUG
[org.ovirt.engine.core.utils.timer.FixedDelayJobListener]
(DefaultQuartzScheduler5) [] Rescheduling
DEFAULT.org.ovirt.engine.core.bll.tasks.AsyncTaskManager.timerElapsed#-9223372036854775787
as there is no unfired trigger.
...
2016-10-25 11:53:52,845 DEBUG
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase]
(org.ovirt.thread.pool-6-thread-2) [5e6a88be] Exception:
java.lang.NullPointerException
at 
org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase.getVdsStatic(AuditLogableBase.java:633)
[dal.jar:]
at 
org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase.getVdsName(AuditLogableBase.java:504)
[dal.jar:]
...
2016-10-25 11:53:52,837 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase]
(org.ovirt.thread.pool-6-thread-2) [5e6a88be] Failed to get vds
'd60db21f-95f0-487b-9f17-44861e2610a7', error: null
2016-10-25 11:53:52,837 DEBUG
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase]
(org.ovirt.thread.pool-6-thread-2) [5e6a88be] Exception:
java.lang.NullPointerException
at 
org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase.getVdsStatic(AuditLogableBase.java:633)
[dal.jar:]
at 
org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase.getVdsName(AuditLogableBase.java:504)
[dal.jar:]
...

The full engine logs can be found here[3] and the entire test suite
logs here[4].

Can anyone have a look?

Thanks,
Nadav.


[1] https://gerrit.ovirt.org/#/c/65198/
[2] 
https://github.com/oVirt/ovirt-system-tests/blob/master/basic_suite_master/test-scenarios/004_basic_sanity.py#L322
[3] 
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/2759/artifact/exported-artifacts/basic_suite_master.sh-fc24/exported-artifacts/test_logs/basic_suite_master/post-004_basic_sanity.py/*zip*/post-004_basic_sanity.py.zip
[4] 
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/2759/artifact/exported-artifacts/basic_suite_master.sh-fc24/exported-artifacts/
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] repos down

2016-08-29 Thread Nadav Goldin
Looks like there was a temporary network outage in our PHX datacenter
during the night(29/08 - 22:00-23:00 UTC).
From what I checked, at least resources.ovirt.org is working normally
now, so you can try again.





On Tue, Aug 30, 2016 at 7:14 AM, Sandro Bonazzola  wrote:
>
> Il 30/Ago/2016 01:43, "Gary Pedretty"  ha scritto:
>>
>> I am unable to do any installs, yum cannot access the mirrors list
>>
>> “Could not retrieve mirrorlist
>> http://resources.ovirt.org/pub/yum-repo/mirrorlist-ovirt-4.0-el7” error was
>> 12: Timeout on
>> http://resources.ovirt.org/pub/yum-repo/mirrorlist-ovirt-4.0-el7: (28,
>> ‘Connection timed out after 30001 milliseconds’)"
>>
>> Tried from various locations on different ISPs.
>>
>> Gary
>>
>> 
>> Gary Pedrettyg...@ravnalaska.net
>> Systems Manager  www.flyravn.com
>> Ravn Alaska   /\907-450-7251
>> 5245 Airport Industrial Road /  \/\ 907-450-7238 fax
>> Fairbanks, Alaska  99709/\  /\ \ Second greatest commandment
>> Serving All of Alaska  /  \/  /\  \ \/\   “Love your neighbor as
>> Really loving the record green up date! Summmer!!   yourself” Matt 22:39
>> 
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ___
>> Devel mailing list
>> Devel@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
>
>
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] [FYI] Jenkins maintenance today 18:30 TLV

2016-07-06 Thread Nadav Goldin
Jenkins is back up. If you see any issues please email infra-supp...@ovirt.org

Thanks,
Nadav.


On Wed, Jul 6, 2016 at 5:48 PM, Nadav Goldin <ngol...@redhat.com> wrote:
> Hi,
> http://jenkins.ovirt.org will be restarted for plugin updates today at
> 18:30 TLV time, expected downtime is 30 minutes. Patches sent 30
> minutes before might not get checked, patches sent during the downtime
> will get checked when Jenkins is back.
>
> If patches you sent did not trigger CI, you can login after the
> downtime and re-trigger them manually.
>
> Thanks,
> Nadav.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] [FYI] Jenkins maintenance today 18:30 TLV

2016-07-06 Thread Nadav Goldin
Hi,
http://jenkins.ovirt.org will be restarted for plugin updates today at
18:30 TLV time, expected downtime is 30 minutes. Patches sent 30
minutes before might not get checked, patches sent during the downtime
will get checked when Jenkins is back.

If patches you sent did not trigger CI, you can login after the
downtime and re-trigger them manually.

Thanks,
Nadav.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] [Attention] Jenkins maintenance today(24/06/2016 01:00 AM TLV)

2016-06-23 Thread Nadav Goldin
Jenkins is back up to normal.



On Fri, Jun 24, 2016 at 12:07 AM, Nadav Goldin <ngol...@redhat.com> wrote:
> Hi,
> As part of an infrastructure upgrade, in approximately one hour at
> 01:00 AM TLV, http://jenkins.ovirt.org will be shut down for
> maintenance, expected downtime is 15 minutes.
> Patches sent during the downtime will be checked afterwards, patches
> sent around 40 minutes prior to the downtime might not get checked.
>
> If patches you sent did not trigger CI, you can login after the
> downtime and re-trigger them manually.
>
> Thanks,
>
> Nadav.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] [Attention] Jenkins maintenance today(24/06/2016 01:00 AM TLV)

2016-06-23 Thread Nadav Goldin
Hi,
As part of an infrastructure upgrade, in approximately one hour at
01:00 AM TLV, http://jenkins.ovirt.org will be shut down for
maintenance, expected downtime is 15 minutes.
Patches sent during the downtime will be checked afterwards, patches
sent around 40 minutes prior to the downtime might not get checked.

If patches you sent did not trigger CI, you can login after the
downtime and re-trigger them manually.

Thanks,

Nadav.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] [Attention needed] GlusterFS repository down - affects CI / Installations

2016-04-27 Thread Nadav Goldin
Hi,
The GlusterFS repository became unavailable this morning, as a result all
Jenkins jobs that use the repository will fail, the common error would be:

>
> http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-7/noarch/repodata/repomd.xml:
> [Errno 14] HTTP Error 403 - Forbidden
>

Also, installations of oVirt will fail.

We are working on a solution and will update asap.

Nadav.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

[ovirt-devel] User authentication on http://jenkins.ovirt.org

2016-04-25 Thread Nadav Goldin
Hi all,
I note again that personal user accounts were not migrated to the new
Jenkins instance,
so if you need other than read-only access please enrol again(press sign up
in the welcome page).

If you require permissions to trigger jobs manually:
email infra-supp...@ovirt.org 'please put my Jenkins $username, in dev role
group'

for other permissions(create job, etc) - if it is a 'testing' job feel free
to use http://jenkins-old.ovirt.org, as we are trying to minimize the
number of none-yamlized jobs in the new instance. If that is not suitable
please email what you need.


Thanks,

Nadav.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] Jenkins.ovirt.org Upgrade | 24/04/2016(Sunday) - 18:00 TLV

2016-04-24 Thread Nadav Goldin
The migration was completed, if you encounter any problems please email
infra-supp...@ovirt.org to track the issue.
The old instance, with the none-yamlized jobs can be found in
http://jenkins-old.ovirt.org,
if you need any of the none-yamlized jobs feel free to enable them in
http://jenkins-old.ovirt.org(only yamlized jobs were migrated)





On Thu, Apr 21, 2016 at 10:59 AM, Nadav Goldin <ngol...@redhat.com> wrote:

> Hey all,
> On Sunday, 24.04.2015 - 18:00 TLV (17:00 CET) we plan to migrate Jenkins(
> http://jenkins.ovirt.org) to a new VM in the PHX datacenter, this will
> increase the instance storage and allow better connectivity with the slaves.
>
> 1. The expected downtime is 2 hours, during that time no patches will be
> checked and you could not login to Jenkins. Patches sent to gerrit during
> the downtime, might get checked after the downtime.
>
> 2.What will be migrated:
> All yamlized jobs, global configuration and most of the existing slaves.
>
> 3. The old Jenkins instance will still be available under
> jenkins-old.ovirt.org, with a minimum number of slaves. It will be kept
> at least in the following months for backup and for the none-yamlized
> jobs(but with no gerrit triggers)
>
> 4. User authentication: all users will have to enrol again, this can be
> done already this week via http://jenkins.phx.ovirt.org
>
> Another reminder will be sent ~ 2 hours before the migration.
>
> Thanks,
>
> Nadav.
>
>
>
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

[ovirt-devel] [ATTENTION NEEDED] Jenkins upgrade | Today(24/04/2016) - 18:00 TLV

2016-04-24 Thread Nadav Goldin
Hey all,

Reminder: http://jenkins.ovirt.org will be down today between 18:00-20:00
TLV(17:00-19:00 CET),
details below.

Thanks,
Nadav.

-- Forwarded message --
From: Nadav Goldin <ngol...@redhat.com>
Date: Thu, Apr 21, 2016 at 10:59 AM
Subject: [ovirt-devel] Jenkins.ovirt.org Upgrade | 24/04/2016(Sunday) -
18:00 TLV
To: devel <devel@ovirt.org>, infra <in...@ovirt.org>


Hey all,
On Sunday, 24.04.2015 - 18:00 TLV (17:00 CET) we plan to migrate Jenkins(
http://jenkins.ovirt.org) to a new VM in the PHX datacenter, this will
increase the instance storage and allow better connectivity with the slaves.

1. The expected downtime is 2 hours, during that time no patches will be
checked and you could not login to Jenkins. Patches sent to gerrit during
the downtime, might get checked after the downtime.

2.What will be migrated:
All yamlized jobs, global configuration and most of the existing slaves.

3. The old Jenkins instance will still be available under
jenkins-old.ovirt.org, with a minimum number of slaves. It will be kept at
least in the following months for backup and for the none-yamlized jobs(but
with no gerrit triggers)

4. User authentication: all users will have to enrol again, this can be
done already this week via http://jenkins.phx.ovirt.org

Another reminder will be sent ~ 2 hours before the migration.

Thanks,

Nadav.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

[ovirt-devel] Jenkins.ovirt.org Upgrade | 24/04/2016(Sunday) - 18:00 TLV

2016-04-21 Thread Nadav Goldin
Hey all,
On Sunday, 24.04.2015 - 18:00 TLV (17:00 CET) we plan to migrate Jenkins(
http://jenkins.ovirt.org) to a new VM in the PHX datacenter, this will
increase the instance storage and allow better connectivity with the slaves.

1. The expected downtime is 2 hours, during that time no patches will be
checked and you could not login to Jenkins. Patches sent to gerrit during
the downtime, might get checked after the downtime.

2.What will be migrated:
All yamlized jobs, global configuration and most of the existing slaves.

3. The old Jenkins instance will still be available under
jenkins-old.ovirt.org, with a minimum number of slaves. It will be kept at
least in the following months for backup and for the none-yamlized jobs(but
with no gerrit triggers)

4. User authentication: all users will have to enrol again, this can be
done already this week via http://jenkins.phx.ovirt.org

Another reminder will be sent ~ 2 hours before the migration.

Thanks,

Nadav.
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] python-mock

2016-04-10 Thread Nadav Goldin
adding infra,
as seems the package isn't needed to be supported natively, and it was
failing puppet run on el7 slaves(the package name was changed from
python-mock to python2-mock in epel),
I've sent a patch to drop it, will wait 2 more days to see if anyone else
objects.





On Thu, Apr 7, 2016 at 11:01 AM, Edward Haas  wrote:

>
>
> On Wed, Apr 6, 2016 at 3:22 PM, Eyal Edri  wrote:
>
>> My only guess would be old VDSM functional tests or unit test using mock.
>> Yaniv/Danken?
>>
>> e.
>>
>> On Wed, Apr 6, 2016 at 2:40 PM, Shlomi Ben David 
>> wrote:
>>
>>> Hi,
>>>
>>> I wanted to ask you if you know about a Jenkins job that is using
>>> 'python-mock' pkg? if you do please let me know.
>>>
>>> Thanks in advanced,
>>>
>>> --
>>> Shlomi Ben-David | Software Engineer | Red Hat ISRAEL
>>>
>>> OPEN SOURCE - 1 4 011 && 011 4 1
>>>
>>> ___
>>> Devel mailing list
>>> Devel@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/devel
>>>
>>>
>>>
>>
>>
>> --
>> Eyal Edri
>> Associate Manager
>> RHEV DevOps
>> EMEA ENG Virtualization R
>> Red Hat Israel
>>
>> phone: +972-9-7692018
>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>>
>> ___
>> Devel mailing list
>> Devel@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
>>
>
>
> VDSM is currently not using it, but a proposal to use it has been raised:
> https://gerrit.ovirt.org/#/c/55342/
> It is a powerful mocking library that has been added to python3.
>
> It will be nice to hear how others use it and why.
>
> Thanks,
> Edy.
>
>
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel