[ovirt-users] Re: error: "cannot set lock, no free lockspace" (localized)

2019-02-28 Thread Mike Lykov
01.03.2019 9:51, Sahina Bose пишет:
> Any errors in vdsm.log or gluster mount log for this volume?
> 

I cannot find any.
Here is full logs from one node for that period:

https://yadi.sk/d/BzLBb8VGNEwidw
file name ovirtnode1-logs-260219.tar.gz

gluster, vdsm logs for all volumes

sanlock client status now (can it contain any useful info for "cannot set lock" 
error?):
node without any VMs

[root@ovirtnode5 ~]# sanlock client status
daemon 165297fa-c9e7-47ec-8949-80f39f52304c.ovirtnode5
p -1 helper
p -1 listener
p -1 status
s 
01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:2:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/ids:0
s 
64f18bf1-4eb6-4b3e-a216-9681091a3bc7:2:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_data/64f18bf1-4eb6-4b3e-a216-9681091a3bc7/dom_md/ids:0
s 
hosted-engine:2:/var/run/vdsm/storage/0571ac7b-a28e-4e20-9cd8-4803e40ec602/1c7d4c4d-4ae4-4743-a61c-1437459dcc14/699eec1d-c713-4e66-8587-27792d9a2b32:0
s 
0571ac7b-a28e-4e20-9cd8-4803e40ec602:2:/rhev/data-center/mnt/glusterSD/ovirtstor1.miac\:_engine/0571ac7b-a28e-4e20-9cd8-4803e40ec602/dom_md/ids:0

node with VMs

[root@ovirtnode1 /]# sanlock client status
daemon 71784659-0fac-4802-8c0d-0efe3ab977d9.ovirtnode1
p -1 helper
p -1 listener
p 36456 miac_serv2
p 48024 miac_gitlab_runner
p 10151 
p 50624 e-l-k.miac
p 455336 openfire.miac
p 456445 miac_serv3
p 458384 debian9_2
p -1 status
s 
01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:1:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/ids:0
s 
64f18bf1-4eb6-4b3e-a216-9681091a3bc7:1:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_data/64f18bf1-4eb6-4b3e-a216-9681091a3bc7/dom_md/ids:0
s 
hosted-engine:1:/var/run/vdsm/storage/0571ac7b-a28e-4e20-9cd8-4803e40ec602/1c7d4c4d-4ae4-4743-a61c-1437459dcc14/699eec1d-c713-4e66-8587-27792d9a2b32:0
s 
0571ac7b-a28e-4e20-9cd8-4803e40ec602:1:/rhev/data-center/mnt/glusterSD/ovirtstor1.miac\:_engine/0571ac7b-a28e-4e20-9cd8-4803e40ec602/dom_md/ids:0
r 
01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:b19996be-1548-41ad-afe3-1726ee38d368:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:13631488:7
 p 458384
r 
01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:4507a184-e158-484e-932a-2f1266b80223:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:7340032:7
 p 456445
r 
01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:d546add1-126a-4490-bc83-469bab659854:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:19922944:6
 p 455336
r 
0571ac7b-a28e-4e20-9cd8-4803e40ec602:SDM:/rhev/data-center/mnt/glusterSD/ovirtstor1.miac\:_engine/0571ac7b-a28e-4e20-9cd8-4803e40ec602/dom_md/leases:1048576:10
 p 10151
r 
01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:7a3af2e7-8296-4fe0-ac55-c52a4b1de93f:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:17825792:5
 p 50624
r 
01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:4c2aaf48-a3f1-45a1-9c2b-912763643268:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:10485760:4
 p 48024
r 
01f6fd06-9ad1-4957-bcda-df24dc4cc4f5:6c380073-9650-4832-8416-3001c5a172ab:/rhev/data-center/mnt/glusterSD/ovirtnode1.miac\:_vmstore/01f6fd06-9ad1-4957-bcda-df24dc4cc4f5/dom_md/xleases:6291456:6
 p 36456
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6CT5DT77QZTZ5Q54QT5BLPYVDTSJEVCU/


[ovirt-users] reinstall centos node (ovirt 4.2.7) fails due missing dependency on librbd1

2019-02-27 Thread Mike Lykov

Hi All!


I'm using ovirt-release42-4.2.7.1-1.el7.noarch
centos-release-7-5.1804.5.el7.centos.x86_64
And HCI glusterfs deployment (no ceph).

And yesterday I wanted an add hosted engine deploy on some node. 
It requires to enable maintenance mode, reinstall node with "hosted engine = 
deploy" config (in webui).

I made it, but reinstall fails. In logs I see (not full list):
...
Feb 27 16:35:39 Updated: libvirt-daemon-4.5.0-10.el7_6.4.x86_64
Feb 27 16:35:39 Updated: 
libvirt-daemon-driver-storage-core-4.5.0-10.el7_6.4.x86_64
Feb 27 16:36:04 Updated: 
libvirt-daemon-driver-storage-rbd-4.5.0-10.el7_6.4.x86_64
Feb 27 16:36:04 Updated: 
libvirt-daemon-driver-storage-disk-4.5.0-10.el7_6.4.x86_64
Feb 27 16:36:04 Updated: libvirt-daemon-driver-storage-4.5.0-10.el7_6.4.x86_64

After it 
# rpm -q librbd1
librbd1-0.94.5-2.el7.x86_64

фев 27 16:36:51 ovirtnode5.miac systemd[1]: Starting Virtualization daemon...
фев 27 16:36:51 ovirtnode5.miac libvirtd[537]: 2019-02-27 12:36:51.338+: 
537: info : libvirt version: 4.5.0, package: 10.el7_6.4 (CentOS BuildSystem 
, 2019-01-29-17:31:22, x86-01.bsys.centos.org)
фев 27 16:36:51 ovirtnode5.miac libvirtd[537]: 2019-02-27 12:36:51.338+: 
537: info : hostname: ovirtnode5.miac
фев 27 16:36:51 ovirtnode5.miac libvirtd[537]: 2019-02-27 12:36:51.338+: 
537: error : virModuleLoadFile:53 : внутренняя ошибка: Failed to load module 
'/usr/lib64/libvirt/storage-backend/libvirt_storage_backend_rbd.so': 
/usr/lib64/libv
irt/storage-backend/libvirt_storage_backend_rbd.so: undefined symbol: 
rbd_diff_iterate2
фев 27 16:36:51 ovirtnode5.miac systemd[1]: libvirtd.service: main process 
exited, code=exited, status=3/NOTIMPLEMENTED
фев 27 16:36:51 ovirtnode5.miac systemd[1]: Failed to start Virtualization 
daemon.

There is a old bugs like
https://bugzilla.redhat.com/show_bug.cgi?id=1316911
or new posts like 
http://dreamcloud.artark.ca/libvirtd-failure-after-latest-upgrade-in-rhel-centos7/

Obviously, the libvirt* rpms are updated, but librbd* not. But why? Why libvirt 
packages are not strictly dependenced on all required package/lib versions?

libvirt-daemon-driver-storage-rbd must require exactly that librbd version that 
can work with, i think.

--
Mike
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/A4X6PPFT7RKCETV4OTR7YARDDRGFBZ2Y/


[ovirt-users] error: "cannot set lock, no free lockspace" (localized)

2019-02-26 Thread Mike Lykov

Hi all. I have a HCI setup, glusterfs 3.12, ovirt 4.2.7, 4 nodes

Yesterday I see 3 VMs detected by engine as "not responding" (it is marked as 
HA VMs)
(it all located on ovirtnode1 server)
Two of them are restarted by engine on other nodes successfully, but one are 
not. All get LOCALIZED message like 'cannot set lock: no free space on device'  
- what is this ? Why engine get that errors, and why some VMs can restart 
automatically, some not (but successfully restarted bu user via webui after 
some pause?)
Who knows, what are you think? Full engine logs may be uploaded.

from engine.log: start event
---
2019-02-26 17:04:05,308+04 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM 
'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) moved from 'Up' --> 
'NotResponding'
2019-02-26 17:04:05,865+04 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: 
VM_NOT_RESPONDING(126), VM openfire.miac is not responding.
2019-02-26 17:04:05,865+04 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM 
'7a3af2e7-8296-4fe0-ac55-c52a4b1de93f'(e-l-k.miac) moved from 'Up' --> 
'NotResponding'
2019-02-26 17:04:05,894+04 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: 
VM_NOT_RESPONDING(126), VM e-l-k.miac is not responding.
2019-02-26 17:04:05,895+04 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM 
'de76aa6c-a211-41de-8d85-7d2821c3980d'(tsgr-mon) moved from 'Up' --> 
'NotResponding'
2019-02-26 17:04:05,926+04 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: 
VM_NOT_RESPONDING(126), VM tsgr-mon is not responding.
---
2019-02-26 17:04:22,237+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-9) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is 
down with error. Exit message: VM has been terminated on the host.

2019-02-26 17:04:22,374+04 INFO  [org.ovirt.engine.core.bll.VdsEventListener] 
(ForkJoinPool-1-worker-9) [] Highly Available VM went down. Attempting to 
restart. VM Name 'openfire.miac', VM Id 'd546add1-126a-4490-bc83-469bab659854'
...
2019-02-26 17:04:27,737+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-5) [] EVENT_ID: 
VM_DOWN_ERROR(119), VM openfire.miac is down with error. Exit message: resource 
busy: Failed to acquire lock: Lease is held by another host.
...
2019-02-26 17:04:28,350+04 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-2886073) [] EVENT_ID: 
VDS_INITIATED_RUN_VM(506), Trying to restart VM openfire.miac on Host 
ovirtnode6.miac
2019-02-26 17:04:31,841+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-2) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is 
down with error. Exit message: resource busy: Failed to acquire lock: Lease is 
held by another host.
2019-02-26 17:04:31,877+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-2886082) [] EVENT_ID: 
VDS_INITIATED_RUN_VM_FAILED(507), Failed to restart VM openfire.miac on Host 
ovirtnode6.miac
...
2019-02-26 17:04:31,994+04 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-2886082) [] EVENT_ID: 
VDS_INITIATED_RUN_VM(506), Trying to restart VM openfire.miac on Host 
ovirtnode1.miac
.
2019-02-26 17:04:36,054+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-9) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac is 
down with error. Exit message: Не удалось установить блокировку: На устройстве 
не осталось свободного места.
2019-02-26 17:04:36,054+04 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(ForkJoinPool-1-worker-9) [] add VM 
'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) to rerun treatment
2019-02-26 17:04:36,091+04 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-2886083) [] EVENT_ID: 
VDS_INITIATED_RUN_VM_FAILED(507), Failed to restart VM openfire.miac on Host 
ovirtnode1.miac
---
No more attemtps for this VM were made (state now is 'Down')

Engine tried restart this VM on some other nodes, get ' Lease is held by 
another host' (it is normal, because timeout for lock not expired?) and then 
got (LOCALIZED MESSAGE ?? Why it is localized while all other are 

[ovirt-users] Re: stucked snapshot, locked disk

2019-02-18 Thread Mike Lykov

14.02.2019 19:45, Jiří Sléžka пишет:

Hello,

we are using ovirt 4.2.8.2-1.el7.

One our user probably tried to preview taken snapshot but the task is
stucked and never finished. Also disk is locked.


I'm also have this problem.
I try to use openbaccus project for backup VMs, and all was good (manual 
 started backup, for example, did his job: take snapshot, copy VM, etc.)


But when I configure a task at night (Sheduled backup VM), It was some 
sort of disaster :)


It try to snapshot/copy VM in infinite loop and create dozens images, 
all of that tasks have "failed" result.
I was forced to delete it in ovirtengine by hand, but last created task 
are stuck. I switch off baccus, because it send queries to snapshots via 
API indefinitely, In engine log:

---
2019-02-12 03:17:00,659+04 INFO 
[org.ovirt.engine.core.sso.utils.AuthenticationUtils] (default 
task-2941) [] User admin@internal successfully logged in with scopes: 
ovirt-app-api ovirt-ext=token-info:authz-search ovirt-ext=token-info:pub
lic-authz-search ovirt-ext=token-info:validate 
ovirt-ext=token:password-access
2019-02-12 03:17:00,697+04 INFO 
[org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default 
task-2941) [7feb4bfd] Running command: CreateUserSessionCommand 
internal: false.
2019-02-12 03:17:00,704+04 INFO 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(default task-2941) [7feb4bfd] EVENT_ID: USER_VDC_LOGIN(30), User 
admin@internal-authz connecting from '172.16.10.41' using session 'K
tVU5wRCDZn2ZrPIE4rengdlpt+GdIfjTD6KpPIW45oY4XUfpcUCJH9ry4gRbsO98lQawu8LdMdRZ0zxqUcJKA==' 
logged in.
2019-02-12 03:17:00,804+04 INFO 
[org.ovirt.engine.core.sso.utils.AuthenticationUtils] (default 
task-2941) [] User admin@internal successfully logged in with scopes: 
ovirt-app-api ovirt-ext=token-info:authz-search ovirt-ext=token-info:pub
lic-authz-search ovirt-ext=token-info:validate 
ovirt-ext=token:password-access
2019-02-12 03:17:00,863+04 INFO 
[org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default 
task-2941) [34e0ac0] Running command: CreateUserSessionCommand internal: 
false.
2019-02-12 03:17:00,984+04 INFO 
[org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] 
(default task-2941) [2f595966-44e8-4a67-8f55-a3d09836fc4d] Lock Acquired 
to object 'EngineLock:{exclusiveLocks='[f1029df3-36f3-4746-8c58-ebe

cf860776f=VM]', sharedLocks=''}'
2019-02-12 03:17:00,985+04 WARN 
[org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] 
(default task-2941) [2f595966-44e8-4a67-8f55-a3d09836fc4d] Validation of 
action 'CreateSnapshotForVm' failed for user admin@internal-authz.
Reasons: 
VAR__ACTION__CREATE,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_VM_IS_DURING_SNAPSHOT
2019-02-12 03:17:00,985+04 INFO 
[org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] 
(default task-2941) [2f595966-44e8-4a67-8f55-a3d09836fc4d] Lock freed to 
object 'EngineLock:{exclusiveLocks='[f1029df3-36f3-4746-8c58-ebecf8

60776f=VM]', sharedLocks=''}'
2019-02-12 03:17:00,990+04 ERROR 
[org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default 
task-2941) [] Operation Failed: [Cannot create Snapshot. The VM is 
performing an operation on a Snapshot. Please wait for the operat

ion to finish, and try again.]
2019-02-12 03:17:01,033+04 INFO 
[org.ovirt.engine.core.sso.servlets.OAuthRevokeServlet] (default 
task-2958) [] User admin@internal successfully logged out
2019-02-12 03:17:01,040+04 INFO 
[org.ovirt.engine.core.bll.aaa.TerminateSessionsForTokenCommand] 
(default task-2955) [6310d15d] Running command: 
TerminateSessionsForTokenCommand internal: true.
2019-02-12 03:17:01,231+04 INFO 
[org.ovirt.engine.core.sso.utils.AuthenticationUtils] (default 
task-2941) [] User admin@internal successfully logged in with scopes: 
ovirt-app-api ovirt-ext=token-info:authz-search ovirt-ext=token-info:pub
lic-authz-search ovirt-ext=token-info:validate 
ovirt-ext=token:password-access
2019-02-12 03:17:01,255+04 INFO 
[org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default 
task-2941) [584a33cc] Running command: CreateUserSessionCommand 
internal: false.
2019-02-12 03:17:01,390+04 INFO 
[org.ovirt.engine.core.sso.utils.AuthenticationUtils] (default 
task-2941) [] User admin@internal successfully logged in with scopes: 
ovirt-app-api ovirt-ext=token-info:authz-search ovirt-ext=token-info:pub
lic-authz-search ovirt-ext=token-info:validate 
ovirt-ext=token:password-access
2019-02-12 03:17:01,414+04 INFO 
[org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default 
task-2941) [4adc6455] Running command: CreateUserSessionCommand 
internal: false.
2019-02-12 03:17:01,587+04 INFO 
[org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] 
(default task-2941) [8b27b7bc-a7d1-4d9a-94f0-fbcad58691d0] Lock Acquired 
to object 'EngineLock:{exclusiveLocks='[f1029df3-36f3-4746-8c58-ebe

cf860776f=VM]', sharedLocks=''}'
2019-02-12 03:17:01,587+04 WARN 

[ovirt-users] Re: fixed: ovirt small network outage causes HE root xfs crash due to race condition

2018-12-24 Thread Mike Lykov

25.12.2018 10:14, Mike Lykov пишет:

1. Why (when it cannot boot due to corruption) it NOT show anything at 
all in console?
I can get to grub menu (if moving fast enough), but if I continue boot I 
see a blinking cursor for many minutes and not more. Grub options not 
contain any splash/quiet parameters.
(exclusion for EDD message - it is meaningless, if I use edd=off - I get 
only black console).


Where is a kernel boot logs/console output? Are it try to load initrd at 
least?


2. How to set some timeouts for ha-agent NOT to try restart HE after 1-2 
unsuccessful pings and 10 seconds outage?
For HE VM stability (not crash/broke fs) are more important instead 
availability (I can live with unavailable it for 10-15 sec, but cannot 
with broken VM).


3. I stop ha-agent, broker and HE VM on all (two) nodes. Fix a partition 
in VM. Then I start ha-agent on nodes, and it BROKE VM fs AGAIN! (trying 
to decide which VM are starting).


I fix VM fs again, put a cluster in maintenance mode, start a VM on one 
node by hand, check it for status/health ok, and only then put ha-agent 
in work (none) mode. Easy way to broke the cluster by crash HE VM fs (by 
not put it to global maintenance mode).






---
Dec 21 12:32:56 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC 
Link is Down
Dec 21 12:32:56 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) 
entered disabled state
Dec 21 12:33:13 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC 
Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit
Dec 21 12:33:13 ovirtnode6 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): 
enp59s0f0: link becomes ready
Dec 21 12:33:13 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) 
entered forwarding state
Dec 21 12:33:13 ovirtnode6 NetworkManager[1715]:  
[1545381193.2204] device (enp59s0f0): carrier: link connected

---

There is 17 second. at 33:13 link are back. BUT all events lead to 
crash follow later:


HA agent log:
--
MainThread::INFO::2018-12-21 
12:32:59,540::states::444::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) 
Engine vm running on localhost
MainThread::INFO::2018-12-21 
12:32:59,662::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) 
Current state EngineUp (score: 3400)
MainThread::INFO::2018-12-21 
12:33:09,797::states::136::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) 
Penalizing score by 1280 due to gateway status
MainThread::INFO::2018-12-21 
12:33:09,798::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) 
Current state EngineUp (score: 2120)
MainThread::ERROR::2018-12-21 
12:33:19,815::states::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) 
Host ovirtnode1.miac (id 1) score is significantly better than local 
score, shutting down VM on this host

--


syslog messages:

Dec 21 12:33:19 ovirtnode6 journal: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Host 
ovirtnode1.miac (id 1) score is significantly better than local score, 
shutting down VM on this host
Dec 21 12:33:29 ovirtnode6 journal: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine 
VM stopped on localhost
Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state

Dec 21 12:33:37 ovirtnode6 kernel: device vnet1 left promiscuous mode
Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state
Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]:  
[1545381217.1796] device (vnet1): state change: disconnected -> 
unmanaged (reason 'unmanaged', sys-iface-state: 'removed')
Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]:  
[1545381217.1798] device (vnet1): released from master device ovirtmgmt
Dec 21 12:33:37 ovirtnode6 libvirtd: 2018-12-21 08:33:37.192+: 
2783: **error : qemuMonitorIO:719 : internal error: End of 
file from qemu monitor*  - WHAT IS THIS?

Dec 21 12:33:37 ovirtnode6 kvm: 2 guests now active
Dec 21 12:33:37 ovirtnode6 systemd-machined: Machine 
qemu-2-HostedEngine terminated.
Dec 21 12:33:37 ovirtnode6 firewalld[1693]: WARNING: COMMAND_FAILED: 
'/usr/sbin/iptables -w2 -w -D libvirt-out -m physdev 
--physdev-is-bridged --physdev-out vnet1 -g FP-vnet1' failed: iptables 
v1.4.21: goto 'FP-vnet1' is not a chain#012#0

12Try `iptables -h' or 'iptables --help' for more information.

Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
blocking state
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state

Dec 21 12:33:55 ovirtnode6 kernel: device vnet1 entered promiscuous mode
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
blocking state
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
forwarding state
Dec 21 12:33:55 ovirtnode6

[ovirt-users] Re: fixed: ovirt small network outage causes HE root xfs crash due to race condition

2018-12-24 Thread Mike Lykov

24.12.2018 11:30, Mike Lykov пишет:

Host nodes (centos 7.5) named ovirtnode1,5,6. Timeouts (in ha agent) are 
default. Sanlock are configured (as i think)

HE running on ovirtnode6, and spare HE deployed on ovirtnode1.


Fixed (as seems) by guestfish/xfs_repair method. It requires to zero xfs 
metadata logs, and this heavily relies on luck.


1. Why (when it cannot boot due to corruption) it NOT show anything at 
all in console?
I can get to grub menu (if moving fast enough), but if I continue boot I 
see a blinking cursor for many minutes and not more. Grub options not 
contain any splash/quiet parameters.
(exclusion for EDD message - it is meaningless, if I use edd=off - I get 
only black console).


Where is a kernel boot logs/console output? Are it try to load initrd at 
least?


2. How to set some timeouts for ha-agent NOT to try restart HE after 1-2 
unsuccessful pings and 10 seconds outage?
For HE VM stability (not crash/broke fs) are more important instead 
availability (I can live with unavailable it for 10-15 sec, but cannot 
with broken VM).






---
Dec 21 12:32:56 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC 
Link is Down
Dec 21 12:32:56 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) entered 
disabled state
Dec 21 12:33:13 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC 
Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit
Dec 21 12:33:13 ovirtnode6 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): 
enp59s0f0: link becomes ready
Dec 21 12:33:13 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) entered 
forwarding state
Dec 21 12:33:13 ovirtnode6 NetworkManager[1715]:  
[1545381193.2204] device (enp59s0f0): carrier: link connected

---

There is 17 second. at 33:13 link are back. BUT all events lead to crash 
follow later:


HA agent log:
--
MainThread::INFO::2018-12-21 
12:32:59,540::states::444::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) 
Engine vm running on localhost
MainThread::INFO::2018-12-21 
12:32:59,662::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) 
Current state EngineUp (score: 3400)
MainThread::INFO::2018-12-21 
12:33:09,797::states::136::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) 
Penalizing score by 1280 due to gateway status
MainThread::INFO::2018-12-21 
12:33:09,798::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) 
Current state EngineUp (score: 2120)
MainThread::ERROR::2018-12-21 
12:33:19,815::states::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) 
Host ovirtnode1.miac (id 1) score is significantly better than local 
score, shutting down VM on this host

--


syslog messages:

Dec 21 12:33:19 ovirtnode6 journal: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Host 
ovirtnode1.miac (id 1) score is significantly better than local score, 
shutting down VM on this host
Dec 21 12:33:29 ovirtnode6 journal: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM 
stopped on localhost
Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state

Dec 21 12:33:37 ovirtnode6 kernel: device vnet1 left promiscuous mode
Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state
Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]:  
[1545381217.1796] device (vnet1): state change: disconnected -> 
unmanaged (reason 'unmanaged', sys-iface-state: 'removed')
Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]:  
[1545381217.1798] device (vnet1): released from master device ovirtmgmt
Dec 21 12:33:37 ovirtnode6 libvirtd: 2018-12-21 08:33:37.192+: 2783: 
**error : qemuMonitorIO:719 : internal error: End of file 
from qemu monitor*  - WHAT IS THIS?

Dec 21 12:33:37 ovirtnode6 kvm: 2 guests now active
Dec 21 12:33:37 ovirtnode6 systemd-machined: Machine qemu-2-HostedEngine 
terminated.
Dec 21 12:33:37 ovirtnode6 firewalld[1693]: WARNING: COMMAND_FAILED: 
'/usr/sbin/iptables -w2 -w -D libvirt-out -m physdev 
--physdev-is-bridged --physdev-out vnet1 -g FP-vnet1' failed: iptables 
v1.4.21: goto 'FP-vnet1' is not a chain#012#0

12Try `iptables -h' or 'iptables --help' for more information.

Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
blocking state
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state

Dec 21 12:33:55 ovirtnode6 kernel: device vnet1 entered promiscuous mode
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
blocking state
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
forwarding state
Dec 21 12:33:55 ovirtnode6 lldpad: recvfrom(Event interface): No buffer 
space available
Dec 21 12:33:55 ovirtnode6 NetworkManager[1715]:  
[1545381235.8086] manager:

[ovirt-users] Re: ovirt small network outage causes HE root xfs crash due to race condition

2018-12-23 Thread Mike Lykov

21.12.2018 14:24, Mike Lykov пишет:


I have a 4.2.7 setup hyperconverged, two deployed VM Engine images and i 
have 20-30 second network outage. After some pinging to start engine on 
host 1, then 2, then again 1 Engine image stuck at

"Probing EDD (edd=off to disable)... _"
as here: https://bugzilla.redhat.com/show_bug.cgi?id=1569827


Now I looking to the logs.
Full /var/log archives are here:
https://yadi.sk/d/XZ5jJfQLN6QMlA (HE engine logs) - 36 Mb
https://yadi.sk/d/bZ0TYGxFoHGgIQ (ovirtnode6 logs) - 144  Mb

I do some CCs in this email to personal addresses, if i's not relevant - 
please ignore.


Host nodes (centos 7.5) named ovirtnode1,5,6. Timeouts (in ha agent) are 
default. Sanlock are configured (as i think)

HE running on ovirtnode6, and spare HE deployed on ovirtnode1.

There is two network links: ovirtmgmt over "ovirtmgmt: port 
1(enp59s0f0)" and glusterfs storage network over ib0 interface 
(different subnet)


messages log on ovirtnode6:
That outage:

---
Dec 21 12:32:56 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC 
Link is Down
Dec 21 12:32:56 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) entered 
disabled state
Dec 21 12:33:13 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC 
Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit
Dec 21 12:33:13 ovirtnode6 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): 
enp59s0f0: link becomes ready
Dec 21 12:33:13 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) entered 
forwarding state
Dec 21 12:33:13 ovirtnode6 NetworkManager[1715]:  
[1545381193.2204] device (enp59s0f0): carrier: link connected

---

There is 17 second. at 33:13 link are back. BUT all events lead to crash 
follow later:


HA agent log:
--
MainThread::INFO::2018-12-21 
12:32:59,540::states::444::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) 
Engine vm running on localhost
MainThread::INFO::2018-12-21 
12:32:59,662::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) 
Current state EngineUp (score: 3400)
MainThread::INFO::2018-12-21 
12:33:09,797::states::136::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) 
Penalizing score by 1280 due to gateway status
MainThread::INFO::2018-12-21 
12:33:09,798::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) 
Current state EngineUp (score: 2120)
MainThread::ERROR::2018-12-21 
12:33:19,815::states::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) 
Host ovirtnode1.miac (id 1) score is significantly better than local 
score, shutting down VM on this host

--


syslog messages:

Dec 21 12:33:19 ovirtnode6 journal: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Host 
ovirtnode1.miac (id 1) score is significantly better than local score, 
shutting down VM on this host
Dec 21 12:33:29 ovirtnode6 journal: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM 
stopped on localhost
Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state

Dec 21 12:33:37 ovirtnode6 kernel: device vnet1 left promiscuous mode
Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state
Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]:  
[1545381217.1796] device (vnet1): state change: disconnected -> 
unmanaged (reason 'unmanaged', sys-iface-state: 'removed')
Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]:  
[1545381217.1798] device (vnet1): released from master device ovirtmgmt
Dec 21 12:33:37 ovirtnode6 libvirtd: 2018-12-21 08:33:37.192+: 2783: 
**error : qemuMonitorIO:719 : internal error: End of file 
from qemu monitor*  - WHAT IS THIS?

Dec 21 12:33:37 ovirtnode6 kvm: 2 guests now active
Dec 21 12:33:37 ovirtnode6 systemd-machined: Machine qemu-2-HostedEngine 
terminated.
Dec 21 12:33:37 ovirtnode6 firewalld[1693]: WARNING: COMMAND_FAILED: 
'/usr/sbin/iptables -w2 -w -D libvirt-out -m physdev 
--physdev-is-bridged --physdev-out vnet1 -g FP-vnet1' failed: iptables 
v1.4.21: goto 'FP-vnet1' is not a chain#012#0

12Try `iptables -h' or 'iptables --help' for more information.

Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
blocking state
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state

Dec 21 12:33:55 ovirtnode6 kernel: device vnet1 entered promiscuous mode
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
blocking state
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
forwarding state
Dec 21 12:33:55 ovirtnode6 lldpad: recvfrom(Event interface): No buffer 
space available
Dec 21 12:33:55 ovirtnode6 NetworkManager[1715]:  
[1545381235.8086] manager: (vnet1): new Tun device 
(/org/freedesktop/NetworkManager/Devices/37)

[ovirt-users] ovirt engine VM (xfs on sda3) broken, how to fix image?

2018-12-21 Thread Mike Lykov



I have a 4.2.7 setup hyperconverged, two deployed VM Engine images and i 
have 20-30 second network outage. After some pinging to start engine on 
host 1, then 2, then again 1 Engine image stuck at

"Probing EDD (edd=off to disable)... _"

as here: https://bugzilla.redhat.com/show_bug.cgi?id=1569827

I stop ha-agent, ha-broker on two hosts (to not trying to start Engine 
VM), Stop VM via

vdsm-client VM destroy vmID="4f169ca9-1854-4e3f-ad57-24445ec08c79"
on both hosts, but i have a lock (lease file) anyway

Oh, lease are disappeared while i wrote he message now xfs_repair 
output:


ERROR: The filesystem has valuable metadata changes in a log which needs 
to be replayed.


guestmount :
ommandrvf: udevadm --debug settle -E /dev/sda3
calling: settle
..
command: mount '-o' 'ro' '/dev/sda3' '/sysroot//'
[1.478858] SGI XFS with ACLs, security attributes, no debug enabled
[1.481701] XFS (sda3): Mounting V5 Filesystem
[1.514183] XFS (sda3): Starting recovery (logdev: internal)
[1.537299] XFS (sda3): Internal error XFS_WANT_CORRUPTED_GOTO at 
line 1664 of file fs/xfs/libxfs/xfs_alloc.c.  Caller 
xfs_free_extent+0xaa/0x140 [xfs]


and "Structure needs cleaning"  .

before it:

[root@ovirtnode6 aa6f3e9b-2eba-4fab-a8ee-a4a1aceddf5e]# ls -l
итого 7480047
-rw-rw. 1 vdsm kvm 83751862272 дек 21 13:05 
38ef3aac-6ecc-4940-9d2c-ffe4e2557482
-rw-rw. 1 vdsm kvm 1048576 дек 21 13:27 
38ef3aac-6ecc-4940-9d2c-ffe4e2557482.lease
-rw-r--r--. 1 vdsm kvm 338 ноя  2 14:01 
38ef3aac-6ecc-4940-9d2c-ffe4e2557482.meta


If i try to use guestfs
LIBGUESTFS_BACKEND=direct guestfish --rw -a 
38ef3aac-6ecc-4940-9d2c-ffe4e2557482

and 'run'

It result to
 run
.
qemu-kvm: -device scsi-hd,drive=hd0: Failed to get "write" lock
Is another process using the image?

in vdsm-client Host getVMList I do not see engine VM (get id from 
vdsm-client Host getAllVmStats), because it stopeed?


And i want to remove lease by vdsm-client, i need an json file with 
UUIDs like

usage: vdsm-client Lease info [-h] [arg=value [arg=value ...]]
positional arguments:
  arg=value   lease: The lease to query
  JSON representation:
  {
  "lease": {
  "sd_id": "UUID",
  "lease_id": "UUID"
  }
  }

in all docs I not find any explains about sd_id and lease_id - where i 
can get it?

see, for example:
https://www.ovirt.org/develop/developer-guide/vdsm/vdsm-client.html


without it I get:
[root@ovirtnode6 aa6f3e9b-2eba-4fab-a8ee-a4a1aceddf5e]# vdsm-client 
Lease info lease=38ef3aac-6ecc-4940-9d2c-ffe4e2557482
vdsm-client: Command Lease.info with args {'lease': 
'38ef3aac-6ecc-4940-9d2c-ffe4e2557482'} failed:

(code=100, message='unicode' object has no attribute 'get')

[root@ovirtnode6 ~]# vdsm-client Lease status 
lease=38ef3aac-6ecc-4940-9d2c-ffe4e2557482
vdsm-client: Command Lease.status with args {'lease': 
'38ef3aac-6ecc-4940-9d2c-ffe4e2557482'} failed:

(code=100, message=)

Please, help me to fix that Engine VM image.


--
Mike
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YFXBYAL6SM45GGX7FWGYMS3APK6QXIDB/


[ovirt-users] Re: engine mails about FSM states

2018-12-17 Thread Mike Lykov

14.12.2018 15:52, Martin Sivak пишет:

Hi,

 > Host id is not set

This is an internal error report that should normally not happen. It 
means the ovirt-ha-agent asked for a storage operation before it 
registered itself with the broker. If this happens seldom then it looks 
like a race condition.


I'm seeing this events also on my setup randomly and seldom.
versions:
ovirt-release42-4.2.7.1-1.el7.noarch
ovirt-hosted-engine-ha-2.2.18-1.el7.noarch
ovirt-host-4.2.3-1.el7.x86_64
...
I have emails from 15 nov, 21 nov, 01 dec, 05 dec...
Last event at 09 dec 16:35:

-
MainThread::INFO::2018-12-09 
16:34:05,223::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) 
Current state EngineUp (score: 3400)
MainThread::INFO::2018-12-09 
16:34:15,360::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) 
Current state EngineUp (score: 3400)

... ALL OK HERE UNTIL..
MainThread::ERROR::2018-12-09 
16:34:25,373::hosted_engine::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) 
Unhandled monitoring loop exception

Traceback (most recent call last):
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", 
line 428, in start_monitoring

self._monitoring_loop()
.
File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 135, in get_stats_from_storage

result = self._proxy.get_stats()
.
error: [Errno 2] No such file or directory
.
-
MainThread::ERROR::2018-12-09 
16:34:25,380::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) 
Trying to restart agent

.
MainThread::INFO::2018-12-09 
16:34:46,353::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) 
Starting monitor storage-domain, options {'sd_uuid': 
'0571ac7b-a28e-4e20-9cd8-4803e40ec602'}
MainThread::INFO::2018-12-09 
16:34:46,354::brokerlink::85::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) 
Success, id storage-domain

.
MainThread::INFO::2018-12-09 
16:35:25,062::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) 
Success, was notification of state_transition (EngineStarting-EngineUp) 
sent? sent

.
MainThread::INFO::2018-12-09 
16:35:31,209::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) 
Current state EngineUp (score: 3400)

-


I would recommend opening a bug report with all the logs we talked about 
and all the RPM versions (ovirt-hosted-engine-ha and 
ovirt-hosted-engine-setup packages). Use this link to go directly to the 
right component: 
https://bugzilla.redhat.com/enter_bug.cgi?product=ovirt-hosted-engine-ha


Are anybody open this bug report ? I do not have account in RH bugzilla..

(Accidentally this (9 dec) log file is rotated while i wrote this email...)




On Fri, Dec 14, 2018 at 12:35 PM fsoyer > wrote:


In borker.log I found this, just before 05:59am:

Thread-3::INFO::2018-12-13
05:58:45,634::mem_free::51::mem_free.MemFree::(action) memFree:
82101
Thread-1::INFO::2018-12-13
05:58:46,322::ping::60::ping.Ping::(action) Successfully pinged
10.0.1.254
Thread-5::INFO::2018-12-13

05:58:46,611::engine_health::241::engine_health.EngineHealth::(_result_from_stats)
VM is up on this host with healthy engine
Thread-2::INFO::2018-12-13
05:58:49,144::mgmt_bridge::62::mgmt_bridge.MgmtBridge::(action)
Found bridge ovirtmgmt with ports
StatusStorageThread::ERROR::2018-12-13

05:58:54,935::status_broker::90::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run)
Failed to update state.
Traceback (most recent call last):
   File

"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
line 82, in run
     if (self._status_broker._inquire_whiteboard_lock() or
   File

"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
line 190, in _inquire_whiteboard_lock
     self.host_id, self._lease_file)
   File

"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
line 128, in host_id
     raise ex.HostIdNotLockedError("Host id is not set")
HostIdNotLockedError: Host id is not set
StatusStorageThread::ERROR::2018-12-13

05:58:54,937::status_broker::70::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(trigger_restart)
Trying to restart the broker

"Host is not set" ???
--

Regards,

*Frank*



Le Vendredi, Décembre 14, 2018 12:27 CET, Martin Sivak
mailto:msi...@redhat.com>> a écrit:

Hi,
check the broker.log as well. The connect 

[ovirt-users] Re: Cloud-init reset network configuration to default dhcp after reboot and regular run

2018-12-06 Thread Mike Lykov

05.12.2018 13:04, Eitan Raviv пишет:
After further investigation I would like to share one more important 
piece of information that explains the "reset" behaviour of the network 
configuration:


Thanks for that detailed clarification.

When a VM is started in 'Run once' mode, the initialization parameters 
supplied for that run are always passed by engine to cloud-init in the 
guest for application.


It's clear and work as intended.

But if a VM is started in 'Run' mode, the initialization parameters are 
passed to cloud-init on the guest only if this is the first run (be it 
'Run' or 'Run once'). On every consecutive run in 'Run' mode no 
parameters are passed to the guest, and therefore (as I quoted from the 
cloud-init documentation earlier in this thread) cloud-init falls back 
to DHCP configuration on the guest.



When this behaviour was introduced into engine the reasoning was that 
after the initial configuration of the VM, there is no reason to resend 
the configuration on every 'Run' but only on 'Run once'.


Yes, there is no reason, but for new user it looks very confusing:
 - create VM
 - configure cloud-init parameters (like IP addr)
 - Run VM (not once), parameters applied (all OK? as it seems)
 - work with/in VM
 - Stop VM (maybe after period of time when VM counts as 'configured 
successfully') for some maintetance
 - Run VM and try to access it (it is 'configured successfully' some 
time ago, isn't it?)

 - NO ACCESS to VM and configurаtion is LOST
 - try to access via console with 'WTF?' exclamation 

Why cloud-init in this scenario after successful configuration is not 
turn off/disable himself (by touch that file, for example) ?


Instead of it cloud-init+ovirt force user to:
 - try to figure why configuration is lost, learn "custom script" format
 - write "custom script" for touch /etc/cloud/cloud-init.disabled and 
reboot

 - run once VM to apply parameters
 - run VM as usual ?


Due to the behaviour of the current cloud-init package, this causes an 
unexpected side effect that should be dealt with by disabling cloud-init 
in one of the methods I described earlier in this thread.


Are there use cases when cloud-init (as a service) may require to start 
repeatedly on consecutive usual VM runs?
If will need to change parameters user may use "Run Once" start with new 
parameters - in this case a 'disabled' file may be ignored.

But what reason to finding a configuration while usual run at all?

--
BR, Mike




On Wed, Nov 28, 2018 at 10:12 AM Eitan Raviv <mailto:era...@redhat.com>> wrote:


On Wed, Nov 28, 2018 at 7:29 AM Mike Lykov mailto:co...@ya.ru>> wrote:
 >
 > 27.11.2018 16:15, Eitan Raviv пишет:
 > > According to cloud-init 0.7.9 documentation cloud-init is
configured
 > > to run by default on each boot [1] and to render the user-selected
 > > network configuration on first boot [2]. Also, in absence of a data
 > > source to configure the network, it will fall back to
configuring DHCP
 > > on eth0 [2].
 > >
 > > As you noted, if you run a VM once, and then in the next
regular run
 > > the cloud-init flag is not selected in the VM configuration in
engine,
 > > there is no data-source and cloud-init falls back to dhcp as
 > > documented.
 >
 > Thanks for the explanation. What intended use of this
subsystem/feature
 > are supposed to?
 >
 > My setup is not in cloud, it's local and use static IP adresses
for VM.
 > I do not want to configure each VM network in console by hand.
 > I create VM from template (template have installed cloud-init
package),
 > configure cloud-init hostname/eth0 network in engine, and as "custom
 > script" (at the same moment) I set a "touch
 > /etc/cloud/cloud-init.disabled" ?

Either that or add custom script to disable just cloud-init network
re-config as you did manually.
Please consult the documentation for the custom script syntax and format
(e.g. search for 'runcmd' in
https://cloudinit.readthedocs.io/en/0.7.9/topics/examples.html)

 > Then I "Run once" a VM, stop it, and run as usual without data source
 > and fallback.
 > Or I name network interface not "eth0" and therefore without need for
 > custom script?

I did not test the outcome of assigning the static IP to another NIC.
Just sharing a thought...

 >
 >
 > > The 'marker' file you refer to are also documented as follows:
 > >
 > > * disabling cloud-init altogether [1] with: touch
/etc/cloud/cloud-init.disabled
 > > * preventing cloud-init from configuring the network [2] with: echo
 > > ‘network: {config: disabled}‘ >> /etc/cloud/cloud.cfg
 > &

[ovirt-users] Re: oVirt 4.2.8 First Release Candidate is now available

2018-11-29 Thread Mike Lykov

29.11.2018 12:59, Simone Tiraboschi пишет:

* Read more about the oVirt 4.2.8 release highlights: 
http://www.ovirt.org/release/4.2.8/


There is a broken link in section "Upgrade Hosted Engine"

current link points to https://www.ovirt.org/upgrade-guide/upgrade-guide/

real link https://www.ovirt.org/documentation/upgrade-guide/

(and many links in doc are the same 404 + search)

--
Mike
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XICQW6YK37KZQNICNXTDVWM5AGNIQM5F/


[ovirt-users] Re: Cloud-init reset network configuration to default dhcp after reboot and regular run

2018-11-27 Thread Mike Lykov

27.11.2018 16:15, Eitan Raviv пишет:

According to cloud-init 0.7.9 documentation cloud-init is configured
to run by default on each boot [1] and to render the user-selected
network configuration on first boot [2]. Also, in absence of a data
source to configure the network, it will fall back to configuring DHCP
on eth0 [2].

As you noted, if you run a VM once, and then in the next regular run
the cloud-init flag is not selected in the VM configuration in engine,
there is no data-source and cloud-init falls back to dhcp as
documented.


Thanks for the explanation. What intended use of this subsystem/feature 
are supposed to?


My setup is not in cloud, it's local and use static IP adresses for VM.
I do not want to configure each VM network in console by hand.
I create VM from template (template have installed cloud-init package), 
configure cloud-init hostname/eth0 network in engine, and as "custom 
script" (at the same moment) I set a "touch 
/etc/cloud/cloud-init.disabled" ?
Then I "Run once" a VM, stop it, and run as usual without data source 
and fallback.
Or I name network interface not "eth0" and therefore without need for 
custom script?




The 'marker' file you refer to are also documented as follows:

* disabling cloud-init altogether [1] with: touch /etc/cloud/cloud-init.disabled
* preventing cloud-init from configuring the network [2] with: echo
‘network: {config: disabled}‘ >> /etc/cloud/cloud.cfg
whichever scenario is used to run a VM, this can be accomplished by
adding the above commands to the custom_script that cloud-init runs at
the last stage of its operation [3].

There is possibly a third 'hack' that would not require any marker file:
* assign your static IP to a NIC not named 'eth0'
I have not tested it myself but it looks like a corollary of [2]

HTH

[1] https://cloudinit.readthedocs.io/en/0.7.9/topics/boot.html#generator
[2] https://cloudinit.readthedocs.io/en/0.7.9/topics/boot.html#local
[3] https://cloudinit.readthedocs.io/en/0.7.9/topics/boot.html#final

On Wed, Nov 21, 2018 at 10:51 AM Mike Lykov  wrote:


20.11.2018 15:30, Mike Lykov пишет:


"cloud-init used to use a "marker" file that it created on initial
execution. If that "marker" file existed it would not rerun on reboot. "
- are it not working  in ovirt/this cloud-init version ?


new restart:

--
2018-11-21 12:40:53,314 - main.py[DEBUG]: Checking to see if files that
we need already exist from a previous run that would allow us to stop early.
2018-11-21 12:40:53,315 - main.py[DEBUG]: Execution continuing, no
previous run detected that would allow us to stop early.
-

which files it try to find ?

--
Mike

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ACNLTDH55L4YX5DWNRQZ3VPRWPFYMOLT/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YKHFUAVY7D2DOS2TA2B6FILCNIYPHAW5/


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CDKFFGGNQ33RVKXOW7APVD7CMX77L7AM/


[ovirt-users] Re: Cloud-init reset network configuration to default dhcp after reboot and regular run

2018-11-21 Thread Mike Lykov

20.11.2018 15:30, Mike Lykov пишет:

"cloud-init used to use a "marker" file that it created on initial 
execution. If that "marker" file existed it would not rerun on reboot. " 
- are it not working  in ovirt/this cloud-init version ?


new restart:

--
2018-11-21 12:40:53,314 - main.py[DEBUG]: Checking to see if files that 
we need already exist from a previous run that would allow us to stop early.
2018-11-21 12:40:53,315 - main.py[DEBUG]: Execution continuing, no 
previous run detected that would allow us to stop early.

-

which files it try to find ?

--
Mike

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ACNLTDH55L4YX5DWNRQZ3VPRWPFYMOLT/


[ovirt-users] Re: Cloud-init reset network configuration to default dhcp after reboot and regular run

2018-11-20 Thread Mike Lykov

20.11.2018 15:30, Mike Lykov пишет:

DataSourceNotFoundException: Did not find any data source, searched 
classes: (DataSourceNoCloudNet)
2018-11-20 13:53:24,194 - stages.py[DEBUG]: applying net config names 
for {'version': 1, 'config': [{'subnets': [{'type': 'dhcp'}],

'type': 'physical', 'name': 'eth0', 'mac_address': '56:6f:21:4a:00:04'}]}


full message (when cloud-init drop static ip configured previusly to 
default dhcp)


2018-11-20 14:35:00,334 - stages.py[DEBUG]: applying net config names 
for {'version': 1, 'config': [{'subnets': [{'type': 'dhcp'}], 'type': 
'physical', 'name': 'eth0', 'mac_address': '56:6f:21:4a:00:05'}]}
2018-11-20 14:35:00,334 - util.py[DEBUG]: Reading from 
/sys/class/net/lo/addr_assign_type (quiet=False)
2018-11-20 14:35:00,334 - util.py[DEBUG]: Read 2 bytes from 
/sys/class/net/lo/addr_assign_type
2018-11-20 14:35:00,334 - util.py[DEBUG]: Reading from 
/sys/class/net/lo/address (quiet=False)
2018-11-20 14:35:00,334 - util.py[DEBUG]: Read 18 bytes from 
/sys/class/net/lo/address
2018-11-20 14:35:00,334 - util.py[DEBUG]: Reading from 
/sys/class/net/eth0/addr_assign_type (quiet=False)
2018-11-20 14:35:00,334 - util.py[DEBUG]: Read 2 bytes from 
/sys/class/net/eth0/addr_assign_type
2018-11-20 14:35:00,334 - util.py[DEBUG]: Reading from 
/sys/class/net/eth0/address (quiet=False)
2018-11-20 14:35:00,335 - util.py[DEBUG]: Read 18 bytes from 
/sys/class/net/eth0/address
2018-11-20 14:35:00,335 - util.py[DEBUG]: Reading from 
/sys/class/net/lo/operstate (quiet=False)
2018-11-20 14:35:00,335 - util.py[DEBUG]: Read 8 bytes from 
/sys/class/net/lo/operstate
2018-11-20 14:35:00,335 - util.py[DEBUG]: Reading from 
/sys/class/net/eth0/operstate (quiet=False)
2018-11-20 14:35:00,335 - util.py[DEBUG]: Read 3 bytes from 
/sys/class/net/eth0/operstate
2018-11-20 14:35:00,335 - util.py[DEBUG]: Running command ['ip', '-6', 
'addr', 'show', 'permanent', 'scope', 'global'] with allowed return 
codes [0] (shell=False, capture=True)
2018-11-20 14:35:00,338 - util.py[DEBUG]: Running command ['ip', '-4', 
'addr', 'show'] with allowed return codes [0] (shell=False, capture=True)
2018-11-20 14:35:00,340 - __init__.py[DEBUG]: no work necessary for 
renaming of [['56:6f:21:4a:00:05', 'eth0']]
2018-11-20 14:35:00,341 - stages.py[INFO]: Applying network 
configuration from fallback bringup=True: {'version': 1, 'config': 
[{'subnets': [{'type': 'dhcp'}], 'type': 'physical', 'name': 'eth0', 
'mac_addre

ss': '56:6f:21:4a:00:05'}]}
2018-11-20 14:35:00,343 - util.py[DEBUG]: Writing to 
/etc/sysconfig/network-scripts/ifcfg-eth0 - wb: [420] 159 bytes


last action rewrites config ...

--
Mike



___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JACYAEFFSMWS3XIJ5TQG5S6E4RBFUBYV/


[ovirt-users] Cloud-init reset network configuration to default dhcp after reboot and regular run

2018-11-20 Thread Mike Lykov

Hi All!

I'm trying to configure network in VMs created from template, and see 
strange behaviour from cloud-init.

cloud-init installed and enabled in template.
ver cloud-init-0.7.9-24.el7.centos.1.x86_64
guest Centos 7.5
ovirt 4.2.7 from ovirt-releases-42-pre repo

1. I use "run once" with initial run - use cloud-init - networks
 in-guest net iface : eth0
add new - static
enter address, mask, gw
ipv6 none

Run (once) and cloud-init configure ifcfg-eth0 for that address 
(successfully).


2. I shutdown that VM and use "Run" (regular) without "use cloud init" 
in VM properties, awaiting that above configurations are saved (and 
booted with it).


But because "use cloud init" not checked, and cloud-init service 
enabled, it start, cannot find datasource and drop configuration to 
default (dhcp).


In cloud-init.log

2018-11-20 13:53:24,153 - util.py[WARNING]: No instance datasource 
found! Likely bad things to come!
2018-11-20 13:53:24,153 - util.py[DEBUG]: No instance datasource found! 
Likely bad things to come!

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cloudinit/cmd/main.py", line 
236, in main_init

init.fetch(existing=existing)
  File "/usr/lib/python2.7/site-packages/cloudinit/stages.py", line 
343, in fetch

return self._get_data_source(existing=existing)
  File "/usr/lib/python2.7/site-packages/cloudinit/stages.py", line 
253, in _get_data_source

pkg_list, self.reporter)
  File 
"/usr/lib/python2.7/site-packages/cloudinit/sources/__init__.py", line 
320, in find_source

raise DataSourceNotFoundException(msg)
DataSourceNotFoundException: Did not find any data source, searched 
classes: (DataSourceNoCloudNet)
2018-11-20 13:53:24,194 - stages.py[DEBUG]: applying net config names 
for {'version': 1, 'config': [{'subnets': [{'type': 'dhcp'}],

'type': 'physical', 'name': 'eth0', 'mac_address': '56:6f:21:4a:00:04'}]}

It reverts config as in here
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/NE27UO4WNZIC27GZY4D2DCFX4DIYFBQP/

3. then I found bug
https://bugzilla.redhat.com/show_bug.cgi?id=1439373#c5

and when "run once" I disable network config as in that comment.
Shutdown, Run (not once) and voila! Ip address are static!

2018-11-20 15:07:31,601 - handlers.py[DEBUG]: finish: 
init-network/search-NoCloudNet: SUCCESS: no network data found from 
DataSource NoCloudNet

.
2018-11-20 15:07:31,602 - util.py[WARNING]: No instance datasource 
found! Likely bad things to come!
2018-11-20 15:07:31,602 - util.py[DEBUG]: No instance datasource found! 
Likely bad things to come!


2018-11-20 15:07:31,639 - stages.py[DEBUG]: network config disabled by 
system_cfg
2018-11-20 15:07:31,639 - stages.py[INFO]: network config is disabled by 
system_cfg
2018-11-20 15:07:31,639 - main.py[DEBUG]: [net] Exiting without 
datasource in local mode
2018-11-20 15:07:31,640 - util.py[DEBUG]: Reading from /proc/uptime 
(quiet=False)

2018-11-20 15:07:31,640 - util.py[DEBUG]: Read 12 bytes from /proc/uptime
2018-11-20 15:07:31,640 - util.py[DEBUG]: cloud-init mode 'init' took 
0.287 seconds (0.29)
2018-11-20 15:07:31,640 - handlers.py[DEBUG]: finish: init-network: 
SUCCESS: searching for network datasources


"cloud-init used to use a "marker" file that it created on initial 
execution. If that "marker" file existed it would not rerun on reboot. " 
- are it not working  in ovirt/this cloud-init version ?


--
Mike
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SNJ6IAZ2C6VJR6IAVZHTL6LM5DIWSXJR/


[ovirt-users] Re: Difference between Template and Clone in oVirt

2018-11-19 Thread Mike Lykov

19.11.2018 14:36, Hari Prasanth Loganathan пишет:

Hi Team,

Could someone let me know the difference between creating VM from clone 
and template in oVirt?

It really looks the same.


I think you can create a "clone based on template" (independent from 
template) or "thin machine" (dependent from template. if template breaks 
machine also breaks) ?


How to "creating VM from clone" ?
I'm also new to ovirt and investigating it.

By the way, ovirt/redhat docs says:
---
 When you create a template, you specify the format of the disk to be 
raw or QCOW2:


QCOW2 disks are thin provisioned.
Raw disks on file storage are thin provisioned.
Raw disks on block storage are preallocated.
---

I have a hyperconverged install with glusterfs file storage, and when I 
create template, it is have a "allocation policy = thin provision" 
automatically.


But when I create new machine without template, I can create new disk 
image and select allocation policy  - thin or preallocated.


Are there a way to create preallocated VMs from template and file 
storage? I think they will be more reliable/safe and error-prone?




___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PZWXKDICL5DDHRTRYKPMW6PZWEKUW3NE/


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Q4W6T2MCIWBEXRZGB6F2G3OAQOANXIMX/


[ovirt-users] Re: GlusterFS LibgfApiSupported and High Availability

2018-11-18 Thread Mike Lykov

18.11.2018 8:16, Shawn Weeks пишет:
Currently when LibgfApiSupported is enabled it looks like the startup 
command for the VM has the Gluster hostname always set to the same host. 
How does that work if that host is down? In my case GlusterFS has 3x 
replication and distribution enabled but if the first host is down the 
VMs don’t work. I’m on GlusterFS 4.1 and oVirt 4.2 latest.


Please see my post at 08 Nov 2018 here with subject "libgfapi support 
are "false" by default in ovirt 4.2 ?​"


there is a link to
https://bugzilla.redhat.com/show_bug.cgi?id=1484227

and some discussion about updates in libvirt

--
Mike
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IHBE4AYDWYZ3BNKBTL4SGDBLERNFGGSA/


[ovirt-users] Re: Error start VM with "XML: Multiple 'scsi' controllers with index '0' " - how to workaround?

2018-11-18 Thread Mike Lykov

18.11.2018 20:02, Sharon Gratch пишет:

Hi,

This was an old issue bug solved in 4.2.2, as you mentioned.


But my ovirt is 4.2.7 (installed at november 2018 begin from
ovirt-release42.rpm repository
(and then update it to 42-pre)


1. Did you run the VM with "run once" or a regular running operation?


No, with this VM I not used "run once", only "run" (I edit boot order to 
boot from CD)


2. Can you please send the vdsm.log and engine log from the time range 
of creating the VM till restarting the VM with a failure?


Ok, i found that logs. Engine log shows that first time creating/running 
was successful, scsi controller doubled later.

I will send logs to your personal email.

VM created at 9 Nov 16:02
one scsi controller
First error encountered at 13 Nov 13:15 - multiple scsi controllers


On Thu, Nov 15, 2018 at 1:04 PM Mike Lykov <mailto:co...@ya.ru>> wrote:


Hi all.

I'm testing oVirt (it's new to me also) and create some VM with
nearly-all-default parameters.
(ovirt 4.2.7, with last updates)

I create one disk for VM (via webui) :
General -> create -> new image with default interface virtio-scsi
(boot,
50 Gb size)
resource allocation - virtio-scsi enabled by default
IO threads enabled by default
I run it with cdrom+iso for install centos.

When i try to reboot it this VM can start or not can start with error
(one time start, next reboot - not start, sometimes after it start...)

[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(ForkJoinPool-1-worker-3) [] EVENT_ID: VM_DOWN_ERROR(119), VM test_rh1
is down with error. Exit message: Ошибка XML: Обнаружено нескол
ько контроллеров «scsi» с индексом «0».

(by the way , why errors are localised? it means  "Multiple 'scsi'
controllers with index '0'"

In engine log here is a config xml dump:

      
        
      
       
        
        
      

It is really contain two controllers:

      
      

But I do NOT add two this controllers by hand!
And sometimes it can start! (without configuration change)

So there is a Question: How to workaround this and start the VM?
In "vm devices" I have two controllers:

virtio-scsi {ioThreadId=1}
scsi {type=pci, slot=0x05, bus=0x00, domain=0x, function=0x0
{index=0}

But all checkboxes are greyed

bug like in this list post:
https://lists.ovirt.org/pipermail/users/2018-February/086860.html

There is a "CLOSED" bugs about this
https://bugzilla.redhat.com/show_bug.cgi?id=1543833#c9
https://bugzilla.redhat.com/show_bug.cgi?id=1563769
https://bugzilla.redhat.com/show_bug.cgi?id=1535961


--
Mike
___
Users mailing list -- users@ovirt.org <mailto:users@ovirt.org>
To unsubscribe send an email to users-le...@ovirt.org
<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/JMLURZSRNALKYMHRA5BN6GTEVF6PVCAN/


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CEOW6HLKFAOYIWVIOJ3Y6KRO7YP65VS2/


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OSR2FP7E6X56DNEOAEXHWHNNE5S2Z5J2/


[ovirt-users] Error start VM with "XML: Multiple 'scsi' controllers with index '0' " - how to workaround?

2018-11-15 Thread Mike Lykov

Hi all.

I'm testing oVirt (it's new to me also) and create some VM with 
nearly-all-default parameters.

(ovirt 4.2.7, with last updates)

I create one disk for VM (via webui) :
General -> create -> new image with default interface virtio-scsi (boot, 
50 Gb size)

resource allocation - virtio-scsi enabled by default
IO threads enabled by default
I run it with cdrom+iso for install centos.

When i try to reboot it this VM can start or not can start with error
(one time start, next reboot - not start, sometimes after it start...)

[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-3) [] EVENT_ID: VM_DOWN_ERROR(119), VM test_rh1 
is down with error. Exit message: Ошибка XML: Обнаружено нескол

ько контроллеров «scsi» с индексом «0».

(by the way , why errors are localised? it means  "Multiple 'scsi' 
controllers with index '0'"


In engine log here is a config xml dump:


  type="pci"/>


 
  
  


It is really contain two controllers:




But I do NOT add two this controllers by hand!
And sometimes it can start! (without configuration change)

So there is a Question: How to workaround this and start the VM?
In "vm devices" I have two controllers:

virtio-scsi {ioThreadId=1}
scsi {type=pci, slot=0x05, bus=0x00, domain=0x, function=0x0 {index=0}

But all checkboxes are greyed

bug like in this list post:
https://lists.ovirt.org/pipermail/users/2018-February/086860.html

There is a "CLOSED" bugs about this
https://bugzilla.redhat.com/show_bug.cgi?id=1543833#c9
https://bugzilla.redhat.com/show_bug.cgi?id=1563769
https://bugzilla.redhat.com/show_bug.cgi?id=1535961


--
Mike
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JMLURZSRNALKYMHRA5BN6GTEVF6PVCAN/


[ovirt-users] Re: libgfapi support are "false" by default in ovirt 4.2 ?

2018-11-08 Thread Mike Lykov

08.11.2018 18:50, Sahina Bose пишет:

On Thu, Nov 8, 2018 at 8:13 PM Simone Tiraboschi  wrote:


Hi,
adding also Sahina here.
AFAIK it should be enabled by default in hyper-converged deployments.

Can you please grep your deployment logs for ENABLE_LIBGFAPI?


No, libgfapi access is disabled by default due to lack of HA
(https://bugzilla.redhat.com/show_bug.cgi?id=1484227)


At this moment, for version mentioned below, this bug is actual?
comment 13 at 2018-08-31 asks for workaround is unanswered too..

more than a year has passed since that, versions updated
now qemu-kvm-ev-2.10.0-21.el7_5.7.1.x86_64
vdsm-4.20.43-1.el7.x86_64
glusterfs-server-3.12.15-1.el7.x86_64

I cannot reproduce it on my test install because I not configured 
storage fully (see my "set up ovirt 4.2 hyperconverged with glusterfs 
"storage network" over infiniband" post here)

But if it needs I may try.



On Thu, Nov 8, 2018 at 2:56 PM Mike Lykov  wrote:


Hi All

I'm try to set up last ovirt version : ovirt-release42-pre.rpm repository
Then, I install these (and many deps) rpms:
ovirt-hosted-engine-setup-2.2.30-1.el7.noarch
ovirt-engine-appliance-4.2-20181026.1.el7.noarch
vdsm-4.20.43-1.el7.x86_64
vdsm-gluster-4.20.43-1.el7.x86_64
vdsm-network-4.20.43-1.el7.x86_64
All from that repository, and use webui installer for create glusterfs
volumes (default suggested engine, data, vmstore) and then install
hosted engine on that "engine" volume.

In cluster I want to use libgfapi gluster storage access method, but
when I import storages, created by installer at first step, VDSM mount
it on hosts with FUSE.

For Example
ovirtstor1.miac:/engine on
/rhev/data-center/mnt/glusterSD/ovirtstor1.miac:_engine type
fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

ovirtnode1.miac:/data on
/rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_data type
fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

I see in config, And it is disabled (false)
[root@ovirtengine ~]# engine-config -a | grep -i libgf
LibgfApiSupported: false version: 3.6
LibgfApiSupported: false version: 4.0
LibgfApiSupported: false version: 4.1
LibgfApiSupported: false version: 4.2

Why? Its is needed to enable it by hand?

Like as in this presentation?
https://www.slideshare.net/DenisChapligin/improving-hyperconverged-performance

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WIOD3DMS5W56QF4LO3HCIUHDHN7SQN2/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3I3KGWFCLDXVLQUJMKYWCHCSFCYD3OIJ/


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W6BSVGGYBLR5X67B5Z6PWFK7JL32HVQF/


[ovirt-users] libgfapi support are "false" by default in ovirt 4.2 ?

2018-11-08 Thread Mike Lykov

Hi All

I'm try to set up last ovirt version : ovirt-release42-pre.rpm repository
Then, I install these (and many deps) rpms:
ovirt-hosted-engine-setup-2.2.30-1.el7.noarch
ovirt-engine-appliance-4.2-20181026.1.el7.noarch
vdsm-4.20.43-1.el7.x86_64
vdsm-gluster-4.20.43-1.el7.x86_64
vdsm-network-4.20.43-1.el7.x86_64
All from that repository, and use webui installer for create glusterfs 
volumes (default suggested engine, data, vmstore) and then install 
hosted engine on that "engine" volume.


In cluster I want to use libgfapi gluster storage access method, but 
when I import storages, created by installer at first step, VDSM mount 
it on hosts with FUSE.


For Example
ovirtstor1.miac:/engine on 
/rhev/data-center/mnt/glusterSD/ovirtstor1.miac:_engine type 
fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)


ovirtnode1.miac:/data on 
/rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_data type 
fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)


I see in config, And it is disabled (false)
[root@ovirtengine ~]# engine-config -a | grep -i libgf
LibgfApiSupported: false version: 3.6
LibgfApiSupported: false version: 4.0
LibgfApiSupported: false version: 4.1
LibgfApiSupported: false version: 4.2

Why? Its is needed to enable it by hand?

Like as in this presentation?
https://www.slideshare.net/DenisChapligin/improving-hyperconverged-performance

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WIOD3DMS5W56QF4LO3HCIUHDHN7SQN2/


[ovirt-users] Re: set up ovirt 4.2 hyperconverged with glusterfs "storage network" over infiniband

2018-11-08 Thread Mike Lykov

07.11.2018 12:27, Mike Lykov пишет:


4. Gluster volumes are in strange state:




  When i try to create volume (storage->volumes->new), press "add 
bricks" - there is a similar drop-down box "Bricks Host" contains only 
"ovirtnode" names, not "ovirtstor" ib interfaces..
If I try to use it - It cannot finished with error like "This host not 
in trusted pool", its true - in trusted tool there is other interface.


What the right way to configure this?


Update:

I create network "storage", uncheck "required" and check "migrate" and 
"gluster" role.
Then i attach this network in "setup host networks" to hosts, interface 
ib0 (but it state is out-of-sync because different MTU in DC config and 
real host, see previous post)


But in engine.log I see this message:

2018-11-08 17:19:23,406+04 WARN 
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] 
(DefaultQuartzScheduler7) [70a2eb8a] Could not associate brick 
'ovirtstor1.miac:/gluster_bricks/engine/engine' of volume 
'77d6bcb1-244d-4319-b3f0-e4eb73a9206c' with correct network as no 
gluster network found in cluster 'ea3c5a62-de76-11e8-9238-00163e062063'


Why "no gluster network found in cluster" ?
It is because it out-of-sync?
Network is UP in webui ...








___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DFXDKHS6PS53ES23RBVIIGFBX7BSWNEX/


[ovirt-users] set up ovirt 4.2 hyperconverged with glusterfs "storage network" over infiniband

2018-11-07 Thread Mike Lykov

Hi All!

I'm try to set up last ovirt version : ovirt-release42-pre.rpm 
repository (because bug 
https://bugzilla.redhat.com/show_bug.cgi?id=1637468 , for example  - 
it's not fixed in release42 stable)


Then, I install these (and many deps) rpms:
ovirt-hosted-engine-setup-2.2.30-1.el7.noarch
ovirt-engine-appliance-4.2-20181026.1.el7.noarch
vdsm-4.20.43-1.el7.x86_64
vdsm-gluster-4.20.43-1.el7.x86_64
vdsm-network-4.20.43-1.el7.x86_64
All from that repository, and use webui installer for create glusterfs 
volumes (default suggested engine, data, vmstore) and then install 
hosted engine on that "engine" volume.


But in my case i try to setup additional "storage network" (for example, 
as described there: 
https://ovirt.org/develop/release-management/features/gluster/select-network-for-gluster/ 
)
These screenshots are too old, and in 4.2 UI changed as I see, but idea 
are same.


I have two interface on each host: one ethernet (enp59s0f0 with address 
from 172.16.10.0/24 with default gateway) and one "Infiniband" (no 
default gateway, only between cluster nodes, no routing, no external 
access).  Really it is Intel Omni-path fabric :

---
[root@ovirtnode1 log]# hfi1_control -i
Driver Version: 10.8-0
Opa Version: 10.8.0.0.204
0: BoardId: Intel Corporation Omni-Path HFI Silicon 100 Series [integrated]
0,1: Status: 5: LinkUp 4: ACTIVE
-

It looks like IP-over-IB interface:
6: ib0:  mtu 65520 qdisc pfifo_fast 
state UP group default qlen 256
link/infiniband 
80:00:00:02:fe:80:00:00:00:00:00:00:00:11:75:09:01:1a:ee:ea brd 
00:ff:ff:ff:ff:12:40:1b:80:01:00:00:00:00:00:00:ff:ff:ff:ff

inet 172.16.100.1/24 brd 172.16.100.255 scope global noprefixroute ib0
   valid_lft forever preferred_lft forever
It has this properties in ifcfg-ib0 file:
CONNECTED_MODE=yes
MTU=65520

All IP-s on that interfaces pair have DNS records on external DNS:
ethernet (management network) has ovirtnode{N} names and infiniband 
(storage network) has an ovirtstor{N} names.


During webui glusterfs setup I used ovirtstor host names, trusted pool 
created:

5a9a0a5f-12f4-48b1-bfbe-24c172adc65covirtstor5.miac Connected
41350da9-c944-41c5-afdc-46ff51ab93f6ovirtstor6.miac Connected
0f50175e-7e47-4839-99c7-c7ced21f090clocalhost   Connected

Then I log in to web administration console and add two other hosts by 
their names



Name Hostname/IP Cluster Data Center Status SPM

ovirtnode1 ovirtnode1 Default Default Up SPM
ovirtnode5 ovirtnode5 Default Default Up Normal

For this setup I have some questions:

1. Where is a webui place when I can configure that i want to use 
"storage network" ?
I try to create second network (network->networks->new), but vdsm 
overwrite the ifcfg-ib0 file without that properties, as it is "like 
ethernet" interface:


Generated by VDSM version 4.20.43-1.el7
DEVICE=ib0
ONBOOT=yes
IPADDR=172.16.100.5
NETMASK=255.255.255.0
BOOTPROTO=none
MTU=65520
DEFROUTE=no
NM_CONTROLLED=no
IPV6INIT=no

MTU i entered by hand in General->MTU-Custom field, but:
It cannot be set without "CONNECTED_MODE=yes" property, and now in 
networks->"storage"->hosts it always show as "out-of-sync". "Custom 
properties" are greyed and not available.


2. If I use checkbox "VM network" when create network and then "setup 
host networks" with this network for ib0 interface - all engine hangs. I 
think it's because it try to bridge infiniband interface with other, and 
that cannot done (i see only "1 task running" that never ends and no 
other interface can show any details)


3. Also ovirt try to start send LLDP TLVs on interface ib0, but it 
cannot be done:
Nov  6 17:30:01 ovirtnode5 systemd: Starting Link Layer Discovery 
Protocol Agent Daemon
Nov  6 17:30:01 ovirtnode5 kernel: bnx2x: 
[bnx2x_dcbnl_set_dcbx:2383(enp59s0f0)]Requested DCBX mode 5 is beyond 
advertised capabilities

Nov  6 17:30:02 ovirtnode5 systemd: Started /sbin/ifup ib0.
Nov  6 17:30:02 ovirtnode5 systemd: Starting /sbin/ifup ib0.
Nov  6 17:30:02 ovirtnode5 kernel: IPv6: ADDRCONF(NETDEV_UP): ib0: link 
is not ready
Nov  6 17:30:02 ovirtnode5 NetworkManager[1650]:  
[1541511002.9642] device (ib0): carrier: link connected
Nov  6 17:30:02 ovirtnode5 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ib0: 
link becomes ready
Nov  6 17:30:02 ovirtnode5 lldpad: setsockopt nearest_bridge: Invalid 
argument
Nov  6 17:30:41 ovirtnode5 vdsm[127585]: ERROR Internal server 
error#012Traceback (most recent call last):#012  File 
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", lin
e 606, in _handle_request#012res = method(**params)#012  File 
"/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 193, in 
_dynamicMethod#012result = fn(*methodArg
s)#012  File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1561, 
in getLldp#012info=supervdsm.getProxy().get_lldp_info(filter))#012 
File "/usr/lib/python2.7/site-pack
ages/vdsm/common/supervdsm.py", line 55, in __call__#012return 
callMethod()#012