[ovirt-users] Can't upgrade Cluster Compatibility Version

2022-08-31 Thread Gillingham, Eric J (US 393D) via Users
When trying to edit my default clusters compat version from 4.5 to anything 
newer it fails with " Error while executing action Edit Cluster properties: 
Internal Engine Error"

engine.log contains the very long stack traces:
##
2022-08-31 23:43:08,647Z ERROR [org.ovirt.engine.core.bll.UpdateVmCommand] 
(default task-26) [590a6527] Error during ValidateFailure.: 
java.lang.NullPointerException
at 
deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.UpdateVmCommand.validate(UpdateVmCommand.java:1015)
at 
deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.internalValidateInTransaction(CommandBase.java:824)
at 
org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:140)
at 
org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:157)
at 
deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.internalValidate(CommandBase.java:803)
at 
deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:417)
at 
deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13)
at 
deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.Backend.runAction(Backend.java:450)
at 
deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:432)
at 
deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:638)
at jdk.internal.reflect.GeneratedMethodAccessor380.invoke(Unknown 
Source)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.jboss.as.ee@24.0.1.Final//org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52)
at 
org.jboss.invocation@1.6.0.Final//org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
at 
org.jboss.invocation@1.6.0.Final//org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:509)
at 
org.jboss.as.weld.common@24.0.1.Final//org.jboss.as.weld.interceptors.Jsr299BindingsInterceptor.delegateInterception(Jsr299BindingsInterceptor.java:79)
at 
org.jboss.as.weld.common@24.0.1.Final//org.jboss.as.weld.interceptors.Jsr299BindingsInterceptor.doMethodInterception(Jsr299BindingsInterceptor.java:89)
at 
org.jboss.as.weld.common@24.0.1.Final//org.jboss.as.weld.interceptors.Jsr299BindingsInterceptor.processInvocation(Jsr299BindingsInterceptor.java:102)
at 
org.jboss.as.ee@24.0.1.Final//org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:63)
at 
org.jboss.invocation@1.6.0.Final//org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
at 
org.jboss.as.ejb3@24.0.1.Final//org.jboss.as.ejb3.component.invocationmetrics.ExecutionTimeInterceptor.processInvocation(ExecutionTimeInterceptor.java:43)
at 
org.jboss.invocation@1.6.0.Final//org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
at 
org.jboss.as.ee@24.0.1.Final//org.jboss.as.ee.concurrent.ConcurrentContextInterceptor.processInvocation(ConcurrentContextInterceptor.java:45)
at 
org.jboss.invocation@1.6.0.Final//org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
at 
org.jboss.invocation@1.6.0.Final//org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:40)
at 
org.jboss.invocation@1.6.0.Final//org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
at 
org.jboss.invocation@1.6.0.Final//org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:53)
at 
org.jboss.as.ee@24.0.1.Final//org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:52)
at 
org.jboss.invocation@1.6.0.Final//org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
at 
org.jboss.as.ejb3@24.0.1.Final//org.jboss.as.ejb3.component.singleton.SingletonComponentInstanceAssociationInterceptor.processInvocation(SingletonComponentInstanceAssociationInterceptor.java:53)
at 
org.jboss.invocation@1.6.0.Final//org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
at 
org.jboss.as.ejb3@24.0.1.Final//org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:232)
at 

[ovirt-users] Backporting of Fixes

2020-11-12 Thread Gillingham, Eric J (US 393D) via Users
I'm still running on oVirt 4.3 due to some hardware that will require some 
extra effort to move to 4.4 we're not quite ready to do yet, and am currently 
hitting what I believe to be 
https://bugzilla.redhat.com/show_bug.cgi?id=1820998 which is fixed in 4.4. I'm 
wondering if there's a process to request a backport, or should I just open a 
new bug against 4.3?

Thank You
- Eric

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6CNDJDE4RG6M24O5IKNNLNHELVBWVLPW/


[ovirt-users] Re: [EXTERNAL] Re: Storage Domain won't activate

2020-09-04 Thread Gillingham, Eric J (US 393D) via Users
On 9/4/20, 2:26 PM, "Nir Soffer"  wrote:
On Fri, Sep 4, 2020 at 5:43 PM Gillingham, Eric J (US 393D) via Users
 wrote:
>
> On 9/4/20, 4:50 AM, "Vojtech Juranek"  wrote:
>
> On čtvrtek 3. září 2020 22:49:17 CEST Gillingham, Eric J (US 393D) 
via Users
> wrote:
>
> how do you remove the fist host, did you put it into maintenance 
first? I
> wonder, how this situation (two lockspaces with conflicting names) 
can occur.
>
> You can try to re-initialize the lockspace directly using sanlock 
command (see
> man sanlock), but it would be good to understand the situation first.
>
>
> Just as you said, put into maintenance mode, shut it down, removed it via 
the engine UI.

Eric, it is possible that you shutdown the host too quickly, before it 
actually
disconnected from the lockspace?

When engine move a host to maintenance, it does not wait until the host 
actually
move into maintenance. This is actually a bug, so it would be good idea to 
file
a bug about this.


That is a possibility, from the UI view it usually takes a bit for the host to 
show is in maintenance, so I assumed it was an accurate representation of the 
state. Unfortunately all hosts have since been completely wiped and 
re-installed, this issue  brought down the entire cluster for over a day so I 
needed to get everything up again ASAP.

I did not archive/backup the sanlock logs beforehand, so I can't check for the 
sanlock events David mentioned. When I cleared the sanlock there were no s or r 
entries listed in sanlock client status, and there were no other running hosts 
to obtain other locks, but I don’t fully grok sanlock if there was maybe some 
lock that existed only on the iscsi space separate from any current or past 
hosts.


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/O7LLWCIC76RPOXA4DCE2NTPWAZEBE6FK/


[ovirt-users] Re: [EXTERNAL] Re: Storage Domain won't activate

2020-09-04 Thread Gillingham, Eric J (US 393D) via Users
This is using iscsi storage. I stopped the ovirt broker/agents/vdsm and used 
sanlock to remove the locks it was complaining about, but as soon as I started 
the ovirt tools up and the engine came online again the same messages 
reappeared.

After spending more than a day trying to resolve this nicely I gave up. 
Installed ovirt-node on the host I originally removed, added that to the 
cluster, then removed and nuked the misbehaving host and did a clean install 
there. I did run into an issue where the first host had an empty 
hosted-engine.conf, only had the cert and the id settings in it so it wouldn’t 
connect properly, but I worked around that by just copying the fully populated 
one from the semi-working host and changing the id to match.
No idea if this is the right solution but it _seems_ to be working and my VMs 
are back to running, just got too frustrated trying to debug through normal 
methods and finding solutions offered via the ovirt tools and documentation.

- Eric

On 9/4/20, 10:59 AM, "Strahil Nikolov"  wrote:

Is this a HCI setup ?
If yes, check gluster status (I prefer cli but is also valid in the UI).

gluster pool list
gluster volume status

gluster volume heal  info summary

Best Regards,
Strahil Nikolov






В петък, 4 септември 2020 г., 00:38:13 Гринуич+3, Gillingham, Eric J (US 
393D) via Users  написа: 





I recently removed a host from my cluster to upgrade it to 4.4, after I 
removed the host from the datacenter VMs started to pause on the second system 
they all migrated to. Investigating via the engine showed the storage domain 
was showing as "unknown", when I try to activate it via the engine it cycles to 
locked then to unknown again.

/var/log/sanlock.log contains a repeating:
add_lockspace 
e1270474-108c-4cae-83d6-51698cffebbf:1:/dev/e1270474-108c-4cae-83d6-51698cffebbf/ids:0
 conflicts with name of list1 s1 
e1270474-108c-4cae-83d6-51698cffebbf:3:/dev/e1270474-108c-4cae-83d6-51698cffebbf/ids:0


vdsm.log contains these (maybe related) snippets:
---
2020-09-03 20:19:53,483+ INFO  (jsonrpc/6) [vdsm.api] FINISH 
getAllTasksStatuses error=Secured object is not in safe state 
from=:::137.79.52.43,36326, flow_id=18031a91, 
task_id=8e92f059-743a-48c8-aa9d-e7c4c836337b (api:52)
2020-09-03 20:19:53,483+ ERROR (jsonrpc/6) [storage.TaskManager.Task] 
(Task='8e92f059-743a-48c8-aa9d-e7c4c836337b') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, 
in _run
return fn(*args, **kargs)
  File "", line 2, in getAllTasksStatuses
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in 
method
ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2201, 
in getAllTasksStatuses
allTasksStatus = self._pool.getAllTasksStatuses()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 
77, in wrapper
raise SecureError("Secured object is not in safe state")
SecureError: Secured object is not in safe state
2020-09-03 20:19:53,483+ INFO  (jsonrpc/6) [storage.TaskManager.Task] 
(Task='8e92f059-743a-48c8-aa9d-e7c4c836337b') aborting: Task is aborted: 
u'Secured object is not in safe state' - code 100 (task:1181)
2020-09-03 20:19:53,483+ ERROR (jsonrpc/6) [storage.Dispatcher] FINISH 
getAllTasksStatuses error=Secured object is not in safe state (dispatcher:87)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 
74, in wrapper
result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, 
in wrapper
return m(self, *a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1189, 
in prepare
raise self.error
SecureError: Secured object is not in safe state
---
2020-09-03 20:44:23,252+ INFO  (tasks/2) 
[storage.ThreadPool.WorkerThread] START task 
76415a77-9d29-4b72-ade1-53207cfc503b (cmd=>, args=None) (thre
adPool:208)
2020-09-03 20:44:23,266+ INFO  (tasks/2) [storage.SANLock] Acquiring 
host id for domain e1270474-108c-4cae-83d6-51698cffebbf (id=1, wait=True) 
(clusterlock:313)
2020-09-03 20:44:23,267+ ERROR (tasks/2) [storage.TaskManager.Task] 
(Task='76415a77-9d29-4b72-ade1-53207cfc503b') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, 
in _run
return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, 
in run
return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7

[ovirt-users] Re: [EXTERNAL] Re: Storage Domain won't activate

2020-09-04 Thread Gillingham, Eric J (US 393D) via Users
On 9/4/20, 4:50 AM, "Vojtech Juranek"  wrote:

On čtvrtek 3. září 2020 22:49:17 CEST Gillingham, Eric J (US 393D) via 
Users 
wrote:
> I recently removed a host from my cluster to upgrade it to 4.4, after I
> removed the host from the datacenter VMs started to pause on the second
> system they all migrated to. Investigating via the engine showed the
> storage domain was showing as "unknown", when I try to activate it via the
> engine it cycles to locked then to unknown again.

> /var/log/sanlock.log contains a repeating:
> add_lockspace
> 
e1270474-108c-4cae-83d6-51698cffebbf:1:/dev/e1270474-108c-4cae-83d6-51698cf
> febbf/ids:0 conflicts with name of list1 s1
> 
e1270474-108c-4cae-83d6-51698cffebbf:3:/dev/e1270474-108c-4cae-83d6-51698cf
> febbf/ids:0

how do you remove the fist host, did you put it into maintenance first? I 
wonder, how this situation (two lockspaces with conflicting names) can 
occur.

You can try to re-initialize the lockspace directly using sanlock command 
(see 
man sanlock), but it would be good to understand the situation first.


Just as you said, put into maintenance mode, shut it down, removed it via the 
engine UI.



___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CDLMSS336Z46BNZ4K4IAWO6JBYAHAFDO/


[ovirt-users] Storage Domain won't activate

2020-09-03 Thread Gillingham, Eric J (US 393D) via Users
I recently removed a host from my cluster to upgrade it to 4.4, after I removed 
the host from the datacenter VMs started to pause on the second system they all 
migrated to. Investigating via the engine showed the storage domain was showing 
as "unknown", when I try to activate it via the engine it cycles to locked then 
to unknown again.

/var/log/sanlock.log contains a repeating:
add_lockspace 
e1270474-108c-4cae-83d6-51698cffebbf:1:/dev/e1270474-108c-4cae-83d6-51698cffebbf/ids:0
 conflicts with name of list1 s1 
e1270474-108c-4cae-83d6-51698cffebbf:3:/dev/e1270474-108c-4cae-83d6-51698cffebbf/ids:0


vdsm.log contains these (maybe related) snippets:
---
2020-09-03 20:19:53,483+ INFO  (jsonrpc/6) [vdsm.api] FINISH 
getAllTasksStatuses error=Secured object is not in safe state 
from=:::137.79.52.43,36326, flow_id=18031a91, 
task_id=8e92f059-743a-48c8-aa9d-e7c4c836337b (api:52)
2020-09-03 20:19:53,483+ ERROR (jsonrpc/6) [storage.TaskManager.Task] 
(Task='8e92f059-743a-48c8-aa9d-e7c4c836337b') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in 
_run
return fn(*args, **kargs)
  File "", line 2, in getAllTasksStatuses
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2201, in 
getAllTasksStatuses
allTasksStatus = self._pool.getAllTasksStatuses()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, 
in wrapper
raise SecureError("Secured object is not in safe state")
SecureError: Secured object is not in safe state
2020-09-03 20:19:53,483+ INFO  (jsonrpc/6) [storage.TaskManager.Task] 
(Task='8e92f059-743a-48c8-aa9d-e7c4c836337b') aborting: Task is aborted: 
u'Secured object is not in safe state' - code 100 (task:1181)
2020-09-03 20:19:53,483+ ERROR (jsonrpc/6) [storage.Dispatcher] FINISH 
getAllTasksStatuses error=Secured object is not in safe state (dispatcher:87)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 74, 
in wrapper
result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in 
wrapper
return m(self, *a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1189, in 
prepare
raise self.error
SecureError: Secured object is not in safe state
---
2020-09-03 20:44:23,252+ INFO  (tasks/2) [storage.ThreadPool.WorkerThread] 
START task 76415a77-9d29-4b72-ade1-53207cfc503b (cmd=>, args=None) (thre
adPool:208)
2020-09-03 20:44:23,266+ INFO  (tasks/2) [storage.SANLock] Acquiring host 
id for domain e1270474-108c-4cae-83d6-51698cffebbf (id=1, wait=True) 
(clusterlock:313)
2020-09-03 20:44:23,267+ ERROR (tasks/2) [storage.TaskManager.Task] 
(Task='76415a77-9d29-4b72-ade1-53207cfc503b') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in 
_run
return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 317, in 
startSpm
self.masterDomain.acquireHostId(self.id)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 957, in 
acquireHostId
self._manifest.acquireHostId(hostId, wait)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 501, in 
acquireHostId
self._domainLock.acquireHostId(hostId, wait)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 
344, in acquireHostId
raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: 
('e1270474-108c-4cae-83d6-51698cffebbf', SanlockException(22, 'Sanlock 
lockspace add failure', 'Invalid argument'))
---

Another symptom is in the hosts view of the engine SPM bounces between "Normal" 
and "Contending". When it's Normal if I select Management -> Select as SPM I 
get "Error while executing action: Cannot force select SPM. Unknown Data Center 
status."

I've tried rebooting the one remaining host in the cluster no to avail, 
hosted-engine --reinitialize-lockspace also seems to not solve the issue.


I'm kind of stumped as to what else to try, would appreciate any guidance on 
how to resolve this.

Thank You

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMJZV2OEKHPTSTROSPLCQ3WJUIPB6CKL/