On 22-03-2019 12:04, Arif Ali wrote:

> On 21-03-2019 17:47, Simone Tiraboschi wrote: 
> 
> On Thu, Mar 21, 2019 at 3:47 PM Arif Ali <[email protected]> wrote: Hi all,
> 
> Recently deployed oVirt version 4.3.1
> 
> It's in a self-hosted engine environment
> 
> Used the steps via cockpit to install the engine, and was able to add 
> the rest of the oVirt nodes without any specific problems
> 
> We tested the HA of the hosted-engine without a problem, and then at one 
> point of turn off the machine that was hosting the engine, to mimic 
> failure to see how it goes; the vm was able to move over successfully, 
> but some of the oVirt started to go into Unassigned. From a total of 6 
> oVirt hosts, I have 4 of them in this state.
> 
> Clicking on the host, I see the following message in the events. I can 
> get to the hosts via the engine, and ping the machine, so not sure what 
> it's doing that it's no longer working
> 
> VDSM <snip> command Get Host Capabilities failed: Message timeout which 
> can be caused by communication issues
> 
> Mind you, I have been trying to resolve this issue since Monday, and 
> have tried various things, like rebooting and re-installing the oVirt 
> hosts, without having much luck
> 
> So any assistance on this would be grateful, maybe I've missed something 
> really simple, and I am overlooking it 
> 
> Can you please check that VDSM is correctly running on that nodes? 
> Are you able to correctly reach that nodes from the engine VM?

So, I have gone back, and re-installed the whole solution again with the
4.3.2 now, and I again have the same issue 

Checking the vdsm logs, I get the issue below in the logs. The host is
either Unassigned or Connecting. I don't have the option to Activate or
put the host into Maintenance mode. I have tried rebooting the node with
no luck 

Mar 22 10:53:27 scvirt02 vdsm[32481]: WARN WORKER BLOCKED: <WORKER
NAME=PERIODIC/2 RUNNING <TASK <OPERATION
ACTION=<VDSM.VIRT.SAMPLING.HOSTMONITOR OBJECT AT 0X7EFED4180610> AT
0X7EFED4180650> TIMEOUT=15, DURATION=30.00 AT 0X7EFED4180810> TASK#=2 AT
0X7EFEF41987D0>, TRACEBACK: 

                                      FILE:
"/USR/LIB64/PYTHON2.7/THREADING.PY", LINE 785, IN __BOOTSTRAP 

                                        SELF.__BOOTSTRAP_INNER() 

                                      FILE:
"/USR/LIB64/PYTHON2.7/THREADING.PY", LINE 812, IN __BOOTSTRAP_INNER 

                                        SELF.RUN() 

                                      FILE:
"/USR/LIB64/PYTHON2.7/THREADING.PY", LINE 765, IN RUN 

                                        SELF.__TARGET(*SELF.__ARGS,
**SELF.__KWARGS) 

                                      FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/COMMON/CONCURRENT.PY", LINE 195,
IN RUN 

                                        RET = FUNC(*ARGS, **KWARGS) 

                                      FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/EXECUTOR.PY", LINE 301, IN _RUN 

                                        SELF._EXECUTE_TASK() 

                                      FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/EXECUTOR.PY", LINE 315, IN
_EXECUTE_TASK 

                                        TASK() 

                                      FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/EXECUTOR.PY", LINE 391, IN
__CALL__ 

                                        SELF._CALLABLE() 

                                      FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/VIRT/PERIODIC.PY", LINE 186, IN
__CALL__ 

                                        SELF._FUNC() 

                                      FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/VIRT/SAMPLING.PY", LINE 481, IN
__CALL__ 

                                        STATS =
HOSTAPI.GET_STATS(SELF._CIF, SELF._SAMPLES.STATS()) 

                                      FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/HOST/API.PY", LINE 79, IN
GET_STATS 

                                        RET['HASTATS'] = _GETHAINFO() 

                                      FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/VDSM/HOST/API.PY", LINE 177, IN
_GETHAINFO 

                                        STATS = INSTANCE.GET_ALL_STATS()


                                      FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/OVIRT_HOSTED_ENGINE_HA/CLIENT/CLIENT.PY",
LINE 94, IN GET_ALL_STATS 

                                        STATS =
BROKER.GET_STATS_FROM_STORAGE() 

                                      FILE:
"/USR/LIB/PYTHON2.7/SITE-PACKAGES/OVIRT_HOSTED_ENGINE_HA/LIB/BROKERLINK.PY",
LINE 143, IN GET_STATS_FROM_STORAGE 

                                        RESULT = SELF._PROXY.GET_STATS()


                                      FILE:
"/USR/LIB64/PYTHON2.7/XMLRPCLIB.PY", LINE 1233, IN __CALL__ 

                                        RETURN SELF.__SEND(SELF.__NAME,
ARGS) 

                                      FILE:
"/USR/LIB64/PYTHON2.7/XMLRPCLIB.PY", LINE 1591, IN __REQUEST 

                                        VERBOSE=SELF.__VERBOSE 

                                      FILE:
"/USR/LIB64/PYTHON2.7/XMLRPCLIB.PY", LINE 1273, IN REQUEST 

                                        RETURN SELF.SINGLE_REQUEST(HOST,
HANDLER, REQUEST_BODY, VERBOSE) 

                                      FILE:
"/USR/LIB64/PYTHON2.7/XMLRPCLIB.PY", LINE 1303, IN SINGLE_REQUEST 

                                        RESPONSE =
H.GETRESPONSE(BUFFERING=TRUE) 

                                      FILE:
"/USR/LIB64/PYTHON2.7/HTTPLIB.PY", LINE 1113, IN GETRESPONSE 

                                        RESPONSE.BEGIN() 

                                      FILE:
"/USR/LIB64/PYTHON2.7/HTTPLIB.PY", LINE 444, IN BEGIN 

                                        VERSION, STATUS, REASON =
SELF._READ_STATUS() 

                                      FILE:
"/USR/LIB64/PYTHON2.7/HTTPLIB.PY", LINE 400, IN _READ_STATUS 

                                        LINE = SELF.FP.READLINE(_MAXLINE
+ 1) 

                                      FILE:
"/USR/LIB64/PYTHON2.7/SOCKET.PY", LINE 476, IN READLINE 

                                        DATA =
SELF._SOCK.RECV(SELF._RBUFSIZE)

On the engine host, I continuously get the following messages too 

Mar 22 11:02:32 <snip> ovsdb-server[4724]:
ovs|01900|jsonrpc|WARN|Dropped 3 log messages in last 14 seconds (most
recently, 7 seconds ago) due to excessive rate 

Mar 22 11:02:32 <snip> ovsdb-server[4724]:
ovs|01901|jsonrpc|WARN|ssl:[::ffff:192.168.203.205]:55658: send error:
Protocol error 

Mar 22 11:02:32 <snip> ovsdb-server[4724]:
ovs|01902|reconnect|WARN|ssl:[::ffff:192.168.203.205]:55658: connection
dropped (Protocol error) 

Mar 22 11:02:34 <snip> ovsdb-server[4724]:
ovs|01903|stream_ssl|WARN|SSL_accept: unexpected SSL connection close 

Mar 22 11:02:34 <snip> ovsdb-server[4724]:
ovs|01904|reconnect|WARN|ssl:[::ffff:192.168.203.202]:49504: connection
dropped (Protocol error) 

Mar 22 11:02:40 <snip> ovsdb-server[4724]:
ovs|01905|stream_ssl|WARN|SSL_accept: unexpected SSL connection close 

Mar 22 11:02:40 <snip> ovsdb-server[4724]:
ovs|01906|jsonrpc|WARN|Dropped 1 log messages in last 5 seconds (most
recently, 5 seconds ago) due to excessive rate 

Mar 22 11:02:40 <snip> ovsdb-server[4724]:
ovs|01907|jsonrpc|WARN|ssl:[::ffff:192.168.203.203]:34114: send error:
Protocol error 

Mar 22 11:02:40 <snip> ovsdb-server[4724]:
ovs|01908|reconnect|WARN|ssl:[::ffff:192.168.203.203]:34114: connection
dropped (Protocol error) 

Mar 22 11:02:41 <snip> ovsdb-server[4724]:
ovs|01909|reconnect|WARN|ssl:[::ffff:192.168.203.204]:52034: connection
dropped (Protocol error) 

Mar 22 11:02:48 <snip> ovsdb-server[4724]:
ovs|01910|stream_ssl|WARN|Dropped 1 log messages in last 7 seconds (most
recently, 7 seconds ago) due to excessive rate 

Mar 22 11:02:48 <snip> ovsdb-server[4724]:
ovs|01911|stream_ssl|WARN|SSL_accept: unexpected SSL connection close 

Mar 22 11:02:48 <snip> ovsdb-server[4724]:
ovs|01912|reconnect|WARN|ssl:[::ffff:192.168.203.205]:55660: connection
dropped (Protocol error) 

Mar 22 11:02:50 <snip> ovsdb-server[4724]:
ovs|01913|stream_ssl|WARN|SSL_accept: unexpected SSL connection close 

Mar 22 11:02:50 <snip> ovsdb-server[4724]:
ovs|01914|jsonrpc|WARN|Dropped 2 log messages in last 9 seconds (most
recently, 2 seconds ago) due to excessive rate 

Mar 22 11:02:50 <snip> ovsdb-server[4724]:
ovs|01915|jsonrpc|WARN|ssl:[::ffff:192.168.203.202]:49506: send error:
Protocol error 

Mar 22 11:02:50 <snip> ovsdb-server[4724]:
ovs|01916|reconnect|WARN|ssl:[::ffff:192.168.203.202]:49506: connection
dropped (Protocol error) 

Mar 22 11:02:56 <snip> ovsdb-server[4724]:
ovs|01917|stream_ssl|WARN|SSL_accept: unexpected SSL connection close 

Mar 22 11:02:56 <snip> ovsdb-server[4724]:
ovs|01918|reconnect|WARN|ssl:[::ffff:192.168.203.203]:34116: connection
dropped (Protocol error) 

Mar 22 11:02:57 <snip> ovsdb-server[4724]:
ovs|01919|reconnect|WARN|ssl:[::ffff:192.168.203.204]:52036: connection
dropped (Protocol error) 

Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01920|stream_ssl|WARN|Dropped 1 log messages in last 7 seconds (most
recently, 7 seconds ago) due to excessive rate 

Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01921|stream_ssl|WARN|SSL_accept: unexpected SSL connection close 

Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01922|jsonrpc|WARN|Dropped 2 log messages in last 9 seconds (most
recently, 7 seconds ago) due to excessive rate 

Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01923|jsonrpc|WARN|ssl:[::ffff:192.168.203.205]:55662: send error:
Protocol error 

Mar 22 11:03:04 <snip> ovsdb-server[4724]:
ovs|01924|reconnect|WARN|ssl:[::ffff:192.168.203.205]:55662: connection
dropped (Protocol error) 

Mar 22 11:03:06 <snip> ovsdb-server[4724]:
ovs|01925|reconnect|WARN|ssl:[::ffff:192.168.203.202]:49508: connection
dropped (Protocol error) 

Mar 22 11:03:12 <snip> ovsdb-server[4724]:
ovs|01926|stream_ssl|WARN|Dropped 1 log messages in last 5 seconds (most
recently, 5 seconds ago) due to excessive rate 

Mar 22 11:03:12 <snip> ovsdb-server[4724]:
ovs|01927|stream_ssl|WARN|SSL_accept: unexpected SSL connection close 

Mar 22 11:03:12 <snip> ovsdb-server[4724]:
ovs|01928|reconnect|WARN|ssl:[::ffff:192.168.203.203]:34118: connection
dropped (Protocol error) 

Mar 22 11:03:13 <snip> ovsdb-server[4724]:
ovs|01929|reconnect|WARN|ssl:[::ffff:192.168.203.204]:52038: connection
dropped (Protocol error) 

I found my issue, and managed to resolve it nothing wrong with oVirt 

The ovirtmgmt network is 10G, I by default set the MTU to 9000 as I
would normally for these type of the network, but found out later that
the network team at this site were not supporting 9000, so back to 1500
and all worked without a problem 

Thanks to all for everyone's assistance

-- 
regards,

Arif Ali
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/XI2QV54A4WNQ5RLE6YHWUC7GLVPQSBK5/

Reply via email to