On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis <[email protected]>
wrote:

> Hi Dominik
>
> When these commands are used on the ovirt-engine host the output is the
> one depicted in your email.
> For your reference see also below:
>
> [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl
> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer
> CA Certificate: /etc/pki/ovirt-engine/ca.pem
> Bootstrap: false
> [root@ath01-ovirt01 certs]# ovn-nbctl get-connection
> ptcp:6641
>
> [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl
> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer
> CA Certificate: /etc/pki/ovirt-engine/ca.pem
> Bootstrap: false
> [root@ath01-ovirt01 certs]# ovn-sbctl get-connection
> read-write role="" ptcp:6642
>
>
^^^ the line above points to the problem: ovn-central is configured to use
plain TCP without ssl.
engine-setup usually configures ovn-central to use SSL. That the files
/etc/pki/ovirt-engine/keys/ovn-* exist, shows,
that engine-setup was triggered correctly. Looks like the ovn db was
dropped somehow, this should not happen.
This can be fixed manually by executing the following commands on engine's
machine:
ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
/etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem
ovn-nbctl set-connection pssl:6641
ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
/etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem
ovn-sbctl set-connection pssl:6642

The /var/log/openvswitch/ovn-controller.log on the hosts should tell that
br-int.mgmt is connected now.



> [root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-*
> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08
> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
> -rw-------. 1 root root      2893 Jun 25 11:08
> /etc/pki/ovirt-engine/keys/ovn-ndb.p12
> -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08
> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
> -rw-------. 1 root root      2893 Jun 25 11:08
> /etc/pki/ovirt-engine/keys/ovn-sdb.p12
>
> When i try the above commands on the node hosts the following happens:
> ovn-nbctl get-ssl / get-connection
> ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database connection
> failed (No such file or directory)
> The above i believe is expected since no northbound connections should be
> established from the host nodes.
>
> ovn-sbctl get-ssl /get-connection
> The output is stuck till i terminate it.
>
>
Yes, the ovn-* commands works only on engine's machine, which has the role
ovn-central.
On the hosts, there is only the ovn-controller, which connects the
ovn southbound to openvswitch on the host.


> For the requested logs the below are found in the ovsdb-server-sb.log
>
> 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146:
> connection dropped (Protocol error)
> 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188:
> connection dropped (Protocol error)
> 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044:
> connection dropped (Protocol error)
> 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148:
> connection dropped (Protocol error)
> 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages in
> last 12 seconds (most recently, 4 seconds ago) due to excessive rate
> 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: error
> parsing stream: line 0, column 0, byte 0: invalid character U+0016
> 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages in
> last 12 seconds (most recently, 4 seconds ago) due to excessive rate
> 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190:
> received SSL data on JSON-RPC channel
> 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190:
> connection dropped (Protocol error)
> 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046:
> connection dropped (Protocol error)
> 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150:
> connection dropped (Protocol error)
> 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192:
> connection dropped (Protocol error)
> 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages in
> last 8 seconds (most recently, 1 seconds ago) due to excessive rate
> 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: error
> parsing stream: line 0, column 0, byte 0: invalid character U+0016
> 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages in
> last 8 seconds (most recently, 1 seconds ago) due to excessive rate
> 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048:
> received SSL data on JSON-RPC channel
> 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048:
> connection dropped (Protocol error)
> 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152:
> connection dropped (Protocol error)
> 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194:
> connection dropped (Protocol error)
> 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050:
> connection dropped (Protocol error)
> 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154:
> connection dropped (Protocol error)
> 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages in
> last 12 seconds (most recently, 4 seconds ago) due to excessive rate
> 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: error
> parsing stream: line 0, column 0, byte 0: invalid character U+0016
> 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages in
> last 12 seconds (most recently, 4 seconds ago) due to excessive rate
> 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196:
> received SSL data on JSON-RPC channel
> 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196:
> connection dropped (Protocol error)
> 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052:
> connection dropped (Protocol error)
>
>
> How can we fix these SSL errors?
>

I addressed this above.


> I thought vdsm did the certificate provisioning on the host nodes as to
> communicate to the engine host node.
>
>
Yes, this seems to work in your scenario, just the SSL configuration on the
ovn-central was lost.


> On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <[email protected]> wrote:
>
>> Looks still like the ovn-controller on the host has problems
>> communicating with ovn-southbound.
>>
>> Are there any hints in /var/log/openvswitch/*.log,
>> especially in /var/log/openvswitch/ovsdb-server-sb.log ?
>>
>> Can you please check the output of
>>
>> ovn-nbctl get-ssl
>> ovn-nbctl get-connection
>> ovn-sbctl get-ssl
>> ovn-sbctl get-connection
>> ls -l /etc/pki/ovirt-engine/keys/ovn-*
>>
>> it should be similar to
>>
>> [root@ovirt-43 ~]# ovn-nbctl get-ssl
>> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
>> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer
>> CA Certificate: /etc/pki/ovirt-engine/ca.pem
>> Bootstrap: false
>> [root@ovirt-43 ~]# ovn-nbctl get-connection
>> pssl:6641:[::]
>> [root@ovirt-43 ~]# ovn-sbctl get-ssl
>> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
>> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer
>> CA Certificate: /etc/pki/ovirt-engine/ca.pem
>> Bootstrap: false
>> [root@ovirt-43 ~]# ovn-sbctl get-connection
>> read-write role="" pssl:6642:[::]
>> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-*
>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14  2019
>> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass
>> -rw-------. 1 root root      2709 Oct 14  2019
>> /etc/pki/ovirt-engine/keys/ovn-ndb.p12
>> -rw-r-----. 1 root hugetlbfs 1828 Oct 14  2019
>> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass
>> -rw-------. 1 root root      2709 Oct 14  2019
>> /etc/pki/ovirt-engine/keys/ovn-sdb.p12
>>
>>
>>
>>
>> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis <[email protected]>
>> wrote:
>>
>>> I did a restart of the ovn-controller, this is the output of the
>>> ovn-controller.log
>>>
>>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file
>>> /var/log/openvswitch/ovn-controller.log
>>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
>>> connecting...
>>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
>>> connected
>>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force
>>> recompute.
>>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>>> connecting...
>>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, force
>>> recompute.
>>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: unexpected
>>> SSL connection close
>>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>>> connection attempt failed (Protocol error)
>>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>>> connecting...
>>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: unexpected
>>> SSL connection close
>>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>>> connection attempt failed (Protocol error)
>>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>>> waiting 2 seconds before reconnect
>>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>>> connecting...
>>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: unexpected
>>> SSL connection close
>>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>>> connection attempt failed (Protocol error)
>>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>>> waiting 4 seconds before reconnect
>>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>>> connecting...
>>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: unexpected
>>> SSL connection close
>>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>>> connection attempt failed (Protocol error)
>>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642:
>>> continuing to reconnect in the background but suppressing further logging
>>>
>>>
>>> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP
>>> OVIRTMGMT_NETWORK_DC
>>> This is how the OVIRT_ENGINE_IP is provided in the ovn controller, i can
>>> redo it if you wan.
>>>
>>> After the restart of the ovn-controller the OVIRT ENGINE still shows
>>> only two geneve connections one with DC01-host02 and DC02-host01.
>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144"
>>>     hostname: "dc02-host01"
>>>     Encap geneve
>>>         ip: "DC02-host01_IP"
>>>         options: {csum="true"}
>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c"
>>>     hostname: "DC01-host02"
>>>     Encap geneve
>>>         ip: "DC01-host02"
>>>         options: {csum="true"}
>>>
>>> I've re-done the vdsm-tool command and nothing changed.... again....with
>>> the same errors as the systemctl restart ovn-controller
>>>
>>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <[email protected]>
>>> wrote:
>>>
>>>> Please include ovirt-users list in your reply, to share the knowledge
>>>> and experience with the community!
>>>>
>>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis <
>>>> [email protected]> wrote:
>>>>
>>>>> Ok below the output per node and DC
>>>>> DC01
>>>>> node01
>>>>>
>>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open .
>>>>> external-ids:ovn-remote
>>>>> "ssl:*OVIRT_ENGINE_IP*:6642"
>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open .
>>>>> external-ids:ovn-encap-type
>>>>> geneve
>>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open .
>>>>> external-ids:ovn-encap-ip
>>>>>
>>>>> "*OVIRTMGMT_IP_DC01-NODE01*"
>>>>>
>>>>> node02
>>>>>
>>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open .
>>>>> external-ids:ovn-remote
>>>>> "ssl:*OVIRT_ENGINE_IP*:6642"
>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open .
>>>>> external-ids:ovn-encap-type
>>>>> geneve
>>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open .
>>>>> external-ids:ovn-encap-ip
>>>>>
>>>>> "*OVIRTMGMT_IP_DC01-NODE02*"
>>>>>
>>>>> DC02
>>>>> node01
>>>>>
>>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open .
>>>>> external-ids:ovn-remote
>>>>> "ssl:*OVIRT_ENGINE_IP*:6642"
>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open .
>>>>> external-ids:ovn-encap-type
>>>>> geneve
>>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open .
>>>>> external-ids:ovn-encap-ip
>>>>>
>>>>> "*OVIRTMGMT_IP_DC02-NODE01*"
>>>>>
>>>>>
>>>> Looks good.
>>>>
>>>>
>>>>> DC01 node01 and node02 share the same VM networks and VMs deployed on
>>>>> top of them cannot talk to VM on the other hypervisor.
>>>>>
>>>>
>>>> Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe
>>>> restarting ovn-controller creates more helpful log messages?
>>>>
>>>> You can also try restart the ovn configuration on all hosts by executing
>>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP
>>>> on each host, this would trigger
>>>>
>>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup_ovn_controller.sh
>>>> internally.
>>>>
>>>>
>>>>> So I would expect to see the same output for node01 to have a geneve
>>>>> tunnel to node02 and vice versa.
>>>>>
>>>>>
>>>> Me too.
>>>>
>>>>
>>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Dominik
>>>>>>>
>>>>>>> OVN is selected as the default network provider on the clusters and
>>>>>>> the hosts.
>>>>>>>
>>>>>>>
>>>>>> sounds good.
>>>>>> This configuration is required already during the host is added to
>>>>>> oVirt Engine, because OVN is configured during this step.
>>>>>>
>>>>>>
>>>>>>> The "ovn-sbctl show" works on the ovirt engine and shows only two
>>>>>>> hosts, 1 per DC.
>>>>>>>
>>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144"
>>>>>>>     hostname: "dc01-node02"
>>>>>>>     Encap geneve
>>>>>>>         ip: "X.X.X.X"
>>>>>>>         options: {csum="true"}
>>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c"
>>>>>>>     hostname: "dc02-node1"
>>>>>>>     Encap geneve
>>>>>>>         ip: "A.A.A.A"
>>>>>>>         options: {csum="true"}
>>>>>>>
>>>>>>>
>>>>>>> The new node is not listed (dc01-node1).
>>>>>>>
>>>>>>> When executed on the nodes the same command (ovn-sbctl show)
>>>>>>> times-out on all nodes.....
>>>>>>>
>>>>>>> The output of the /var/log/openvswitch/ovn-conntroller.log lists on
>>>>>>> all logs
>>>>>>>
>>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect:
>>>>>>> unexpected SSL connection close
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Can you please compare the output of
>>>>>>
>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote
>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type
>>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip
>>>>>>
>>>>>> of the working hosts, e.g. dc01-node02, and the failing host
>>>>>> dc01-node1?
>>>>>> This should point us the relevant difference in the configuration.
>>>>>>
>>>>>> Please include ovirt-users list in your replay, to share
>>>>>> the knowledge and experience with the community.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thank you
>>>>>>> Best regards
>>>>>>> Konstantinos Betsis
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all
>>>>>>>>>
>>>>>>>>> We have a small installation based on OVIRT 4.3.
>>>>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG Node
>>>>>>>>> image.
>>>>>>>>>
>>>>>>>>> The environment was stable till an upgrade took place a couple of
>>>>>>>>> months ago.
>>>>>>>>> As such we had to re-install one of the Centos 7 node and start
>>>>>>>>> from scratch.
>>>>>>>>>
>>>>>>>>
>>>>>>>> To trigger the automatic configuration of the host, it is required
>>>>>>>> to configure ovirt-provider-ovn as the default network provider for the
>>>>>>>> cluster before adding the host to oVirt.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Even though the installation completed successfully and VMs are
>>>>>>>>> created, the following are not working as expected:
>>>>>>>>> 1. ovn geneve tunnels are not established with the other Centos 7
>>>>>>>>> node in the cluster.
>>>>>>>>> 2. Centos 7 node is configured by ovirt engine however no geneve
>>>>>>>>> tunnel is established when "ovn-sbctl show" is issued on the engine.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Does "ovn-sbctl show" list the hosts?
>>>>>>>>
>>>>>>>>
>>>>>>>>> 3. no flows are shown on the engine on port 6642 for the ovs db.
>>>>>>>>>
>>>>>>>>> Does anyone have any experience on how to troubleshoot OVN on
>>>>>>>>> ovirt?
>>>>>>>>>
>>>>>>>>>
>>>>>>>> /var/log/openvswitch/ovncontroller.log on the host should contain a
>>>>>>>> helpful hint.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thank you
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list -- [email protected]
>>>>>>>>> To unsubscribe send an email to [email protected]
>>>>>>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>>>>>>>> oVirt Code of Conduct:
>>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/
>>>>>>>>> List Archives:
>>>>>>>>> https://lists.ovirt.org/archives/list/[email protected]/message/LBVGLQJBWJF3EKFITPR72LBPA5A43WWW/
>>>>>>>>>
>>>>>>>>
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/IMCJCWTODPBZMAAPZXRX2E3NHJTHWIE5/

Reply via email to