On Mon, Sep 14, 2020 at 9:25 AM Konstantinos Betsis <[email protected]> wrote:
> Hi Dominik > > When these commands are used on the ovirt-engine host the output is the > one depicted in your email. > For your reference see also below: > > [root@ath01-ovirt01 certs]# ovn-nbctl get-ssl > Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass > Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer > CA Certificate: /etc/pki/ovirt-engine/ca.pem > Bootstrap: false > [root@ath01-ovirt01 certs]# ovn-nbctl get-connection > ptcp:6641 > > [root@ath01-ovirt01 certs]# ovn-sbctl get-ssl > Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass > Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer > CA Certificate: /etc/pki/ovirt-engine/ca.pem > Bootstrap: false > [root@ath01-ovirt01 certs]# ovn-sbctl get-connection > read-write role="" ptcp:6642 > > ^^^ the line above points to the problem: ovn-central is configured to use plain TCP without ssl. engine-setup usually configures ovn-central to use SSL. That the files /etc/pki/ovirt-engine/keys/ovn-* exist, shows, that engine-setup was triggered correctly. Looks like the ovn db was dropped somehow, this should not happen. This can be fixed manually by executing the following commands on engine's machine: ovn-nbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/ca.pem ovn-nbctl set-connection pssl:6641 ovn-sbctl set-ssl /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/ca.pem ovn-sbctl set-connection pssl:6642 The /var/log/openvswitch/ovn-controller.log on the hosts should tell that br-int.mgmt is connected now. > [root@ath01-ovirt01 certs]# ls -l /etc/pki/ovirt-engine/keys/ovn-* > -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 > /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass > -rw-------. 1 root root 2893 Jun 25 11:08 > /etc/pki/ovirt-engine/keys/ovn-ndb.p12 > -rw-r-----. 1 root hugetlbfs 1828 Jun 25 11:08 > /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass > -rw-------. 1 root root 2893 Jun 25 11:08 > /etc/pki/ovirt-engine/keys/ovn-sdb.p12 > > When i try the above commands on the node hosts the following happens: > ovn-nbctl get-ssl / get-connection > ovn-nbctl: unix:/var/run/openvswitch/ovnnb_db.sock: database connection > failed (No such file or directory) > The above i believe is expected since no northbound connections should be > established from the host nodes. > > ovn-sbctl get-ssl /get-connection > The output is stuck till i terminate it. > > Yes, the ovn-* commands works only on engine's machine, which has the role ovn-central. On the hosts, there is only the ovn-controller, which connects the ovn southbound to openvswitch on the host. > For the requested logs the below are found in the ovsdb-server-sb.log > > 2020-09-14T07:18:38.187Z|219636|reconnect|WARN|tcp:DC02-host01:33146: > connection dropped (Protocol error) > 2020-09-14T07:18:41.946Z|219637|reconnect|WARN|tcp:DC01-host01:51188: > connection dropped (Protocol error) > 2020-09-14T07:18:43.033Z|219638|reconnect|WARN|tcp:DC01-host02:37044: > connection dropped (Protocol error) > 2020-09-14T07:18:46.198Z|219639|reconnect|WARN|tcp:DC02-host01:33148: > connection dropped (Protocol error) > 2020-09-14T07:18:50.069Z|219640|jsonrpc|WARN|Dropped 4 log messages in > last 12 seconds (most recently, 4 seconds ago) due to excessive rate > 2020-09-14T07:18:50.069Z|219641|jsonrpc|WARN|tcp:DC01-host01:51190: error > parsing stream: line 0, column 0, byte 0: invalid character U+0016 > 2020-09-14T07:18:50.069Z|219642|jsonrpc|WARN|Dropped 4 log messages in > last 12 seconds (most recently, 4 seconds ago) due to excessive rate > 2020-09-14T07:18:50.069Z|219643|jsonrpc|WARN|tcp:DC01-host01:51190: > received SSL data on JSON-RPC channel > 2020-09-14T07:18:50.070Z|219644|reconnect|WARN|tcp:DC01-host01:51190: > connection dropped (Protocol error) > 2020-09-14T07:18:51.147Z|219645|reconnect|WARN|tcp:DC01-host02:37046: > connection dropped (Protocol error) > 2020-09-14T07:18:54.209Z|219646|reconnect|WARN|tcp:DC02-host01:33150: > connection dropped (Protocol error) > 2020-09-14T07:18:58.192Z|219647|reconnect|WARN|tcp:DC01-host01:51192: > connection dropped (Protocol error) > 2020-09-14T07:18:59.262Z|219648|jsonrpc|WARN|Dropped 3 log messages in > last 8 seconds (most recently, 1 seconds ago) due to excessive rate > 2020-09-14T07:18:59.262Z|219649|jsonrpc|WARN|tcp:DC01-host02:37048: error > parsing stream: line 0, column 0, byte 0: invalid character U+0016 > 2020-09-14T07:18:59.263Z|219650|jsonrpc|WARN|Dropped 3 log messages in > last 8 seconds (most recently, 1 seconds ago) due to excessive rate > 2020-09-14T07:18:59.263Z|219651|jsonrpc|WARN|tcp:DC01-host02:37048: > received SSL data on JSON-RPC channel > 2020-09-14T07:18:59.263Z|219652|reconnect|WARN|tcp:DC01-host02:37048: > connection dropped (Protocol error) > 2020-09-14T07:19:02.220Z|219653|reconnect|WARN|tcp:DC02-host01:33152: > connection dropped (Protocol error) > 2020-09-14T07:19:06.316Z|219654|reconnect|WARN|tcp:DC01-host01:51194: > connection dropped (Protocol error) > 2020-09-14T07:19:07.386Z|219655|reconnect|WARN|tcp:DC01-host02:37050: > connection dropped (Protocol error) > 2020-09-14T07:19:10.232Z|219656|reconnect|WARN|tcp:DC02-host01:33154: > connection dropped (Protocol error) > 2020-09-14T07:19:14.439Z|219657|jsonrpc|WARN|Dropped 4 log messages in > last 12 seconds (most recently, 4 seconds ago) due to excessive rate > 2020-09-14T07:19:14.439Z|219658|jsonrpc|WARN|tcp:DC01-host01:51196: error > parsing stream: line 0, column 0, byte 0: invalid character U+0016 > 2020-09-14T07:19:14.439Z|219659|jsonrpc|WARN|Dropped 4 log messages in > last 12 seconds (most recently, 4 seconds ago) due to excessive rate > 2020-09-14T07:19:14.439Z|219660|jsonrpc|WARN|tcp:DC01-host01:51196: > received SSL data on JSON-RPC channel > 2020-09-14T07:19:14.440Z|219661|reconnect|WARN|tcp:DC01-host01:51196: > connection dropped (Protocol error) > 2020-09-14T07:19:15.505Z|219662|reconnect|WARN|tcp:DC01-host02:37052: > connection dropped (Protocol error) > > > How can we fix these SSL errors? > I addressed this above. > I thought vdsm did the certificate provisioning on the host nodes as to > communicate to the engine host node. > > Yes, this seems to work in your scenario, just the SSL configuration on the ovn-central was lost. > On Fri, Sep 11, 2020 at 6:39 PM Dominik Holler <[email protected]> wrote: > >> Looks still like the ovn-controller on the host has problems >> communicating with ovn-southbound. >> >> Are there any hints in /var/log/openvswitch/*.log, >> especially in /var/log/openvswitch/ovsdb-server-sb.log ? >> >> Can you please check the output of >> >> ovn-nbctl get-ssl >> ovn-nbctl get-connection >> ovn-sbctl get-ssl >> ovn-sbctl get-connection >> ls -l /etc/pki/ovirt-engine/keys/ovn-* >> >> it should be similar to >> >> [root@ovirt-43 ~]# ovn-nbctl get-ssl >> Private key: /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >> Certificate: /etc/pki/ovirt-engine/certs/ovn-ndb.cer >> CA Certificate: /etc/pki/ovirt-engine/ca.pem >> Bootstrap: false >> [root@ovirt-43 ~]# ovn-nbctl get-connection >> pssl:6641:[::] >> [root@ovirt-43 ~]# ovn-sbctl get-ssl >> Private key: /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >> Certificate: /etc/pki/ovirt-engine/certs/ovn-sdb.cer >> CA Certificate: /etc/pki/ovirt-engine/ca.pem >> Bootstrap: false >> [root@ovirt-43 ~]# ovn-sbctl get-connection >> read-write role="" pssl:6642:[::] >> [root@ovirt-43 ~]# ls -l /etc/pki/ovirt-engine/keys/ovn-* >> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >> /etc/pki/ovirt-engine/keys/ovn-ndb.key.nopass >> -rw-------. 1 root root 2709 Oct 14 2019 >> /etc/pki/ovirt-engine/keys/ovn-ndb.p12 >> -rw-r-----. 1 root hugetlbfs 1828 Oct 14 2019 >> /etc/pki/ovirt-engine/keys/ovn-sdb.key.nopass >> -rw-------. 1 root root 2709 Oct 14 2019 >> /etc/pki/ovirt-engine/keys/ovn-sdb.p12 >> >> >> >> >> On Fri, Sep 11, 2020 at 1:10 PM Konstantinos Betsis <[email protected]> >> wrote: >> >>> I did a restart of the ovn-controller, this is the output of the >>> ovn-controller.log >>> >>> 2020-09-11T10:54:07.566Z|00001|vlog|INFO|opened log file >>> /var/log/openvswitch/ovn-controller.log >>> 2020-09-11T10:54:07.568Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>> connecting... >>> 2020-09-11T10:54:07.568Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: >>> connected >>> 2020-09-11T10:54:07.570Z|00004|main|INFO|OVS IDL reconnected, force >>> recompute. >>> 2020-09-11T10:54:07.571Z|00005|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connecting... >>> 2020-09-11T10:54:07.571Z|00006|main|INFO|OVNSB IDL reconnected, force >>> recompute. >>> 2020-09-11T10:54:07.685Z|00007|stream_ssl|WARN|SSL_connect: unexpected >>> SSL connection close >>> 2020-09-11T10:54:07.685Z|00008|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connection attempt failed (Protocol error) >>> 2020-09-11T10:54:08.685Z|00009|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connecting... >>> 2020-09-11T10:54:08.800Z|00010|stream_ssl|WARN|SSL_connect: unexpected >>> SSL connection close >>> 2020-09-11T10:54:08.800Z|00011|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connection attempt failed (Protocol error) >>> 2020-09-11T10:54:08.800Z|00012|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> waiting 2 seconds before reconnect >>> 2020-09-11T10:54:10.802Z|00013|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connecting... >>> 2020-09-11T10:54:10.917Z|00014|stream_ssl|WARN|SSL_connect: unexpected >>> SSL connection close >>> 2020-09-11T10:54:10.917Z|00015|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connection attempt failed (Protocol error) >>> 2020-09-11T10:54:10.917Z|00016|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> waiting 4 seconds before reconnect >>> 2020-09-11T10:54:14.921Z|00017|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connecting... >>> 2020-09-11T10:54:15.036Z|00018|stream_ssl|WARN|SSL_connect: unexpected >>> SSL connection close >>> 2020-09-11T10:54:15.036Z|00019|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> connection attempt failed (Protocol error) >>> 2020-09-11T10:54:15.036Z|00020|reconnect|INFO|ssl:OVIRT_ENGINE_IP:6642: >>> continuing to reconnect in the background but suppressing further logging >>> >>> >>> I have also done the vdsm-tool ovn-config OVIRT_ENGINE_IP >>> OVIRTMGMT_NETWORK_DC >>> This is how the OVIRT_ENGINE_IP is provided in the ovn controller, i can >>> redo it if you wan. >>> >>> After the restart of the ovn-controller the OVIRT ENGINE still shows >>> only two geneve connections one with DC01-host02 and DC02-host01. >>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>> hostname: "dc02-host01" >>> Encap geneve >>> ip: "DC02-host01_IP" >>> options: {csum="true"} >>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>> hostname: "DC01-host02" >>> Encap geneve >>> ip: "DC01-host02" >>> options: {csum="true"} >>> >>> I've re-done the vdsm-tool command and nothing changed.... again....with >>> the same errors as the systemctl restart ovn-controller >>> >>> On Fri, Sep 11, 2020 at 1:49 PM Dominik Holler <[email protected]> >>> wrote: >>> >>>> Please include ovirt-users list in your reply, to share the knowledge >>>> and experience with the community! >>>> >>>> On Fri, Sep 11, 2020 at 12:12 PM Konstantinos Betsis < >>>> [email protected]> wrote: >>>> >>>>> Ok below the output per node and DC >>>>> DC01 >>>>> node01 >>>>> >>>>> [root@dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-remote >>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-encap-type >>>>> geneve >>>>> [root@ dc01-node01 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-encap-ip >>>>> >>>>> "*OVIRTMGMT_IP_DC01-NODE01*" >>>>> >>>>> node02 >>>>> >>>>> [root@dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-remote >>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-encap-type >>>>> geneve >>>>> [root@ dc01-node02 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-encap-ip >>>>> >>>>> "*OVIRTMGMT_IP_DC01-NODE02*" >>>>> >>>>> DC02 >>>>> node01 >>>>> >>>>> [root@dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-remote >>>>> "ssl:*OVIRT_ENGINE_IP*:6642" >>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-encap-type >>>>> geneve >>>>> [root@ dc02-node01 ~]# ovs-vsctl --no-wait get open . >>>>> external-ids:ovn-encap-ip >>>>> >>>>> "*OVIRTMGMT_IP_DC02-NODE01*" >>>>> >>>>> >>>> Looks good. >>>> >>>> >>>>> DC01 node01 and node02 share the same VM networks and VMs deployed on >>>>> top of them cannot talk to VM on the other hypervisor. >>>>> >>>> >>>> Maybe there is a hint on ovn-controller.log on dc01-node02 ? Maybe >>>> restarting ovn-controller creates more helpful log messages? >>>> >>>> You can also try restart the ovn configuration on all hosts by executing >>>> vdsm-tool ovn-config OVIRT_ENGINE_IP LOCAL_OVIRTMGMT_IP >>>> on each host, this would trigger >>>> >>>> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/setup_ovn_controller.sh >>>> internally. >>>> >>>> >>>>> So I would expect to see the same output for node01 to have a geneve >>>>> tunnel to node02 and vice versa. >>>>> >>>>> >>>> Me too. >>>> >>>> >>>>> On Fri, Sep 11, 2020 at 12:14 PM Dominik Holler <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 11, 2020 at 10:53 AM Konstantinos Betsis < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Dominik >>>>>>> >>>>>>> OVN is selected as the default network provider on the clusters and >>>>>>> the hosts. >>>>>>> >>>>>>> >>>>>> sounds good. >>>>>> This configuration is required already during the host is added to >>>>>> oVirt Engine, because OVN is configured during this step. >>>>>> >>>>>> >>>>>>> The "ovn-sbctl show" works on the ovirt engine and shows only two >>>>>>> hosts, 1 per DC. >>>>>>> >>>>>>> Chassis "c4b23834-aec7-4bf8-8be7-aa94a50a6144" >>>>>>> hostname: "dc01-node02" >>>>>>> Encap geneve >>>>>>> ip: "X.X.X.X" >>>>>>> options: {csum="true"} >>>>>>> Chassis "be3abcc9-7358-4040-a37b-8d8a782f239c" >>>>>>> hostname: "dc02-node1" >>>>>>> Encap geneve >>>>>>> ip: "A.A.A.A" >>>>>>> options: {csum="true"} >>>>>>> >>>>>>> >>>>>>> The new node is not listed (dc01-node1). >>>>>>> >>>>>>> When executed on the nodes the same command (ovn-sbctl show) >>>>>>> times-out on all nodes..... >>>>>>> >>>>>>> The output of the /var/log/openvswitch/ovn-conntroller.log lists on >>>>>>> all logs >>>>>>> >>>>>>> 2020-09-11T08:46:55.197Z|07361|stream_ssl|WARN|SSL_connect: >>>>>>> unexpected SSL connection close >>>>>>> >>>>>>> >>>>>>> >>>>>> Can you please compare the output of >>>>>> >>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-remote >>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-type >>>>>> ovs-vsctl --no-wait get open . external-ids:ovn-encap-ip >>>>>> >>>>>> of the working hosts, e.g. dc01-node02, and the failing host >>>>>> dc01-node1? >>>>>> This should point us the relevant difference in the configuration. >>>>>> >>>>>> Please include ovirt-users list in your replay, to share >>>>>> the knowledge and experience with the community. >>>>>> >>>>>> >>>>>> >>>>>>> Thank you >>>>>>> Best regards >>>>>>> Konstantinos Betsis >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 11, 2020 at 11:01 AM Dominik Holler <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Sep 10, 2020 at 6:26 PM Konstantinos B <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all >>>>>>>>> >>>>>>>>> We have a small installation based on OVIRT 4.3. >>>>>>>>> 1 Cluster is based on Centos 7 and the other on OVIRT NG Node >>>>>>>>> image. >>>>>>>>> >>>>>>>>> The environment was stable till an upgrade took place a couple of >>>>>>>>> months ago. >>>>>>>>> As such we had to re-install one of the Centos 7 node and start >>>>>>>>> from scratch. >>>>>>>>> >>>>>>>> >>>>>>>> To trigger the automatic configuration of the host, it is required >>>>>>>> to configure ovirt-provider-ovn as the default network provider for the >>>>>>>> cluster before adding the host to oVirt. >>>>>>>> >>>>>>>> >>>>>>>>> Even though the installation completed successfully and VMs are >>>>>>>>> created, the following are not working as expected: >>>>>>>>> 1. ovn geneve tunnels are not established with the other Centos 7 >>>>>>>>> node in the cluster. >>>>>>>>> 2. Centos 7 node is configured by ovirt engine however no geneve >>>>>>>>> tunnel is established when "ovn-sbctl show" is issued on the engine. >>>>>>>>> >>>>>>>> >>>>>>>> Does "ovn-sbctl show" list the hosts? >>>>>>>> >>>>>>>> >>>>>>>>> 3. no flows are shown on the engine on port 6642 for the ovs db. >>>>>>>>> >>>>>>>>> Does anyone have any experience on how to troubleshoot OVN on >>>>>>>>> ovirt? >>>>>>>>> >>>>>>>>> >>>>>>>> /var/log/openvswitch/ovncontroller.log on the host should contain a >>>>>>>> helpful hint. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Thank you >>>>>>>>> _______________________________________________ >>>>>>>>> Users mailing list -- [email protected] >>>>>>>>> To unsubscribe send an email to [email protected] >>>>>>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html >>>>>>>>> oVirt Code of Conduct: >>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>>> List Archives: >>>>>>>>> https://lists.ovirt.org/archives/list/[email protected]/message/LBVGLQJBWJF3EKFITPR72LBPA5A43WWW/ >>>>>>>>> >>>>>>>>
_______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/IMCJCWTODPBZMAAPZXRX2E3NHJTHWIE5/

