On Sun, Oct 4, 2020 at 2:07 AM Gianluca Cecchi <[email protected]> wrote:
> On Sat, Oct 3, 2020 at 9:42 PM Amit Bawer <[email protected]> wrote: > >> >> >> On Sat, Oct 3, 2020 at 10:24 PM Amit Bawer <[email protected]> wrote: >> >>> >>> >>> For the gluster bricks being filtered out in 4.4.2, this seems like [1]. >>> >>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1883805 >>> >> >> Maybe remove the lvm filter from /etc/lvm/lvm.conf while in 4.4.2 >> maintenance mode >> if the fs is mounted as read only, try >> >> mount -o remount,rw / >> >> sync and try to reboot 4.4.2. >> >> > Indeed if i run, when in emergency shell in 4.4.2, the command: > > lvs --config 'devices { filter = [ "a|.*|" ] }' > > I see also all the gluster volumes, so I think the update injected the > nasty filter. > Possibly during update the command > # vdsm-tool config-lvm-filter -y > was executed and erroneously created the filter? > Since there wasn't a filter set on the node, the 4.4.2 update added the default filter for the root-lv pv if there was some filter set before the upgrade, it would not have been added by the 4.4.2 update. > Anyway remounting read write the root filesystem and removing the filter > line from lvm.conf and rebooting worked and 4.4.2 booted ok and I was able > to exit global maintenance and have the engine up. > > Thanks Amit for the help and all the insights. > > Right now only two problems: > > 1) a long running problem that from engine web admin all the volumes are > seen as up and also the storage domains up, while only the hosted engine > one is up, while "data" and vmstore" are down, as I can verify from the > host, only one /rhev/data-center/ mount: > > [root@ovirt01 ~]# df -h > Filesystem Size Used Avail > Use% Mounted on > devtmpfs 16G 0 16G > 0% /dev > tmpfs 16G 16K 16G > 1% /dev/shm > tmpfs 16G 18M 16G > 1% /run > tmpfs 16G 0 16G > 0% /sys/fs/cgroup > /dev/mapper/onn-ovirt--node--ng--4.4.2--0.20200918.0+1 133G 3.9G 129G > 3% / > /dev/mapper/onn-tmp 1014M 40M 975M > 4% /tmp > /dev/mapper/gluster_vg_sda-gluster_lv_engine 100G 9.0G 91G > 9% /gluster_bricks/engine > /dev/mapper/gluster_vg_sda-gluster_lv_data 500G 126G 375G > 26% /gluster_bricks/data > /dev/mapper/gluster_vg_sda-gluster_lv_vmstore 90G 6.9G 84G > 8% /gluster_bricks/vmstore > /dev/mapper/onn-home 1014M 40M 975M > 4% /home > /dev/sdb2 976M 307M 603M > 34% /boot > /dev/sdb1 599M 6.8M 593M > 2% /boot/efi > /dev/mapper/onn-var 15G 263M 15G > 2% /var > /dev/mapper/onn-var_log 8.0G 541M 7.5G > 7% /var/log > /dev/mapper/onn-var_crash 10G 105M 9.9G > 2% /var/crash > /dev/mapper/onn-var_log_audit 2.0G 79M 2.0G > 4% /var/log/audit > ovirt01st.lutwyn.storage:/engine 100G 10G 90G > 10% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_engine > tmpfs 3.2G 0 3.2G > 0% /run/user/1000 > [root@ovirt01 ~]# > > I can also wait 10 minutes and no change. The way I use to exit from this > stalled situation is power on a VM, so that obviously it fails > VM f32 is down with error. Exit message: Unable to get volume size for > domain d39ed9a3-3b10-46bf-b334-e8970f5deca1 volume > 242d16c6-1fd9-4918-b9dd-0d477a86424c. > 10/4/20 12:50:41 AM > > and suddenly all the data storage domains are deactivated (from engine > point of view, because actually they were not active...): > Storage Domain vmstore (Data Center Default) was deactivated by system > because it's not visible by any of the hosts. > 10/4/20 12:50:31 AM > > and I can go in Data Centers --> Default --> Storage and activate > "vmstore" and "data" storage domains and suddenly I get them activated and > filesystems mounted. > > [root@ovirt01 ~]# df -h | grep rhev > ovirt01st.lutwyn.storage:/engine 100G 10G 90G > 10% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_engine > ovirt01st.lutwyn.storage:/data 500G 131G 370G > 27% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_data > ovirt01st.lutwyn.storage:/vmstore 90G 7.8G 83G > 9% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_vmstore > [root@ovirt01 ~]# > > and VM starts ok now. > > I already reported this, but I don't know if there is yet a bugzilla open > for it. > Did you get any response for the original mail? haven't seen it on the users-list. > 2) I see that I cannot connect to cockpit console of node. > > In firefox (version 80) in my Fedora 31 I get: > " > Secure Connection Failed > > An error occurred during a connection to ovirt01.lutwyn.local:9090. > PR_CONNECT_RESET_ERROR > > The page you are trying to view cannot be shown because the > authenticity of the received data could not be verified. > Please contact the website owners to inform them of this problem. > > Learn more… > " > In Chrome (build 85.0.4183.121) > > " > Your connection is not private > Attackers might be trying to steal your information from > ovirt01.lutwyn.local (for example, passwords, messages, or credit cards). > Learn more > NET::ERR_CERT_AUTHORITY_INVALID > " > Click Advanced and select to go to the site > > " > This server could not prove that it is ovirt01.lutwyn.local; its security > certificate is not trusted by your computer's operating system. This may be > caused by a misconfiguration or an attacker intercepting your connection." > > If I select > > " > This page isn’t working ovirt01.lutwyn.local didn’t send any data. > ERR_EMPTY_RESPONSE > " > > NOTE: the ost is not resolved by DNS but I put an entry in my hosts client. > Might be required to set DNS for authenticity, maybe other members on the list could tell better. > On host: > > [root@ovirt01 ~]# systemctl status cockpit.socket --no-pager > ● cockpit.socket - Cockpit Web Service Socket > Loaded: loaded (/usr/lib/systemd/system/cockpit.socket; disabled; > vendor preset: enabled) > Active: active (listening) since Sun 2020-10-04 00:36:36 CEST; 25min ago > Docs: man:cockpit-ws(8) > Listen: [::]:9090 (Stream) > Process: 1425 ExecStartPost=/bin/ln -snf active.motd /run/cockpit/motd > (code=exited, status=0/SUCCESS) > Process: 1417 ExecStartPost=/usr/share/cockpit/motd/update-motd > localhost (code=exited, status=0/SUCCESS) > Tasks: 0 (limit: 202981) > Memory: 1.6M > CGroup: /system.slice/cockpit.socket > > Oct 04 00:36:36 ovirt01.lutwyn.local systemd[1]: Starting Cockpit Web > Service Socket. > Oct 04 00:36:36 ovirt01.lutwyn.local systemd[1]: Listening on Cockpit Web > Service Socket. > [root@ovirt01 ~]# > > [root@ovirt01 ~]# systemctl status cockpit.service --no-pager > ● cockpit.service - Cockpit Web Service > Loaded: loaded (/usr/lib/systemd/system/cockpit.service; static; vendor > preset: disabled) > Active: active (running) since Sun 2020-10-04 00:58:09 CEST; 3min 30s > ago > Docs: man:cockpit-ws(8) > Process: 19260 ExecStartPre=/usr/sbin/remotectl certificate --ensure > --user=root --group=cockpit-ws --selinux-type=etc_t (code=exited, > status=0/SUCCESS) > Main PID: 19263 (cockpit-tls) > Tasks: 1 (limit: 202981) > Memory: 1.4M > CGroup: /system.slice/cockpit.service > └─19263 /usr/libexec/cockpit-tls > > Oct 04 00:59:59 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: > connect(http-redirect.sock) failed: Permission denied > Oct 04 00:59:59 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: > connect(http-redirect.sock) failed: Permission denied > Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: > gnutls_handshake failed: A TLS fatal alert has been received. > Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: > connect(https-factory.sock) failed: Permission denied > Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: > gnutls_handshake failed: A TLS fatal alert has been received. > Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: > connect(https-factory.sock) failed: Permission denied > Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: > gnutls_handshake failed: A TLS fatal alert has been received. > Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: > gnutls_handshake failed: A TLS fatal alert has been received. > Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: > gnutls_handshake failed: A TLS fatal alert has been received. > Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: > connect(https-factory.sock) failed: Permission denied > [root@ovirt01 ~]# > > > Gianluca > >
_______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/QAFFLVSQZ47DFJUTFU6PDAWNPSH3YXCQ/

