On Sun, Oct 4, 2020 at 2:07 AM Gianluca Cecchi <[email protected]>
wrote:

> On Sat, Oct 3, 2020 at 9:42 PM Amit Bawer <[email protected]> wrote:
>
>>
>>
>> On Sat, Oct 3, 2020 at 10:24 PM Amit Bawer <[email protected]> wrote:
>>
>>>
>>>
>>> For the gluster bricks being filtered out in 4.4.2, this seems like [1].
>>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1883805
>>>
>>
>> Maybe remove the lvm filter from /etc/lvm/lvm.conf while in 4.4.2
>> maintenance mode
>> if the fs is mounted as read only, try
>>
>> mount -o remount,rw /
>>
>> sync and try to reboot 4.4.2.
>>
>>
> Indeed if i run, when in emergency shell in 4.4.2, the command:
>
> lvs --config 'devices { filter = [ "a|.*|" ] }'
>
> I see also all the gluster volumes, so I think the update injected the
> nasty filter.
> Possibly during update the command
> # vdsm-tool config-lvm-filter -y
> was executed and erroneously created the filter?
>
Since there wasn't a filter set on the node, the 4.4.2 update added the
default filter for the root-lv pv
if there was some filter set before the upgrade, it would not have been
added by the 4.4.2 update.


> Anyway remounting read write the root filesystem and removing the filter
> line from lvm.conf and rebooting worked and 4.4.2 booted ok and I was able
> to exit global maintenance and have the engine up.
>
> Thanks Amit for the help and all the insights.
>
> Right now only two problems:
>
> 1) a long running problem that from engine web admin all the volumes are
> seen as up and also the storage domains up, while only the hosted engine
> one is up, while "data" and vmstore" are down, as I can verify from the
> host, only one /rhev/data-center/ mount:
>
> [root@ovirt01 ~]# df -h
> Filesystem                                              Size  Used Avail
> Use% Mounted on
> devtmpfs                                                 16G     0   16G
> 0% /dev
> tmpfs                                                    16G   16K   16G
> 1% /dev/shm
> tmpfs                                                    16G   18M   16G
> 1% /run
> tmpfs                                                    16G     0   16G
> 0% /sys/fs/cgroup
> /dev/mapper/onn-ovirt--node--ng--4.4.2--0.20200918.0+1  133G  3.9G  129G
> 3% /
> /dev/mapper/onn-tmp                                    1014M   40M  975M
> 4% /tmp
> /dev/mapper/gluster_vg_sda-gluster_lv_engine            100G  9.0G   91G
> 9% /gluster_bricks/engine
> /dev/mapper/gluster_vg_sda-gluster_lv_data              500G  126G  375G
>  26% /gluster_bricks/data
> /dev/mapper/gluster_vg_sda-gluster_lv_vmstore            90G  6.9G   84G
> 8% /gluster_bricks/vmstore
> /dev/mapper/onn-home                                   1014M   40M  975M
> 4% /home
> /dev/sdb2                                               976M  307M  603M
>  34% /boot
> /dev/sdb1                                               599M  6.8M  593M
> 2% /boot/efi
> /dev/mapper/onn-var                                      15G  263M   15G
> 2% /var
> /dev/mapper/onn-var_log                                 8.0G  541M  7.5G
> 7% /var/log
> /dev/mapper/onn-var_crash                                10G  105M  9.9G
> 2% /var/crash
> /dev/mapper/onn-var_log_audit                           2.0G   79M  2.0G
> 4% /var/log/audit
> ovirt01st.lutwyn.storage:/engine                        100G   10G   90G
>  10% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_engine
> tmpfs                                                   3.2G     0  3.2G
> 0% /run/user/1000
> [root@ovirt01 ~]#
>
> I can also wait 10 minutes and no change. The way I use to exit from this
> stalled situation is power on a VM, so that obviously it fails
> VM f32 is down with error. Exit message: Unable to get volume size for
> domain d39ed9a3-3b10-46bf-b334-e8970f5deca1 volume
> 242d16c6-1fd9-4918-b9dd-0d477a86424c.
> 10/4/20 12:50:41 AM
>
> and suddenly all the data storage domains are deactivated (from engine
> point of view, because actually they were not active...):
> Storage Domain vmstore (Data Center Default) was deactivated by system
> because it's not visible by any of the hosts.
> 10/4/20 12:50:31 AM
>
> and I can go in Data Centers --> Default --> Storage and activate
> "vmstore" and "data" storage domains and suddenly I get them activated and
> filesystems mounted.
>
> [root@ovirt01 ~]# df -h | grep rhev
> ovirt01st.lutwyn.storage:/engine                        100G   10G   90G
>  10% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_engine
> ovirt01st.lutwyn.storage:/data                          500G  131G  370G
>  27% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_data
> ovirt01st.lutwyn.storage:/vmstore                        90G  7.8G   83G
> 9% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_vmstore
> [root@ovirt01 ~]#
>
> and VM starts ok now.
>
> I already reported this, but I don't know if there is yet a bugzilla open
> for it.
>
Did you get any response for the original mail? haven't seen it on the
users-list.


> 2) I see that I cannot connect to cockpit console of node.
>
> In firefox (version 80) in my Fedora 31 I get:
> "
> Secure Connection Failed
>
> An error occurred during a connection to ovirt01.lutwyn.local:9090.
> PR_CONNECT_RESET_ERROR
>
>     The page you are trying to view cannot be shown because the
> authenticity of the received data could not be verified.
>     Please contact the website owners to inform them of this problem.
>
> Learn more…
> "
> In Chrome (build 85.0.4183.121)
>
> "
> Your connection is not private
> Attackers might be trying to steal your information from
> ovirt01.lutwyn.local (for example, passwords, messages, or credit cards).
> Learn more
> NET::ERR_CERT_AUTHORITY_INVALID
> "
> Click Advanced and select to go to the site
>
> "
> This server could not prove that it is ovirt01.lutwyn.local; its security
> certificate is not trusted by your computer's operating system. This may be
> caused by a misconfiguration or an attacker intercepting your connection."
>
> If I select
>
> "
> This page isn’t working ovirt01.lutwyn.local didn’t send any data.
> ERR_EMPTY_RESPONSE
> "
>
> NOTE: the ost is not resolved by DNS but I put an entry in my hosts client.
>
Might be required to set DNS for authenticity, maybe other members on the
list could tell better.


> On host:
>
> [root@ovirt01 ~]# systemctl status cockpit.socket --no-pager
> ● cockpit.socket - Cockpit Web Service Socket
>    Loaded: loaded (/usr/lib/systemd/system/cockpit.socket; disabled;
> vendor preset: enabled)
>    Active: active (listening) since Sun 2020-10-04 00:36:36 CEST; 25min ago
>      Docs: man:cockpit-ws(8)
>    Listen: [::]:9090 (Stream)
>   Process: 1425 ExecStartPost=/bin/ln -snf active.motd /run/cockpit/motd
> (code=exited, status=0/SUCCESS)
>   Process: 1417 ExecStartPost=/usr/share/cockpit/motd/update-motd
>  localhost (code=exited, status=0/SUCCESS)
>     Tasks: 0 (limit: 202981)
>    Memory: 1.6M
>    CGroup: /system.slice/cockpit.socket
>
> Oct 04 00:36:36 ovirt01.lutwyn.local systemd[1]: Starting Cockpit Web
> Service Socket.
> Oct 04 00:36:36 ovirt01.lutwyn.local systemd[1]: Listening on Cockpit Web
> Service Socket.
> [root@ovirt01 ~]#
>
> [root@ovirt01 ~]# systemctl status cockpit.service --no-pager
> ● cockpit.service - Cockpit Web Service
>    Loaded: loaded (/usr/lib/systemd/system/cockpit.service; static; vendor
> preset: disabled)
>    Active: active (running) since Sun 2020-10-04 00:58:09 CEST; 3min 30s
> ago
>      Docs: man:cockpit-ws(8)
>   Process: 19260 ExecStartPre=/usr/sbin/remotectl certificate --ensure
> --user=root --group=cockpit-ws --selinux-type=etc_t (code=exited,
> status=0/SUCCESS)
>  Main PID: 19263 (cockpit-tls)
>     Tasks: 1 (limit: 202981)
>    Memory: 1.4M
>    CGroup: /system.slice/cockpit.service
>            └─19263 /usr/libexec/cockpit-tls
>
> Oct 04 00:59:59 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
> connect(http-redirect.sock) failed: Permission denied
> Oct 04 00:59:59 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
> connect(http-redirect.sock) failed: Permission denied
> Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
> gnutls_handshake failed: A TLS fatal alert has been received.
> Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
> connect(https-factory.sock) failed: Permission denied
> Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
> gnutls_handshake failed: A TLS fatal alert has been received.
> Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
> connect(https-factory.sock) failed: Permission denied
> Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
> gnutls_handshake failed: A TLS fatal alert has been received.
> Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
> gnutls_handshake failed: A TLS fatal alert has been received.
> Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
> gnutls_handshake failed: A TLS fatal alert has been received.
> Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
> connect(https-factory.sock) failed: Permission denied
> [root@ovirt01 ~]#
>
>
> Gianluca
>
>
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/QAFFLVSQZ47DFJUTFU6PDAWNPSH3YXCQ/

Reply via email to