[ovirt-users] Re: ERROR (mailbox-spm) [storage.MailBox.SpmMailMonitor] mailbox 65 checksum failed, not clearing mailbox, clearing new mail

2022-02-14 Thread Petr Kyselák
Hi,
thank you for quick reply. It is repeating issue. I file a bug.

https://bugzilla.redhat.com/show_bug.cgi?id=2054209
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LWT72ISG5NJACF2BMLK4LRKQ3CNUGBIL/


[ovirt-users] Re: ERROR (mailbox-spm) [storage.MailBox.SpmMailMonitor] mailbox 65 checksum failed, not clearing mailbox, clearing new mail

2022-02-14 Thread Nir Soffer
On Mon, Feb 14, 2022 at 10:51 AM Petr Kyselák  wrote:
>
> Hi,
> I see a lot of errors in vdsm.log
>
> 2022-02-14 08:42:52,086+0100 ERROR (mailbox-spm) 
> [storage.MailBox.SpmMailMonitor] mailbox 65 checksum failed, not clearing 
> mailbox, clearing new mail (data=b'\xff\xff\xff\xff\  \x00\x00', 
> checksum=, expected=b'\xbfG\x00\x00') 
> (mailbox:602)
> 2022-02-14 08:42:52,087+0100 ERROR (mailbox-spm) 
> [storage.MailBox.SpmMailMonitor] mailbox 66 checksum failed, not clearing 
> mailbox, clearing new mail (data=b'\x00\x00\x00\x00\  \xff\xff', 
> checksum=, expected=b'\x04\xf0\x0b\x00') 
> (mailbox:602)

This can be a real checksum error, meaning random failure on storage,
but is more likely a race in ovirt itself. We had lot of these in the past and
I think we fixed them but it is possible that we have more due to the way
this code works.

> We are running latest ovirt engine and hosts:
> Hosts: ovirt-node-ng-installer-4.4.10-2022020214.el8.iso
> engine: ovirt-engine-4.4.10.6-1.el8.noarch
>
> We have 3 hosts and 8 iSCSI domains. I found similar issue from 2018 
> https://lists.ovirt.org/archives/list/users@ovirt.org/thread/FJ6KIEOXEEFFZSJOT2ZF4TRKQ5NCP4OQ/#L7WD2FY25XJCNMB3YMTA4ASKMZGKCDZM
> I am not sure how to determinate which mailbox I should try to "clean". Can 
> anybody help me please?

You don't need to do anything, the mailbox already cleaned up.
This message means that the SPM found bad checksum and drop the
messages in the mailbox.

Processes that sent mail to the SPM will resed dropped mail in 2-3 seconds,
so the issue should be recovered automatically.

I would monitor your logs to check if this is a common issue, or one time
incident. If this error is repeating, please file a vdsm bug and attach complete
log since this host was started.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UJSPPHF4GGFT2JG3QAMWVIVI6XE4LY24/