[ovirt-users] Re: Constantly XFS in memory corruption inside VMs

2020-12-01 Thread Vinícius Ferrão via Users
Can this be related the case?
https://bugzilla.redhat.com/show_bug.cgi?id=810082

On 1 Dec 2020, at 10:25, Vinícius Ferrão 
mailto:fer...@versatushpc.com.br>> wrote:

ECC RAM everywhere: hosts and storage.

I even run Memtest86 on both hypervisor hosts just be sure. No errors. I 
haven’t had the opportunity to run it on the storage yet.

After I’ve sent that message yesterday, the engine VM crashed again, filesystem 
went offline. There was some discards (again) on the switch, probably due to 
the “boot storm” of other VM’s. But this time a simple reboot fixed the 
filesystem and the hosted engine VM was back.

Since it was an extremely small amount of time, I’ve checked everything again, 
and only the discards issues came up, there are ~90k discards on Po2 (which is 
the LACP interface of the hypervisor). Since the fact, I enabled hardware flow 
control on the ports of the switch, but discards are still happening:

PortAlign-Err FCS-ErrXmit-Err Rcv-Err  UnderSize  
OutDiscards
Po1 0   0   0   0  00
Po2 0   0   0   0  0 
3650
Po3 0   0   0   0  00
Po4 0   0   0   0  00
Po5 0   0   0   0  00
Po6 0   0   0   0  00
Po7 0   0   0   0  00
Po200   0   0   0  0
13788

I think this may be related… but it’s just a guess.

Thanks,


On 1 Dec 2020, at 05:06, Strahil Nikolov 
mailto:hunter86...@yahoo.com>> wrote:

Could it be faulty ram ?
Do you use ECC ram ?

Best Regards,
Strahil Nikolov






В вторник, 1 декември 2020 г., 06:17:10 Гринуич+2, Vinícius Ferrão via Users 
mailto:users@ovirt.org>> написа:






Hi again,



I had to shutdown everything because of a power outage in the office. When 
trying to get the infra up again, even the Engine have corrupted:



[  772.466982] XFS (dm-4): Invalid superblock magic number
mount: /var: wrong fs type, bad option, bad superblock on 
/dev/mapper/ovirt-var, missing codepage or helper program, or other error.
[  772.472885] XFS (dm-3): Mounting V5 Filesystem
[  773.629700] XFS (dm-3): Starting recovery (logdev: internal)
[  773.731104] XFS (dm-3): Metadata CRC error detected at 
xfs_agfl_read_verify+0xa1/0xf0 [xfs], xfs_agfl block 0xf3
[  773.734352] XFS (dm-3): Unmount and run xfs_repair
[  773.736216] XFS (dm-3): First 128 bytes of corrupted metadata buffer:
[  773.738458] : 23 31 31 35 36 35 35 34 29 00 2d 20 52 65 62 75  
#1156554).- Rebu
[  773.741044] 0010: 69 6c 74 20 66 6f 72 20 68 74 74 70 73 3a 2f 2f  ilt 
for https://
[  773.743636] 0020: 66 65 64 6f 72 61 70 72 6f 6a 65 63 74 2e 6f 72  
fedoraproject.or
[  773.746191] 0030: 67 2f 77 69 6b 69 2f 46 65 64 6f 72 61 5f 32 33  
g/wiki/Fedora_23
[  773.748818] 0040: 5f 4d 61 73 73 5f 52 65 62 75 69 6c 64 00 2d 20  
_Mass_Rebuild.-
[  773.751399] 0050: 44 72 6f 70 20 6f 62 73 6f 6c 65 74 65 20 64 65  Drop 
obsolete de
[  773.753933] 0060: 66 61 74 74 72 20 73 74 61 6e 7a 61 73 20 28 23  fattr 
stanzas (#
[  773.756428] 0070: 31 30 34 37 30 33 31 29 00 2d 20 49 6e 73 74 61  
1047031).- Insta
[  773.758873] XFS (dm-3): metadata I/O error in "xfs_trans_read_buf_map" at 
daddr 0xf3 len 1 error 74
[  773.763756] XFS (dm-3): xfs_do_force_shutdown(0x8) called from line 446 of 
file fs/xfs/libxfs/xfs_defer.c. Return address = 962bd5ee
[  773.769363] XFS (dm-3): Corruption of in-memory data detected.  Shutting 
down filesystem
[  773.772643] XFS (dm-3): Please unmount the filesystem and rectify the 
problem(s)
[  773.776079] XFS (dm-3): xfs_imap_to_bp: xfs_trans_read_buf() returned error 
-5.
[  773.779113] XFS (dm-3): xlog_recover_clear_agi_bucket: failed to clear agi 
3. Continuing.
[  773.783039] XFS (dm-3): xfs_imap_to_bp: xfs_trans_read_buf() returned error 
-5.
[  773.785698] XFS (dm-3): xlog_recover_clear_agi_bucket: failed to clear agi 
3. Continuing.
[  773.790023] XFS (dm-3): Ending recovery (logdev: internal)
[  773.792489] XFS (dm-3): Error -5 recovering leftover CoW allocations.
mount: /var/log: can't read superblock on /dev/mapper/ovirt-log.
mount: /var/log/audit: mount point does not exist.




/var seems to be completely trashed.




The only time that I’ve seem something like this was faulty hardware. But 
nothing shows up on logs, as far as I know.




After forcing repairs with -L I’ve got other issues:




mount -a
[  326.170941] XFS (dm-4): Mounting V5 Filesystem
[  326.404788] XFS (dm-4): Ending clean mount
[  326.415291] XFS (dm-3): Mounting V5 Filesystem
[  326.611673] XFS (dm-3): Ending clean mount
[  326.621705] XFS (dm-2): Mounting V5 Filesystem
[  326.784067] XFS 

[ovirt-users] Re: vdsm ProtocolDetector.SSLHandshakeDispatcher ERROR Error during handshake: no certificate returned

2020-12-01 Thread erin . sims
https://microdevsys.com/wp/get-host-capabilities-failed-general-sslengine-problem/
so i followed some stuff on this page to fix my issue, i rebuild the engine 
cert and that brrought up all the nodes for me, also rebuilt the apache.p12 one 
too. 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OEFRWUMGJD3LUTRD5WCQUR6LAXXQYDUH/


[ovirt-users] Re: Unable to move or copy disks

2020-12-01 Thread suporte
Thanks 

Did you use the command cp to copy data between gluster volumes? 

Regards 

José 


De: "Strahil Nikolov"  
Para: supo...@logicworks.pt 
Cc: users@ovirt.org 
Enviadas: Terça-feira, 1 De Dezembro de 2020 8:05:17 
Assunto: Re: [ovirt-users] Re: Unable to move or copy disks 

This looks like the bug I have reported a long time ago. 
The only fix I found was to create new gluster volume and "cp -a" all data from 
the old to the new volume. 

Do you have spare space for a new Gluster volume ? 
If yes, create the new volume and add it to Ovirt, then dd the file and move 
the disk to that new storage. 
Once you move all VM's disks you can get rid of the old Gluster volume and 
reuse the space . 

P.S.: Sadly I didn't have the time to look at your logs . 


Best Regards, 
Strahil Nikolov 






В понеделник, 30 ноември 2020 г., 01:22:46 Гринуич+2,  
написа: 





No errors 

# sudo -u vdsm dd 
if=/rhev/data-center/mnt/glusterSD/gfs1.server.pt:_gfs1data/0e8de531-ac5e-4089-b390-cfc0adc3e79a/images/a847beca-7ed0-4ff1-8767-fc398379d85b/61d85180-65a4-452d-8773-db778f56e242
 of=/dev/null bs=4M status=progress 
107336433664 bytes (107 GB) copied, 245.349334 s, 437 MB/s 
25600+0 records in 
25600+0 records out 
107374182400 bytes (107 GB) copied, 245.682 s, 437 MB/s 

After this I tried again to move the disk, and surprise, successfully 

I didn't believe it. 
Try to move another disk, the same error came back 
I did a dd to this other disk and tried again to move it, again successfully 

!!! 

 
De: "Strahil Nikolov"  
Para: supo...@logicworks.pt 
Cc: users@ovirt.org 
Enviadas: Domingo, 29 De Novembro de 2020 20:22:36 
Assunto: Re: [ovirt-users] Re: Unable to move or copy disks 

Usually distributed volumes are supported on a Single-node setup, but it 
shouldn't be the problem. 


As you know the affected VMs , you can easily find the disks of a VM. 

Then try to read the VM's disk: 

sudo -u vdsm dd 
if=/rhev/data-center/mnt/glusterSD/gfs1.server.pt:_gfs1data//images//
 of=/dev/null bs=4M status=progress 

Does it give errors ? 


Best Regards, 
Strahil Nikolov 



В неделя, 29 ноември 2020 г., 20:06:42 Гринуич+2, supo...@logicworks.pt 
 написа: 





No heals pending 
There are some VM's I can move the disk but some others VM's I cannot move the 
disk 


It's a simple gluster 
]# gluster volume info 

Volume Name: gfs1data 
Type: Distribute 
Volume ID: 7e6826b9-1220-49d4-a4bf-e7f50f38c42c 
Status: Started 
Snapshot Count: 0 
Number of Bricks: 1 
Transport-type: tcp 
Bricks: 
Brick1: gfs1.server.pt:/home/brick1 
Options Reconfigured: 
diagnostics.brick-log-level: INFO 
performance.client-io-threads: off 
server.event-threads: 4 
client.event-threads: 4 
cluster.choose-local: yes 
user.cifs: off 
features.shard: on 
cluster.shd-wait-qlength: 1 
cluster.locking-scheme: granular 
cluster.data-self-heal-algorithm: full 
cluster.server-quorum-type: server 
cluster.quorum-type: auto 
cluster.eager-lock: enable 
network.remote-dio: enable 
performance.low-prio-threads: 32 
performance.io-cache: off 
performance.read-ahead: off 
performance.quick-read: off 
storage.owner-gid: 36 
storage.owner-uid: 36 
transport.address-family: inet 
nfs.disable: on 



 
De: "Strahil Nikolov"  
Para: supo...@logicworks.pt 
Cc: users@ovirt.org 
Enviadas: Domingo, 29 De Novembro de 2020 17:27:04 
Assunto: Re: [ovirt-users] Re: Unable to move or copy disks 

Are you sure you don't have any heals pending ? 
I should admit I have never seen this type of error. 

Is it happening for all VMs or only specific ones ? 


Best Regards, 
Strahil Nikolov 






В неделя, 29 ноември 2020 г., 15:37:04 Гринуич+2, supo...@logicworks.pt 
 написа: 





Sorry, I found this error on gluster logs: 

[MSGID: 113040] [posix-helpers.c:1929:__posix_fd_ctx_get] 0-gfs1data-posix: 
Failed to get anonymous fd for real_path: 
/home/brick1/.glusterfs/bc/57/bc57653e-b08c-417b-83f3-bf234a97e30f. [No such 
file or directory] 

 
De: supo...@logicworks.pt 
Para: "Strahil Nikolov"  
Cc: users@ovirt.org 
Enviadas: Domingo, 29 De Novembro de 2020 13:13:00 
Assunto: [ovirt-users] Re: Unable to move or copy disks 

I don't find any error in the gluster logs, I just find this error in the vdsm 
log: 

2020-11-29 12:57:45,528+ INFO (tasks/1) [storage.SANLock] Successfully 
released Lease(name='61d85180-65a4-452d-8773-db778f56e242', 
path=u'/rhev/data-center/mnt/node2.server.pt:_home_node2data/ab4855be-0edd-4fac-b062-bded661e20a1/images/a847beca-7ed0-4ff1-8767-fc398379d85b/61d85180-65a4-452d-8773-db778f56e242.lease',
 offset=0) (clusterlock:524) 
2020-11-29 12:57:45,528+ ERROR (tasks/1) [root] Job 
u'cc8ea210-df4b-4f0b-a385-5bc3adc825f6' failed (jobs:221) 
Traceback (most recent call last): 
File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 157, in run 
self._run() 
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdm/api/copy_data.py", line 
86, in _run 

[ovirt-users] [CFP] Virtualization & IaaS Devroom

2020-12-01 Thread Piotr Kliczewski
We are excited to announce that the call for proposals is now open for the
Virtualization & IaaS devroom at the upcoming FOSDEM 2021, to be hosted
virtually on February 6th 2021.

This year will mark FOSDEM’s 21th anniversary as one of the longest-running
free and open source software developer events, attracting thousands of
developers and users from all over the world. Due to Covid-19, FOSDEM will
be held virtually this year on February 6th & 7th, 2021.

About the Devroom

The Virtualization & IaaS devroom will feature session topics such as open
source hypervisors and virtual machine managers such as Xen Project, KVM,
bhyve, and VirtualBox, and Infrastructure-as-a-Service projects such as
KubeVirt, Apache CloudStack, Foreman, OpenStack, oVirt, QEMU and OpenNebula.

This devroom will host presentations that focus on topics of shared
interest, such as KVM; libvirt; shared storage; virtualized networking;
cloud security; clustering and high availability; interfacing with multiple
hypervisors; hyperconverged deployments; and scaling across hundreds or
thousands of servers.

Presentations in this devroom will be aimed at users or developers working
on these platforms who are looking to collaborate and improve shared
infrastructure or solve common problems. We seek topics that encourage
dialog between projects and continued work post-FOSDEM.

Important Dates

Submission deadline: 20th of December

Acceptance notifications: 25th of December

Final schedule announcement: 31st of December

Recorded presentations upload deadline: 15th of January

Devroom: 6th February 2021

Submit Your Proposal

All submissions must be made via the Pentabarf event planning site[1]. If
you have not used Pentabarf before, you will need to create an account. If
you submitted proposals for FOSDEM in previous years, you can use your
existing account.

After creating the account, select Create Event to start the submission
process. Make sure to select Virtualization and IaaS devroom from the Track
list. Please fill out all the required fields, and provide a meaningful
abstract and description of your proposed session.

Submission Guidelines

We expect more proposals than we can possibly accept, so it is vitally
important that you submit your proposal on or before the deadline. Late
submissions are unlikely to be considered.

All presentation slots are 30 minutes, with 20 minutes planned for
presentations, and 10 minutes for Q

All presentations will need to be pre-recorded and put into our system at
least a couple of weeks before the event.

The presentations should be uploaded by 15th of January and made available
under Creative

Commons licenses. In the Submission notes field, please indicate that you
agree that your presentation will be licensed under the CC-By-SA-4.0 or
CC-By-4.0 license and that you agree to have your presentation recorded.
For example:

"If my presentation is accepted for FOSDEM, I hereby agree to license all
recordings, slides, and other associated materials under the Creative
Commons Attribution Share-Alike 4.0 International License. Sincerely,
."

In the Submission notes field, please also confirm that if your talk is
accepted, you will be able to attend the virtual FOSDEM event for the Q
We will not consider proposals from prospective speakers who are unsure
whether they will be able to attend the FOSDEM virtual event.

If you are experiencing problems with Pentabarf, the proposal submission
interface, or have other questions, you can email our devroom mailing
list[2] and we will try to help you.


Code of Conduct

Following the release of the updated code of conduct for FOSDEM, we'd like
to remind all speakers and attendees that all of the presentations and
discussions in our devroom are held under the guidelines set in the CoC and
we expect attendees, speakers, and volunteers to follow the CoC at all
times.

If you submit a proposal and it is accepted, you will be required to
confirm that you accept the FOSDEM CoC. If you have any questions about the
CoC or wish to have one of the devroom organizers review your presentation
slides or any other content for CoC compliance, please email us and we will
do our best to assist you.

Call for Volunteers

We are also looking for volunteers to help run the devroom. We need
assistance with helping speakers to record the presentation as well as
helping with streaming and chat moderation for the devroom. Please contact
devroom mailing list [2] for more information.

Questions?

If you have any questions about this devroom, please send your questions to
our devroom mailing list. You can also subscribe to the list to receive
updates about important dates, session announcements, and to connect with
other attendees.

See you all at FOSDEM!

[1] https://penta.fosdem
.org/submission/FOSDEM21

[2] iaas-virt-devroom at lists.fosdem.org
___
Users mailing list -- users@ovirt.org
To 

[ovirt-users] Re: Can't to update cluster to 4.5

2020-12-01 Thread Klaas Demter

https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZPLFRE7TMLQDSVVTQYU7BYGNAIOKC5HF/

needs centos 8.3 which is not out yet

On 12/1/20 5:59 PM, Patrick Lomakin wrote:

I have a message:
"Data Center Default compatibility version is 4.4, which is lower than latest 
available version 4.5. Please upgrade your Data Center to latest version to successfully 
finish upgrade of your setup".
  When I try to update datacenter Default I got an error: Error while executing 
action: Cannot update Data Center compatibility version to a value that is 
greater than its cluster's version. The following clusters should be upgraded: 
Default. Next step for me is to try to upgrade compatibility of Cluster. But I 
have tried to do this by editing Cluster compatibility version from 4.4 to 4.5, 
but got an error: Error while executing action: Cannot change Cluster 
Compatibility Version to higher version when there are active Hosts with lower 
version. Please, give me some advices how to fix problem.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NUJPUVEBAR5ZDWVL3OMQJXGQY2FP5HPH/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/N4NORNPEV6KPBGPFMZ4MTSIS2OSR22WI/


[ovirt-users] Can't to update cluster to 4.5

2020-12-01 Thread Patrick Lomakin
I have a message: 
"Data Center Default compatibility version is 4.4, which is lower than latest 
available version 4.5. Please upgrade your Data Center to latest version to 
successfully finish upgrade of your setup".
 When I try to update datacenter Default I got an error: Error while executing 
action: Cannot update Data Center compatibility version to a value that is 
greater than its cluster's version. The following clusters should be upgraded: 
Default. Next step for me is to try to upgrade compatibility of Cluster. But I 
have tried to do this by editing Cluster compatibility version from 4.4 to 4.5, 
but got an error: Error while executing action: Cannot change Cluster 
Compatibility Version to higher version when there are active Hosts with lower 
version. Please, give me some advices how to fix problem.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NUJPUVEBAR5ZDWVL3OMQJXGQY2FP5HPH/


[ovirt-users] Re: Failed upgrade from SHE 4.3.10 to 4.4.3 - Host set to Non-Operational - missing networks

2020-12-01 Thread Patrick Lomakin
Try this command on the host: vdsm-tool remove-config, vdsm-tool configure 
--force; If it will not work, you can stop your VMs with virsh, install a 
"clean" setup of the hosted engine and import your storage with all VMs. I 
think if it possible, installing a new HE and manually set some parameters, 
IMHO, a "little bit" faster, that trying to restore HE with different major 
releases.  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DZTGWTESHHFHEEHOPNIBBMFM2DAYG7Q4/


[ovirt-users] Failed upgrade from SHE 4.3.10 to 4.4.3 - Host set to Non-Operational - missing networks

2020-12-01 Thread Roberto Nunin
We are following both oVirt upgrade guide [1] and RHV 4.4 upgrade guide [2].

aps-te62-mng.corporate.it ---> host resinatlled with oVirt Node 4.4.3
aps-te61-mng.corporate.it ---> host where previous ovirt-engine 4.3.10 VM
was running when backup was taken.

hosted-engine --deploy  --restore-from-file= fails with
following errors in ovirt-hosted-engine-setup:

2020-12-01 15:53:37,534+0100 ERROR
otopi.ovirt_hosted_engine_setup.ansible_utils
ansible_utils._process_output:109 fatal: [localhost]: FAILED! =>
{"changed": false, "msg": "The host has been set in non_operational status,
please check engine logs, more info can be found in the engine logs, fix
accordingly and re-deploy."}
2020-12-01 15:56:30,414+0100 ERROR
otopi.ovirt_hosted_engine_setup.ansible_utils
ansible_utils._process_output:109 fatal: [localhost]: FAILED! =>
{"changed": false, "msg": "The system may not be provisioned according to
the playbook results: please check the logs for the issue, fix accordingly
or re-deploy from scratch.\n"}
2020-12-01 15:56:33,731+0100 ERROR otopi.context context._executeMethod:154
Failed to execute stage 'Closing up': Failed executing ansible-playbook
2020-12-01 15:57:08,663+0100 ERROR
otopi.ovirt_hosted_engine_setup.ansible_utils
ansible_utils._process_output:109 fatal: [localhost]: UNREACHABLE! =>
{"changed": false, "msg": "Failed to connect to the host via ssh: ssh:
connect to host itte1lv51-mng.comifar.it port 22: Connection timed out",
"skip_reason": "Host localhost is unreachable", "unreachable": true}
2020-12-01 15:58:22,179+0100 ERROR otopi.plugins.gr_he_common.core.misc
misc._terminate:167 Hosted Engine deployment failed: please check the logs
for the issue, fix accordingly or re-deploy from scratch.

while within the HostedEngineLocal engine.log:

2020-12-01 15:52:42,161+01 ERROR
[org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
(EE-ManagedThreadFactory-engine-Thread-96) [11f50ce0] Host '
aps-te62-mng.corporate.it' is set to Non-Operational, it is missing the
following networks:
'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9'
2020-12-01 15:52:48,474+01 ERROR
[org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
(EE-ManagedScheduledExecutorService-engineScheduledTh
readPool-Thread-12) [41688fc7] Host 'aps-te62-mng.corporate.it' is set to
Non-Operational, it is missing the following networks:
'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9'
2020-12-01 15:52:53,734+01 ERROR
[org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-6)
[5fc7257] Host 'aps-te62-mng.corporate.it' is set to Non-Operational, it is
missing the following networks:
'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9'
2020-12-01 15:52:54,567+01 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring]
(ForkJoinPool-1-worker-13) [] Rerun VM
'f9249e06-237e-412c-91e9-7b0fa0b6ec2a'. Called from VDS '
aps-te62-mng.corprorate.it'
2020-12-01 15:52:54,676+01 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-361) [] EVENT_ID:
VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed  (VM:
external-HostedEngineLocal, Source: aps-te62-mng.corporate.it, Destination:
aps-te61-mng.corporate.it).

Why is the playbook trying to migrate HostedEngineLocal from reinstalled
4.4.3 oVirt node to an existing one that is still running oVirt Node 4.3.x ?
How can we manage this issue and proceed with the upgrade ?

[1]
https://www.ovirt.org/documentation/upgrade_guide/#SHE_Upgrading_from_4-3
[2]
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html/upgrade_guide/she_upgrading_from_4-3


Thanks is advance for support.
Best regards

-- 
Roberto Nunin
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EGANL54AX3URDKV5NDKFCYKRJAWRD3O7/


[ovirt-users] Re: Constantly XFS in memory corruption inside VMs

2020-12-01 Thread Vinícius Ferrão via Users
ECC RAM everywhere: hosts and storage.

I even run Memtest86 on both hypervisor hosts just be sure. No errors. I 
haven’t had the opportunity to run it on the storage yet.

After I’ve sent that message yesterday, the engine VM crashed again, filesystem 
went offline. There was some discards (again) on the switch, probably due to 
the “boot storm” of other VM’s. But this time a simple reboot fixed the 
filesystem and the hosted engine VM was back.

Since it was an extremely small amount of time, I’ve checked everything again, 
and only the discards issues came up, there are ~90k discards on Po2 (which is 
the LACP interface of the hypervisor). Since the fact, I enabled hardware flow 
control on the ports of the switch, but discards are still happening:

PortAlign-Err FCS-ErrXmit-Err Rcv-Err  UnderSize  
OutDiscards 
Po1 0   0   0   0  0
0 
Po2 0   0   0   0  0 
3650 
Po3 0   0   0   0  0
0 
Po4 0   0   0   0  0
0 
Po5 0   0   0   0  0
0 
Po6 0   0   0   0  0
0 
Po7 0   0   0   0  0
0 
Po200   0   0   0  0
13788 

I think this may be related… but it’s just a guess.

Thanks,


> On 1 Dec 2020, at 05:06, Strahil Nikolov  wrote:
> 
> Could it be faulty ram ?
> Do you use ECC ram ?
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В вторник, 1 декември 2020 г., 06:17:10 Гринуич+2, Vinícius Ferrão via Users 
>  написа: 
> 
> 
> 
> 
> 
> 
> Hi again,
> 
> 
> 
> I had to shutdown everything because of a power outage in the office. When 
> trying to get the infra up again, even the Engine have corrupted: 
> 
> 
> 
> [  772.466982] XFS (dm-4): Invalid superblock magic number
> mount: /var: wrong fs type, bad option, bad superblock on 
> /dev/mapper/ovirt-var, missing codepage or helper program, or other error.
> [  772.472885] XFS (dm-3): Mounting V5 Filesystem
> [  773.629700] XFS (dm-3): Starting recovery (logdev: internal)
> [  773.731104] XFS (dm-3): Metadata CRC error detected at 
> xfs_agfl_read_verify+0xa1/0xf0 [xfs], xfs_agfl block 0xf3 
> [  773.734352] XFS (dm-3): Unmount and run xfs_repair
> [  773.736216] XFS (dm-3): First 128 bytes of corrupted metadata buffer:
> [  773.738458] : 23 31 31 35 36 35 35 34 29 00 2d 20 52 65 62 75  
> #1156554).- Rebu
> [  773.741044] 0010: 69 6c 74 20 66 6f 72 20 68 74 74 70 73 3a 2f 2f  ilt 
> for https://
> [  773.743636] 0020: 66 65 64 6f 72 61 70 72 6f 6a 65 63 74 2e 6f 72  
> fedoraproject.or
> [  773.746191] 0030: 67 2f 77 69 6b 69 2f 46 65 64 6f 72 61 5f 32 33  
> g/wiki/Fedora_23
> [  773.748818] 0040: 5f 4d 61 73 73 5f 52 65 62 75 69 6c 64 00 2d 20  
> _Mass_Rebuild.- 
> [  773.751399] 0050: 44 72 6f 70 20 6f 62 73 6f 6c 65 74 65 20 64 65  
> Drop obsolete de
> [  773.753933] 0060: 66 61 74 74 72 20 73 74 61 6e 7a 61 73 20 28 23  
> fattr stanzas (#
> [  773.756428] 0070: 31 30 34 37 30 33 31 29 00 2d 20 49 6e 73 74 61  
> 1047031).- Insta
> [  773.758873] XFS (dm-3): metadata I/O error in "xfs_trans_read_buf_map" at 
> daddr 0xf3 len 1 error 74
> [  773.763756] XFS (dm-3): xfs_do_force_shutdown(0x8) called from line 446 of 
> file fs/xfs/libxfs/xfs_defer.c. Return address = 962bd5ee
> [  773.769363] XFS (dm-3): Corruption of in-memory data detected.  Shutting 
> down filesystem
> [  773.772643] XFS (dm-3): Please unmount the filesystem and rectify the 
> problem(s)
> [  773.776079] XFS (dm-3): xfs_imap_to_bp: xfs_trans_read_buf() returned 
> error -5.
> [  773.779113] XFS (dm-3): xlog_recover_clear_agi_bucket: failed to clear agi 
> 3. Continuing.
> [  773.783039] XFS (dm-3): xfs_imap_to_bp: xfs_trans_read_buf() returned 
> error -5.
> [  773.785698] XFS (dm-3): xlog_recover_clear_agi_bucket: failed to clear agi 
> 3. Continuing.
> [  773.790023] XFS (dm-3): Ending recovery (logdev: internal)
> [  773.792489] XFS (dm-3): Error -5 recovering leftover CoW allocations.
> mount: /var/log: can't read superblock on /dev/mapper/ovirt-log.
> mount: /var/log/audit: mount point does not exist.
> 
> 
> 
> 
> /var seems to be completely trashed.
> 
> 
> 
> 
> The only time that I’ve seem something like this was faulty hardware. But 
> nothing shows up on logs, as far as I know.
> 
> 
> 
> 
> After forcing repairs with -L I’ve got other issues:
> 
> 
> 
> 
> mount -a
> [  326.170941] XFS (dm-4): Mounting V5 Filesystem
> [  326.404788] XFS (dm-4): Ending clean mount
> [  326.415291] XFS (dm-3): Mounting V5 Filesystem
> [  326.611673] XFS (dm-3): Ending clean mount
> [  326.621705] XFS (dm-2): Mounting V5 Filesystem
> [  326.784067] XFS 

[ovirt-users] oVirt 4.4.3 async is now generally available

2020-12-01 Thread Lev Veyde
The oVirt project just released an oVirt 4.4.3 async , as of December 1st,
2020.

This release fixes the following bugs:

1850939 Hosted engine deployment does not properly show iSCSI LUN errors

1895553 Add pre-flight in cockpit deployment flow to check for disk block
sizes used for bricks and LV cache to be identical

1895277 [DR] Remote data sync to the secondary site never completes

1895762 cockpit ovirt(downstream) docs links point to upstream docs

1835685 [Hosted-Engine] "Installation Guide" and "RHV Documents" didn't
jump to the correct pages in hosted engine page

1858248 Volume creation with 6 or more nodes in cluster, should allow users
to select the hosts for brick creation

1895356 Upgrade to 4.4.2 will fail due to dangling symlinks

-- 

Lev Veyde

Senior Software Engineer, RHCE | RHCVA | MCITP

Red Hat Israel



l...@redhat.com | lve...@redhat.com

TRIED. TESTED. TRUSTED. 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2XKKX2O763XKYUTHLDY5NON6ZUCE6Y3V/


[ovirt-users] Re: Constantly XFS in memory corruption inside VMs

2020-12-01 Thread Strahil Nikolov via Users
Could it be faulty ram ?
Do you use ECC ram ?

Best Regards,
Strahil Nikolov






В вторник, 1 декември 2020 г., 06:17:10 Гринуич+2, Vinícius Ferrão via Users 
 написа: 






Hi again,



I had to shutdown everything because of a power outage in the office. When 
trying to get the infra up again, even the Engine have corrupted: 



[  772.466982] XFS (dm-4): Invalid superblock magic number
mount: /var: wrong fs type, bad option, bad superblock on 
/dev/mapper/ovirt-var, missing codepage or helper program, or other error.
[  772.472885] XFS (dm-3): Mounting V5 Filesystem
[  773.629700] XFS (dm-3): Starting recovery (logdev: internal)
[  773.731104] XFS (dm-3): Metadata CRC error detected at 
xfs_agfl_read_verify+0xa1/0xf0 [xfs], xfs_agfl block 0xf3 
[  773.734352] XFS (dm-3): Unmount and run xfs_repair
[  773.736216] XFS (dm-3): First 128 bytes of corrupted metadata buffer:
[  773.738458] : 23 31 31 35 36 35 35 34 29 00 2d 20 52 65 62 75  
#1156554).- Rebu
[  773.741044] 0010: 69 6c 74 20 66 6f 72 20 68 74 74 70 73 3a 2f 2f  ilt 
for https://
[  773.743636] 0020: 66 65 64 6f 72 61 70 72 6f 6a 65 63 74 2e 6f 72  
fedoraproject.or
[  773.746191] 0030: 67 2f 77 69 6b 69 2f 46 65 64 6f 72 61 5f 32 33  
g/wiki/Fedora_23
[  773.748818] 0040: 5f 4d 61 73 73 5f 52 65 62 75 69 6c 64 00 2d 20  
_Mass_Rebuild.- 
[  773.751399] 0050: 44 72 6f 70 20 6f 62 73 6f 6c 65 74 65 20 64 65  Drop 
obsolete de
[  773.753933] 0060: 66 61 74 74 72 20 73 74 61 6e 7a 61 73 20 28 23  fattr 
stanzas (#
[  773.756428] 0070: 31 30 34 37 30 33 31 29 00 2d 20 49 6e 73 74 61  
1047031).- Insta
[  773.758873] XFS (dm-3): metadata I/O error in "xfs_trans_read_buf_map" at 
daddr 0xf3 len 1 error 74
[  773.763756] XFS (dm-3): xfs_do_force_shutdown(0x8) called from line 446 of 
file fs/xfs/libxfs/xfs_defer.c. Return address = 962bd5ee
[  773.769363] XFS (dm-3): Corruption of in-memory data detected.  Shutting 
down filesystem
[  773.772643] XFS (dm-3): Please unmount the filesystem and rectify the 
problem(s)
[  773.776079] XFS (dm-3): xfs_imap_to_bp: xfs_trans_read_buf() returned error 
-5.
[  773.779113] XFS (dm-3): xlog_recover_clear_agi_bucket: failed to clear agi 
3. Continuing.
[  773.783039] XFS (dm-3): xfs_imap_to_bp: xfs_trans_read_buf() returned error 
-5.
[  773.785698] XFS (dm-3): xlog_recover_clear_agi_bucket: failed to clear agi 
3. Continuing.
[  773.790023] XFS (dm-3): Ending recovery (logdev: internal)
[  773.792489] XFS (dm-3): Error -5 recovering leftover CoW allocations.
mount: /var/log: can't read superblock on /dev/mapper/ovirt-log.
mount: /var/log/audit: mount point does not exist.




/var seems to be completely trashed.




The only time that I’ve seem something like this was faulty hardware. But 
nothing shows up on logs, as far as I know.




After forcing repairs with -L I’ve got other issues:




mount -a
[  326.170941] XFS (dm-4): Mounting V5 Filesystem
[  326.404788] XFS (dm-4): Ending clean mount
[  326.415291] XFS (dm-3): Mounting V5 Filesystem
[  326.611673] XFS (dm-3): Ending clean mount
[  326.621705] XFS (dm-2): Mounting V5 Filesystem
[  326.784067] XFS (dm-2): Starting recovery (logdev: internal)
[  326.792083] XFS (dm-2): Metadata CRC error detected at 
xfs_agi_read_verify+0xc7/0xf0 [xfs], xfs_agi block 0x2 
[  326.794445] XFS (dm-2): Unmount and run xfs_repair
[  326.795557] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
[  326.797055] : 4d 33 44 34 39 56 00 00 80 00 00 00 f0 cf 00 00  
M3D49V..
[  326.799685] 0010: 00 00 00 00 02 00 00 00 23 10 00 00 3d 08 01 08  
#...=...
[  326.802290] 0020: 21 27 44 34 39 56 00 00 00 d0 00 00 01 00 00 00  
!'D49V..
[  326.804748] 0030: 50 00 00 00 00 00 00 00 23 10 00 00 41 01 08 08  
P...#...A...
[  326.807296] 0040: 21 27 44 34 39 56 00 00 10 d0 00 00 02 00 00 00  
!'D49V..
[  326.809883] 0050: 60 00 00 00 00 00 00 00 23 10 00 00 41 01 08 08  
`...#...A...
[  326.812345] 0060: 61 2f 44 34 39 56 00 00 00 00 00 00 00 00 00 00  
a/D49V..
[  326.814831] 0070: 50 34 00 00 00 00 00 00 23 10 00 00 82 08 08 04  
P4..#...
[  326.817237] XFS (dm-2): metadata I/O error in "xfs_trans_read_buf_map" at 
daddr 0x2 len 1 error 74
mount: /var/log/audit: mount(2) system call failed: Structure needs cleaning.




But after more xfs_repair -L the engine is up…




Now I need to scavenge other VMs and do the same thing.




That’s it.




Thanks all,

V.




PS: For those interested, there’s a paste of the fixes: 
https://pastebin.com/jsMguw6j







>  
> On 29 Nov 2020, at 17:03, Strahil Nikolov  wrote:
> 
> 
>  
> Damn...
> 
> You are using EFI boot. Does this happen only to EFI machines ?
> Did you notice if only EL 8 is affected ?
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В неделя, 29 ноември 2020 г., 19:36:09 Гринуич+2, Vinícius Ferrão 
>  написа: 
> 
> 
> 
> 
> 
> Yes!
> 
> I have a live VM right now that will 

[ovirt-users] Re: Unable to move or copy disks

2020-12-01 Thread Strahil Nikolov via Users
This looks like the bug I have reported a long time ago.
The only fix I found was to create new gluster volume and "cp -a" all data from 
the old to the new volume.

Do you have spare space for a new Gluster volume ?
If yes, create the new volume and add it to Ovirt, then dd the file and move 
the disk to that new storage.
Once you move all VM's disks you can get rid of the old Gluster volume and 
reuse the space .

P.S.: Sadly I didn't have the time to look at your logs .


Best Regards,
Strahil Nikolov






В понеделник, 30 ноември 2020 г., 01:22:46 Гринуич+2,  
написа: 





No errors

# sudo -u vdsm dd 
if=/rhev/data-center/mnt/glusterSD/gfs1.server.pt:_gfs1data/0e8de531-ac5e-4089-b390-cfc0adc3e79a/images/a847beca-7ed0-4ff1-8767-fc398379d85b/61d85180-65a4-452d-8773-db778f56e242
 of=/dev/null bs=4M status=progress
107336433664 bytes (107 GB) copied, 245.349334 s, 437 MB/s
25600+0 records in
25600+0 records out
107374182400 bytes (107 GB) copied, 245.682 s, 437 MB/s

After this I tried again to move the disk, and surprise, successfully

I didn't believe it.
Try to move another disk, the same error came back
I did a dd to this other disk and tried again to move it, again successfully

!!!


De: "Strahil Nikolov" 
Para: supo...@logicworks.pt
Cc: users@ovirt.org
Enviadas: Domingo, 29 De Novembro de 2020 20:22:36
Assunto: Re: [ovirt-users] Re: Unable to move or copy disks

Usually distributed volumes are supported on a Single-node setup, but it 
shouldn't be the problem.


As you know the affected VMs , you can easily find the disks of a VM.

Then try to read the VM's disk:

sudo -u vdsm dd 
if=/rhev/data-center/mnt/glusterSD/gfs1.server.pt:_gfs1data//images//
 of=/dev/null bs=4M status=progress

Does it give errors ?


Best Regards,
Strahil Nikolov



В неделя, 29 ноември 2020 г., 20:06:42 Гринуич+2, supo...@logicworks.pt 
 написа: 





No heals pending
There are some VM's I can move the disk but some others VM's I cannot move the 
disk


It's a simple gluster
]# gluster volume info

Volume Name: gfs1data
Type: Distribute
Volume ID: 7e6826b9-1220-49d4-a4bf-e7f50f38c42c
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: gfs1.server.pt:/home/brick1
Options Reconfigured:
diagnostics.brick-log-level: INFO
performance.client-io-threads: off
server.event-threads: 4
client.event-threads: 4
cluster.choose-local: yes
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 1
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: enable
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-gid: 36
storage.owner-uid: 36
transport.address-family: inet
nfs.disable: on




De: "Strahil Nikolov" 
Para: supo...@logicworks.pt
Cc: users@ovirt.org
Enviadas: Domingo, 29 De Novembro de 2020 17:27:04
Assunto: Re: [ovirt-users] Re: Unable to move or copy disks

Are you sure you don't have any heals pending ?
I should admit I have never seen this type of error.

Is it happening for all VMs or only specific ones ?


Best Regards,
Strahil Nikolov






В неделя, 29 ноември 2020 г., 15:37:04 Гринуич+2, supo...@logicworks.pt 
 написа: 





Sorry, I found this error on gluster logs:

 [MSGID: 113040] [posix-helpers.c:1929:__posix_fd_ctx_get] 0-gfs1data-posix: 
Failed to get anonymous fd for real_path: 
/home/brick1/.glusterfs/bc/57/bc57653e-b08c-417b-83f3-bf234a97e30f. [No such 
file or directory]


De: supo...@logicworks.pt
Para: "Strahil Nikolov" 
Cc: users@ovirt.org
Enviadas: Domingo, 29 De Novembro de 2020 13:13:00
Assunto: [ovirt-users] Re: Unable to move or copy disks

I don't find any error in the gluster logs, I just find this error in the vdsm 
log:

2020-11-29 12:57:45,528+ INFO  (tasks/1) [storage.SANLock] Successfully 
released Lease(name='61d85180-65a4-452d-8773-db778f56e242', 
path=u'/rhev/data-center/mnt/node2.server.pt:_home_node2data/ab4855be-0edd-4fac-b062-bded661e20a1/images/a847beca-7ed0-4ff1-8767-fc398379d85b/61d85180-65a4-452d-8773-db778f56e242.lease',
 offset=0) (clusterlock:524)
2020-11-29 12:57:45,528+ ERROR (tasks/1) [root] Job 
u'cc8ea210-df4b-4f0b-a385-5bc3adc825f6' failed (jobs:221)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 157, in run
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdm/api/copy_data.py", 
line 86, in _run
    self._operation.run()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 343, in 
run
    for data in self._operation.watch():
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 106, 
in watch
    self._finalize(b"", err)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 179, 
in _finalize