[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

Gianluca Cecchi Sat, 28 Mar 2020 16:53:09 -0700

On Sat, Mar 28, 2020 at 8:26 PM Nir Soffer <[email protected]> wrote:


[snip]

> Hey Nir,
> > You are right ... This is just a theory based on my knowledge and it
> might not be valid.
> > We nees the libvirt logs to confirm or reject  the theory, but I'm
> convinced that is the reason.
> >
> > Yet,  it's quite  possible.
> > Qemu tries to write to the qcow disk on gluster.
> > Gluster is creating shards based of the ofset, as it was not done
> initially (preallocated  disk  take the full size  on gluster  and all
> shards are created  immediately). This takes time and requires  to be done
> on all bricks.
> > As the shard size  is too small (default 64MB), gluster has to create
> the next shard almost immediately,  but if it can't do it as fast as qemu
> is filling it's qcow2  disk
>
> Gluster can block the I/O until it can write the data to a new shard.
> There is no reason
> to return an error unless a real error happened.
>
> Also the VMs mentioned here are using raw disks, not qcow2:
>
> [snip]

>             <target bus="scsi" dev="sda"/>
>             <source
>
> file="/rhev/data-center/mnt/glusterSD/ovirtst.mydomain.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/0a91c346-23a5-4432-8af7-ae0a28f9c208/2741af0b-27fe-4f7b-a8bc-8b34b9e31cb6">
>                 <seclabel model="dac" relabel="no" type="none"/>
>             </source>
>             <driver cache="none" error_policy="stop" io="threads"
> name="qemu" type="raw"/>
>
> [snip]

>
> Note type="raw"
>
> >  -  qemu will get an I/O error and we know what happens there.
> > Later gluster manages to create the shard(s) , and the VM is unpaused.
> >
> > That's why the oVirt team made all gluster-based disks to be fully
> preallocated.
>

Yes, in my disk definition I used default proposed.
Possibly I only chose virito-scsi (see the sda name): I don't remember in
4.3.9 and red hat core os as os type if virtio would be the default one or
not...


> Gluster disks are thin (raw-sparse) by default just like any other
> file based storage.
>
> If this theory was correct, this would fail consistently on gluster:
>
> 1. create raw sparse image
>
>     truncate -s 100g /rhev/data-center/mnt/glusterSD/server:_path/test
>
> 2. Fill image quickly with data
>
>     dd if=/dev/zero bs=1M | tr "\0" "U" | dd
> of=/rhev/data-center/mnt/glusterSD/server:_path/test bs=1M count=12800
> iflag=fullblock oflag=direct conv=notrunc
>
> According to your theory gluster will fail to allocate shards fast
> enough and fail the I/O.
>
> Nir
>

I can also try the commands above, just to see the behavior, and report
here.
As soon as I can connect to the system

Gianluca

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/WULGO4LK5ARACVH6TFNNT5SZR3W4EODZ/

[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

Reply via email to