from:"Nir Soffer"

[ovirt-users] Re: Are people still experiencing issues with GlusterFS on 4.3x?

2019-03-16 Thread Nir Soffer

On Fri, Mar 15, 2019, 15:16 Sandro Bonazzola 
>
> Il giorno ven 15 mar 2019 alle ore 14:00 Simon Coter <
> simon.co...@oracle.com> ha scritto:
>
>> Hi,
>>
>> something that I’m seeing in the vdsm.log, that I think is gluster
>> related is the following message:
>>
>> 2019-03-15 05:58:28,980-0700 INFO  (jsonrpc/6) [root] managedvolume not
>> supported: Managed Volume Not Supported. Missing package os-brick.:
>> ('Cannot import os_brick',) (caps:148)
>>
>> os_brick seems something available by openstack channels but I didn’t
>> verify.
>>
>
> Fred, I see you introduced above error in vdsm
> commit 9646c6dc1b875338b170df2cfa4f41c0db8a6525 back in November 2018.
> I guess you are referring to python-os-brick.
> Looks like it's related to cinderlib integration.
> I would suggest to:
> - fix error message pointing to python-os-brick
> - add python-os-brick dependency in spec file if the dependency is not
> optional
> - if the dependency is optional as it seems to be, adjust the error
> message to say so. I feel nervous seeing errors on missing packages :-)
>
>

There is no error message here. This is an INFO level message, not an ERROR
or WARN, and it just explains why managed volumes will not be available on
this host.

Having this information in the log is extremely important for developers
and support.

I think we can improve the message to mention the actual package name, but
otherwise there is no issue in this info message.

Nir


>> Simon
>>
>> On Mar 15, 2019, at 1:54 PM, Sandro Bonazzola 
>> wrote:
>>
>>
>>
>> Il giorno ven 15 mar 2019 alle ore 13:46 Strahil Nikolov <
>> hunter86...@yahoo.com> ha scritto:
>>
>>>
>>> >I along with others had GlusterFS issues after 4.3 upgrades, the failed
>>> to dispatch handler issue with bricks going down intermittently.  After
>>> some time it seemed to have corrected itself (at least in my enviornment)
>>> and I >hadn't had any brick problems in a while.  I upgraded my three node
>>> HCI cluster to 4.3.1 yesterday and again I'm running in to brick issues.
>>> They will all be up running fine then all of a sudden a brick will randomly
>>> drop >and I have to force start the volume to get it back up.
>>> >
>>> >Have any of these Gluster issues been addressed in 4.3.2 or any other
>>> releases/patches that may be available to help the problem at this time?
>>> >
>>> >Thanks!
>>>
>>> Yep,
>>>
>>> sometimes a brick dies (usually my ISO domain ) and then I have to
>>> "gluster volume start isos force".
>>> Sadly I had several issues with 4.3.X - problematic OVF_STORE (0 bytes),
>>> issues with gluster , out-of-sync network - so for me 4.3.0 & 4.3.0 are
>>> quite unstable.
>>>
>>> Is there a convention indicating stability ? Is 4.3.xxx means unstable ,
>>> while 4.2.yyy means stable ?
>>>
>>
>> No, there's no such convention. 4.3 is supposed to be stable and
>> production ready.
>> The fact it isn't stable enough for all the cases means it has not been
>> tested for those cases.
>> In oVirt 4.3.1 RC cycle testing (
>> https://trello.com/b/5ZNJgPC3/ovirt-431-test-day-1 ) we got
>> participation of only 6 people and not even all the tests have been
>> completed.
>> Help testing during release candidate phase helps having more stable
>> final releases.
>> oVirt 4.3.2 is at its second release candidate, if you have time and
>> resource, it would be helpful testing it on an environment which is similar
>> to your production environment and give feedback / report bugs.
>>
>> Thanks
>>
>>
>>
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ACQE2DCN2LP3RPIPZNXYSLCBXZ4VOPX2/
>>>
>>
>>
>> --
>> SANDRO BONAZZOLA
>>
>> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
>> Red Hat EMEA 
>>
>> sbona...@redhat.com
>> 
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/UPPMAKYNGWB6F4GPZTHOY4QC6GGO66CX/
>>
>>
>>
>
> --
>
> SANDRO BONAZZOLA
>
> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
>
> Red Hat EMEA 
>
> sbona...@redhat.com
> 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
>

[ovirt-users] Re: qemu-img info showed iscsi/FC lun size 0

2019-03-13 Thread Nir Soffer

On Wed, Mar 13, 2019 at 8:40 PM Jingjie Jiang 
wrote:

> Hi Nir,
>
> I had qcow2 on FC, but qemu-img still showed size is 0.
>
> # qemu-img info
> /rhev/data-center/mnt/blockSD/eaa6f641-6b36-4c1d-bf99-6ba77df3156f/images/38cdceea-45d9-4616-8eef-966acff2f7be/8a32c5af-f01f-48f4-9329-e173ad3483b1
>
> image:
> /rhev/data-center/mnt/blockSD/eaa6f641-6b36-4c1d-bf99-6ba77df3156f/images/38cdceea-45d9-4616-8eef-966acff2f7be/8a32c5af-f01f-48f4-9329-e173ad3483b1
> file format: qcow2
> virtual size: 20G (21474836480 bytes)
> *disk size: 0*
> cluster_size: 65536
> Format specific information:
> compat: 1.1
> lazy refcounts: false
> refcount bits: 16
> corrupt: false
>
> Is the behavior expected?
>
Yes, I explained it here on few weeks ago:
http://lists.nongnu.org/archive/html/qemu-block/2019-02/msg01040.html

>
> Thanks,
>
> Jingjie
>
>
> On 2/22/19 1:53 PM, Nir Soffer wrote:
>
> On Fri, Feb 22, 2019 at 7:14 PM Nir Soffer  wrote:
>
>> On Fri, Feb 22, 2019 at 5:00 PM Jingjie Jiang 
>> wrote:
>>
>>> What about qcow2 format?
>>>
>> qcow2 reports the real size regardless of the underlying storage, since
> qcow2 manages
> the allocations. However the size is reported in qemu-img check in the
> image-end-offset.
>
> $ dd if=/dev/zero bs=1M count=10 | tr "\0" "\1" > test.raw
>
> $ truncate -s 200m test.raw
>
> $ truncate -s 1g backing
>
> $ sudo losetup -f backing --show
> /dev/loop2
>
> $ sudo qemu-img convert -f raw -O qcow2 test.raw /dev/loop2
>
> $ sudo qemu-img info --output json /dev/loop2
> {
> "virtual-size": 209715200,
> "filename": "/dev/loop2",
> "cluster-size": 65536,
> "format": "qcow2",
> "actual-size": 0,
> "format-specific": {
> "type": "qcow2",
> "data": {
> "compat": "1.1",
> "lazy-refcounts": false,
> "refcount-bits": 16,
> "corrupt": false
> }
> },
> "dirty-flag": false
> }
>
> $ sudo qemu-img check --output json /dev/loop2
> {
> "image-end-offset": 10813440,
> "total-clusters": 3200,
> "check-errors": 0,
> "allocated-clusters": 160,
> "filename": "/dev/loop2",
> "format": "qcow2"
> }
>
> We use this for reducing volumes to optimal size after merging snapshots,
> but
> we don't report this value to engine.
>
> Is there a choice  to create vm disk with format qcow2 instead of raw?
>>>
>> Not for LUNs, only for images.
>>
>> The available formats in 4.3 are documented here:
>>
>> https://ovirt.org/develop/release-management/features/storage/incremental-backup.html#disk-format
>>
>> incremental means you checked the checkbox "Enable incremental backup"
>> when creating a disk.
>> But note that the fact that we will create qcow2 image is implementation
>> detail and the behavior
>> may change in the future. For example, qemu is expected to provide a way
>> to do incremental
>> backup with raw volumes, and in this case we may create a raw volume
>> instead of qcow2 volume.
>> (actually raw data volume and qcow2 metadata volume).
>>
>> If you want to control the disk format, the only way is via the REST API
>> or SDK, where you can
>> specify the format instead of allocation policy. However even if you
>> specify the format in the SDK
>> the system may chose to change the format when copying the disk to
>> another storage type. For
>> example if you copy qcow2/sparse image from block storage to file storage
>> the system will create
>> a raw/sparse image.
>>
>> If you desire to control the format both from the UI and REST API/SDK and
>> ensure that the system
>> will never change the selected format please file a bug explaining the
>> use case.
>>
>> On 2/21/19 5:46 PM, Nir Soffer wrote:
>>>
>>>
>>>
>>> On Thu, Feb 21, 2019, 21:48 >>
>>>> Hi,
>>>> Based on oVirt 4.3.0, I have data domain from FC lun, then I create new
>>>> vm on the disk from FC data domain.
>>>> After VM was created. According to qemu-img info, the disk size is 0.
>>>> # qemu-img info
>>>> /rhev/data-center/mnt/blockSD/eaa6f641-6b36-4c1d-bf99-6ba77df3156f/images/8d3b455b-1da4-49f3-ba57-8

[ovirt-users] Re: [ovirt-devel] new easy way to open issues for ovirt.org and documentation

2019-02-17 Thread Nir Soffer

On Sun, Feb 17, 2019 at 9:32 PM Greg Sheremeta  wrote:

> Hi,
>
> I created some footer links on ovirt.org to make it super easy to open
> GitHub issues. We often get questions about where to report documentation
> issues and things like that -- hopefully this eases the pain.
>
> When you are visiting any page on ovirt.org and you see a problem or have
> an idea for enhancement, simply scroll down and click 'Report an issue on
> GitHub'.
>
> [image: Selection_393.png]
>

Cool!

But this may lead to reporting ovrit bugs in ovirt site.

Maybe:

"Report an issue with this page"

The "on Github" part is not important (same for "Edit this page").

Nir



>
> Full demo (2 minutes):
> https://www.youtube.com/watch?v=TbORKknLVL4
>
> (and, of course, if you're so inclined, the Edit link starts a PR for you
> ...)
>
> Best wishes,
> Greg
>
>
> --
>
> GREG SHEREMETA
>
> SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX
>
> Red Hat NA
>
> 
>
> gsher...@redhat.comIRC: gshereme
> 
> ___
> Devel mailing list -- de...@ovirt.org
> To unsubscribe send an email to devel-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/de...@ovirt.org/message/WC42SBSGYAW6GIL4A6BSTZKG5UA7LALN/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EJOFCNYZNKO3A4FT5HIG5EWMPSHVXXUQ/

[ovirt-users] SURVEY: your NFS configuration (Bug 1666795 - SHE doesn't start after power-off, 4.1 to 4.3 upgrade - VolumeDoesNotExist: Volume does not exist )

2019-02-12 Thread Nir Soffer

Looking at
https://bugzilla.redhat.com/1666795

It seems that a change in vdsm/libvirt exposed NFS configuration issue,
that may was
needed in the past and probably not needed now.

If you use NFS, I would like to see your /etc/exports (after sanitizing it
if needed).
For extra bonus, output of "exportfs -v" would be useful.

In particular, I want to know if you use root_squash, all_squash, or
no_root_squash.

Thanks,
Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7J3ZV25DP2X5TD6A4IV63W5PANKWERTO/

[ovirt-users] Re: performance issues with Windows 10

2019-02-06 Thread Nir Soffer

On Tue, Feb 5, 2019 at 3:58 PM Hetz Ben Hamo  wrote:

> Hi,
>
> I'm doing some very simple disk benchmarks on Windows 10, both with ESXI
> 6.7 and oVirt 4.3.0.
> Both Windows 10 Pro guests have all the driver installed.
> the "Storage" (Datastore in VMWare and storage domains in oVirt) comes
> from the same ZFS machine, both mounted as NFS *without* any parameters
> for NFS mount.
> The ESXI is running on HP DL360 G7 with E5620 CPU, while the oVirt node is
> running on IBM X3550 M3 with dual Xeon E5620. There are no memory issues as
> both machines have plenty of free memory and free CPU resources.
>
> Screenshots:
>
> - Windows 10 in vSphere 6.7 -  https://imgur.com/V75ep2n
> - Windows 10 in oVirt 4.3.0 - https://imgur.com/3JDrWLx
>
> As you can see, while oVirt lags a bit in 4K Read, the write performance
> is really bad.
>

382 MiB/s vs 54 MiBs? smells like someone is cheating :-)

Maybe VMWare is using buffered I/O, so you test writing to the host buffer
cache,
while oVirt is using cache=none, and actually writing to the remote storage.
But this is only a wild guess, we need much more details.

Lets start by getting more details about your setup, so we can reproduce it
in our lab.

- What is the network topology?
- Spec of the NFS server?
- Configuration of the VM on both oVirt and VMWare.
- How do you test? how much data is written during the test?
- How it compares to running the same tests on the hypervisor?

For oVirt, getting the VM XML would be helpful - try:

Find the vm id:

sudo virsh -r list

Dump the xml:

sudo virsh -r dumpxml N

For testing, you should probably use fio:
https://bluestop.org/fio/

Added people who can help to diagnose this or at least ask better questions.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PD63I247C7L7PAA5BXXE4UVG77RS2JJP/

[ovirt-users] Re: Unable to upload images

2019-02-19 Thread Nir Soffer

On Wed, Feb 20, 2019 at 1:01 AM  wrote:

> qemu-img info:
> LM-7.2.45.0.17004.RELEASE-Linux-KVM-XEN.disk
> image: LoadMaster-VLM-7.2.45.0.17004.RELEASE-Linux-KVM-XEN.disk
> file format: raw
> virtual size: 16G (17179869696 bytes)
> disk size: 16G
>

This is raw image, so it may work, but

ls -l:
> -rwxrwxrwx 1 michael michael 17179869185 Jan  7 16:43
> LoadMaster-VLM-7.2.45.0.17004.RELEASE-Linux-KVM-XEN.disk
>

The image size is invalid. raw image size must be aligned to 512 bytes,
this is why
we block such images in the UI.

Is it possible that the image was truncated?

According to qemu-img info, the size is 17179869696. qemu-img lie about the
size by rounding
up to the next multiple of 512.

I think this will fix your image:

truncate -s 17179869696
LoadMaster-VLM-7.2.45.0.17004.RELEASE-Linux-KVM-XEN.disk

After that uploading the image should work.

Downloading the image again and verifying the image checksum with the
vendor is probably
a good idea if you are not sure about the contents of this image.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/D3PMG77NT2I7DNX2PPPCZXO763VMXS4L/

[ovirt-users] Re: Unable to upload images

2019-02-19 Thread Nir Soffer

On Tue, Feb 19, 2019 at 11:29 PM Michael Blanchard 
wrote:

> I don't even get the that point.  I try to upload to he Kemp Roadmaster
> vm in .disk format and it says not in proper format and errors out.  No log
> entries anuwhere
>

What is "Kemp Roadmaster vm in .disk format"?

Can you run this on the machine with this file?

ls -l /path/to/disk

Also this can help to understand the issue:

qemu-img info /path/to/disk

Nir


>
> Get Outlook for Android <https://aka.ms/ghei36>
>
> --
> *From:* Shani Leviim 
> *Sent:* Tuesday, February 19, 2019 10:50:20 AM
> *To:* Michael Blanchard
> *Cc:* users; Nir Soffer; Daniel Erez
> *Subject:* Re: [ovirt-users] Unable to upload images
>
> Also, can you please share the output of 'ls -l' executed on the image
> you're trying to upload?
>
>
> *Regards, *
>
> *Shani Leviim *
>
>
> On Tue, Feb 19, 2019 at 3:52 PM Shani Leviim  wrote:
>
>> Hi,
>> Can you please share engine and UI logs?
>> Also, can you please attach a screenshot?
>>
>>
>> *Regards, *
>>
>> *Shani Leviim *
>>
>>
>> On Tue, Feb 19, 2019 at 1:54 AM  wrote:
>>
>>> I just updated my ovirt to the latest, and now I can't upload images
>>> that I used to be able to.  I can upload and see the nagio xi virtual
>>> appliance, but I can't upload .disk files anymore, I get a red error in GUI
>>> and it say image file not supported, but I used to be able to upload same
>>> file in previous version with no issue
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/EM3RUQQ6NJ5SXNFF6ZPFAHJKBIZT6UFG/
>>>
>> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they are
> addressed. If you have received this email in error please notify the
> system manager. This message contains confidential information and is
> intended only for the individual named. If you are not the named addressee
> you should not disseminate, distribute or copy this e-mail. Please notify
> the sender immediately by e-mail if you have received this e-mail by
> mistake and delete this e-mail from your system. If you are not the
> intended recipient you are notified that disclosing, copying, distributing
> or taking any action in reliance on the contents of this information is
> strictly prohibited.
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W6DZZXFJ53HX2ZDGDNXVYYEGEVRSVYFR/

[ovirt-users] Re: 4.3.0 rc2 cannot mount glusterfs volumes on ovirt node ng

2019-01-25 Thread Nir Soffer

On Fri, Jan 25, 2019 at 3:18 PM Jorick Astrego  wrote:

> Hi,
>
> We're having problems mounting the preexisting 3.12 glusterfs storage
> domains in ovirt node ng 4.3.0 rc2.
>
> Getting
>
> There are no iptables blocks on the storage network, the ip's are pingable
> bothe ways. I can telnet to the glusterfs ports and I see no messages in
> the logs of the glusterfs servers.
>
> When I try the mount command manually it hangs for ever:
>
> /usr/bin/mount -t glusterfs -o backup-volfile-servers=*.*.*.*:*.*.*.*
> *.*.*.*:/sdd8 /mnt/temp
>
> I haven't submitted a bug yet
>
> from supervdsm.log
>
> MainProcess|jsonrpc/2::DEBUG::2019-01-25
> 13:42:45,282::supervdsm_server::100::SuperVdsm.ServerCallback::(wrapper)
> call volumeInfo with (u'sdd8', u'*.*.*.*') {}
> MainProcess|jsonrpc/2::DEBUG::2019-01-25
> 13:42:45,282::commands::198::root::(execCmd) /usr/bin/taskset --cpu-list
> 0-63 /usr/sbin/gluster --mode=script volume info --remote-host=*.*.*.* sdd8
> --xml (cwd None)
> MainProcess|jsonrpc/2::DEBUG::2019-01-25
> 13:44:45,399::commands::219::root::(execCmd) FAILED:  = '';  = 1
> MainProcess|jsonrpc/2::DEBUG::2019-01-25
> 13:44:45,399::logutils::319::root::(_report_stats) ThreadedHandler is ok
> in the last 120 seconds (max pending: 2)
>

This looks like
https://bugzilla.redhat.com/show_bug.cgi?id=1666123#c18

We should see "ThreadedHandler is ok" every 60 seconds when using debug log
level.

Looks like your entire supervdsmd process was hang for 120 seconds.


> MainProcess|jsonrpc/2::ERROR::2019-01-25
> 13:44:45,399::supervdsm_server::104::SuperVdsm.ServerCallback::(wrapper)
> Error in volumeInfo
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line
> 102, in wrapper
> res = func(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/gluster/cli.py", line 529,
> in volumeInfo
> xmltree = _execGlusterXml(command)
>   File "/usr/lib/python2.7/site-packages/vdsm/gluster/cli.py", line 131,
> in _execGlusterXml
> return _getTree(rc, out, err)
>   File "/usr/lib/python2.7/site-packages/vdsm/gluster/cli.py", line 112,
> in _getTree
> raise ge.GlusterCmdExecFailedException(rc, out, err)
> GlusterCmdExecFailedException: Command execution failed
> error: E
> r
> r
> o
> r
>
> :
>
> R
> e
> q
> u
> e
> s
> t
>
> t
> i
> m
> e
> d
>
> o
> u
> t
>
Looks like side effect of
https://gerrit.ovirt.org/c/94784/

GlusterException assumes that it accept list of lines, but we started to
raise
strings. The class should be fixed to handle strings.

>
>
> return code: 1
> MainProcess|jsonrpc/2::DEBUG::2019-01-25
> 13:44:45,400::supervdsm_server::100::SuperVdsm.ServerCallback::(wrapper)
> call mount with ( 0x7f6eb8d0a2d0>, u'*.*.*.*:/sdd8',
> u'/rhev/data-center/mnt/glusterSD/*.*.*.*:_sdd8') {'vfstype': u'glusterfs',
> 'mntOpts': u'backup-volfile-servers=*.*.*.*:*.*.*.*', 'cgroup':
> 'vdsm-glusterfs'}
> MainProcess|jsonrpc/2::DEBUG::2019-01-25
> 13:44:45,400::commands::198::root::(execCmd) /usr/bin/taskset --cpu-list
> 0-63 /usr/bin/systemd-run --scope --slice=vdsm-glusterfs /usr/bin/mount -t
> glusterfs -o backup-volfile-servers=*.*.*.*:*.*.*.* *.*.*.*:/sdd8
> /rhev/data-center/mnt/glusterSD/*.*.*.*:_sdd8 (cwd None)
> MainProcess|jsonrpc/0::DEBUG::2019-01-25
> 13:45:02,884::commands::219::root::(execCmd) FAILED:  = 'Running scope
> as unit run-38676.scope.\nMount failed. Please check the log file for more
> details.\n';  = 1
> MainProcess|jsonrpc/0::ERROR::2019-01-25
> 13:45:02,884::supervdsm_server::104::SuperVdsm.ServerCallback::(wrapper)
> Error in mount
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line
> 102, in wrapper
> res = func(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line
> 144, in mount
> cgroup=cgroup)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 277,
> in _mount
> _runcmd(cmd)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 305,
> in _runcmd
> raise MountError(rc, b";".join((out, err)))
> MountError: (1, ';Running scope as unit run-38676.scope.\nMount failed.
> Please check the log file for more details.\n')
>

The mount failure is probably related to glusterfs. There are glusterfs
logs on the host that
can give more info on this error.

> MainProcess|jsonrpc/0::DEBUG::2019-01-25
> 13:45:02,894::supervdsm_server::100::SuperVdsm.ServerCallback::(wrapper)
> call volumeInfo with (u'ssd9', u'*.*.*.*') {}
> MainProcess|jsonrpc/0::DEBUG::2019-01-25
> 13:45:02,894::commands::198::root::(execCmd) /usr/bin/taskset --cpu-list
> 0-63 /usr/sbin/gluster --mode=script volume info --remote-host=*.*.*.* ssd9
> --xml (cwd None)
>
>
> from vdsm.log
>
> 2019-01-25 13:46:03,519+0100 WARN  (vdsm.Scheduler) [Executor] Worker
> blocked:  {u'connectionParams': [{u'mnt_options':
> u'backup-volfile-servers=*.*.*.*:*.*.*.*', u'id':
> u'6b6b7899-c82b-4417-b453-0b3b0ac11deb', u'connection':

[ovirt-users] Re: 4.3.0 rc2 cannot mount glusterfs volumes on ovirt node ng

2019-01-25 Thread Nir Soffer

On Fri, Jan 25, 2019 at 3:55 PM Nir Soffer  wrote:

> On Fri, Jan 25, 2019 at 3:18 PM Jorick Astrego  wrote:
>
>> Hi,
>>
>> We're having problems mounting the preexisting 3.12 glusterfs storage
>> domains in ovirt node ng 4.3.0 rc2.
>>
>> Getting
>>
>> There are no iptables blocks on the storage network, the ip's are
>> pingable bothe ways. I can telnet to the glusterfs ports and I see no
>> messages in the logs of the glusterfs servers.
>>
>> When I try the mount command manually it hangs for ever:
>>
>> /usr/bin/mount -t glusterfs -o backup-volfile-servers=*.*.*.*:*.*.*.*
>> *.*.*.*:/sdd8 /mnt/temp
>>
>> I haven't submitted a bug yet
>>
>> from supervdsm.log
>>
>> MainProcess|jsonrpc/2::DEBUG::2019-01-25
>> 13:42:45,282::supervdsm_server::100::SuperVdsm.ServerCallback::(wrapper)
>> call volumeInfo with (u'sdd8', u'*.*.*.*') {}
>> MainProcess|jsonrpc/2::DEBUG::2019-01-25
>> 13:42:45,282::commands::198::root::(execCmd) /usr/bin/taskset --cpu-list
>> 0-63 /usr/sbin/gluster --mode=script volume info --remote-host=*.*.*.* sdd8
>> --xml (cwd None)
>> MainProcess|jsonrpc/2::DEBUG::2019-01-25
>> 13:44:45,399::commands::219::root::(execCmd) FAILED:  = '';  = 1
>> MainProcess|jsonrpc/2::DEBUG::2019-01-25
>> 13:44:45,399::logutils::319::root::(_report_stats) ThreadedHandler is ok
>> in the last 120 seconds (max pending: 2)
>>
>
> This looks like
> https://bugzilla.redhat.com/show_bug.cgi?id=1666123#c18
>
> We should see "ThreadedHandler is ok" every 60 seconds when using debug
> log level.
>
> Looks like your entire supervdsmd process was hang for 120 seconds.
>
>
>> MainProcess|jsonrpc/2::ERROR::2019-01-25
>> 13:44:45,399::supervdsm_server::104::SuperVdsm.ServerCallback::(wrapper)
>> Error in volumeInfo
>> Traceback (most recent call last):
>>   File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line
>> 102, in wrapper
>> res = func(*args, **kwargs)
>>   File "/usr/lib/python2.7/site-packages/vdsm/gluster/cli.py", line 529,
>> in volumeInfo
>> xmltree = _execGlusterXml(command)
>>   File "/usr/lib/python2.7/site-packages/vdsm/gluster/cli.py", line 131,
>> in _execGlusterXml
>> return _getTree(rc, out, err)
>>   File "/usr/lib/python2.7/site-packages/vdsm/gluster/cli.py", line 112,
>> in _getTree
>> raise ge.GlusterCmdExecFailedException(rc, out, err)
>> GlusterCmdExecFailedException: Command execution failed
>> error: E
>> r
>> r
>> o
>> r
>>
>> :
>>
>> R
>> e
>> q
>> u
>> e
>> s
>> t
>>
>> t
>> i
>> m
>> e
>> d
>>
>> o
>> u
>> t
>>
> Looks like side effect of
> https://gerrit.ovirt.org/c/94784/
>
> GlusterException assumes that it accept list of lines, but we started to
> raise
> strings. The class should be fixed to handle strings.
>

Fixed in https://gerrit.ovirt.org/c/97316/

I think we need this in 4.2.8.
Denis, please check.

>
>>
>> return code: 1
>> MainProcess|jsonrpc/2::DEBUG::2019-01-25
>> 13:44:45,400::supervdsm_server::100::SuperVdsm.ServerCallback::(wrapper)
>> call mount with (> 0x7f6eb8d0a2d0>, u'*.*.*.*:/sdd8',
>> u'/rhev/data-center/mnt/glusterSD/*.*.*.*:_sdd8') {'vfstype': u'glusterfs',
>> 'mntOpts': u'backup-volfile-servers=*.*.*.*:*.*.*.*', 'cgroup':
>> 'vdsm-glusterfs'}
>> MainProcess|jsonrpc/2::DEBUG::2019-01-25
>> 13:44:45,400::commands::198::root::(execCmd) /usr/bin/taskset --cpu-list
>> 0-63 /usr/bin/systemd-run --scope --slice=vdsm-glusterfs /usr/bin/mount -t
>> glusterfs -o backup-volfile-servers=*.*.*.*:*.*.*.* *.*.*.*:/sdd8
>> /rhev/data-center/mnt/glusterSD/*.*.*.*:_sdd8 (cwd None)
>> MainProcess|jsonrpc/0::DEBUG::2019-01-25
>> 13:45:02,884::commands::219::root::(execCmd) FAILED:  = 'Running scope
>> as unit run-38676.scope.\nMount failed. Please check the log file for more
>> details.\n';  = 1
>> MainProcess|jsonrpc/0::ERROR::2019-01-25
>> 13:45:02,884::supervdsm_server::104::SuperVdsm.ServerCallback::(wrapper)
>> Error in mount
>> Traceback (most recent call last):
>>   File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line
>> 102, in wrapper
>> res = func(*args, **kwargs)
>>   File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line
>> 144, in mount
>> cgroup=cgroup)
>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line
>> 277,

[ovirt-users] Re: Mounting ISO in subfolder

2019-02-01 Thread Nir Soffer

On Fri, Feb 1, 2019 at 10:21 AM Giulio Casella  wrote:

> Il 31/01/2019 18:14, Sandro Bonazzola ha scritto:
> > As far as I can tell, there are no tools that creates subdirectories
> > within storage domains.
> > Did you manually upload the iso into the nfs mount creating a
> > subdirectory there?
> > I think this layout is not supported at all.
>
> Yes, I did (sorry :-)). My ISOs are growing, and I'd like to have a
> hierarchical structure.
> To say the truth it was only a test, I wasn't sure to see ISOs in
> subdir. But when I've seen them (correctly listed in admin portal as
> "foo/bar.iso"), I'd expect to be able mount them.
>
> I also filed a bug
> (https://bugzilla.redhat.com/show_bug.cgi?id=1671046), if the answer
> will be NOTABUG, I'll try with a RFE.
>

Vdsm list iso files in subdirectories, and should be able to use them when
starting vms. I don't know about engine side, but this may be a regression
in 4.2, which made major changes in the way vm are started.

You can try to create a 4.1 cluster and see if this works there. If you
don't
need any of the features added in 4.2 cluster version, using 4.1 cluster
version may be good enough.

But note that ISO domains are deprecated and will be removed in future
versions. Mostly likely 4.4 will not have them.

> Thanks,
> gc
>
>
> TL;DR
>
> The scenario I'm trying to implement is a DVD video store, provided by
> images in ISO domain, automatically mounted on VM on demand, via a
> backend python script. That's why in this case a hyerarchical structure
> would be much better than a flat one.
>

I would like to hear more about this use case.

Why do you need to start a vm connected to iso file in a DVD store?

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LGVQE3GOIOHYK3H67B3BDM3RXNCWC6UW/

[ovirt-users] Re: How can I set up a newly created virtual machine to use the qcow2 format disk by default?

2019-02-01 Thread Nir Soffer

On Fri, Feb 1, 2019 at 7:56 AM  wrote:

> How can I set up a newly created virtual machine to use the qcow2 format
> disk by default?
> My storage configuration is glusterfs non-ovirt managed
>

I don't think this is possible. The default format for file based storage
domain like
glusterfs and nfs is raw-sparse.

The only way to create a qcow2-sprase file today is to create the disk via
the REST
API which give you more control over the disk properties.

In 4.3 selecting "enable incremental backup" will always create qcow2
images, but
this is internal implementation detail that may change in the future.

Daniel, can we make the defaults configurable in engine.config?

Nir

> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/AIB5DOJD3IWUXVXUHVJOTZVKDIXEUA4B/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EQWJR2UPSAZBWPZK2ZAKGAV7MSM6PCT2/

[ovirt-users] Re: Sanlock volume corrupted on deployment

2019-01-31 Thread Nir Soffer

On Thu, Jan 31, 2019 at 2:52 PM Strahil Nikolov 
wrote:

> Dear Nir,
>
> the issue with the 'The method does not exist or is not available:
> {'method': u'GlusterHost.list'}, code = -32601' is not related to the
> sanlock. I don't know why the 'vdsm-gluster' package was not installed as a
> dependency.
>

Please file a bug about this.

> Can you share your sanlock log?
> >
> I'm attaching the contents of /var/log , but here is a short snippet:
>
> About the sanlock issue - it reappeared with errors like :
> 2019-01-31 13:33:10 27551 [17279]: leader1 delta_acquire_begin error -223
> lockspace hosted-engine host_id 1
>

As I said, the error is not -233, but -223, which make sense - this error
means sanlock did not
find the magic number for a delta lease area, which means the area was not
formatted, or
corrupted.

> 2019-01-31 13:33:10 27551 [17279]: leader2 path
> /var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74/2c74697a-8bd9-4472-8a98-bf624f3462d5/411b6cee-5b01-47ca-8c28-bb1fed8ac83b
> offset 0
> 2019-01-31 13:33:10 27551 [17279]: leader3 m 0 v 30003 ss 512 nh 0 mh 1 oi
> 0 og 0 lv 0
> 2019-01-31 13:33:10 27551 [17279]: leader4 sn hosted-engine rn  ts 0 cs
> 60346c59
> 2019-01-31 13:33:11 27551 [21482]: s6 add_lockspace fail result -223
> 2019-01-31 13:33:16 27556 [21482]: s7 lockspace
> hosted-engine:1:/var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74/2c74697a-8bd9-4472-8a98-bf624f3462d5/411b6cee-5b01-47ca-8c28-bb1fe
> d8ac83b:0
>
>
> I have managed to fix it by running the following immediately after the ha
> services were started by ansible:
>
> cd
> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain\:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/ha_agent/
>

This is not a path managed by vdsm, so I guess the issue is with hosted
enigne
specific lockspace that is managed by hosted engine, not by vdsm.

> sanlock direct init -s hosted-engine:0:hosted-engine.lockspace:0
>

This formats the lockspace, and is expected to fix this issue.

> systemctl stop ovirt-ha-agent ovirt-ha-broker
> systemctl status vdsmd
> systemctl start ovirt-ha-broker ovirt-ha-agent
>
> Once the VM started - ansible managed to finish the deployment without any
> issues.
> I hope someone can check the sanlock init stuff , as it is really
> frustrating.
>

If I understand the flow correctly, you create a new environment from
scratch, so this is
an issue with hosted engine deploymnet, not initializing the lockspace.

I think filing a bug with the info in this thread is the first step.

Simone, can you take a look at this?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JWFLLOQS7AWN6P4XZS3HC4PTUWU2G5SP/

[ovirt-users] Re: Sanlock volume corrupted on deployment

2019-01-31 Thread Nir Soffer

On Tue, Jan 29, 2019 at 2:00 PM Strahil  wrote:

> Dear Nir,
>
> According to redhat solution 1179163 'add_lockspace fail result -233'
> indicates corrupted ids lockspace.
>

Good work finding the solution!

Note that the page mention error -223, not -233:

2014-08-27 14:26:42+ 2244 [14497]: s30 add_lockspace fail result
-223  #<-- corrupted ids lockspace



>
> During the install, the VM fails to get up.
> In order to fix it, I stop:
> ovirt-ha-agent, ovirt-ha-broker, vdsmd, supervdsmd, sanlock
> Then reinitialize the lockspace via 'sanlock direct init -s' (used
> bugreport 1116469 as guidance).
> Once the init is successful and all the services are up - the VM is
> started but the deployment was long over and the setup needs additional
> cleaning up.
>
> I will rebuild the gluster cluster and then will repeat the deployment.
>
> Can you guide me what information will be needed , as I'm quite new in
> ovirt/RHV ?
>
> Best Regards,
> Strahil Nikolov
>
> On Jan 28, 2019 20:34, Nir Soffer  wrote:
>
> On Sat, Jan 26, 2019 at 6:13 PM Strahil  wrote:
>
> Hey guys,
>
> I have noticed that with 4.2.8 the sanlock issue (during deployment) is
> still not fixed.
> Am I the only one with bad luck or there is something broken there ?
>
> The sanlock service reports code 's7 add_lockspace fail result -233'
> 'leader1 delta_acquire_begin error -233 lockspace hosted-engine host_id
> 1'.
>
>
> Sanlock does not have such error code - are you sure this is -233?
>
> Here sanlock return values:
> https://pagure.io/sanlock/blob/master/f/src/sanlock_rv.h
>
> Can you share your sanlock log?
>
>
>
>
> Best Regards,
> Strahil Nikolov
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/SZMF5KKHSXOUTLGX3LR2NBN7E6QGS6G3/
>
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMAWDZO5UO2HAGMHXT7AKGEKKXTIJS5S/

[ovirt-users] Re: Live storage migration is failing in 4.2.8

2019-04-12 Thread Nir Soffer

On Fri, Apr 12, 2019, 12:07 Ladislav Humenik 
wrote:

> Hello, we have recently updated few ovirts from 4.2.5 to 4.2.8 version
> (actually 9 ovirt engine nodes), where the live storage migration
> stopped to work, and leave auto-generated snapshot behind.
>
> If we power the guest VM down, the migration works as expected. Is there
> a known bug for this? Shall we open a new one?
>
> Setup:
> ovirt - Dell PowerEdge R630
>  - CentOS Linux release 7.6.1810 (Core)
>  - ovirt-engine-4.2.8.2-1.el7.noarch
>  - kernel-3.10.0-957.10.1.el7.x86_64
> hypervisors- Dell PowerEdge R640
>  - CentOS Linux release 7.6.1810 (Core)
>  - kernel-3.10.0-957.10.1.el7.x86_64
>  - vdsm-4.20.46-1.el7.x86_64
>  - libvirt-5.0.0-1.el7.x86_64
>

This is known issue in libvirt < 5.2.

How did you get this version on CentOS 7.6?

On my CentOS 7.6 I have libvirt 4.5, which is not affected by this issue.

Nir

 - qemu-kvm-ev-2.12.0-18.el7_6.3.1.x86_64
> storage domain  - netapp NFS share
>
>
> logs are attached
>
> --
> Ladislav Humenik
>
> System administrator
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/VSKUEPUOPJDSRWYYMZEKAVTZ62YP6UK2/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3B3TLAJ7QPC6LLPBZYRD7WXUJZXQE5P6/

[ovirt-users] Re: Unable to start vdsm, upgrade 4.0 to 4.1

2019-04-13 Thread Nir Soffer

On Sat, Apr 13, 2019 at 7:02 AM Todd Barton 
wrote:

> Looking for some help/suggestions to correct an issue I'm having.  I have
> a 3 host HA setup running a hosted-engine and gluster storage.  The hosts
> are identical hardware configurations and have been running for several
> years very solidly.  I was performing an upgrade to 4.1.  1st host when
> fine.  The second upgrade didn't go well...On server reboot, it went into
> kernel panic and I had to load previous kernel to diagnose.
>
> I couldn't get it out of panic and I had to revert the system to the
> previous kernel which was a big PITA. I updated it to current and verified
> installation of ovirt/vdsm.  Everything seemed to be ok, but vdsm won't
> start. Gluster is working fine.  It appears I have a authentication issue
> with libvirt.  I'm getting the message "libvirt: XML-RPC error :
> authentication failed: authentication failed" which seems to be the core
> issue.
>
> I've looked at all the past issues/resolutions to this issue and tried
> them, but I can't get it to work.  For example, I do a vdsm-tool configure
> --force and I get this...
>
> Checking configuration status...
>
> abrt is already configured for vdsm
> lvm is configured for vdsm
> libvirt is already configured for vdsm
> SUCCESS: ssl configured to true. No conflicts
> Current revision of multipath.conf detected, preserving
>
> Running configure...
> Reconfiguration of abrt is done.
> Traceback (most recent call last):
>   File "/usr/bin/vdsm-tool", line 219, in main
> return tool_command[cmd]["command"](*args)
>   File "/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py", line 38,
> in wrapper
> func(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line
> 141, in configure
> _configure(c)
>   File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line
> 88, in _configure
> getattr(module, 'configure', lambda: None)()
>   File
> "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/passwd.py", line
> 68, in configure
> configure_passwd()
>   File
> "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/passwd.py", line
> 98, in configure_passwd
> raise RuntimeError("Set password failed: %s" % (err,))
> RuntimeError: Set password failed: ['saslpasswd2: invalid parameter
> supplied']
>

I think libvirt changed the way sasl is configured about a year ago, maybe
it broke
4.1. I don't think anyone should use 4.1 at this point.

Maybe removing vdsm on the host, and installing 4.2 or 4.3 will avoid this
issue.

Nir


>
> ...and help would be greatly appreciated.  I'm not a linux/ovirt expert by
> any means, but I desperately need to get this setup back to being stable.
> This happened many months ago and I gave up fixing, but I really need to
> get this back online again.
>
> Thank you
>
> *Todd Barton*
>
>
>
>
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/354QKTKZTQGJIXYM5Q4RDOFKLZK5ORBE/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Q2QE7XDFV7ICCUVZ6BX6T3XLT36YEMKH/

[ovirt-users] Re: [ovirt-announce] Re: [ANN] oVirt 4.3.4 First Release Candidate is now available

2019-05-16 Thread Nir Soffer

On Thu, May 16, 2019 at 10:12 PM Darrell Budic 
wrote:

> On May 16, 2019, at 1:41 PM, Nir Soffer  wrote:
>
>
> On Thu, May 16, 2019 at 8:38 PM Darrell Budic 
> wrote:
>
>> I tried adding a new storage domain on my hyper converged test cluster
>> running Ovirt 4.3.3.7 and gluster 6.1. I was able to create the new gluster
>> volume fine, but it’s not able to add the gluster storage domain (as either
>> a managed gluster volume or directly entering values). The created gluster
>> volume mounts and looks fine from the CLI. Errors in VDSM log:
>>
>> ...
>
>> 2019-05-16 10:25:09,584-0500 ERROR (jsonrpc/5) [storage.fileSD] Underlying
>> file system doesn't supportdirect IO (fileSD:110)
>> 2019-05-16 10:25:09,584-0500 INFO  (jsonrpc/5) [vdsm.api] FINISH
>> createStorageDomain error=Storage Domain target is unsupported: ()
>> from=:::10.100.90.5,44732, flow_id=31d993dd,
>> task_id=ecea28f3-60d4-476d-9ba8-b753b7c9940d (api:52)
>>
>
> The direct I/O check has failed.
>
>
> So something is wrong in the files system.
>
> To confirm, you can try to do:
>
> dd if=/dev/zero of=/path/to/mountoint/test bs=4096 count=1 oflag=direct
>
> This will probably fail with:
> dd: failed to open '/path/to/mountoint/test': Invalid argument
>
> If it succeeds, but oVirt fail to connect to this domain, file a bug and
> we will investigate.
>
> Nir
>
>
> Yep, it fails as expected. Just to check, it is working on pre-existing
> volumes, so I poked around at gluster settings for the new volume. It has
> network.remote-dio=off set on the new volume, but enabled on old volumes.
> After enabling it, I’m able to run the dd test:
>
> [root@boneyard mnt]# gluster vol set test network.remote-dio enable
> volume set: success
> [root@boneyard mnt]# dd if=/dev/zero of=testfile bs=4096 count=1
> oflag=direct
> 1+0 records in
> 1+0 records out
> 4096 bytes (4.1 kB) copied, 0.0018285 s, 2.2 MB/s
>
> I’m also able to add the storage domain in ovirt now.
>
> I see network.remote-dio=enable is part of the gluster virt group, so
> apparently it’s not getting set by ovirt duding the volume creation/optimze
> for storage?
>

I'm not sure who is responsible for changing these settings. oVirt always
required directio, and we
never had to change anything in gluster.

Sahina, maybe gluster changed the defaults?

Darrell, please file a bug, probably for RHHI.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XRE4XE5WJECVMCUFTS4Y2ADKGWQWJ5CE/

[ovirt-users] Re: [ovirt-announce] Re: [ANN] oVirt 4.3.4 First Release Candidate is now available

2019-05-16 Thread Nir Soffer

On Thu, May 16, 2019 at 10:02 PM Strahil  wrote:

> This is my previous e-mail:
>
> On May 16, 2019 15:23, Strahil Nikolov  wrote:
>
> It seems that the issue is within the 'dd' command as it stays waiting for
> input:
>
> [root@ovirt1 mnt]# /usr/bin/dd iflag=fullblock  of=file
> oflag=direct,seek_bytes seek=1048576 bs=256512 count=1
> conv=notrunc,nocreat,fsync  ^C0+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 19.3282 s, 0.0 kB/s
>
> Changing the dd command works and shows that the gluster is working:
>
> [root@ovirt1 mnt]# cat /dev/urandom |  /usr/bin/dd  of=file
> oflag=direct,seek_bytes seek=1048576 bs=256512 count=1
> conv=notrunc,nocreat,fsync  0+1 records in
> 0+1 records out
> 131072 bytes (131 kB) copied, 0.00705081 s, 18.6 MB/s
>
> Best Regards,
>
> Strahil Nikolov
>
> - Препратено съобщение -
>
> *От:* Strahil Nikolov 
>
> *До:* Users 
>
> *Изпратено:* четвъртък, 16 май 2019 г., 5:56:44 ч. Гринуич-4
>
> *Тема:* ovirt 4.3.3.7 cannot create a gluster storage domain
>
> Hey guys,
>
> I have recently updated (yesterday) my platform to latest available (v
> 4.3.3.7) and upgraded to gluster v6.1 .The setup is hyperconverged 3 node
> cluster with ovirt1/gluster1 & ovirt2/gluster2 as replica nodes (glusterX
> is for gluster communication) while ovirt3 is the arbiter.
>
> Today I have tried to add new domain storages but they fail with the
> following:
>
> 2019-05-16 10:15:21,296+0300 INFO  (jsonrpc/2) [vdsm.api] FINISH
> createStorageDomain error=Command ['/usr/bin/dd', 'iflag=fullblock',
> u'of=/rhev/data-center/mnt/glusterSD/gluster1:_data__fast2/591d9b61-5c7d-4388-a6b7-ab03181dff8a/dom_md/xleases',
> 'oflag=direct,seek_bytes', 'seek=1048576', 'bs=256512', 'count=1',
> 'conv=notrunc,nocreat,fsync'] failed with rc=1 out='[suppressed]'
> err="/usr/bin/dd: error writing
> '/rhev/data-center/mnt/glusterSD/gluster1:_data__fast2/591d9b61-5c7d-4388-a6b7-ab03181dff8a/dom_md/xleases':
> Invalid argument\n1+0 records in\n0+0 records out\n0 bytes (0 B) copied,
> 0.0138582 s, 0.0 kB/s\n" from=:::192.168.1.2,43864, flow_id=4a54578a,
> task_id=d2535d0f-c7f7-4f31-a10f-704923ce1790 (api:52)
>
>
This may be another issue. This command works only for storage with 512
bytes sector size.

Hyperconverge systems may use VDO, and it must be configured in
compatibility mode to support
512 bytes sector size.

I'm not sure how this is configured but Sahina should know.

Nir

> 2019-05-16 10:15:21,296+0300 ERROR (jsonrpc/2) [storage.TaskManager.Task]
> (Task='d2535d0f-c7f7-4f31-a10f-704923ce1790') Unexpected error (task:875)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
> in _run
> return fn(*args, **kargs)
>   File "", line 2, in createStorageDomain
>   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in
> method
> ret = func(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2614,
> in createStorageDomain
> storageType, domVersion, block_size, alignment)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py
> <http://nfssd.py/>", line 106, in create
> block_size)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py
> <http://filesd.py/>", line 466, in _prepareMetadata
> cls.format_external_leases(sdUUID, xleases_path)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 1255,
> in format_external_leases
> xlease.format_index(lockspace, backend)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/xlease.py", line
> 681, in format_index
> index.dump(file)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/xlease.py", line
> 843, in dump
> file.pwrite(INDEX_BASE, self._buf)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/xlease.py", line
> 1076, in pwr
>
>
> It seems that the 'dd' is having trouble checking the new gluster volume.
> The output is from the RC1 , but as you see Darell's situation is maybe
> the same.
> On May 16, 2019 21:41, Nir Soffer  wrote:
>
> On Thu, May 16, 2019 at 8:38 PM Darrell Budic 
> wrote:
>
> I tried adding a new storage domain on my hyper converged test cluster
> running Ovirt 4.3.3.7 and gluster 6.1. I was able to create the new
> gluster volume fine, but it’s not able to add the gluster storage domain
> (as either a managed gluster volume or directly entering values). The
> created gluster volume mounts and looks fine from the CLI. Errors in VDSM
> log:
>
> ...
>
> 2019-05-16 10:25:09,584-0500 ERROR (jsonrpc/5) [storage

[ovirt-users] Re: ovirt 4.3.3.7 cannot create a gluster storage domain

2019-05-17 Thread Nir Soffer

On Fri, May 17, 2019 at 2:47 PM Andreas Elvers <
andreas.elvers+ovirtfo...@solutions.work> wrote:

> Yeah. But I think this ist just an artefact of the current version. All
> images are in sync.
>  dom_md/ids is an obsolete file anyway as the docs say.
>

This page was correct about 10 years ago, the ids file is used for sanlock
delta leases, which are
the core infrastructure of oVirt. Without this file, you will not have any
kind of storage.

Please use RHV documentation:
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/

And the source:
https://github.com/ovirt

Anything else is not reliable source for information.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SZW46B6YFSUWG5CIJYHWLFMMA6N7HK6R/

[ovirt-users] Re: ovirt 4.3.3.7 cannot create a gluster storage domain

2019-05-17 Thread Nir Soffer

On Fri, May 17, 2019 at 6:13 PM Nir Soffer  wrote:

> On Fri, May 17, 2019 at 2:47 PM Andreas Elvers <
> andreas.elvers+ovirtfo...@solutions.work> wrote:
>
>> Yeah. But I think this ist just an artefact of the current version. All
>> images are in sync.
>>  dom_md/ids is an obsolete file anyway as the docs say.
>>
>
> This page was correct about 10 years ago, the ids file is used for sanlock
> delta leases, which are
> the core infrastructure of oVirt. Without this file, you will not have any
> kind of storage.
>

Should be fixed in:
https://github.com/oVirt/ovirt-site/pull/1994


>
> Please use RHV documentation:
> https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/
>
> And the source:
> https://github.com/ovirt
>
> Anything else is not reliable source for information.
>
> Nir
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TD7QYHMTT6JO5EMFMCRKWY4NO2EM632N/

[ovirt-users] Re: deprecating export domain?

2019-05-17 Thread Nir Soffer

On Wed, May 15, 2019 at 3:52 PM Andreas Elvers <
andreas.elvers+ovirtfo...@solutions.work> wrote:

> Maybe I overlooked the information, but in recent RHVE 4.3 docs the
> information how to use the export storage domain have been removed and
> there is no alternative to do so, but to detach a data domain and attach it
> somewhere else. But how can I move my VMs one by one to a new storage
> domain on a different datacenter without completely detaching the original
> storage domain?
>

It not clear what do you mean by different datacenter.

The best way to decommission a storage domain is to move the disks to
another domain in the same DC.
You can do this while the VM is running, without any downtime. When you are
done, you can detach and
remove the old storage domain.

If you want to move the VM to a different storage domain on another oVirt
DC, move the domain to the same
DC until you finish the migration, and then move the domain back to another
DC and import the VM. If you
want to use the same domain for exporting and importing, you will need to
move the VM to another domain
on the target DC.

If you want to move the VM to another oVirt setup, you can attach a
temporary storage domain, move the disk
to that storage domain, detach the domain, attach it to the other setup,
and import the VM.

If you can replicate the storage using your storage server (e.g, take a
snapshot of a LUN), you can
attach the new LUN to the new setup and import the VMs.   (this is how
oVirt DR works)

If you don't have shared storage between the two setups, maybe different
physical datacenters, you can:
- export OVA, and import it on the other setup
- download the vm disks, upload them to the other setup and recreated the vm

To minimize downtime while importing and exporting a VM using attach/detach
storage domain:

On the source setup:
1. Attach the temporary storage domain used for moving vms
2. While the VM is running, move the disks to the temporary storage domain
3. Stop the VM
4. Detach the temporary storage domain

On the destination setup:
5. Attach the temporary storage domain to other setup
6. Import the VM
7. Start the VM
8. While the VM is running, move the disks to the target storage domain

Steps 3-7 should take only 2-3 minutes, and do no data operations.
Exporting and importing big VMs
using export domain can take many minutes or hours.

This can be automated using oVirt REST API, SDK, or Ansible.

I don't want to bring down all of my VMs on the old storage domain for
> import. I want to export and import them one by one. When all VMs are moved
> to the new data center only then I want to decommission
> the old data center.

> What is the rationale to deprecate the export storage and already remove
> documentation when there seems to be no alternative available?
>

There are many alternatives for several versions, listed above. The main
alternative is attach/detach
storage domain.

This method has many advantages like:

- If you can keep the VMs on the same storage, requires no data copies,
minimizing total time to
  move the VMs around.
- If you cannot keep the VMs on same storage, requires up to 2 data copies,
like export domain
- But unlike export domain, you can do the copies in the background while
the VM is running
  (see my example above).
- Does not require NFS storage on your high end iSCSI/FC setup
- Since you can use block storage, more reliable and perform better due of
multipath
- Since we use regular data domain, easier to maintain and less likely to
break
- Works with any recent storage format (V3, V4, V5), while export domain
requires V1. Assuming that
  all future version of a product will support all storage formats was
never a good idea.

We are playing with a new way to move VM with minimal downtime, using a the
concept of "external disk".
With this you will be able to run a tool that will shutdown the VM on one
setup, and start it in seconds
on the other setup. While the VM is running, it will migrate the disks from
the old storage to new storage.
This method does not require shared storage to be available to both setups,
only that we can expose
the source disks over the network, for example using NBD.

There is a proof of concept here:
https://gerrit.ovirt.org/c/98926/

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GFOK55O5N4SRU5PA32P3LATW74E7WKT6/

[ovirt-users] Re: oVirt on RHEL 8.

2019-06-05 Thread Nir Soffer

On Wed, Jun 5, 2019 at 6:34 PM Gianluca Cecchi 
wrote:

> On Wed, Jun 5, 2019 at 5:26 PM Nir Soffer  wrote:
>
>> On Wed, Jun 5, 2019 at 3:54 PM  wrote:
>>
>>> Hello
>>> Did anyone managed to install oVirt on RHEL8?
>>> If so, Can you please guide me?
>>> I'm trying to install but it throws an error as "Failed to synchronize
>>> repo for ovirt-*".
>>> Please help me.
>>>
>>
>> RHEL 8 system python is python 3.6, and oVirt is not ready yet for python
>> 3.
>>
>> We hope to complete python 3 support in oVirt 4.4.
>>
>> Nir
>>
>
> Actually in RHEL 8 you have no default python and you can choose between
> 2.7 or 3.6
>
> https://developers.redhat.com/blog/2019/05/07/what-no-python-in-red-hat-enterprise-linux-8/
>

But this is short term support, unlike the system python (python 3.6) which
will be supported
for many years.

But probably it makes sense to combine shift to RHEL 8 in oVirt with python
> 3 for longer term support
>

It may be possible to make oVirt work on RHEL 8 using python 2.7, but we
are working on
the long term solution.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HLWD7V7DOFTJDSPLUMM7JHTZW47NRNXC/

[ovirt-users] Re: oVirt on RHEL 8.

2019-06-05 Thread Nir Soffer

On Wed, Jun 5, 2019 at 3:54 PM  wrote:

> Hello
> Did anyone managed to install oVirt on RHEL8?
> If so, Can you please guide me?
> I'm trying to install but it throws an error as "Failed to synchronize
> repo for ovirt-*".
> Please help me.
>

RHEL 8 system python is python 3.6, and oVirt is not ready yet for python 3.

We hope to complete python 3 support in oVirt 4.4.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/O3YOLO3O5RTPK7YMUHVLKKQYEKHISYS7/

[ovirt-users] Re: ovirt-imagio-proxy upload speed slow

2019-06-05 Thread Nir Soffer

On Wed, Apr 24, 2019 at 12:13 AM Dev Ops  wrote:

> I am working on integrating a backup solution for our ovirt environment
> and having issues with the time it takes to backup the VM's. This backup
> solution is simply taking a snapshot and making a clone and backing the
> clone up to a backup server.
>
> A VM that is 100 gig takes 52 minutes to back up. The same VM doing a file
> backup using the same product, and bypassing their rhv plugin, takes 14
> minutes. So the throughput is there but the ovirt imageio-proxy process
> seems to be what manages how images are uploaded and is officially my
> bottle neck. Load is not high on the engine or kvm hosts.

I had bumped up the Upload image size from 100MB to 10gig weeks ago and
> that didn't seem to help.
>
> [root@blah-lab-engine ~]# engine-config -a |grep Upload
> UploadImageChunkSizeKB: 1024 version: general
>

This will not help, 100 MiB should be big enough.

[root@bgl-vms-engine ~]# rpm -qa |grep ovirt-image
> ovirt-imageio-proxy-1.4.6-1.el7.noarch
> ovirt-imageio-common-1.4.6-1.el7.x86_64
> ovirt-imageio-proxy-setup-1.4.6-1.el7.noarch
>
> I have seen bugs reported to redhat about this but I am running above the
> affected releases.
>
> engine software is 4.2.8.2-1.el7
>
> Any idea what we can tweak to open up this bottleneck?
>

The proxy is not meant to be used for backup and restore. It was created to
allow easy upload or
download from the UI.

For backup application you should use upload and download directly from the
host.

When you create an image transfer, you should use the image transfer
"transfer_url" instead of the "proxy_url".

When you upload or download directly from the host, you can take advantage
of keep alive connections
and fast zero support.

If your upload or download program is running on the host, you can take
advantage of unix socket support
for even faster transfers.

For upload, you should use ovirt_imageio_common.client module, using all
available features:
https://github.com/oVirt/ovirt-imageio/blob/master/examples/upload

The client does not provide a easy to use helper for download yet. A future
version will provide
download helper.

Please see the documentation:
http://ovirt.github.io/ovirt-imageio/random-io.html
http://ovirt.github.io/ovirt-imageio/unix-socket.html

And SDK examples:
https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/upload_disk.py
https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/download_disk.py

If direct download is still too slow, please file a bug and provide logs
imageio daemon logs.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TF5ULO52EDV3J5RLISL64HIBVKAH2OIS/

[ovirt-users] Re: 4.3 live migration creates wrong image permissions.

2019-06-13 Thread Nir Soffer

On Thu, Jun 13, 2019, 12:19 Alex McWhirter  wrote:

> after upgrading from 4.2 to 4.3, after a vm live migrates it's disk
> images are become owned by root:root. Live migration succeeds and the vm
> stays up, but after shutting down the VM from this point, starting it up
> again will cause it to fail. At this point i have to go in and change
> the permissions back to vdsm:kvm on the images, and the VM will boot
> again.
>

This is a known issue with early 4.3 release, please upgrade to latest 4.3.

Nir

___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TSWRTC2E7XZSGSLA7NC5YGP7BIWQKMM3/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JTO5KUZC5J6TNSB4EJWSPEN4D5BF45GZ/

[ovirt-users] Re: 4.3 live migration creates wrong image permissions.

2019-06-14 Thread Nir Soffer

On Fri, Jun 14, 2019 at 7:05 PM Milan Zamazal  wrote:

> Alex McWhirter  writes:
>
> > In this case, i should be able to edit /etc/libvirtd/qemu.conf on all
> > the nodes to disable dynamic ownership as a temporary measure until
> > this is patched for libgfapi?
>
> No, other devices might have permission problems in such a case.
>

I wonder how libvirt can change the permissions for devices it does not
know about?

When using libgfapi, we pass libivrt:








So libvirt does not have the path to the file, and it cannot change the
permissions.

Alex, can you reproduce this flow and attach vdsm and engine logs from all
hosts
to the bug?

Nir

> On 2019-06-13 10:37, Milan Zamazal wrote:
> >> Shani Leviim  writes:
> >>
> >>> Hi,
> >>> It seems that you hit this bug:
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1666795
> >>>
> >>> Adding +Milan Zamazal , Can you please confirm?
> >>
> >> There may still be problems when using GlusterFS with libgfapi:
> >> https://bugzilla.redhat.com/1719789.
> >>
> >> What's your Vdsm version and which kind of storage do you use?
> >>
> >>> *Regards,*
> >>>
> >>> *Shani Leviim*
> >>>
> >>>
> >>> On Thu, Jun 13, 2019 at 12:18 PM Alex McWhirter 
> >>> wrote:
> >>>
>  after upgrading from 4.2 to 4.3, after a vm live migrates it's disk
>  images are become owned by root:root. Live migration succeeds and
>  the vm
>  stays up, but after shutting down the VM from this point, starting
>  it up
>  again will cause it to fail. At this point i have to go in and change
>  the permissions back to vdsm:kvm on the images, and the VM will boot
>  again.
>  ___
>  Users mailing list -- users@ovirt.org
>  To unsubscribe send an email to users-le...@ovirt.org
>  Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>  oVirt Code of Conduct:
>  https://www.ovirt.org/community/about/community-guidelines/
>  List Archives:
> 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TSWRTC2E7XZSGSLA7NC5YGP7BIWQKMM3/
> 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/36Z6BB5NGYEEFMPRTDYKFJVVBPZFUCBL/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AKENVDWK3VE3COWVPWWYVJBQC2CIEAAY/

[ovirt-users] Re: Failed to activate Storage Domain --- ovirt 4.2

2019-06-10 Thread Nir Soffer

On Fri, Jun 7, 2019 at 5:03 PM  wrote:

> Hi
> Has anyone experiencing the following issue with Storage Domain -
>
> Failed to activate Storage Domain cLUN-R940-DC2-dstore01 --
> VDSM command ActivateStorageDomainVDS failed: Storage domain does not
> exist: (u'1b0ef853-fd71-45ea-8165-cc6047a267bc',)
>
> Currently, the storge Domain is Inactive and strangely, the VMs are
> running as normal. We can't manage or extend the volume size of this
> storage domain. The pvscan shows as:
> [root@uk1-ion-ovm-18  pvscan
>   /dev/mapper/36000d31005697814: Checksum error at offset
> 4397954425856
>   Couldn't read volume group metadata from
> /dev/mapper/36000d31005697814.
>   Metadata location on /dev/mapper/36000d31005697814 at
> 4397954425856 has invalid summary for VG.
>   Failed to read metadata summary from
> /dev/mapper/36000d31005697814
>   Failed to scan VG from /dev/mapper/36000d31005697814
>

This looks like corrupted vg metadata.

> I have tired the following steps:
> 1. Restarted ovirt-engine.service
> 2. tried to restore the metadata using vgcfgrestore but it failed with the
> following error:
>
> [root@uk1-ion-ovm-19 backup]# vgcfgrestore
> 36000d31005697814
>   Volume group 36000d31005697814 has active volume: .
>   WARNING: Found 1 active volume(s) in volume group
> "36000d31005697814".
>   Restoring VG with active LVs, may cause mismatch with its metadata.
> Do you really want to proceed with restore of volume group
> "36000d31005697814", while 1 volume(s) are active? [y/n]: y
>

This is not safe, you cannot fix the VG while it is being used by oVirt.

You need to migrate the running VMs to other storage, or shut down the VMs.
Then
deactivate this storage domain. Only then you can try to restore the VG.

  /dev/mapper/36000d31005697814: Checksum error at offset
> 4397954425856
>   Couldn't read volume group metadata from
> /dev/mapper/36000d31005697814.
>   Metadata location on /dev/mapper/36000d31005697814 at
> 4397954425856 has invalid summary for VG.
>   Failed to read metadata summary from
> /dev/mapper/36000d31005697814
>   Failed to scan VG from /dev/mapper/36000d31005697814
>   /etc/lvm/backup/36000d31005697814: stat failed: No such
> file or directory
>

Looks like you don't have a backup in this host. You may have the most
recent backup
on another host.


>   Couldn't read volume group metadata from file.
>   Failed to read VG 36000d31005697814 from
> /etc/lvm/backup/36000d31005697814
>   Restore failed.
>
> Please let me know if anyone knows any possible resolution.
>

David, we keep 2 metadata copies on the first PV. Can we use one of the
copies on the PV
to restore the metadata to the least good state?

David, how do you suggest to proceed?

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KA4TVUE775MMCQVD3YF7GSUZGEEOCQCF/

[ovirt-users] Re: Failed to activate Storage Domain --- ovirt 4.2

2019-06-10 Thread Nir Soffer

On Mon, Jun 10, 2019 at 11:22 PM David Teigland  wrote:

> On Mon, Jun 10, 2019 at 10:59:43PM +0300, Nir Soffer wrote:
> > > [root@uk1-ion-ovm-18  pvscan
> > >   /dev/mapper/36000d31005697814: Checksum error at
> offset
> > > 4397954425856
> > >   Couldn't read volume group metadata from
> > > /dev/mapper/36000d31005697814.
> > >   Metadata location on /dev/mapper/36000d31005697814 at
> > > 4397954425856 has invalid summary for VG.
> > >   Failed to read metadata summary from
> > > /dev/mapper/36000d31005697814
> > >   Failed to scan VG from /dev/mapper/36000d31005697814
> >
> > This looks like corrupted vg metadata.
>
> Yes, the second metadata area, at the end of the device is corrupted; the
> first metadata area is probably ok.  That version of lvm is not able to
> continue by just using the one good copy.


Can we copy the first metadata area into the second metadata area?

Last week I pushed out major changes to LVM upstream to be able to handle
> and repair most of these cases.  So, one option is to build lvm from the
> upstream master branch, and check if that can read and repair this
> metadata.
>

This sound pretty risky for production.

> David, we keep 2 metadata copies on the first PV. Can we use one of the
> > copies on the PV to restore the metadata to the least good state?
>
> pvcreate with --restorefile and --uuid, and with the right backup metadata
>

What would be the right backup metadata?


> could probably correct things, but experiment with some temporary PVs
> first.
>

Aminur, can you copy and compress the metadata areas, and shared them
somewhere?

To copy the first metadata area, use:

dd if=/dev/mapper/360014058ccaab4857eb40f393aaf0351 of=md1 bs=128M count=1
skip=4096 iflag=skip_bytes

To copy the second metadata area, you need to know the size of the PV. On
my setup with 100G
PV, I have 800 extents (128M each), and this works:

dd if=/dev/mapper/360014058ccaab4857eb40f393aaf0351 of=md2 bs=128M count=1
skip=799

gzip md1 md2

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RYQA4SXJQJJN7DV3U6KB2XQ3AOPLAHT6/

[ovirt-users] Re: [ovirt-devel] cannot add host (oVirt 4.3.3.7)

2019-06-09 Thread Nir Soffer

On Sun, Jun 9, 2019 at 9:15 PM Hetz Ben Hamo  wrote:

Moving to users list,  devel list is for discussion about oVirt
development, not troubleshooting.

I'm trying to add a new host to oVirt. The main host is Xeon E5 while the
> new host is AMD Ryzen 5.
>
> The main host is running oVirt 4.3.3 and the new node is a minimal install
> of CentOS 7.6 (1810) with all the latest updates.
>

Which engine version? on which OS?

I'm enclosing the log files. it complains that it cannot get the oVirt
> packages, perhaps wrong channel(?). Looking at the log, it's trying to use
> minidnf. I don't think CentOS 7 supports DNF..
>

Did you install ovirt-release43 rpm before adding the host?
https://ovirt.org/download/

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/E3CZAVHXQCSD37XHFO7DNYEK6EW3OGVK/

[ovirt-users] Re: [ovirt-devel] cannot add host (oVirt 4.3.3.7)

2019-06-09 Thread Nir Soffer

On Mon, Jun 10, 2019 at 12:23 AM Hetz Ben Hamo  wrote:

> I'm using version 4.3.3.7-1.el7
>
> No, I didn't install the RPM. From past experience (and according to the docs
> here )
> you don't need to as it takes care of it. Few months ago when I tested it,
> it worked well on a new CentOS install.
>

This is not required if you use ovit-node:
https://ovirt.org/download/#download-ovirt-node

Otherwise you need to install that rpm:
https://ovirt.org/download/#or-setup-a-host

>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2K3LD3LCZOSSOAHA7IMFXFSLN75ROVAI/

[ovirt-users] Re: Can't import some VMs after storage domain detach and reattach to new datacenter.

2019-06-25 Thread Nir Soffer

On Tue, Jun 25, 2019 at 9:40 AM  wrote:

> That's interesting.
>
> Where can i find meta for block storage?
>

In block storage metadata is kept in the "metadata" logical volume.

To find metadata for particular volume, you need to to look at the logical
volume tags:

lvs -o tags vg-name/lv-name
Which will output tags like MD_42

The metadata for this volume is at slot number 42.

To find the metadata, you need to calculate the offset in the metadata
logical volume.

On storage domain V4 format, the offset is:

offset=$((42 * 512))

On storage domain V5 format, the offset is:

offset=$((1048576 + 42 * 8192))

You can read the metadata like this:

dd if=/dev/vg-name/metadata bs=512 count=1 skip=$offset conv=skip_bytes
iflag=direct

If you need help you can copy the metadata logical volume and share it here:

dd if=/dev/vg-name/metadata bs=1M count=17 iflag=direct of=metadata-$(date
+%Y-%m-%d-%H-%M)
xz metadata-*

Nir

On NFS storage these files are located next to the disk image.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/KX2MVSAKJOPAVTKQXOR76QSIISTMCJQE/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RWIWK5PKSPGMHHK2DR7IRR4FEGCDJIKC/

[ovirt-users] Re: Can't import some VMs after storage domain detach and reattach to new datacenter.

2019-06-25 Thread Nir Soffer

On Mon, Jun 24, 2019 at 7:47 PM Dmitry Filonov 
wrote:

>
> Take a look at the corresponding .meta file for the disks you can not
> import.
> I had the very same problem and it was caused by
> DISKTYPE=1 in .meta.
>

I want more info on this. We think that the only value ever used for
DISKTYPE is 2.

Do you have any info on how these disks were created? Maybe by some ancient
version?

Nir


> When changed to
> DISKTYPE=DATA I was able to import disks correctly.
> Not the whole VM though..
>
>
> --
> Dmitry Filonov
> Linux Administrator
> SBGrid Core | Harvard Medical School
> 250 Longwood Ave, SGM-114
> Boston, MA 02115
>
>
> On Sun, Jun 23, 2019 at 4:29 AM m black  wrote:
>
>> Hi.
>>
>> I have a problem with importing some VMs after importing storage domain
>> in new datacenter.
>>
>> I have 5 servers with oVirt version 4.1.7, hosted-engine setup and
>> datacenter with iscsi, fc and nfs storages. Also i have 3 servers with
>> oVirt 4.3.4, hosted-engine and nfs storage.
>>
>> I've set iscsi and fc storages to maintenance and detached them
>> successfully on 4.1.7 datacenter.
>> Then i've imported these storage domains via Import Domain in 4.3.4
>> datacenter successfully.
>>
>> After storage domains were imported to new 4.3.4 datacenter i've tried to
>> import VMs from VM Import tab on storages.
>>
>> On the FC storage it was good, all VMs imported and started, all VMs in
>> place.
>>
>> And with iSCSI storage i've got problems:
>> On the iSCSI storage some VMs imported and started, but some of them
>> missing, some of missing VMs disks are showing at Disk Import, i've tried
>> to import disks from Disk Import tab and got error - 'Failed to register
>> disk'.
>> Tried to scan disks with 'Scan Disks' in storage domain, also tried
>> 'Update OVF' - no result.
>>
>> What caused this? What can i do to recover missing VMs? What logs to
>> examine?
>> Can it be storage domain disk corruption?
>>
>> Please, help.
>>
>> Thank you.
>>
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/MF5IUXURKIQZNNG4YW6ELENFD4GZIDQZ/
>>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BMXCPVIHKKQ3T767KVGMB44BLJOKLP6K/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XI2QRQFHJ7LYJHXU5GQNMLDRDPEKEDK3/

[ovirt-users] Re: Can't import some VMs after storage domain detach and reattach to new datacenter.

2019-06-25 Thread Nir Soffer

On Tue, Jun 25, 2019 at 3:15 PM Dmitry Filonov 
wrote:

> Hi Nir -
>
>  in my case these VMs were migrated from VirtualBox to oVirt using some of
> the VMWare provided tool
> and then virt-v2v to convert images. Here's the example of the meta file -
>
> DOMAIN=92be9db3-eab4-47ed-9ee9-87b8616b7c8c
> VOLTYPE=LEAF
> CTIME=1529005629
> MTIME=1529005629
> IMAGE=f0d0b3b3-5a31-4c9f-b551-90586bf946a5
> DISKTYPE=1
> PUUID=----
> LEGALITY=LEGAL
> POOL_UUID=
> SIZE=41943040
> FORMAT=RAW
> TYPE=SPARSE
> DESCRIPTION=generated by virt-v2v 1.36.10rhel_7,release_6.el7_5.2,libvirt
> EOF
>
> These disks worked fine on 4.2.3.8 but I wasn't able to import them into
> 4.3.4.3 unless I changed DISKTYPE line manually.
>

Do you have engine and vdsm logs from the time you imported this vm?

Which engine version was used during the import?

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TFGH4E6MWZX5XBNBFDTGOVUWCLZI77EZ/

[ovirt-users] Re: Can't import some VMs after storage domain detach and reattach to new datacenter.

2019-06-25 Thread Nir Soffer

On Tue, Jun 25, 2019 at 8:39 PM Dmitry Filonov 
wrote:

> Sorry, don't have any logs from back then. That was some time ago and it
> was easy to fix so I didn't bother keeping logs.
> DISKTYPE and DESCRIPTION were the only two lines I had to fix to get disks
> imported nicely.
> If you like I can probably re-create situation by creating a VM, then
> unregistering it, changing the .meta file and try re-importing it back.
>

Sure, reproducing it easy.

But we want to support only valid value created by older version of oVirt.
if such
volumes actually exists in the field, the system should handle the import
transparently,
translating the disk type to "DATA".

Are you sure the metadata was not modified outside of oVirt?

Nir

> On Tue, Jun 25, 2019 at 10:42 AM Nir Soffer  wrote:
>
>> On Tue, Jun 25, 2019 at 3:15 PM Dmitry Filonov <
>> filo...@hkl.hms.harvard.edu> wrote:
>>
>>> Hi Nir -
>>>
>>>  in my case these VMs were migrated from VirtualBox to oVirt using some
>>> of the VMWare provided tool
>>> and then virt-v2v to convert images. Here's the example of the meta file
>>> -
>>>
>>> DOMAIN=92be9db3-eab4-47ed-9ee9-87b8616b7c8c
>>> VOLTYPE=LEAF
>>> CTIME=1529005629
>>> MTIME=1529005629
>>> IMAGE=f0d0b3b3-5a31-4c9f-b551-90586bf946a5
>>> DISKTYPE=1
>>> PUUID=----
>>> LEGALITY=LEGAL
>>> POOL_UUID=
>>> SIZE=41943040
>>> FORMAT=RAW
>>> TYPE=SPARSE
>>> DESCRIPTION=generated by virt-v2v 1.36.10rhel_7,release_6.el7_5.2,libvirt
>>> EOF
>>>
>>> These disks worked fine on 4.2.3.8 but I wasn't able to import them into
>>> 4.3.4.3 unless I changed DISKTYPE line manually.
>>>
>>
>> Do you have engine and vdsm logs from the time you imported this vm?
>>
>> Which engine version was used during the import?
>>
>> Nir
>>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ML7DK2FPFLCNI6ZPFB3Z52HWIJCSZDOH/

[ovirt-users] Re: VM has paused due to no storage space error

2019-05-14 Thread Nir Soffer

On Sun, Oct 2, 2016 at 12:06 PM, Sandvik Agustin
wrote:
> Hi users,
>
> I have this problem that sometimes 1 to 3 VM just automatically paused with
> user interaction and getting this error "VM has paused due to no storage
> space error". any inputs from you guys are very appreciated.

This is expected - when there is no storage space :-)

The vm is paused when there are some io pending io requests that
could not be fulfilled since you don't have enough space.

In a real machine the io requests would fail. In a vm, the vm can pause,
you can fix the issue (extend the storage domain), and resume the vm.

But I guess there is storage space available, otherwise you would
not spend the time sending this mail.

This can happen when using thin provisioned disks on block storage
(iSCSI, FC). We provision such disk with 1G, and and extend the disk
(add 1G) when it becomes too full (by default, free space < 0.5G).

If we fail to extend the disk quick enough, the vm will pause before the
extend was completed. Once the extend was completed, we resume
the vm.

So you may see very short pauses, but they should be rare.

To understand the issue, we need to inspect vdsm logs from the host
running the vm that paused, showing the timeframe when the vm
was paused.

You should see this message in the log each time a vm pauses:

abnormal vm stop device error ENOSPC

Nir
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5MAYP4SZZQC5BB2VVPQBXYWH4OOJ7LUW/

[ovirt-users] Re: ovirt-ha-agent cpu usage

2019-05-14 Thread Nir Soffer

On Wed, Oct 5, 2016 at 1:33 PM, Simone Tiraboschi 
wrote:

>
>
> On Wed, Oct 5, 2016 at 10:34 AM, Nir Soffer  wrote:
>
>> On Wed, Oct 5, 2016 at 10:24 AM, Simone Tiraboschi 
>> wrote:
>>
>>>
>>>
>>> On Wed, Oct 5, 2016 at 9:17 AM, gregor  wrote:
>>>
>>>> Hi,
>>>>
>>>> did you found a solution or cause for this high CPU usage?
>>>> I have installed the self hosted engine on another server and there is
>>>> no VM running but ovirt-ha-agent uses heavily the CPU.
>>>>
>>>
>>> Yes, it's due to the fact that ovirt-ha-agent periodically reconnects
>>> over json rpc and this is CPU intensive since the client has to parse the
>>> yaml API specification each time it connects.
>>>
>>
>> Simone, reusing the connection is good idea anyway, but what you describe
>> is
>> a bug in the client library. The library does *not* need to load and
>> parse the
>> schema at all for sending requests to vdsm.
>>
>> The schema is only needed if you want to verify request parameters,
>> or provide online help, these are not needed in a client library.
>>
>> Please file an infra bug about it.
>>
>
> Done, https://bugzilla.redhat.com/show_bug.cgi?id=1381899
>

Here is a patch that should eliminate most most of the problem:
https://gerrit.ovirt.org/65230

Would be nice if it can be tested on the system showing this problem.

Cheers,
Nir

--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se



--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZBONUMRRA2POLODZPFSRHZVG3YOTGHSV/

[ovirt-users] Re: VMs paused due to IO issues - Dell Equallogic controller failover

2019-05-14 Thread Nir Soffer

On Thu, Oct 6, 2016 at 10:19 AM, Gary Lloyd  wrote:

> I asked on the Dell Storage Forum and they recommend the following:
>
> *I recommend not using a numeric value for the "no_path_retry" variable
> within /etc/multipath.conf as once that numeric value is reached, if no
> healthy LUNs were discovered during that defined time multipath will
> disable the I/O queue altogether.*
>
> *I do recommend, however, changing the variable value from "12" (or even
> "60") to "queue" which will then allow multipathd to continue queing I/O
> until a healthy LUN is discovered (time of fail-over between controllers)
> and I/O is allowed to flow once again.*
>
> Can you see any issues with this recommendation as far as Ovirt is
> concerned ?
>
Yes, we cannot work with unlimited queue. This will block vdsm for unlimited
time when the next command try to access storage. Because we don't have
good isolation between different storage domains, this may cause other
storage
domains to become faulty. Also engine flows that have a timeout will fail
with
a timeout.

If you are on 3.x, this will be very painfull, on 4.0 it should be better,
but it is not
recommended.

Nir

--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se



--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CVXQHQBMPCIII6YX67XIV6CWAJKYZYLK/

[ovirt-users] Re: VMs paused due to IO issues - Dell Equallogic controller failover

2019-05-14 Thread Nir Soffer

On Tue, Oct 4, 2016 at 10:51 AM, Gary Lloyd  wrote:

> Hi
>
> We have Ovirt 3.65 with a Dell Equallogic SAN and we use Direct Luns for
> all our VMs.
> At the weekend during early hours an Equallogic controller failed over to
> its standby on one of our arrays and this caused about 20 of our VMs to be
> paused due to IO problems.
>
> I have also noticed that this happens during Equallogic firmware upgrades
> since we moved onto Ovirt 3.65.
>
> As recommended by Dell disk timeouts within the VMs are set to 60 seconds
> when they are hosted on an EqualLogic SAN.
>
> Is there any other timeout value that we can configure in vdsm.conf to
> stop VMs from getting paused when a controller fails over ?
>

You can set the timeout in multipath.conf.

With current multipath configuration (deployed by vdsm), when all paths to
a device
are lost (e.g. you take down all ports on the server during upgrade), all
io will fail
immediately.

If you want to allow 60 seconds gracetime in such case, you can configure:

no_path_retry 12

This will continue to monitor the paths 12 times, each 5 seconds
(assuming polling_interval=5). If some path recover during this time, the io
can complete and the vm will not be paused.

If no path is available after these retries, io will fail and vms with
pending io
will pause.

Note that this will also cause delays in vdsm in various flows, increasing
the chance
of timeouts in engine side, or delays in storage domain monitoring.

However, the 60 seconds delay is expected only on the first time all paths
become
faulty. Once the timeout has expired, any access to the device will fail
immediately.

To configure this, you must add the # VDSM PRIVATE tag at the second line of
multipath.conf, otherwise vdsm will override your configuration in the next
time
you run vdsm-tool configure.

multipath.conf should look like this:

# VDSM REVISION 1.3
# VDSM PRIVATE

defaults {
polling_interval5
no_path_retry   12
user_friendly_names no
flush_on_last_del   yes
fast_io_fail_tmo5
dev_loss_tmo30
max_fds 4096
}

devices {
device {
all_devsyes
no_path_retry   12
}
}

This will use 12 retries (60 seconds) timeout for any device. If you like
to
configure only your specific device, you can add a device section for
your specific server instead.

>
> Also is there anything that we can tweak to automatically unpause the VMs
> once connectivity with the arrays is re-established ?
>

Vdsm will resume the vms when storage monitor detect that storage became
available again.
However we cannot guarantee that storage monitoring will detect that
storage was down.
This should be improved in 4.0.

> At the moment we are running a customized version of storageServer.py, as
> Ovirt has yet to include iscsi multipath support for Direct Luns out of the
> box.
>

Would you like to share this code?

Nir

--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se

--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4BNCWDHI62PCBS5RRZ46U2K6HEVNJEAO/

[ovirt-users] Re: DISCARD support?

2019-05-14 Thread Nir Soffer

On Tue, Oct 4, 2016 at 11:11 AM, Nicolas Ecarnot 
wrote:

> Hello,
>
> Sending this here to share knowledge.
>
> Here is what I learned from many BZ and mailing list posts readings. I'm
> not working at Redhat, so please correct me if I'm wrong.
>
> We are using thin-provisioned block storage LUNs (Equallogic), on which
> oVirt is creating numerous Logical Volumes, and we're very happy with it.
> When oVirt is removing a virtual disk, the SAN is not informed, because
> the LVM layer is not sending the "issue_discard" flag.
>
> /etc/lvm/lvm.conf is not the natural place to try to change this
> parameter, as VDSM is not using it.
>

> Efforts are presently made to include issue_discard setting support
> directly into vdsm.conf, first on a datacenter scope (4.0.x), then per
> storage domain (4.1.x) and maybe via a web GUI check-box. Part of the
> effort is to make sure every bit of a planned to be removed LV get wiped
> out. Part is to inform the block storage side about the deletion, in case
> of thin provisioned LUNs.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1342919
> https://bugzilla.redhat.com/show_bug.cgi?id=981626
>

This is already included in 4.0, added in:
https://gerrit.ovirt.org/58036

However it is disabled by default. To enable discard, you need to
enable the irs:discard_enable option.

The best way to do this is to create a dropin conf:
/etc/vdsm/vdsm.conf.d/50_discard.conf

[irs]
discard_enable = true

And restart vdsm.

You need to deploy this file on all hosts.

In the next version we want to enable this automatically if the storage
domain supports discard, no configuration on the host will be needed.

Nir


>
> --
> Nicolas ECARNOT
>
> On Mon, Oct 3, 2016 at 2:24 PM, Nicolas Ecarnot 
> wrote:
>
>> Yaniv,
>>
>> As a pure random way of web surfing, I found that you posted on twitter
>> an information about DISCARD support. (https://twitter.com/YanivKaul
>> /status/773513216664174592)
>>
>> I did not dig any further, but has it any relation with the fact that so
>> far, oVirt did not reclaim lost storage space amongst its logical volumes
>> of its storage domains?
>>
>> A BZ exist about this, but one was told no work would be done about it
>> until 4.x.y, so now we're there, I was wondering if you knew more?
>>
>
> Feel free to send such questions on the mailing list (ovirt users or
> devel), so other will be able to both chime in and see the response.
> We've supported a custom hook for enabling discard per disk (which is only
> relevant for virtio-SCSI and IDE) for some versions now (3.5 I believe).
> We are planning to add this via a UI and API in 4.1.
> In addition, we are looking into discard (instead of wipe after delete,
> when discard is also zero'ing content) as well as discard when removing LVs.
> See:
> http://www.ovirt.org/develop/release-management/features/
> storage/pass-discard-from-guest-to-underlying-storage/
> http://www.ovirt.org/develop/release-management/features/
> storage/wipe-volumes-using-blkdiscard/
> http://www.ovirt.org/develop/release-management/features/
> storage/discard-after-delete/
>
> Y.
>
>
>>
>> Best,
>>
>> --
>> Nicolas ECARNOT
>>
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>

--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se



--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6PVCQ5XGGVA3GX4QRTP7XNS4NTQNUR5N/

[ovirt-users] Re: ovirt-ha-agent cpu usage

2019-05-14 Thread Nir Soffer

On Wed, Oct 5, 2016 at 10:24 AM, Simone Tiraboschi 
wrote:

>
>
> On Wed, Oct 5, 2016 at 9:17 AM, gregor  wrote:
>
>> Hi,
>>
>> did you found a solution or cause for this high CPU usage?
>> I have installed the self hosted engine on another server and there is
>> no VM running but ovirt-ha-agent uses heavily the CPU.
>>
>
> Yes, it's due to the fact that ovirt-ha-agent periodically reconnects over
> json rpc and this is CPU intensive since the client has to parse the yaml
> API specification each time it connects.
>

Simone, reusing the connection is good idea anyway, but what you describe
is
a bug in the client library. The library does *not* need to load and parse
the
schema at all for sending requests to vdsm.

The schema is only needed if you want to verify request parameters,
or provide online help, these are not needed in a client library.

Please file an infra bug about it.

Nir


> The issue is tracked here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1349829 - ovirt-ha-agent
> should reuse json-rpc connections
> but it depends on:
> https://bugzilla.redhat.com/show_bug.cgi?id=1376843 - [RFE] Implement a
> keep-alive with reconnect if needed logic for the python jsonrpc client
>
>
>
>>
>> cheers
>> gregor
>>
>> On 08/08/16 15:09, Gianluca Cecchi wrote:
>> > On Mon, Aug 8, 2016 at 1:03 PM, Roy Golan > > > wrote:
>> >
>> > Does the spikes correlates with info messages on extracting the ovf?
>> >
>> >
>> >
>> >
>> >
>> >
>> > yes, it seems so and it happens every 14-15 seconds
>> >
>> > These are the lines I see scrolling in agent.log when I notice cpu
>> > spikes in ovirt-ha-agent...
>> >
>> > MainThread::INFO::2016-08-08
>> > 15:03:07,815::storage_server::212::ovirt_hosted_engine_ha.li
>> b.storage_server.StorageServer::(connect_storage_server)
>> > Connecting storage server
>> > MainThread::INFO::2016-08-08
>> > 15:03:08,144::storage_server::220::ovirt_hosted_engine_ha.li
>> b.storage_server.StorageServer::(connect_storage_server)
>> > Refreshing the storage domain
>> > MainThread::INFO::2016-08-08
>> > 15:03:08,705::hosted_engine::685::ovirt_hosted_engine_ha.age
>> nt.hosted_engine.HostedEngine::(_initialize_storage_images)
>> > Preparing images
>> > MainThread::INFO::2016-08-08
>> > 15:03:08,705::image::126::ovirt_hosted_engine_ha.lib.image.
>> Image::(prepare_images)
>> > Preparing images
>> > MainThread::INFO::2016-08-08
>> > 15:03:09,653::hosted_engine::688::ovirt_hosted_engine_ha.age
>> nt.hosted_engine.HostedEngine::(_initialize_storage_images)
>> > Reloading vm.conf from the shared storage domain
>> > MainThread::INFO::2016-08-08
>> > 15:03:09,653::config::205::ovirt_hosted_engine_ha.agent.host
>> ed_engine.HostedEngine.config::(refresh_local_conf_file)
>> > Trying to get a fresher copy of vm configuration from the OVF_STORE
>> > MainThread::INFO::2016-08-08
>> > 15:03:09,843::ovf_store::100::ovirt_hosted_engine_ha.lib.ovf
>> .ovf_store.OVFStore::(scan)
>> > Found OVF_STORE: imgUUID:223d26c2-1668-493c-a322-8054923d135f,
>> > volUUID:108a362c-f5a9-440e-8817-1ed8a129afe8
>> > MainThread::INFO::2016-08-08
>> > 15:03:10,309::ovf_store::100::ovirt_hosted_engine_ha.lib.ovf
>> .ovf_store.OVFStore::(scan)
>> > Found OVF_STORE: imgUUID:12ca2fc6-01f7-41ab-ab22-e75c822ac9b6,
>> > volUUID:1a18851e-6858-401c-be6e-af14415034b5
>> > MainThread::INFO::2016-08-08
>> > 15:03:10,652::ovf_store::109::ovirt_hosted_engine_ha.lib.ovf
>> .ovf_store.OVFStore::(getEngineVMOVF)
>> > Extracting Engine VM OVF from the OVF_STORE
>> > MainThread::INFO::2016-08-08
>> > 15:03:10,974::ovf_store::116::ovirt_hosted_engine_ha.lib.ovf
>> .ovf_store.OVFStore::(getEngineVMOVF)
>> > OVF_STORE volume path:
>> > /rhev/data-center/mnt/ovirt01.lutwyn.org:_SHE__DOMAIN/31a9e9
>> fd-8dcb-4475-aac4-09f897ee1b45/images/12ca2fc6-01f7-41ab-
>> ab22-e75c822ac9b6/1a18851e-6858-401c-be6e-af14415034b5
>> > MainThread::INFO::2016-08-08
>> > 15:03:11,494::config::225::ovirt_hosted_engine_ha.agent.host
>> ed_engine.HostedEngine.config::(refresh_local_conf_file)
>> > Found an OVF for HE VM, trying to convert
>> > MainThread::INFO::2016-08-08
>> > 15:03:11,497::config::230::ovirt_hosted_engine_ha.agent.host
>> ed_engine.HostedEngine.config::(refresh_local_conf_file)
>> > Got vm.conf from OVF_STORE
>> > MainThread::INFO::2016-08-08
>> > 15:03:11,675::hosted_engine::462::ovirt_hosted_engine_ha.age
>> nt.hosted_engine.HostedEngine::(start_monitoring)
>> > Current state EngineUp (score: 3400)
>> >
>> >
>> > ___
>> > Users mailing list
>> > Users@ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/users
>> >
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>

--
IMPORTANT!
This message has been scanned for viruses and phishing links.

[ovirt-users] Re: Tracebacks in vdsm.log file

2019-05-14 Thread Nir Soffer

On Fri, Sep 30, 2016 at 3:58 PM, knarra  wrote:
> Hi,
>
> I see below trace back in my vdsm.log. Can some one help me understand
> why these are logged?
>
>
> is free, finding out if anyone is waiting for it.
> Thread-557::DEBUG::2016-09-30
> 18:20:25,064::resourceManager::661::Storage.ResourceManager::(releaseResource)
> No one is waiting for resource 'Storage.upgrade_57ee3a08-004b-02
> 7b-0395-01d6', Clearing records.
> Thread-557::ERROR::2016-09-30
> 18:20:25,064::utils::375::Storage.StoragePool::(wrapper) Unhandled exception
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 372, in
> wrapper
> return f(*a, **kw)
>   File "/usr/lib/python2.7/site-packages/vdsm/concurrent.py", line 177, in
> run
> return func(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line
> 78, in wrapper
> return method(self, *args, **kwargs)
>   File "/usr/share/vdsm/storage/sp.py", line 207, in _upgradePoolDomain
> self._finalizePoolUpgradeIfNeeded()
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line
> 76, in wrapper
> raise SecureError("Secured object is not in safe state")
> SecureError: Secured object is not in safe state

This means that the when an domain upgrade thread has finished, the spm
was stopped.

I'm seeing these errors from time to time on my development host using
master. I don't think you should worry about them.

Can you file a bug about this? we should clean this sometimes.

Nir
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se



--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7KOBWIMEWWW3LWB7AB2B4VCXDXPGKNCW/

[ovirt-users] Re: VMs paused due to IO issues - Dell Equallogic controller failover

2019-05-14 Thread Nir Soffer

On Tue, Oct 4, 2016 at 7:03 PM, Michal Skrivanek <
michal.skriva...@redhat.com> wrote:

>
> > On 4 Oct 2016, at 09:51, Gary Lloyd  wrote:
> >
> > Hi
> >
> > We have Ovirt 3.65 with a Dell Equallogic SAN and we use Direct Luns for
> all our VMs.
> > At the weekend during early hours an Equallogic controller failed over
> to its standby on one of our arrays and this caused about 20 of our VMs to
> be paused due to IO problems.
> >
> > I have also noticed that this happens during Equallogic firmware
> upgrades since we moved onto Ovirt 3.65.
> >
> > As recommended by Dell disk timeouts within the VMs are set to 60
> seconds when they are hosted on an EqualLogic SAN.
> >
> > Is there any other timeout value that we can configure in vdsm.conf to
> stop VMs from getting paused when a controller fails over ?
>
> not really. but things are not so different when you look at it from the
> guest perspective. If the intention is to hide the fact that there is a
> problem and the guest should just see a delay (instead of dealing with
> error) then pausing and unpausing is the right behavior. From guest point
> of view this is just a delay it sees.
>
> >
> > Also is there anything that we can tweak to automatically unpause the
> VMs once connectivity with the arrays is re-established ?
>
> that should happen when the storage domain monitoring detects error and
> then reactivate(http://gerrit.ovirt.org/16244). It may be that since you
> have direct luns it’s not working with those….dunno, storage people should
> chime in I guess...
>


We don't monitor direct luns, only storage domains, so we do not support
resuming vms using direct luns.

multipath does monitor all devices, so we could monitor the devices status
via multipath, and resume paused vms when a device move from faulty
state to active state.

Maybe open an RFE for this?

Nir

--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se



--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2N5M7XDVNPV7OVI7COFTQRMUV7UMOQU2/

[ovirt-users] Re: ovirt-ha-agent cpu usage

2019-05-15 Thread Nir Soffer

On Fri, Oct 7, 2016 at 3:52 PM, Michal Skrivanek <
michal.skriva...@redhat.com> wrote:

>
> On 7 Oct 2016, at 14:42, Nir Soffer  wrote:
>
> On Wed, Oct 5, 2016 at 1:33 PM, Simone Tiraboschi 
> wrote:
>
>>
>>
>> On Wed, Oct 5, 2016 at 10:34 AM, Nir Soffer  wrote:
>>
>>> On Wed, Oct 5, 2016 at 10:24 AM, Simone Tiraboschi 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Oct 5, 2016 at 9:17 AM, gregor  wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> did you found a solution or cause for this high CPU usage?
>>>>> I have installed the self hosted engine on another server and there is
>>>>> no VM running but ovirt-ha-agent uses heavily the CPU.
>>>>>
>>>>
>>>> Yes, it's due to the fact that ovirt-ha-agent periodically reconnects
>>>> over json rpc and this is CPU intensive since the client has to parse the
>>>> yaml API specification each time it connects.
>>>>
>>>
> wasn’t it suppose to be fixed to reuse the connection? Like all the other
> clients (vdsm migration code:-)
>

This is orthogonal issue.


> Does schema validation matter then if there would be only one connection
> at the start up?
>

Loading once does not help command line tools like vdsClient, hosted-engine
and
vdsm-tool.

Nir


>
>
>>> Simone, reusing the connection is good idea anyway, but what you
>>> describe is
>>> a bug in the client library. The library does *not* need to load and
>>> parse the
>>> schema at all for sending requests to vdsm.
>>>
>>> The schema is only needed if you want to verify request parameters,
>>> or provide online help, these are not needed in a client library.
>>>
>>> Please file an infra bug about it.
>>>
>>
>> Done, https://bugzilla.redhat.com/show_bug.cgi?id=1381899
>>
>
> Here is a patch that should eliminate most most of the problem:
> https://gerrit.ovirt.org/65230
>
> Would be nice if it can be tested on the system showing this problem.
>
> Cheers,
> Nir
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>

--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se



--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SWDGG5TVQ54Q3SB2TAOKEQI6SVNTQS5W/

[ovirt-users] Re: Cleanup illegal snapshot

2019-05-15 Thread Nir Soffer

On Sun, Oct 9, 2016 at 8:33 PM, Markus Stockhausen
 wrote:
> Hi Ala,
>
> that did not help. VDSM log tells me that the delta qcow2 file is missing:
>
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/task.py", line 873, in _run
> return fn(*args, **kargs)
>   File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
> res = f(*args, **kwargs)
>   File "/usr/share/vdsm/storage/hsm.py", line 3162, in getVolumeInfo
> volUUID=volUUID).getInfo()
>   File "/usr/share/vdsm/storage/sd.py", line 457, in produceVolume
> volUUID)
>   File "/usr/share/vdsm/storage/fileVolume.py", line 58, in __init__
> volume.Volume.__init__(self, repoPath, sdUUID, imgUUID, volUUID)
>   File "/usr/share/vdsm/storage/volume.py", line 181, in __init__
> self.validate()
>   File "/usr/share/vdsm/storage/volume.py", line 194, in validate
> self.validateVolumePath()
>   File "/usr/share/vdsm/storage/fileVolume.py", line 540, in
> validateVolumePath
> raise se.VolumeDoesNotExist(self.volUUID)
> VolumeDoesNotExist: Volume does not exist:
> (u'c277351d-e2b1-4057-aafb-55d4b607ebae',)
> ...
> Thread-196::ERROR::2016-10-09 19:31:07,037::utils::739::root::(wrapper)
> Unhandled exception
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 736, in
> wrapper
> return f(*a, **kw)
>   File "/usr/share/vdsm/virt/vm.py", line 5264, in run
> self.update_base_size()
>   File "/usr/share/vdsm/virt/vm.py", line 5257, in update_base_size
> self.drive.imageID, topVolUUID)
>   File "/usr/share/vdsm/virt/vm.py", line 5191, in _getVolumeInfo
> (domainID, volumeID))
> StorageUnavailableError: Unable to get volume info for domain
> 47202573-6e83-42fd-a274-d11f05eca2dd volume
> c277351d-e2b1-4057-aafb-55d4b607ebae

Hi Markus,

I'm sorry for this confusing log,  Vdsm is treating a missing volume
as a horrible
error is a bug, this is expected condition when you query volume info.

We also log 2 exceptions instead of one.

Would file a vdsm storage bug for these issues?

Thanks,
Nir
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se



--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BNSJD4IBEGVZWOQFYPXIH7X2AJT5STXR/

[ovirt-users] Re: [ovirt-announce] Re: [ANN] oVirt 4.3.4 First Release Candidate is now available

2019-05-18 Thread Nir Soffer

On Fri, May 17, 2019 at 7:54 AM Gobinda Das  wrote:

> From RHHI side default we are setting below volume options:
>
> { group: 'virt',
>  storage.owner-uid: '36',
>  storage.owner-gid: '36',
>  network.ping-timeout: '30',
>  performance.strict-o-direct: 'on',
>  network.remote-dio: 'off'
>

According to the user reports, this configuration is not compatible with
oVirt.

Was this tested?

   }
>
>
> On Fri, May 17, 2019 at 2:31 AM Strahil Nikolov 
> wrote:
>
>> Ok, setting 'gluster volume set data_fast4 network.remote-dio on'
>> allowed me to create the storage domain without any issues.
>> I set it on all 4 new gluster volumes and the storage domains were
>> successfully created.
>>
>> I have created bug for that:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1711060
>>
>> If someone else already opened - please ping me to mark this one as
>> duplicate.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>>
>> В четвъртък, 16 май 2019 г., 22:27:01 ч. Гринуич+3, Darrell Budic <
>> bu...@onholyground.com> написа:
>>
>>
>> On May 16, 2019, at 1:41 PM, Nir Soffer  wrote:
>>
>>
>> On Thu, May 16, 2019 at 8:38 PM Darrell Budic 
>> wrote:
>>
>> I tried adding a new storage domain on my hyper converged test cluster
>> running Ovirt 4.3.3.7 and gluster 6.1. I was able to create the new gluster
>> volume fine, but it’s not able to add the gluster storage domain (as either
>> a managed gluster volume or directly entering values). The created gluster
>> volume mounts and looks fine from the CLI. Errors in VDSM log:
>>
>> ...
>>
>> 2019-05-16 10:25:09,584-0500 ERROR (jsonrpc/5) [storage.fileSD] Underlying
>> file system doesn't supportdirect IO (fileSD:110)
>> 2019-05-16 10:25:09,584-0500 INFO  (jsonrpc/5) [vdsm.api] FINISH
>> createStorageDomain error=Storage Domain target is unsupported: ()
>> from=:::10.100.90.5,44732, flow_id=31d993dd,
>> task_id=ecea28f3-60d4-476d-9ba8-b753b7c9940d (api:52)
>>
>>
>> The direct I/O check has failed.
>>
>>
>> So something is wrong in the files system.
>>
>> To confirm, you can try to do:
>>
>> dd if=/dev/zero of=/path/to/mountoint/test bs=4096 count=1 oflag=direct
>>
>> This will probably fail with:
>> dd: failed to open '/path/to/mountoint/test': Invalid argument
>>
>> If it succeeds, but oVirt fail to connect to this domain, file a bug and
>> we will investigate.
>>
>> Nir
>>
>>
>> Yep, it fails as expected. Just to check, it is working on pre-existing
>> volumes, so I poked around at gluster settings for the new volume. It has
>> network.remote-dio=off set on the new volume, but enabled on old volumes.
>> After enabling it, I’m able to run the dd test:
>>
>> [root@boneyard mnt]# gluster vol set test network.remote-dio enable
>> volume set: success
>> [root@boneyard mnt]# dd if=/dev/zero of=testfile bs=4096 count=1
>> oflag=direct
>> 1+0 records in
>> 1+0 records out
>> 4096 bytes (4.1 kB) copied, 0.0018285 s, 2.2 MB/s
>>
>> I’m also able to add the storage domain in ovirt now.
>>
>> I see network.remote-dio=enable is part of the gluster virt group, so
>> apparently it’s not getting set by ovirt duding the volume creation/optimze
>> for storage?
>>
>>
>>
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>>
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OPBXHYOHZA4XR5CHU7KMD2ISQWLFRG5N/
>>
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/B7K24XYG3M43CMMM7MMFARH52QEBXIU5/
>>
>
>
> --
>
>
> Thanks,
> Gobinda
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/72IJEAJ7RN42H4GDG7DC4JGCRACIGOOV/

[ovirt-users] Re: Wrong disk size in UI after expanding iscsi direct LUN

2019-05-18 Thread Nir Soffer

On Thu, May 16, 2019 at 6:10 PM Bernhard Dick  wrote:

> Hi,
>
> I've extended the size of one of my direct iSCSI LUNs. The VM is seeing
> the new size but in the webinterface there is still the old size
> reported. Is there a way to update this information? I already took a
> look into the list but there are only reports regarding updating the
> size the VM sees.
>

Sounds like you hit this bug:
https://bugzilla.redhat.com/1651939


The description mention a workaround using the REST API.

Nir


>
>Best regards
>  Bernhard
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/54YHISUA66227IAMI2UVPZRIXV54BAKA/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WTXNENWK47HH3BQ4ZV4GZKOA7XYHMX6D/

[ovirt-users] Re: ovirt 4.3.3.7 cannot create a gluster storage domain

2019-05-22 Thread Nir Soffer

On Mon, May 20, 2019 at 12:46 PM Andreas Elvers <
andreas.elvers+ovirtfo...@solutions.work> wrote:

>
> > Without this file [dom_md/ids], you will not have any
> > kind of storage.
>
> Ok. Sounds I'm kind of in trouble with that file being un-healable by
> gluster?
>

Yes, but the good news is that you can easily initialize this file if it
got corrupted,
since it keeps only temporary hosts status.
https://lists.ovirt.org/pipermail/users/2016-February/038046.html


> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/FIR74ALG7WJFIPNIAHZH4PRBY7UI2QRO/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OSZGHT65OUUSGKTLYKJVZPVRFTBPI3EO/

[ovirt-users] Re: [ovirt-announce] Re: [ANN] oVirt 4.3.4 First Release Candidate is now available

2019-05-16 Thread Nir Soffer

On Thu, May 16, 2019 at 8:38 PM Darrell Budic 
wrote:

> I tried adding a new storage domain on my hyper converged test cluster
> running Ovirt 4.3.3.7 and gluster 6.1. I was able to create the new gluster
> volume fine, but it’s not able to add the gluster storage domain (as either
> a managed gluster volume or directly entering values). The created gluster
> volume mounts and looks fine from the CLI. Errors in VDSM log:
>
> ...

> 2019-05-16 10:25:09,584-0500 ERROR (jsonrpc/5) [storage.fileSD] Underlying
> file system doesn't supportdirect IO (fileSD:110)
> 2019-05-16 10:25:09,584-0500 INFO  (jsonrpc/5) [vdsm.api] FINISH
> createStorageDomain error=Storage Domain target is unsupported: ()
> from=:::10.100.90.5,44732, flow_id=31d993dd,
> task_id=ecea28f3-60d4-476d-9ba8-b753b7c9940d (api:52)
>

The direct I/O check has failed.

This is the code doing the check:

 98 def validateFileSystemFeatures(sdUUID, mountDir):
 99 try:
100 # Don't unlink this file, we don't have the cluster lock yet as
it
101 # requires direct IO which is what we are trying to test for.
This
102 # means that unlinking the file might cause a race. Since we
don't
103 # care what the content of the file is, just that we managed to
104 # open it O_DIRECT.
105 testFilePath = os.path.join(mountDir, "__DIRECT_IO_TEST__")
106 oop.getProcessPool(sdUUID).directTouch(testFilePath)


107 except OSError as e:
108 if e.errno == errno.EINVAL:
109 log = logging.getLogger("storage.fileSD")
110 log.error("Underlying file system doesn't support"
111   "direct IO")
112 raise se.StorageDomainTargetUnsupported()
113
114 raise

The actual check is done in ioprocess, using:

319 fd = open(path->str, allFlags, mode);


320 if (fd == -1) {
321 rv = fd;
322 goto clean;
323 }
324
325 rv = futimens(fd, NULL);
326 if (rv < 0) {
327 goto clean;
328 }
With:

allFlags = O_WRONLY | O_CREAT | O_DIRECT

See:
https://github.com/oVirt/ioprocess/blob/7508d23e19aeeb4dfc180b854a5a92690d2e2aaf/src/exported-functions.c#L291

According to the error message:
Underlying file system doesn't support direct IO

We got EINVAL, which is possible only from open(), and is likely an issue
opening
the file with O_DIRECT.

So something is wrong in the files system.

To confirm, you can try to do:

dd if=/dev/zero of=/path/to/mountoint/test bs=4096 count=1 oflag=direct

This will probably fail with:
dd: failed to open '/path/to/mountoint/test': Invalid argument

If it succeeds, but oVirt fail to connect to this domain, file a bug and we
will investigate.

Nir


>
> On May 16, 2019, at 11:55 AM, Nir Soffer  wrote:
>
> On Thu, May 16, 2019 at 7:42 PM Strahil  wrote:
>
>> Hi Sandro,
>>
>> Thanks for the update.
>>
>> I have just upgraded to RC1 (using gluster v6 here)  and the issue  I
>> detected in 4.3.3.7 - where gluster Storage domain fails creation - is
>> still present.
>>
>
> What is is this issue? can you provide a link to the bug/mail about it?
>
> Can you check if the 'dd' command executed during creation has been
>> recently modified ?
>>
>> I've received update from Darrell  (also gluster v6) , but haven't
>> received an update from anyone who is using gluster v5 -> thus I haven't
>> opened a bug yet.
>>
>> Best Regards,
>> Strahil Nikolov
>> On May 16, 2019 11:21, Sandro Bonazzola  wrote:
>>
>> The oVirt Project is pleased to announce the availability of the oVirt
>> 4.3.4 First Release Candidate, as of May 16th, 2019.
>>
>> This update is a release candidate of the fourth in a series of
>> stabilization updates to the 4.3 series.
>> This is pre-release software. This pre-release should not to be used
>> inproduction.
>>
>> This release is available now on x86_64 architecture for:
>> * Red Hat Enterprise Linux 7.6 or later
>> * CentOS Linux (or similar) 7.6 or later
>>
>> This release supports Hypervisor Hosts on x86_64 and ppc64le
>> architectures for:
>> * Red Hat Enterprise Linux 7.6 or later
>> * CentOS Linux (or similar) 7.6 or later
>> * oVirt Node 4.3 (available for x86_64 only)
>>
>> Experimental tech preview for x86_64 and s390x architectures for Fedora
>> 28 is also included.
>>
>> See the release notes [1] for installation / upgrade instructions and a
>> list of new features and bugs fixed.
>>
>> Notes:
>> - oVirt Appliance is already available
>> - oVirt Node is already available[2]
>>
>> Additional Resources:
>> * Read more about the oVirt 4.3.4 release highlights:
>> http://w

[ovirt-users] Re: [ANN] oVirt 4.3.4 First Release Candidate is now available

2019-05-16 Thread Nir Soffer

On Thu, May 16, 2019 at 7:42 PM Strahil  wrote:

> Hi Sandro,
>
> Thanks for the update.
>
> I have just upgraded to RC1 (using gluster v6 here)  and the issue  I
> detected in 4.3.3.7 - where gluster Storage domain fails creation - is
> still present.
>

What is is this issue? can you provide a link to the bug/mail about it?

Can you check if the 'dd' command executed during creation has been
> recently modified ?
>
> I've received update from Darrell  (also gluster v6) , but haven't
> received an update from anyone who is using gluster v5 -> thus I haven't
> opened a bug yet.
>
> Best Regards,
> Strahil Nikolov
> On May 16, 2019 11:21, Sandro Bonazzola  wrote:
>
> The oVirt Project is pleased to announce the availability of the oVirt
> 4.3.4 First Release Candidate, as of May 16th, 2019.
>
> This update is a release candidate of the fourth in a series of
> stabilization updates to the 4.3 series.
> This is pre-release software. This pre-release should not to be used
> inproduction.
>
> This release is available now on x86_64 architecture for:
> * Red Hat Enterprise Linux 7.6 or later
> * CentOS Linux (or similar) 7.6 or later
>
> This release supports Hypervisor Hosts on x86_64 and ppc64le architectures
> for:
> * Red Hat Enterprise Linux 7.6 or later
> * CentOS Linux (or similar) 7.6 or later
> * oVirt Node 4.3 (available for x86_64 only)
>
> Experimental tech preview for x86_64 and s390x architectures for Fedora 28
> is also included.
>
> See the release notes [1] for installation / upgrade instructions and a
> list of new features and bugs fixed.
>
> Notes:
> - oVirt Appliance is already available
> - oVirt Node is already available[2]
>
> Additional Resources:
> * Read more about the oVirt 4.3.4 release highlights:
> http://www.ovirt.org/release/4.3.4/
> * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt
> * Check out the latest project news on the oVirt blog:
> http://www.ovirt.org/blog/
>
> [1] http://www.ovirt.org/release/4.3.4/
> [2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/
>
> --
>
> Sandro Bonazzola
>
> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
>
> Red Hat EMEA 
>
> sbona...@redhat.com
> 
> 
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/REDV54BH7CIIDRCRUPCUYN4TX5Z3SL6R/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ABFECS5ES4MVL3UZC34GLIDN5PNDTNOR/

[ovirt-users] Re: ovirt-ha-agent cpu usage

2019-05-15 Thread Nir Soffer

>
> And a bright new 3-minutes video here:
> https://drive.google.com/file/d/0BwoPbcrMv8mvVzBPUVRQa1pwVnc/
> view?usp=sharing
>

> It seems that now ovirt-ha-agent or is not present in top cpu process or
> at least has ranges between 5% and 12% and not more
>

5-12% is probably 10 times more than needed for the agent, profiling
should tell us where time is spent.

Since the agent depends on vdsm, we can reuse vdsm cpu profiler, see
lib/vdsm/profiling/cpu.py. It is not ready yet to be used by other
applications,
but making it more general should be easy.

Nir

--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se



--
IMPORTANT!
This message has been scanned for viruses and phishing links.
However, it is your responsibility to evaluate the links and attachments you 
choose to click.
If you are uncertain, we always try to help.
Greetings helpd...@actnet.se


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/A2JOBFOHETQIMVJHLK6CIYF4ECCJ65NI/

[ovirt-users] Re: Does Ovirt 4.3.4 have support for NFS 4/4.1/4.2 or pNFS

2019-07-13 Thread Nir Soffer

On Sat, Jul 13, 2019, 00:30 Erick Perez  wrote:

> I have read the archives and the most recent discussion was 5 years ago.
> So I better ask again.
> My NAS runs Centos with NFS4.2 (and I am testing Ganesha in another server)
> Does Ovirt 4.3.4 have support for NFS 4/4.1/4.2 or pNFS
>

We support NFS 4.2 for a while (4.2 or 4.3), but the default is still auto
(typically NFS 3).

Specially version 4.2 due to:
> Server-Side Copy: NFSv4.2 supports copy_file_range() system call, which
> allows the NFS client to efficiently copy data without wasting network
> resources.
>

qemu support copy_file_range since RHEL 7.6, but I think it has some issues
and does not perform well, and we don't enable this mode yet.

What work with NFS 4.2 is sparseness and fallocate support, speeding
copy/move/clone/upload of spares disks.

For example creating 100g preallocated image would take less than a second.


> But this will only happen if the Ovirt (which I know is Centos based)
> supports NFS 4.2.
> Not sure If I update the NFS toolset on the Ovirt install, it will break
> something or worst.
>

It should work and much better.

Nir

___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/Q4YOSOY4ZF2D6YTIORIPMUD6YJNACVB3/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YOBLWT6H4THVXOY2PXSDYGWXU6UYJWIM/

[ovirt-users] Re: Upgraded to 4.3: getting SpmStart failure

2019-07-07 Thread Nir Soffer

On Fri, Jul 5, 2019 at 6:45 PM Richard Chan 
wrote:

> Unfortunately, this is a separate engine running external to the oVirt
> cluster.
> The main error on the hosts is: any idea what this Volume is referring to?
> I have 3 existings storage domains
> ISO EXPORT and DATA all upgraded to 4.3
> None have this UUID.
>
>
> VolumeDoesNotExist: Volume does not exist:
> (u'1d1b6dfd-197a-4583-8398-6b941dbca854',)
>
> 2019-07-05 23:02:15,954+0800 ERROR (tasks/0) [storage.StoragePool] failed:
> Volume does not exist: (u'1d1b6dfd-197a-4583-8398-6b941dbca854',) (sp:384)
> 2019-07-05 23:02:15,954+0800 INFO  (tasks/0) [storage.SANLock] Releasing
> Lease(name='SDM', 
> path=u'/rhev/data-center/mnt/192.168.88.40:_genie_ovirt_data/f9aaa65a-0ac7-4c61-a27d-15af39016419/dom_md/leases',
> offset=1048576) (clusterlock:488)
> 2019-07-05 23:02:15,955+0800 INFO  (tasks/0) [storage.SANLock]
> Successfully released Lease(name='SDM',
> path=u'/rhev/data-center/mnt/192.168.88.40:_genie_ovirt_data/f9aaa65a-0ac7-4c61-a27d-15af39016419/dom_md/leases',
> offset=1048576) (clusterlock:497)
> 2019-07-05 23:02:15,955+0800 ERROR (tasks/0) [storage.TaskManager.Task]
> (Task='159b9026-7034-43c2-9587-8fec036e92ba') Unexpected error (task:875)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
> in _run
> return fn(*args, **kargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336,
> in run
> return self.cmd(*self.argslist, **self.argsdict)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 333, in
> startSpm
> self._upgradePool(expectedDomVersion, __securityOverride=True)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line
> 79, in wrapper
> return method(self, *args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 484, in
> _upgradePool
> str(targetDomVersion))
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1108,
> in _convertDomain
> targetFormat)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/formatconverter.py",
> line 447, in convert
> converter(repoPath, hostId, imageRepo, isMsd)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/formatconverter.py",
> line 405, in v5DomainConverter
> domain.convert_volumes_metadata(target_version)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line
> 813, in convert_volumes_metadata
> for vol in self.iter_volumes():
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 764, in
> iter_volumes
> yield self.produceVolume(img_id, vol_id)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 846, in
> produceVolume
> volUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line
> 817, in __init__
> self._manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
> 71, in __init__
> volUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 86,
> in __init__
> self.validate()
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line
> 112, in validate
> self.validateVolumePath()
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
> 131, in validateVolumePath
> raise se.VolumeDoesNotExist(self.volUUID)
>

This usually means the volume does not have the right permissions.

130 if not self.oop.fileUtils.pathExists(volPath):


131 raise se.VolumeDoesNotExist(self.volUUID)

oop.fileUtils.pathExists() is:

130 if not self.oop.fileUtils.pathExists(volPath):


131 raise se.VolumeDoesNotExist(self.volUUID)

IOProcess.pathExists() is:

482 def pathExists(self, filename, writable=False):


483 check = os.R_OK
484
485 if writable:
486 check |= os.W_OK
487
488 if self.access(filename, check):
489 return True
490
491 return self.access(filename, check)

Bad permissions would fail converting a domain of course, so this real fix
is to
fix the bad permissions.

Nir


>
> On Fri, Jul 5, 2019 at 11:31 PM Strahil  wrote:
>
>> Can you run the engine manually on another server ?
>> hosted-engine --vm-start
>>
>> Best Regards,
>> Strahil Nikolov
>> On Jul 5, 2019 18:05, Richard Chan  wrote:
>>
>> My engine is getting SpmStart failure.
>>
>> What is the best way to troubleshoot this?
>>
>>
>> --
>> Richard Chan
>>
>>
>
> --
> Richard Chan
> Chief Architect
>
> TreeBox Solutions Pte Ltd
> 1 Commonwealth Lane #03-01
> Singapore 149544
> Tel: 6570 3725
> http://www.treeboxsolutions.com
>
> Co.Reg.No. 201100585R
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
>

[ovirt-users] Re: Experience 4.1.x -> 4.2.8 -> 4.3.4 upgrade lessons

2019-07-07 Thread Nir Soffer

On Sat, Jul 6, 2019 at 4:58 AM Richard Chan 
wrote:

> Hi oVirters,
>
> Did a 4.1 - > 4.2.8 -> 4.3.4 upgrade in two hops. Here is my experience,
> hope it can save you some pain.
>
> Gotchas
> 1. For shutdown VMs, their disks were deactivated (though still attached)
> 2. For shutdown VMs, their NICs were "Up" but "Unplugged"
> 3. VM Custom Compatibility Version is hard to find. (If it is too old say
> 3.6, 4.1) they won't start up.  The location is:  VM -> Edit -> System ->
> Advanced Parameters (click this)
>
> "Advanced Parameters" is an expandable widget that is quite small when
> collapsed. When it is expanded then you will see "Custom Compatibility
> Version"
>
> 4. Storage Domain migration to V5: I use a NFS storoage domain. This is
> migrated to the "V5" format and a lot of cleanup is done. If you have stray
> files or images not owned by 36:36 (vdsm:kvm on my NFS server), the V5
> upgrade will fail, there will be no SPM, and the Data Center will oscillate
> between Non-responsive and Contending.
>
> Why would there be non 36:36 uid/gid files? This is a long story, but from
> the older 3.x days when there was less functionality, we would sometimes
> directly manipulate the filesystem and/or qcow2 images using guestfish or
> equivalent tools which would change the owner to qemu:qemu. Forgetting to
> chown back to vdsm:kvm is a recipe for pain.
>

Bad permission are expected to cause failures when converting a domain,
since we need
to modify domain and volumes metadata. This will also cause failures to use
volumes
or in some other storage operations.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WAC4OPM73HWACWVHWZQOPX7RUWOWPJHK/

[ovirt-users] Re: VM snapshot - some info needed

2019-07-07 Thread Nir Soffer

On Wed, Jul 3, 2019 at 8:11 PM Strahil Nikolov 
wrote:

> I have noticed that if I want a VM snapshot with memory - I get a warning "The
> VM will be paused while saving the memory" .
> Is there a way to make snapshot without pausing the VM for the whole
> duration ?
>

No, this is expected.

Denis was working on background snapshot which will remove this limitation:
https://www.youtube.com/watch?v=fKj4j8lw8pU

Nir


> I have noticed that it doesn't matter if VM has qemu-guest-agent or
> ovirt-guest-agent.
>
> Thanks in advance for sharing your thoughts.
>
> Best Regards,
> Strahil  Nikolov
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/7VFMZRGF57WXTFTZ44W55MDX6BJES5TX/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GFG47UKCGCNP5R2P7RDSNY5X2KTIUCZG/

[ovirt-users] Re: [ovirt-devel] USB on host support?

2019-06-29 Thread Nir Soffer

On Sat, Jun 29, 2019 at 11:57 PM Hetz Ben Hamo  wrote:

> One of the requirements is USB host support (which will be on the nodes,
> not on the desktops). I haven't found any way to connect USB to a VM other
> then through SPICE, but this method won't help as these dongles won't be on
> desktops.
>
> Is this in the road map of oVirt?
>

Did you try Compute > Virtual machines > vm-name > Host devices?

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WPHJ6DXBAJNXNBM7HTLLUEAVXXTTGYKO/

[ovirt-users] Re: Disk Allocation policy changes after a snapshot

2019-08-15 Thread Nir Soffer

On Thu, Aug 15, 2019 at 8:30 PM Gianluca Cecchi 
wrote:

> On Thu, Aug 15, 2019 at 10:35 AM Evelina Shames 
> wrote:
>
>> Hey Kevin,
>> By design, when creating a snapshot, the new volume is created with
>> 'sparse' allocation policy.
>> I suggest you to open a bug since this operation should not crash the VM.
>> Add this description and please add all relevant logs and any relevant
>> information of your environment.
>>
>> Regards,
>> Evelina
>>
>> On Tue, Aug 13, 2019 at 4:10 PM Kevin Doyle 
>> wrote:
>>
>>> Hi I have linux VM with 2 disks one for OS is sparse the other is for a
>>> Database and is Preallocated. When I take a snapshot of the VM both disks
>>> change to sparse policy but the disks in the snapshot are 1 sparse and 1
>>> allocated. Before the snapshot the VM was running fine, now it crashes when
>>> data is written to the database. When I delete the snapshot the disks go
>>> back to 1 sparse and 1 allocated. Has anyone else seen this happen. Ovirt
>>> is 4.3.2.1-1.el7 and it is running on a hostedEngine
>>>
>>> Many thanks
>>> Kevin
>>> ___
>>>
>>
> Hi,
> some clarifications needed: what kind of storage are you using?
> If block based (iSCSI or FC-SAN) I verified problems on sparse allocated
> disks and databases (Oracle in my case) during high I/O on datafiles.
>

What kind of problems did you have? do we have a bug for this?


> So, as you did, I used preallocated for data based disks.
>

For best performance, we always recommended preallocated disks. I think you
will get best results
with direct LUN for applications that needs best performance.


> For fine tuning of automatic LVM extensions in case of block based storage
> domains, see also this 2017 thread:
>
> https://lists.ovirt.org/archives/list/users@ovirt.org/thread/S3LXEJV3V4CIOTQXNGZYVZFUSDSQZQJS/
> not currently using it tohugh with recent versions of oVirt, so I have no
> "fresh" information about efficiency, depending on I/O load amount
> HIH anyway,
> Gianluca
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/Q6SKLJQHNYBOLDXJBWA6JYNL6DN6ITT4/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DPHKXBPEXMSRHUGPIOYISBQRZZ4H3FZ7/

[ovirt-users] Re: ovirt-imagio-proxy upload speed slow

2019-09-15 Thread Nir Soffer

On Sun, Sep 15, 2019 at 8:37 PM Mikael Öhman  wrote:

> > What do you mean by "small block sizes"?
>
> inside a VM, or directly on the mounted glusterfs;
> dd if=/dev/zero of=tmpfile  bs=1M count=100 oflag=direct
> of course, a terrible way to write data, but also things like compiling
> software inside one of the VMs was terrible slow, 5-10x slower than
> hardware, consisting of almost only idling.
>
> Uploading disk images never got above 30MB/s.
> (and I did try all options i could find; using upload_disk.py on one of
> the hosts, even through a unix socket or with -d option, tweaking buffer
> size, all of which made no difference).
> Adding an NFS volume and uploading to it I reach +200MB/s.
>
> I tried tuning a few parameters on glusterfs but saw no improvements until
> I got to network.remote-dio, which made everything listed above really fast.
>
> > Note that network.remote-dio is not the recommended configuration
> > for ovirt, in particular if on hyperconverge setup when it can be harmful
> > by delaying sanlock I/O.
> >
> >
> https://github.com/oVirt/ovirt-site/blob/4a9b28aac48870343c5ea4d1e83a63c1.
> ..
> > (Patch in discussion)
>
> Oh, I had seen this page, thanks. Is "remote-dio=enabled" harmful as in
> things breaking, or just worse performance?
>

I think the issue is delayed writes to storage until you call fsync() on
the node. The
kernel may try to flush too much data, which may cause delays in other
processes.

Many years ago we had such issues that cause sanlock failures in renewing
leases, which
can end in terminating vdsm. This is the reason we always use direct I/O
when copying
disks. If you have hypervonvrged setup and use network.remote-dio you may
have the
same problems.

Sahina worked on a bug that was related to network.remote-dio, I hope she
can add more
details.

> I was a bit reluctant to turn it on, but after seeing it was part of the
> virt group I thought it must have been safe.
>

I think it should be safe for general virt usage, but you may need to tweak
some
host settings to avoid large delays when using fsync().

> Perhaps some of the other options like "performance.strict-o-direct on"
> would solve my performance issues in a nicer way (I will test it out first
> thing on monday)
>

This is not likely to improve performance, but it is seems to be required
if you use
network.remote-dio = off. Without it direct I/O does not behave in a
predictable way
and may cause failures in qemu, vdsm and ovirt-imageio.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TXULQKGQ6CZDUW4D3XBTLKTS4I3TNSC4/

[ovirt-users] Re: ovirt-imagio-proxy upload speed slow

2019-09-15 Thread Nir Soffer

On Fri, Sep 13, 2019 at 3:54 PM Mikael Öhman  wrote:

> Perhaps it will help future sysadmins will learn from my mistake.
> I also saw very poor upload speeds (~30MB/s) no matter what I tried. I
> went through the whole route with unix-sockets and whatnot.
>
> But, in the end, it just turned out that the glusterfs itself was the
> bottleneck; abysmal performance for small block sizes.
>

What do you mean by "small block sizes"?

I found the list of suggested performance tweaks that RHEL suggests. In
> particular, it was the "network.remote-dio=on" setting that made all the
> difference. Almost 10x faster.
>

Note that network.remote-dio is not the recommended configuration
for ovirt, in particular if on hyperconverge setup when it can be harmful
by delaying sanlock I/O.

https://github.com/oVirt/ovirt-site/blob/4a9b28aac48870343c5ea4d1e83a63c10a1cbfa2/source/documentation/admin-guide/chap-Working_with_Gluster_Storage.md#options-set-on-gluster-storage-volumes-to-store-virtual-machine-images
(Patch in discussion)

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6RIRLB6KIDEBYSIJCRJQFZ4RVHOHQAC7/

[ovirt-users] Re: CentOS 6 in oVirt

2019-09-15 Thread Nir Soffer

On Wed, Sep 11, 2019, 10:07 Dominik Holler  wrote:

> Hello,
> are the official CentOS 6 cloud images known to work in oVirt?
> I tried, but they seem not to work in oVirt [1], but on plain libvirt.
> Are there any pitfalls known during using them in oVirt?
>

Why do you want to use prehisroric images?

Thanks
> Dominik
>
>
> [1]
>   https://gerrit.ovirt.org/#/c/101117/
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TN5SEECVFGCCMQ2QACYYEZXNLFGBYDZH/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YGC3UXJMWGJZAJ3RVJ3PTL3MGKMLHDGD/

[ovirt-users] Re: Failure while import VM from export domain

2019-09-07 Thread Nir Soffer

On Fri, Sep 6, 2019, 21:48 Timmi  wrote:

> Hi oVirt list,
>
> I'm planing to migrate my oVirt installation on a new HW.
> My current platform is running a 4.2.8. The new one a 4.3.5.
>
> I'm currently trying to import the VMs through the export domain. But
> some of my old VMs are failing.
> Which log file would be interesting to check?
>

vdsm log should help


> Also maybe there is a better way to copy more less all VMs from my
> current oVirt to the new one.
>


Yes, detach the domain from old setup, attach to new setup, and import all
the vms. This works when you want to move all vms in a storage domain. If
you want to keep some vms in the old setup you can move them to another
storage domain before detaching the domain.

Nir


> Best regards
> Christoph
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/72KKG3IBEYOBNMEDPKCXSVUVTDSLBODS/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/L5GORDOZRA3TTVIND5D62VQPMP7HHZJF/

[ovirt-users] Re: Stuck in "Finalizing" disk upload phase

2019-07-17 Thread Nir Soffer

On Wed, Jul 17, 2019, 19:20 Vrgotic, Marko 
wrote:

> Dear oVIrt,
>
>
>
> I initiated upload of qcow2 disk image for Centos 6.5:
>
> It reached finalizing phase and than started throwing following errors:
>
>
>
> 2019-07-17 14:40:51,480Z INFO
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-86)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Finalizing successful transfer for
> Upload disk 'av-07-centos-65-base' (disk id:
> '3452459d-aec6-430e-9509-1d9ca815b2d8', image id:
> 'b44659a9-607a-4eeb-a255-99532fd4fce4')
>
> 2019-07-17 14:40:51,480Z WARN
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-86)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Failed to stop image transfer
> session. Ticket does not exist for image
> '3452459d-aec6-430e-9509-1d9ca815b2d8'
>

Old versions of ovirt-imageio-daemon were failing to remove a ticket if the
ticket does not exist, and engine did not handle this well.

This was fixed in 4.2. Are you running the latest version on the hosts?

Please update ovirt-imageio-daemon to latest version.

Nir

2019-07-17 14:41:01,572Z INFO
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-19)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Finalizing successful transfer for
> Upload disk 'av-07-centos-65-base' (disk id:
> '3452459d-aec6-430e-9509-1d9ca815b2d8', image id:
> 'b44659a9-607a-4eeb-a255-99532fd4fce4')
>
> 2019-07-17 14:41:01,574Z WARN
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-19)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Failed to stop image transfer
> session. Ticket does not exist for image
> '3452459d-aec6-430e-9509-1d9ca815b2d8'
>
> 2019-07-17 14:41:11,690Z INFO
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-7)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Finalizing successful transfer for
> Upload disk 'av-07-centos-65-base' (disk id:
> '3452459d-aec6-430e-9509-1d9ca815b2d8', image id:
> 'b44659a9-607a-4eeb-a255-99532fd4fce4')
>
> 2019-07-17 14:41:11,690Z WARN
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-7)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Failed to stop image transfer
> session. Ticket does not exist for image
> '3452459d-aec6-430e-9509-1d9ca815b2d8'
>
> 2019-07-17 14:41:21,781Z INFO
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-12)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Finalizing successful transfer for
> Upload disk 'av-07-centos-65-base' (disk id:
> '3452459d-aec6-430e-9509-1d9ca815b2d8', image id:
> 'b44659a9-607a-4eeb-a255-99532fd4fce4')
>
> 2019-07-17 14:41:21,782Z WARN
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-12)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Failed to stop image transfer
> session. Ticket does not exist for image
> '3452459d-aec6-430e-9509-1d9ca815b2d8'
>
>
>
> I can not cancel it, can not stop it, not via UI not via force option
> using ovirt_disk module.
>
>
>
> Help!
>
>
>
> oVIrt 4.3.4.3-1 version running with CentOS 7.6 Hosts.
>
>
>
> Kindly awaiting your reply.
>
>
>
>
>
> — — —
> Met vriendelijke groet / Kind regards,
>
> *Marko Vrgotic*
>
> ActiveVideo
>
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LJPWK5A3346ZCDSWEAG6WU3JLRXEGX22/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZUVWDVKQVDK36D2W2GCZK5ETGYICJXCZ/

[ovirt-users] Re: Stuck in "Finalizing" disk upload phase

2019-07-18 Thread Nir Soffer

On Thu, Jul 18, 2019 at 10:49 AM Vrgotic, Marko 
wrote:

> Dear Nir,
>
>
>
> None of my Hosts has any updated left to be added, they are as up to date
> as they can be.
>
> The imageio packages version installed is:
>
>
>
> Ovirt-imageio-common-1.5.1-0.el7.x86_64
>
> Ovirt-imageio-daemon-1.5.1-0.el7.x86_64
>

We need logs to understand the issue.

Can you share logs from the time the upload was started?
- engine log (/var/log/ovirt-engine/engine.log)
- vdsm log on the host that performed the upload (/var/log/vdsm/vdsm.log)
- daemon logs on that host (/var/log/ovirt-imageio-daemon/daemon.log)

To locate the right host you can grep for the transfer uuid that should be
mentioned in engine logs.


>
>
> Additional software info from Host
>
> OS Version:
>
> RHEL - 7 - 6.1810.2.el7.centos
>
> OS Description:
>
> CentOS Linux 7 (Core)
>
> Kernel Version:
>
> 3.10.0 - 957.21.3.el7.x86_64
>
> KVM Version:
>
> 2.12.0 - 18.el7_6.5.1
>
> LIBVIRT Version:
>
> libvirt-4.5.0-10.el7_6.12
>
> VDSM Version:
>
> vdsm-4.30.17-1.el7
>
> SPICE Version:
>
> 0.14.0 - 6.el7_6.1
>
> GlusterFS Version:
>
> [N/A]
>
> CEPH Version:
>
> librbd1-10.2.5-4.el7
>
> Open vSwitch Version:
>
> openvswitch-2.10.1-3.el7
>
> Kernel Features:
>
> PTI: 1, IBRS: 0, RETP: 1, SSBD: 3
>
>
>
> *From: *"Vrgotic, Marko" 
> *Date: *Thursday, 18 July 2019 at 08:03
> *To: *Nir Soffer 
> *Cc: *users , Daniel Erez 
> *Subject: *Re: [ovirt-users] Stuck in "Finalizing" disk upload phase
>
>
>
> Hi Nir,
>
>
>
> Sure, i will check.
>
>
>
> Is there a way for Adminstrator to view tickets or close them by force?
>
> Sent from my iPhone
>
>
> On 18 Jul 2019, at 00:05, Nir Soffer  wrote:
>
>
>
> On Wed, Jul 17, 2019, 19:20 Vrgotic, Marko 
> wrote:
>
> Dear oVIrt,
>
>
>
> I initiated upload of qcow2 disk image for Centos 6.5:
>
> It reached finalizing phase and than started throwing following errors:
>
>
>
> 2019-07-17 14:40:51,480Z INFO
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-86)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Finalizing successful transfer for
> Upload disk 'av-07-centos-65-base' (disk id:
> '3452459d-aec6-430e-9509-1d9ca815b2d8', image id:
> 'b44659a9-607a-4eeb-a255-99532fd4fce4')
>
> 2019-07-17 14:40:51,480Z WARN
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-86)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Failed to stop image transfer
> session. Ticket does not exist for image
> '3452459d-aec6-430e-9509-1d9ca815b2d8'
>
>
>
> Old versions of ovirt-imageio-daemon were failing to remove a ticket if
> the ticket does not exist, and engine did not handle this well.
>
>
>
> This was fixed in 4.2. Are you running the latest version on the hosts?
>
>
>
> Please update ovirt-imageio-daemon to latest version.
>
>
>
> Nir
>
>
>
> 2019-07-17 14:41:01,572Z INFO
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-19)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Finalizing successful transfer for
> Upload disk 'av-07-centos-65-base' (disk id:
> '3452459d-aec6-430e-9509-1d9ca815b2d8', image id:
> 'b44659a9-607a-4eeb-a255-99532fd4fce4')
>
> 2019-07-17 14:41:01,574Z WARN
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-19)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Failed to stop image transfer
> session. Ticket does not exist for image
> '3452459d-aec6-430e-9509-1d9ca815b2d8'
>
> 2019-07-17 14:41:11,690Z INFO
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-7)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Finalizing successful transfer for
> Upload disk 'av-07-centos-65-base' (disk id:
> '3452459d-aec6-430e-9509-1d9ca815b2d8', image id:
> 'b44659a9-607a-4eeb-a255-99532fd4fce4')
>
> 2019-07-17 14:41:11,690Z WARN
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-7)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Failed to stop image transfer
> session. Ticket does not exist for image
> '3452459d-aec6-430e-9509-1d9ca815b2d8'
>
> 2019-07-17 14:41:21,781Z INFO
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-12)
> [a43180ec-afc7-429e-9f30-9e851e

[ovirt-users] Re: Stuck in "Finalizing" disk upload phase

2019-07-18 Thread Nir Soffer

On Thu, Jul 18, 2019 at 1:44 PM Vrgotic, Marko 
wrote:

> Hi Nir,
>
>
>
> Sure: here is the ovirt-engine/engine.log related to transaction:
>
...

> 2019-07-17 13:36:11,394Z INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand]
> (default task-376) [a43180ec-afc7-429e-9f30-9e851eaf7ce7] START,
> AddImageTicketVDSCommand(HostName = ovirt-hv-01.avinity.tv,
> AddImageTicketVDSCommandParameters:{hostId='e7e3f1dc-8037-4e74-a44c-442bdb02197d',
> ticketId='10096b11-d10d-43aa-ad16-668b29a8c152', timeout='300',
> operations='[write]', size='47244640256',
> url='file:///rhev/data-center/mnt/172.17.28.5:_ovirt__production/644aacaa-12e1-4fcd-b3aa-941678cf95bd/images/3452459d-aec6-430e-9509-1d9ca815b2d8/b44659a9-607a-4eeb-a255-99532fd4fce4',
> filename='null', sparse='true', 
> transferId='28eda0c2-e36b-4e70-91ea-2ecf4a030d19'}),
> log id: 3bf6b943
>

The transfer uuid is mentioned in this log:

transferId='28eda0c2-e36b-4e70-91ea-2ecf4a030d19'

Please attach complete engine.log since this line.

You can see that the host used for upload was ovirt-hv-01.avinity.tv.
Please attach vdsm logs
from this host from the time this ticket was added.

Please attach imageio daemon logs from the same time on this host.

— — —
> Met vriendelijke groet / Kind regards,
>
> *Marko Vrgotic*
>
>
>
>
>
> *From: *Nir Soffer 
> *Date: *Thursday, 18 July 2019 at 12:20
> *To: *"Vrgotic, Marko" 
> *Cc: *users , Daniel Erez 
> *Subject: *Re: [ovirt-users] Stuck in "Finalizing" disk upload phase
>
>
>
> On Thu, Jul 18, 2019 at 10:49 AM Vrgotic, Marko 
> wrote:
>
> Dear Nir,
>
>
>
> None of my Hosts has any updated left to be added, they are as up to date
> as they can be.
>
> The imageio packages version installed is:
>
>
>
> Ovirt-imageio-common-1.5.1-0.el7.x86_64
>
> Ovirt-imageio-daemon-1.5.1-0.el7.x86_64
>
>
>
> We need logs to understand the issue.
>
>
>
> Can you share logs from the time the upload was started?
>
> - engine log (/var/log/ovirt-engine/engine.log)
>
> - vdsm log on the host that performed the upload (/var/log/vdsm/vdsm.log)
>
> - daemon logs on that host (/var/log/ovirt-imageio-daemon/daemon.log)
>
>
>
> To locate the right host you can grep for the transfer uuid that should be
> mentioned in engine logs.
>
>
>
>
>
> Additional software info from Host
>
> OS Version:
>
> RHEL - 7 - 6.1810.2.el7.centos
>
> OS Description:
>
> CentOS Linux 7 (Core)
>
> Kernel Version:
>
> 3.10.0 - 957.21.3.el7.x86_64
>
> KVM Version:
>
> 2.12.0 - 18.el7_6.5.1
>
> LIBVIRT Version:
>
> libvirt-4.5.0-10.el7_6.12
>
> VDSM Version:
>
> vdsm-4.30.17-1.el7
>
> SPICE Version:
>
> 0.14.0 - 6.el7_6.1
>
> GlusterFS Version:
>
> [N/A]
>
> CEPH Version:
>
> librbd1-10.2.5-4.el7
>
> Open vSwitch Version:
>
> openvswitch-2.10.1-3.el7
>
> Kernel Features:
>
> PTI: 1, IBRS: 0, RETP: 1, SSBD: 3
>
>
>
> *From: *"Vrgotic, Marko" 
> *Date: *Thursday, 18 July 2019 at 08:03
> *To: *Nir Soffer 
> *Cc: *users , Daniel Erez 
> *Subject: *Re: [ovirt-users] Stuck in "Finalizing" disk upload phase
>
>
>
> Hi Nir,
>
>
>
> Sure, i will check.
>
>
>
> Is there a way for Adminstrator to view tickets or close them by force?
>
> Sent from my iPhone
>
>
> On 18 Jul 2019, at 00:05, Nir Soffer  wrote:
>
>
>
> On Wed, Jul 17, 2019, 19:20 Vrgotic, Marko 
> wrote:
>
> Dear oVIrt,
>
>
>
> I initiated upload of qcow2 disk image for Centos 6.5:
>
> It reached finalizing phase and than started throwing following errors:
>
>
>
> 2019-07-17 14:40:51,480Z INFO
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-86)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Finalizing successful transfer for
> Upload disk 'av-07-centos-65-base' (disk id:
> '3452459d-aec6-430e-9509-1d9ca815b2d8', image id:
> 'b44659a9-607a-4eeb-a255-99532fd4fce4')
>
> 2019-07-17 14:40:51,480Z WARN
> [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-86)
> [a43180ec-afc7-429e-9f30-9e851eaf7ce7] Failed to stop image transfer
> session. Ticket does not exist for image
> '3452459d-aec6-430e-9509-1d9ca815b2d8'
>
>
>
> Old versions of ovirt-imageio-daemon were failing to remove a ticket if
> the ticket does not exist, and engine did not handle this well.
>
>
>
> This was fixed in 4.2. Are you running the latest version on the hosts?
>
>
>
> Please update ovirt-imageio-daemo

[ovirt-users] Re: [ANN] oVirt 4.3.6 is now generally available

2019-09-28 Thread Nir Soffer

On Sat, Sep 28, 2019 at 11:04 PM Rik Theys 
wrote:

> Hi Nir,
>
> Thank you for your time.
> On 9/27/19 4:27 PM, Nir Soffer wrote:
>
>
>
> On Fri, Sep 27, 2019, 12:37 Rik Theys  wrote:
>
>> Hi,
>>
>> After upgrading to 4.3.6, my storage domain can no longer be activated,
>> rendering my data center useless.
>>
>> My storage domain is local storage on a filesystem backed by VDO/LVM. It
>> seems 4.3.6 has added support for 4k storage.
>> My VDO does not have the 'emulate512' flag set.
>>
>
> This configuration is not supported before 4.3.6. Various operations may
> fail when
> reading or writing to storage.
>
> I was not aware of this when I set it up as I did not expect this to
> influence a setup where oVirt uses local storage (a file system location).
>
>
> 4.3.6 detects storage block size, creates compatible storage domain
> metadata, and
> consider the block size when accessing storage.
>
>
>> I've tried downgrading all packages on the host to the previous versions
>> (with ioprocess 1.2), but this does not seem to make any difference.
>>
>
> Downgrading should solve your issue, but without any logs we only guess.
>
> I was able to work around my issue by downgrading to ioprocess 1.1 (and
> vdsm-4.30.24). Downgrading to only 1.2 did not solve my issue. With
> ioprocess downgraded to 1.1, I did not have to downgrade the engine (still
> on 4.3.6).
>
ioprocess 1.1. is not recommended, you really want to use 1.3.0.

> I think I now have a better understanding what happened that triggered
> this.
>
> During a nightly yum-cron, the ioprocess and vdsm packages on the host
> were upgraded to 1.3 and vdsm 4.30.33. At this point, the engine log
> started to log:
>
> 2019-09-27 03:40:27,472+02 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand]
> (EE-ManagedThreadFactory-engine-Thread-384418) [695f38cc] Executing with
> domain map: {6bdf1a0d-274b-4195-8f
> f5-a5c002ea1a77=active}
> 2019-09-27 03:40:27,646+02 WARN
> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand]
> (EE-ManagedThreadFactory-engine-Thread-384418) [695f38cc] Unexpected return
> value: Status [code=348, message=Block size does not match storage block
> size: 'block_size=512, storage_block_size=4096']
>
This means that when activating the storage domain, vdsm detected that the
storage block size
is 4k, but the domain metadata reports block size of 512.

This combination may partly work for localfs domain since we don't use
sanlock with local storage,
and vdsm does not use direct I/O when writing to storage, and always use 4k
block size when
reading metadata from storage.

Note that with older ovirt-imageio < 1.5.2, image uploads and downloads may
fail when using 4k storage.
in recent ovirt-imageio we detect and use the correct block size.

> 2019-09-27 03:40:27,646+02 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand]
> (EE-ManagedThreadFactory-engine-Thread-384418) [695f38cc] FINISH,
> ConnectStoragePoolVDSCommand, return: , log id: 483c7a17
>
> I did not notice at first that this was a storage related issue and
> assumed it may get resolved by also upgrading the engine. So in the morning
> I upgraded the engine to 4.3.6 but this did not resolve my issue.
>
> I then found the above error in the engine log. In the release notes of
> 4.3.6 I read about the 4k support.
>
> I then downgraded ioprocess (and vdsm) to ioprocess 1.2 but that did also
> not solve my issue. This is when I contacted the list with my question.
>
> Afterwards I found in the ioprocess rpm changelog that (partial?) 4k
> support was also in 1.2. I kept on downgrading until I got ioprocess 1.1
> (without 4k support) and at this point I could re-attach my storage domain.
>
> You mention above that 4.3.6 will detect the block size and configure the
> metadata on the storage domain? I've checked the dom_md/metadata file and
> it shows:
>
> ALIGNMENT=1048576
> *BLOCK_SIZE=512*
> CLASS=Data
> DESCRIPTION=studvirt1-Local
> IOOPTIMEOUTSEC=10
> LEASERETRIES=3
> LEASETIMESEC=60
> LOCKPOLICY=
> LOCKRENEWALINTERVALSEC=5
> MASTER_VERSION=1
> POOL_DESCRIPTION=studvirt1-Local
> POOL_DOMAINS=6bdf1a0d-274b-4195-8ff5-a5c002ea1a77:Active
> POOL_SPM_ID=-1
> POOL_SPM_LVER=-1
> POOL_UUID=085f02e8-c3b4-4cef-a35c-e357a86eec0c
> REMOTE_PATH=/data/images
> ROLE=Master
> SDUUID=6bdf1a0d-274b-4195-8ff5-a5c002ea1a77
> TYPE=LOCALFS
> VERSION=5
> _SHA_CKSUM=9dde06bbc9f2316efc141565738ff32037b1ff66
>
So you have a v5 localfs storage domain - because we don't use leases, this
domain should work
with 4.3.6 if you modify this line in the domain metadata.

BLOCK_SIZE=4096

T

[ovirt-users] Re: [ANN] oVirt 4.3.6 is now generally available

2019-09-27 Thread Nir Soffer

On Fri, Sep 27, 2019, 12:37 Rik Theys  wrote:

> Hi,
>
> After upgrading to 4.3.6, my storage domain can no longer be activated,
> rendering my data center useless.
>
> My storage domain is local storage on a filesystem backed by VDO/LVM. It
> seems 4.3.6 has added support for 4k storage.
> My VDO does not have the 'emulate512' flag set.
>

This configuration is not supported before 4.3.6. Various operations may
fail when
reading or writing to storage.

4.3.6 detects storage block size, creates compatible storage domain
metadata, and
consider the block size when accessing storage.


> I've tried downgrading all packages on the host to the previous versions
> (with ioprocess 1.2), but this does not seem to make any difference.
>

Downgrading should solve your issue, but without any logs we only guess.


> Should I also downgrade the engine to 4.3.5 to get this to work again. I
> expected the downgrade of the host to be sufficient.
>
> As an alternative I guess I could enable the emulate512 flag on VDO but I
> can not find how to do this on an existing VDO volume. Is this possible?
>

Please share more data so we can understand the failure:

- complete vdsm log showing the failure to activate the domain
  - with 4.3.6
  - with 4.3.5 (after you downgraded
- contents of /rhev/data-center/mnt/_/domain-uuid/dom_md/metadata
  (assuming your local domain mount is /domaindir)
- engine db dump

Nir


>
> Regards,
> Rik
>
>
> On 9/26/19 4:58 PM, Sandro Bonazzola wrote:
>
> The oVirt Project is pleased to announce the general availability of oVirt
> 4.3.6 as of September 26th, 2019.
>
>
>
> This update is the sixth in a series of stabilization updates to the 4.3
> series.
>
>
>
> This release is available now on x86_64 architecture for:
>
> * Red Hat Enterprise Linux 7.7 or later (but < 8)
>
> * CentOS Linux (or similar) 7.7 or later (but < 8)
>
>
>
> This release supports Hypervisor Hosts on x86_64 and ppc64le architectures
> for:
>
> * Red Hat Enterprise Linux 7.7 or later (but < 8)
>
> * CentOS Linux (or similar) 7.7 or later (but < 8)
>
> * oVirt Node 4.3 (available for x86_64 only)
>
>
>
> Due to Fedora 28 being now at end of life this release is missing
> experimental tech preview for x86_64 and s390x architectures for Fedora 28.
>
> We are working on Fedora 29 and 30 support and we may re-introduce
> experimental support for Fedora in next release.
>
>
>
> See the release notes [1] for installation / upgrade instructions and a
> list of new features and bugs fixed.
>
>
>
> Notes:
>
> - oVirt Appliance is already available
>
> - oVirt Node is already available[2]
>
> oVirt Node and Appliance have been updated including:
>
> - oVirt 4.3.6: http://www.ovirt.org/release/4.3.6/
>
> - Wildfly 17.0.1:
> https://wildfly.org/news/2019/07/07/WildFly-1701-Released/
>
> - Latest CentOS 7.7 updates including:
>
>-
>
>Release for CentOS Linux 7 (1908) on the x86_64 Architecture
>
> 
>-
>
>CEBA-2019:2601 CentOS 7 NetworkManager BugFix Update
>
> 
>
>-
>
>CEBA-2019:2023 CentOS 7 efivar BugFix Update
>
> 
>-
>
>CEBA-2019:2614 CentOS 7 firewalld BugFix Update
>
> 
>-
>
>CEBA-2019:2227 CentOS 7 grubby BugFix Update
>
> 
>-
>
>CESA-2019:2258 Moderate CentOS 7 http-parser Security Update
>
> 
>-
>
>CESA-2019:2600 Important CentOS 7 kernel Security Update
>
> 
>-
>
>CEBA-2019:2599 CentOS 7 krb5 BugFix Update
>
> 
>-
>
>CEBA-2019:2358 CentOS 7 libguestfs BugFix Update
>
> 
>-
>
>CEBA-2019:2679 CentOS 7 libvirt BugFix Update
>
> 
>-
>
>CEBA-2019:2501 CentOS 7 rsyslog BugFix Update
>
> 
>-
>
>CEBA-2019:2355 CentOS 7 selinux-policy BugFix Update
>
> 
>-
>
>CEBA-2019:2612 CentOS 7 sg3_utils BugFix Update
>
> 
>-
>
>CEBA-2019:2602 CentOS 7 sos BugFix Update
>
> 
>
>-
>
>

[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-11-05 Thread Nir Soffer

On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver  wrote:
>
> Hi all,
>   I run an oVirt 4.3.6.7-1.el7 installation (50+ hosts, 40+ FC storage 
> domains on two all-flash arrays) and experienced a problem accessing single 
> storage domains.

What was the last change in the system? upgrade? network change? storage change?

> As a result, hosts were taken “not operational” because they could not see 
> all storage domains, SPM started to move around the hosts.

This is expected if some domain is not accessible on all hosts.

> oVirt messages start with:
>
> 2019-11-04 15:10:22.739+01 | VDSM HOST082 command SpmStatusVDS failed: (-202, 
> 'Sanlock resource read failure', 'IO timeout')

This means sanlock timed out renewing the lockspace

> 2019-11-04 15:13:58.836+01 | Host HOST017 cannot access the Storage Domain(s) 
> HOST_LUN_204 attached to the Data Center . Setting Host state to 
> Non-Operational.

If a host cannot access all storage domain in the DC, the system set
it to non-operational, and will
probably try to reconnect it later.

> 2019-11-04 15:15:14.145+01 | Storage domain HOST_LUN_221 experienced a high 
> latency of 9.60953 seconds from host HOST038. This may cause performance and 
> functional issues. Please consult your Storage Administrator.

This means reading 4k from start of the metadata lv took 9.6 seconds.
Something in
the way to storage is bad (kernel, network, storage).

> The problem mainly affected two storage domains (on the same array) but I 
> also saw single messages for other storage domains (one the other array as 
> well).
> Storage domains stayed available to the hosts, all VMs continued to run.

We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
when there are
no active paths, before I/O fails, pausing the VM. We also resume
paused VMs when
storage monitoring works again, so maybe the VM were paused and resumed.

However for storage monitoring we have strict 10 seconds timeout. If
reading from
the metadata lv times out or fail and does not operated normally after
5 minutes, the
domain will become inactive.

> When constantly reading from the storage domains (/bin/dd iflag=direct 
> if=  bs=4096 count=1 of=/dev/null) we got expected 20+ MBytes/s on 
> all but some storage domains. One of them showed “transfer rates” around 200 
> Bytes/s, but went up to normal performance from time to time. Transfer rate 
> to this domain was different between the hosts.

This can explain the read timeouts.

> /var/log/messages contain qla2xxx abort messages on almost all hosts. There 
> are no errors on SAN switches or storage array (but vendor is still 
> investigating). I did not see high load on the storage array.
> The system seemed to stabilize when I stopped all VMs on the affected storage 
> domain and this storage domain became “inactive”.

This looks the right way to troubleshoot this.

> Currently, this storage domain still is inactive and we cannot place it in 
> maintenance mode (“Failed to deactivate Storage Domain”) nor activate it.

We need vdsm logs to understand this failure.

> OVF Metadata seems to be corrupt as well (failed to update OVF disks , 
> OVF data isn't updated on those OVF stores).

This does not mean OVF is corrupted, only that we could not store new
data. The older data on the other
OVFSTORE disk is probably fine. Hopefuly the system will not try to
write to the other OVFSTORE disk
overwriting the last good version.

> The first six 512 byte blocks of /dev//metadata seem to contain only 
> zeros.

This is normal, the first 2048 bytes are always zeroes. This area was
used for domain
metadata in older versions.

> Any advice on how to proceed here?
>
> Is there a way to recover this storage domain?

Please share more details:

- output of "lsblk"
- output of "multipath -ll"
- output of "/usr/libexec/vdsm/fc-scan -v"
- output of "vgs -o +tags problem-domain-id"
- output of "lvs -o +tags problem-domain-id"
- contents of /etc/multipath.conf
- contents of /etc/multipath.conf.d/*.conf
- /var/log/messages since the issue started
- /var/log/vdsm/vdsm.log* since the issue started on one of the hosts

A bug is probably the best place to keep these logs and make it easy to trac.

Thanks,
Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CLVTY3WNCTYDT2P4PQWQBXVCBTB5DCGX/

[ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing

2019-11-21 Thread Nir Soffer

in. If this was selinux issue the permission denied issue
will disappear.
If this is the case please provide the output of:

ausearh -m AVC -ts today

If the issue still exists, we eliminated selinux, and you can enable it
again:

setenforce 1

Nir


> I even replaced one of the disks and healed , but the result is the same
>> for all my VMs.
>>
>
> Have you checked the permission for user/group are set correctly across
> all the bricks in the cluster?
> What does ls -la on the images directory from mount of the volume show you.
>
> Adding Krutika and Rafi as they ran into a similar issue in the past.
>
>
>> Best Regards,
>> Strahil Nikolov
>>
>>
>> В сряда, 20 ноември 2019 г., 18:17:18 ч. Гринуич+2, Strahil Nikolov <
>> hunter86...@yahoo.com> написа:
>>
>>
>> Hello All,
>>
>> my engine is back online , but I'm still having difficulties to make vdsm
>> powerup the systems.
>> I think that the events generated today can lead me to the right
>> direction(just an example , many more are there):
>>
>> VDSM ovirt3.localdomain command SpmStatusVDS failed: Cannot inquire
>> Lease(name='SDM',
>> path=u'/rhev/data-center/mnt/glusterSD/gluster1:_data__fast3/ecc3bf0e-8214-45c1-98a6-0afa642e591f/dom_md/leases',
>> offset=1048576): (2, 'Sanlock get hosts failure', 'No such file or
>> directory')
>>
>> I will try to collect a fresh log and see what is it complaining about
>> this time.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> >Hi Sahina,
>>
>> >I have a strange situation:
>> >1. When I try to access the file via 'sudo -u vdsm dd if=disk of=test
>> bs=4M' the command fails on aprox 60MB.
>> >2. If I run same command as root , remove the file and then run again
>> via vdsm user -> this time no i/o error reported.
>>
>> >My guess is that I need to check what's going on the bricks themselve ...
>>
>> >Best Regards,
>> >Strahil Nikolov
>>
>>
>> В вторник, 19 ноември 2019 г., 0:02:16 ч. Гринуич-5, Sahina Bose <
>> sab...@redhat.com> написа:
>>
>>
>>
>>
>> On Tue, Nov 19, 2019 at 10:10 AM Strahil Nikolov 
>> wrote:
>>
>> Hi Sahina,
>>
>> Sadly engine logs have no errors.
>> I've got only an I/O error, but in the debug of the vdsm I can clearly
>> see that "qemu-img" is giving an "OK".
>> During the upgrade I got some metadata files pending heal, but I have
>> recovered the conflict manually and should be OK.
>> Today I have defined one of the VMs manually (virsh define) and then
>> started it , but the issue is the same.
>> It seems to be storage-related issue,as VMs that are on specific domain
>> can be started , but most of my VMs are on the fast storage domains and
>> none of them can be started.
>>
>> After the gluster snapshot restore , the engine is having issues and I
>> have to separately investigate that (as I poweroff my HostedEngine before
>> creating the snapshot).
>>
>> The logs can be find at :
>> https://drive.google.com/open?id=1VAZFZWWrpimDeVuZT0sWFVXy76scr4NM
>>
>>
>> Any ideas where to look at , as I can definitely read (using "dd if=disk"
>> or qemu-img info) the disks of the rhel7 VM ?
>>
>>
>> The vdsm logs have this:
>> 2019-11-17 10:21:23,892+0200 INFO  (libvirt/events) [virt.vm]
>> (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') abnormal vm stop device
>> ua-94f763e9-fd96-4bee-a6b2-31af841a918b error eother (vm:5075)
>> 2019-11-17 10:21:23,892+0200 INFO  (libvirt/events) [virt.vm]
>> (vmId='b3c4d84a-9784-470c-b70e-7ad7cc45e913') CPU stopped: onIOError
>> (vm:6062)
>> 2019-11-17 10:21:23,893+0200 DEBUG (libvirt/events)
>> [jsonrpc.Notification] Sending event {"params": {"notify_time": 4356025830,
>> "b3c4d84a-9784-470c-b70e-7ad7cc45e913": {"status": "WaitForLaunch",
>> "ioerror": {"alias": "ua-94f763e9-fd96-4bee-a6b2-31af841a918b", "name":
>> "sda", "path":
>> "/rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b/5b1d3113-5cca-4582-9029-634b16338a2f"},
>> "pauseCode": "EOTHER"}}, "jsonrpc": "2.0", "method":
>> "|virt|VM_status|b3c4d84a-9784-470c-b70e-7ad7cc45e913"} (__init__:181)
>>
>> Can you check the permissions of the file
>> /rhev/data-center/mnt/glusterSD/gluster1:_data__fast/

[ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing

2019-11-22 Thread Nir Soffer

On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov 
wrote:

> On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose  wrote:
>
>
>
> On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov 
> wrote:
>
> Hi All,
>
> another clue in the logs :
> [2019-11-21 00:29:50.536631] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (----) [Permission denied]
> [2019-11-21 00:29:50.536798] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (----) [Permission denied]
> [2019-11-21 00:29:50.536959] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (----) [Permission denied]
> [2019-11-21 00:29:50.537007] E [MSGID: 133010]
> [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
> shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae
> [Permission denied]
> [2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk]
> 0-glusterfs-fuse: 12458: READ => -1
> gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission
> denied)
> [2019-11-21 00:30:01.177665] I [MSGID: 133022]
> [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of
> gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend
> [2019-11-21 00:30:13.132756] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
> remote operation failed. Path:
> /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79
> (----) [Permission denied]
> [2019-11-21 00:30:13.132824] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
> remote operation failed. Path:
> /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79
> (----) [Permission denied]
> [2019-11-21 00:30:13.133217] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
> remote operation failed. Path:
> /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79
> (----) [Permission denied]
> [2019-11-21 00:30:13.133238] E [MSGID: 133010]
> [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
> shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c
> [Permission denied]
> [2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk]
> 0-glusterfs-fuse: 12660: READ => -1
> gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission
> denied)
> [2019-11-21 00:30:38.489449] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
> remote operation failed. Path:
> /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6
> (----) [Permission denied]
> [2019-11-21 00:30:38.489520] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
> remote operation failed. Path:
> /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6
> (----) [Permission denied]
> [2019-11-21 00:30:38.489669] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
> remote operation failed. Path:
> /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6
> (----) [Permission denied]
> [2019-11-21 00:30:38.489717] E [MSGID: 133010]
> [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
> shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4
> [Permission denied]
> [2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk]
> 0-glusterfs-fuse: 12928: READ => -1
> gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission
> denied)
>
>
> Anyone got an idea why is it happening?
> I checked user/group and selinux permissions - all OK
>
>
> >Can you share the commands (and output) used to check this?
> I first thought that the file is cached in memory and that's why vdsm user
> can read the file , but the following shows opposite:
>
> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# ll
> total 562145
> -rw-rw. 1 vdsm kvm 5368709120 Nov 12 23:29
> 5b1d3113-5cca-4582-9029-634b16338a2f
> -rw-rw. 1 vdsm kvm1048576 Nov 11 14:11
> 5b1d3113-5cca-4582-9029-634b16338a2f.lease
> -rw-r--r--. 1 vdsm kvm313 Nov 11 14:11
> 5b1d3113-5cca-4582-9029-634b16338a2f.meta
> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd
>
> /rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b
> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 >
> /proc/sys/vm/drop_caches
>

I would use iflag=direct

[ovirt-users] Re: Possible sources of cpu steal and countermeasures

2019-12-04 Thread Nir Soffer

On Wed, Dec 4, 2019 at 6:15 PM  wrote:
>
> Hi,
>
> I'm having performance issues with a ovirt installation. It is showing
> high steal (5-10%) for a cpu intensive VM. The hypervisor however has
> more than 65% of his resources idle while the steal is seen inside of
> the VM.
>
> Even when placing only a single VM on a hypervisor it still receives
> steal (0-2%), even though the hypervisor is not overcommited.
>
>
> Hypervisor:
>
> 2 Socket system in total 2*28(56HT) cores
>
>
> VM:
>
> 30vCPUs (ovirt seems to think its a good idea to make that 15 sockets *
> 2 cores)

I think you can control this in oVirt.

> My questions are:
>
> a) Could it be that the hypervisor is trying to schedule all 30 cores on
> a single numa node, ie using the HT cores instead of "real" ones and
> this shows up as steal?
>
> b) Do I need to make VMs this big numa-aware and spread the vm over both
> numa nodes?
>
> c) Would using the High Performance VM type help in this kind of situation?
>
> d) General advise: how do I reduce steal in an environment where the
> hypervisor has idle resources
>
>
> Any advise would be appreciated.

These questions are mainly about qemu, so adding qemu-discuss.

I think it will help if you share your vm qemu command line, found in:
/var/log/libvit/qemu/vm-name.log

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4RBWYLKGCXAGOSC7FM3UMPE5T3JHOQKV/

[ovirt-users] Re: Still having NFS issues. (Permissions)

2019-12-12 Thread Nir Soffer

On Thu, Dec 12, 2019 at 6:36 PM Milan Zamazal  wrote:
>
> Strahil  writes:
>
> > Why do you use  'all_squash' ?
> >
> > all_squashMap all uids and gids to the anonymous user. Useful for
> > NFS-exported public FTP directories, news spool directories, etc. The
> > opposite option is no_all_squash, which is the default setting.
>
> AFAIK all_squash,anonuid=36,anongid=36 is the recommended NFS setting
> for oVirt and the only one guaranteed to work.

Any user which is not vdsm or in group kvm should not have access to
storage, so all_squash is not needed.

anonuid=36,anongid=36 is required only for root_squash, I think because libvirt
is accessing storage as root.

We probably need to add libvirt to kvm group like we do with sanlock,
so we don't
have to allow root access to storage. This how we allow sanlock access to vdsm
managed storage.

> Regards,
> Milan
>
> > Best Regards,
> > Strahil NikolovOn Dec 10, 2019 07:46, Tony Brian Albers  wrote:
> >>
> >> On Mon, 2019-12-09 at 18:43 +, Robert Webb wrote:
> >> > To add, the 757 permission does not need to be on the .lease or the
> >> > .meta files.
> >> >
> >> > https://lists.ovirt.org/archives/list/users@ovirt.org/message/KZF6RCSRW2QV3PUEJCJW5DZ54DLAOGAA/
> >>
> >> Good morning,
> >>
> >> Check SELinux just in case.
> >>
> >> Here's my config:
> >>
> >> NFS server:
> >> /etc/exports:
> >> /data/ovirt
> >> *(rw,sync,no_subtree_check,all_squash,anonuid=36,anongid=36)
> >>
> >> Folder:
> >> [root@kst001 ~]# ls -ld /data/ovirt
> >> drwxr-xr-x 3 vdsm kvm 76 Jun  1  2017 /data/ovirt
> >>
> >> Subfolders:
> >> [root@kst001 ~]# ls -l /data/ovirt/*
> >> -rwxr-xr-x 1 vdsm kvm  0 Dec 10 06:38 /data/ovirt/__DIRECT_IO_TEST__
> >>
> >> /data/ovirt/a597d0aa-bf22-47a3-a8a3-e5cecf3e20e0:
> >> total 4
> >> drwxr-xr-x  2 vdsm kvm  117 Jun  1  2017 dom_md
> >> drwxr-xr-x 56 vdsm kvm 4096 Dec  2 14:51 images
> >> drwxr-xr-x  4 vdsm kvm   42 Jun  1  2017 master
> >> [root@kst001 ~]#
> >>
> >>
> >> The user:
> >> [root@kst001 ~]# id vdsm
> >> uid=36(vdsm) gid=36(kvm) groups=36(kvm)
> >> [root@kst001 ~]#
> >>
> >> And output from 'mount' on a host:
> >> kst001:/data/ovirt on /rhev/data-center/mnt/kst001:_data_ovirt type nfs
> >> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nolock,
> >> nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr= >> server-
> >> ip>,mountvers=3,mountport=20048,mountproto=udp,local_lock=all,addr= >> -server-ip>)
> >>
> >>
> >> HTH
> >>
> >> /tony
> >> ___
> >> Users mailing list -- users@ovirt.org
> >> To unsubscribe send an email to users-le...@ovirt.org
> >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> >> oVirt Code of Conduct: 
> >> https://www.ovirt.org/community/about/community-guidelines/
> >> List Archives:
> >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/T6S32XNRB6S67PH5TOZZ6ZAD6KMVA3G6/
> > ___
> > Users mailing list -- users@ovirt.org
> > To unsubscribe send an email to users-le...@ovirt.org
> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> > oVirt Code of Conduct: 
> > https://www.ovirt.org/community/about/community-guidelines/
> > List Archives:
> > https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z5XPTK5B4KTITNDRFKR3C7TQYUXQTC4A/
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TSSPIUYPPGSAS5TUV3GUWMWNIGGIB2NF/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CO4UFLVDTSLO5S3XPA4PYXG3OGUSHSVP/

[ovirt-users] Re: Still having NFS issues. (Permissions)

2019-12-12 Thread Nir Soffer

On Tue, Dec 10, 2019 at 4:35 PM Robert Webb  wrote:

...
> >https://ovirt.org/develop/troubleshooting-nfs-storage-issues.html
> >
> >Generally speaking:
> >
> >Files there are created by vdsm (vdsmd), but are used (when running VMs)
> >by qemu. So both of them need access.
>
> So the link to the NFS storage troubleshooting page is where I found that the 
> perms needed to be 755.

I think this is an error in the troubleshooting page. There is no
reason to allow access to
other users except vdsm:kvm.

...
> Like this:
>
> drwxr-xr-x+ 2 vdsm kvm4096 Dec 10 09:03 .
> drwxr-xr-x+ 3 vdsm kvm4096 Dec 10 09:02 ..
> -rw-rw  1 vdsm kvm 53687091200 Dec 10 09:02 
> 5a514067-82fb-42f9-b436-f8f93883fe27
> -rw-rw  1 vdsm kvm 1048576 Dec 10 09:03 
> 5a514067-82fb-42f9-b436-f8f93883fe27.lease
> -rw-r--r--  1 vdsm kvm 298 Dec 10 09:03 
> 5a514067-82fb-42f9-b436-f8f93883fe27.meta
>
>
> So, with all that said, I cleaned everything up and my directory permissions 
> look like what Tony posted for his. I have added in his export options to my 
> setup and rebooted my host.
>
> I created a new VM from scratch and the files under images now look like this:
>
> drwxr-xr-x+ 2 vdsm kvm4096 Dec 10 09:03 .
> drwxr-xr-x+ 3 vdsm kvm4096 Dec 10 09:02 ..
> -rw-rw  1 vdsm kvm 53687091200 Dec 10 09:02 
> 5a514067-82fb-42f9-b436-f8f93883fe27
> -rw-rw  1 vdsm kvm 1048576 Dec 10 09:03 
> 5a514067-82fb-42f9-b436-f8f93883fe27.lease
> -rw-r--r--  1 vdsm kvm 298 Dec 10 09:03 
> 5a514067-82fb-42f9-b436-f8f93883fe27.meta
>
>
> Still not the 755 as expected,

It is not expected, the permissions look normal.

These are the permissions used for volumes on file based storage:

lib/vdsm/storage/constants.py:FILE_VOLUME_PERMISSIONS = 0o660

but I am guessing with the addition of the "anonuid=36,anongid=36" to
the exports, everything is now working as expected. The VM will boot
and run as expected. There was nothing in the any of the documentation
which alluded to possibly needed the additional options in the NFS
export options.

I this is a libvirt issue, it tries to access volumes as root, and
without anonuid=36,anongid=36
it will be squashed to nobody and fail.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/D6MXQGZB2SHJ2WCKBWYXD5CQ2WBJGT5B/

[ovirt-users] Re: Still having NFS issues. (Permissions)

2019-12-12 Thread Nir Soffer

On Fri, Dec 13, 2019 at 1:39 AM Nir Soffer  wrote:
>
> On Tue, Dec 10, 2019 at 4:35 PM Robert Webb  wrote:
>
> ...
> > >https://ovirt.org/develop/troubleshooting-nfs-storage-issues.html
> > >
> > >Generally speaking:
> > >
> > >Files there are created by vdsm (vdsmd), but are used (when running VMs)
> > >by qemu. So both of them need access.
> >
> > So the link to the NFS storage troubleshooting page is where I found that 
> > the perms needed to be 755.
>
> I think this is an error in the troubleshooting page. There is no
> reason to allow access to
> other users except vdsm:kvm.

The page mentions other daemons:

>> In principle, the user vdsm, with uid 36 and gid 36, must have read and 
>> write permissions on
>> all NFS exports. However, some daemons on the hypervisor hosts (for example, 
>> sanlock)
>> use a different uid but need access to the directory too.

But other daemon that should have access to vdsm storage are in the
kvm group (vdsm configure
this during installation):

$ id sanlock
uid=179(sanlock) gid=179(sanlock) groups=179(sanlock),6(disk),36(kvm),107(qemu)

> ...
> > Like this:
> >
> > drwxr-xr-x+ 2 vdsm kvm4096 Dec 10 09:03 .
> > drwxr-xr-x+ 3 vdsm kvm4096 Dec 10 09:02 ..
> > -rw-rw  1 vdsm kvm 53687091200 Dec 10 09:02 
> > 5a514067-82fb-42f9-b436-f8f93883fe27
> > -rw-rw  1 vdsm kvm 1048576 Dec 10 09:03 
> > 5a514067-82fb-42f9-b436-f8f93883fe27.lease
> > -rw-r--r--  1 vdsm kvm 298 Dec 10 09:03 
> > 5a514067-82fb-42f9-b436-f8f93883fe27.meta
> >
> >
> > So, with all that said, I cleaned everything up and my directory 
> > permissions look like what Tony posted for his. I have added in his export 
> > options to my setup and rebooted my host.
> >
> > I created a new VM from scratch and the files under images now look like 
> > this:
> >
> > drwxr-xr-x+ 2 vdsm kvm4096 Dec 10 09:03 .
> > drwxr-xr-x+ 3 vdsm kvm4096 Dec 10 09:02 ..
> > -rw-rw  1 vdsm kvm 53687091200 Dec 10 09:02 
> > 5a514067-82fb-42f9-b436-f8f93883fe27
> > -rw-rw  1 vdsm kvm 1048576 Dec 10 09:03 
> > 5a514067-82fb-42f9-b436-f8f93883fe27.lease
> > -rw-r--r--  1 vdsm kvm 298 Dec 10 09:03 
> > 5a514067-82fb-42f9-b436-f8f93883fe27.meta
> >
> >
> > Still not the 755 as expected,
>
> It is not expected, the permissions look normal.
>
> These are the permissions used for volumes on file based storage:
>
> lib/vdsm/storage/constants.py:FILE_VOLUME_PERMISSIONS = 0o660
>
> but I am guessing with the addition of the "anonuid=36,anongid=36" to
> the exports, everything is now working as expected. The VM will boot
> and run as expected. There was nothing in the any of the documentation
> which alluded to possibly needed the additional options in the NFS
> export options.
>
> I this is a libvirt issue, it tries to access volumes as root, and
> without anonuid=36,anongid=36
> it will be squashed to nobody and fail.
>
> Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3KZII244RKMFPKSYD5WJ47IES4XLT2LI/

[ovirt-users] Re: Managed Block Storage: ceph detach_volume failing after migration

2019-09-25 Thread Nir Soffer

On Wed, Sep 25, 2019 at 8:02 PM Dan Poltawski 
wrote:

> Hi,
>
> On Wed, 2019-09-25 at 15:42 +0300, Amit Bawer wrote:
> > According to resolution of [1] it's a multipathd/udev configuration
> > issue. Could be worth to track this issue.
> >
> > [1] https://tracker.ceph.com/issues/12763
>
> Thanks, that certainly looks like a smoking gun to me, in the logs:
>
> Sep 25 12:27:45 mario multipathd: rbd29: add path (uevent)
> Sep 25 12:27:45 mario multipathd: rbd29: spurious uevent, path already
> in pathvec
> Sep 25 12:27:45 mario multipathd: rbd29: HDIO_GETGEO failed with 25
> Sep 25 12:27:45 mario multipathd: rbd29: failed to get path uid
> Sep 25 12:27:45 mario multipathd: uevent trigger error
>

Please file oVirt bug. Vdsm manages multipath configuration and I don't
think we have
a blacklist for rbd devices.

If this is the issue, you can fix this locally by installing a multipath
drop-in configuration:

# cat /etc/multipath.conf.d/rbd.conf
blacklist {
   devnode "^(rbd)[0-9]*"
}

Vdsm should include this configuration in /etc/multipath.conf that vdsm
manages.

Nir



>
>
> Dan
>
> >
> > On Wed, Sep 25, 2019 at 3:18 PM Dan Poltawski <
> > dan.poltaw...@tnp.net.uk> wrote:
> > > On ovirt 4.3.5 we are seeing various problems related to the rbd
> > > device staying mapped after a guest has been live migrated. This
> > > causes problems migrating the guest back, as well as rebooting the
> > > guest when it starts back up on the original host. The error
> > > returned is ‘nrbd: unmap failed: (16) Device or resource busy’.
> > > I’ve pasted the full vdsm log below.
> > >
> > > As far as I can tell this isn’t happening 100% of the time, and
> > > seems to be more prevalent on busy guests.
> > >
> > > (Not sure if I should create a bug for this, so thought I’d start
> > > here first)
> > >
> > > Thanks,
> > >
> > > Dan
> > >
> > >
> > > Sep 24 19:26:18 mario vdsm[5485]: ERROR FINISH detach_volume
> > > error=Managed Volume Helper failed.: ('Error executing helper:
> > > Command [\'/usr/libexec/vdsm/managedvolume-helper\', \'detach\']
> > > failed with rc=1 out=\'\' err=\'oslo.privsep.daemon: Running
> > > privsep helper: [\\\'sudo\\\', \\\'privsep-helper\\\', \\\'
> > > --privsep_context\\\', \\\'os_brick.privileged.default\\\', \\\'
> > > --privsep_sock_path\\\',
> > > \\\'/tmp/tmptQzb10/privsep.sock\\\']\\noslo.privsep.daemon: Spawned
> > > new privsep daemon via rootwrap\\noslo.privsep.daemon: privsep
> > > daemon starting\\noslo.privsep.daemon: privsep process running with
> > > uid/gid: 0/0\\noslo.privsep.daemon: privsep process running with
> > > capabilities (eff/prm/inh):
> > > CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon: privsep
> > > daemon running as pid 76076\\nTraceback (most recent call
> > > last):\\n  File "/usr/libexec/vdsm/managedvolume-helper", line 154,
> > > in \\nsys.exit(main(sys.argv[1:]))\\n  File
> > > "/usr/libexec/vdsm/managedvolume-helper", line 77, in main\\n
> > > args.command(args)\\n  File "/usr/libexec/vdsm/managedvolume-
> > > helper", line 149, in detach\\nignore_errors=False)\\n  File
> > > "/usr/lib/python2.7/site-packages/vdsm/storage/nos_brick.py", line
> > > 121, in disconnect_volume\\nrun_as_root=True)\\n  File
> > > "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52,
> > > in _execute\\nresult = self.__execute(*args, **kwargs)\\n  File
> > > "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py",
> > > line 169, in execute\\nreturn execute_root(*cmd, **kwargs)\\n
> > > File "/usr/lib/python2.7/site-
> > > packages/oslo_privsep/priv_context.py",  line 241, in _wrap\\n
> > > return self.channel.remote_call(name, args, kwargs)\\n  File
> > > "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line
> > > 203, in remote_call\\nraise
> > > exc_type(*result[2])\\noslo_concurrency.processutils.ProcessExecuti
> > > onError: Unexpected error while running command.\\nCommand: rbd
> > > unmap /dev/rbd/rbd/volume-0e8c1056-45d6-4740-934d-eb07a9f73160 --
> > > conf /tmp/brickrbd_LCKezP --id ovirt --mon_host 172.16.10.13:3300
> > > --mon_host 172.16.10.14:3300 --mon_host 172.16.10.12:6789\\nExit
> > > code: 16\\nStdout: u\\\'\\\'\\nStderr: u\\\'rbd: sysfs write
> > > failednrbd: unmap failed: (16) Device or resource
> > > busyn\\\'\\n\'',)#012Traceback (most recent call last):#012
> > > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line
> > > 124, in method#012ret = func(*args, **kwargs)#012  File
> > > "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1766, in
> > > detach_volume#012return
> > > managedvolume.detach_volume(vol_id)#012  File
> > > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py",
> > > line 67, in wrapper#012return func(*args, **kwargs)#012  File
> > > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py",
> > > line 135, in detach_volume#012run_helper("detach",
> > > vol_info)#012  File "/usr/lib/python2.7/site-
> > >

[ovirt-users] Re: vdsmd 4.4.0 throws an exception in asyncore.py while updating OVF data

2019-12-18 Thread Nir Soffer

On Wed, Dec 18, 2019 at 8:58 PM  wrote:
Thanks for testing oVirt 4.4. you are a brave man :-)

...
> -- vdsm.log --
>
> 2019-12-17 16:36:58,393-0600 ERROR (Reactor thread) [vds.dispatcher] 
> uncaptured python exception, closing channel 
>  0, 0) at 0x7fbda865ed30> (:object of type 'NoneType' has 
> no len() [/usr/lib64/python3.6/asyncore.py|readwrite|108] 
> [/usr/lib64/python3.6/asyncore.py|handle_read_event|423] 
> [/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|handle_read|71] 
> [/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|_delegate_call|168]
>  [/usr/lib/python3.6/site-packages/vdsm/protocoldetector.py|handle_read|115]) 
> (betterAsyncore:179)

Please file vdsm bug for this, and attach complete vdsm log.

If you can please change the log level to DEBUG (see /etc/vdsm/logger.conf)
and attach log to the bug.

Attach also engine log, the issue may also be bad request from engine,
not validated
properly in vdsm.

> - engine.log --
>
> 2019-12-17 16:36:58,395-06 ERROR 
> [org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand] 
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-95) 
> [6c787bf3] Command 
> 'org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand' failed: 
> EngineException: 
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: 
> java.net.SocketException: Connection reset (Failed with error 
> VDS_NETWORK_ERROR and code 5022)

One line from a log file is rarely useful.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/D6QDJ5O5HTHBI3HO7Z4M4WUNI5IOJSMN/

[ovirt-users]Re: qcow on LVM‽ Why‽

2020-03-05 Thread Nir Soffer

On Wed, Mar 4, 2020 at 6:13 PM Thorsten Glaser  wrote:
...
> I was shrinking a hard disc: first the filesystems inside the VM,
> then the partitions inside the VM, then the LV

This is the point where you probably corrupted your image.

oVirt does not support shrinking existing disks. If you want to do this
you must know what you are doing.

> … then I wanted to
> convert the LV to a compressed qcow2 file for transport, and it
> told me that the source is corrupted. Huh?

You corrupted it by shrinking the LV without checking the end of the image.

Next time try:

$ qemu-img check /dev/vg/lv
...
Image end offset: 123456789

You must not shrink the LV less than Image end offset.

> I had already wondered why I was unable to inspect the LV on the
> host the usual way (kpartx -v -a /dev/VG/LV after finding out,
> with “virsh --readonly -c qemu:///system domblklist VM_NAME”,
> which LV is the right one).
>
> Turns out that ovirt stores qcow on LVs instead of raw images ☹

I think this this is documented. Did you read storage admin guide before
playing with the underlying logical volumes?

> Well, vgcfgrestore to my rescue:
> - vgcfgrestore -l VG_NAME
> - vgcfgrestore -f /etc/… VG_NAME

This may be too late if another disk is using segments you removed from the
original lv, but seems that you were lucky this time.

> The image was still marked as corrupted, but exported fine. I
> could not write it back to the LV as preallocated,

You cannot change image format for existing disk. But you can delete
the VM disk,
upload the modified disk (e.g. via the UI or SDK) and attach the disk to the VM.

Or you can create a new empty preallocated disk, copy the image
directly to the disk
using qemu-img, and then attach the disk to the VM.

> which seems
> to be what ovirt does, because qemu-img doesn’t wish to do that
> when the target is a special device (not a regular file). Meh.

qemu-img convert works with block devices. You can enable DEBUG log
level in vdsm to check how vdsm run qemu-img.

> Does ovirt handle raw images on LV, and if so, how can we enable
> this for new VMs? If not, whyever the hell not? And whose “great”
> idea was this anyway?

oVirt supports raw format of course, and this is the default format
for disks on iSCSI/FC
storage domain.

You probably chose "thin" when you created the disk. This means qcow2 format.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BDHVWXG4EZAFYLBPEZFUE5ZDYUHS5G3K/

[ovirt-users] Re: The problem with create VM

2020-03-05 Thread Nir Soffer

On Thu, Mar 5, 2020 at 1:16 PM  wrote:
>
> I began use ovirt recently. I tried create one VM on ovirt plateform. But I 
> nerver start VM. I had the message like:
>
> L'hôte x..fr n'a pas pu satisfaire le filtre Network de type interne 
> car réseau d'affichage ${DisplayNames} manquant.

Google translated this to:
The host x..fr could not satisfy the Network filter of
internal type because display network $ {DisplayNames} missing.

There are two issues:
- The missing display network - maybe Dominic can help
- Placeholder "${DisplayNames}" in the message. This is probably a
missing translation or bug in the translation code
  please file ovirt-engine bug for this

Nir

>
> I don't understand where is the problem.
>
> Could you help me?
>
> Thanks
>
> Anne
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/RB66ESCX7EN7PQI2TIMJ6FAWVEYYK5G6/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7MOOWJ3ZZS3YCDVB67KUQT5CKLI2RKEA/

[ovirt-users] Re: oVirt behavior with thin provision/deduplicated block storage

2020-02-24 Thread Nir Soffer

On Mon, Feb 24, 2020 at 3:03 PM Gorka Eguileor  wrote:
>
> On 22/02, Nir Soffer wrote:
> > On Sat, Feb 22, 2020, 13:02 Alan G  wrote:
> > >
> > > I'm not really concerned about the reporting aspect, I can look in the 
> > > storage vendor UI to see that. My concern is: will oVirt stop 
> > > provisioning storage in the domain because it *thinks* the domain is 
> > > full. De-dup is currently running at about 2.5:1 so I'm concerned that 
> > > oVirt will think the domain is full way before it actually is.
> > >
> > > Not clear if this is handled natively in oVirt or by the underlying lvs?
> >
> > Because oVirt does not know about deduplication or actual allocation
> > on the storage side,
> > it will let you allocate up the size of the LUNs that you added to the
> > storage domain, minus
> > the size oVirt uses for its own metadata.
> >
> > oVirt uses about 5G for its own metadata on the first LUN in a storage
> > domain. The rest of
> > the space can be used by user disks. Disks are LVM logical volumes
> > created in the VG created
> > from the LUN.
> >
> > If you create a storage domain with 4T LUN, you will be able to
> > allocate about 4091G on this
> > storage domain. If you use preallocated disks, oVirt will stop when
> > you allocated all the space
> > in the VG. Actually it will stop earlier based on the minimal amount
> > of free space configured for
> > the storage domain when creating the storage domain.
> >
> > If you use thin disks, oVirt will allocate only 1G per disk (by
> > default), so you can allocate
> > more storage than you actually have, but when VMs will write to the
> > disk, oVirt will extend
> > the disks. Once you use all the available space in this VG, you will
> > not be able to allocate
> > more without extending the storage domain with new LUN, or resizing
> > the  LUN on storage.
> >
> > If you use Managed Block Storage (cinderlib) every disk is a LUN with
> > the exact size you
> > ask when you create the disk. The actual allocation of this LUN
> > depends on your storage.
> >
> > Nir
> >
>
> Hi,
>
> I don't know anything about the oVirt's implementation, so I'm just
> going to provide some information from cinderlib's point of view.
>
> Cinderlib was developed as a dumb library to abstract access to storage
> backends, so all the "smart" functionality is pushed to the user of the
> library, in this case oVirt.
>
> In practice this means that cinderlib will NOT limit the number of LUNs
> or over-provisioning done in the backend.
>
> Cinderlib doesn't care if we are over-provisioning because we have dedup
> and decompression or because we are using thin volumes where we don't
> consume all the allocated space, it doesn't even care if we cannot do
> over-provisioning because we are using thick volumes.  If it gets a
> request to create a volume, it will try to do so.
>
> From oVirt's perspective this is dangerous if not controlled, because we
> could end up consuming all free space in the backend and then running
> VMs will crash (I think) when they could no longer write to disks.
>
> oVirt can query the stats of the backend [1] to see how much free space
> is available (free_capacity_gb) at any given time in order to provide
> over-provisioning limits to its users.  I don't know if oVirt is already
> doing that or something similar.
>
> If is important to know that stats gathering is an expensive operation
> for most drivers, and that's why we can request cached stats (cache is
> lost as the process exits) to help users not overuse it.  It probably
> shouldn't be gathered more than once a minute.
>
> I hope this helps.  I'll be happy to answer any cinderlib questions. :-)

Thanks Gorka, good to know we already have API to get backend
allocation info. Hopefully we will use this in future version.

Nir

>
> Cheers,
> Gorka.
>
> [1]: https://docs.openstack.org/cinderlib/latest/topics/backends.html#stats
>
> > >  On Fri, 21 Feb 2020 21:35:06 + Nir Soffer  
> > > wrote 
> > >
> > >
> > >
> > > On Fri, Feb 21, 2020, 17:14 Alan G  wrote:
> > >
> > > Hi,
> > >
> > > I have an oVirt cluster with a storage domain hosted on a FC storage 
> > > array that utilises block de-duplication technology. oVirt reports the 
> > > capacity of the domain as though the de-duplication factor was 1:1, which 
> > > of course is not the case. So what I would like to understand is the 
> > > likely behavior of oVirt when the used space appr

[ovirt-users] Re: Can't connect vdsm storage: Command StorageDomain.getInfo with args failed: (code=350, message=Error in storage domain action

2020-02-01 Thread Nir Soffer

On Sat, Feb 1, 2020 at 5:39 PM  wrote:

> Ok, i will try to set 777 permissoin on NFS storage.

This is invalid configuration. See RHV docs for proper configuration:
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/html/administration_guide/sect-preparing_and_adding_nfs_storage#Preparing_NFS_Storage_storage_admin

> But, why this issue starting from updating  4.30.32-1 to  4.30.33-1?
> Withowt any another changes.
>

I guess you had wrong permissions and ownership on the storage before, but
vdsm was not detecting
the issue because it was missing validations in older versions. Current
version is validating that creating
and deleting files and using direct I/O works with the storage when
creating and activating a storage
domain.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NWMDXNEBA62FD6QKKYFQJWEKXBD55VBD/

[ovirt-users] Re: Hyperconverged solution

2020-01-23 Thread Nir Soffer

On Wed, Jan 22, 2020, 17:54 Benedetto Vassallo 
wrote:

> Hello Guys,
> Here at University of Palermo (Italy) we are planning to switch from
> vmware to ovirt using the hyperconverged solution.
> Our design is a 6 nodes cluster, each node with this configuration:
>
> - 1x Dell PowerEdge R7425 server;
> - 2x AMD EPYC 7301 Processor;
> - 512GB of RAM (8x 64GB LRDIMM, 2666MT/s, Quad Rank);
> - 2x Broadcom 57412 Dual Port 10Gb SFP+ ethernet card;
> - 3x 600GB 10K RPM SAS for the OS (Raid1 + hotspare);
> - 5x 1.2TB 10K RPM SAS for the hosted storage domain (Raid5 + hotspare);
>
The hosted engine storage donain is small and sould run only one VM, so you
probably don't need 1.2T disks for it.

> - 11x 2.4TB 10KRPM SAS for the vm data domain (Raid6 + hotspare);
> - 4x 960GB SSD SAS for an additional SSD storage domain (Raid5 + hotspare);
>
Hyperconverged uses gluster, and gluster uses replication (replica 3 or
replica 2 + arbiter) so adding raid below may not be needed.

You may use the SSDs for lvm cache for the gluster setup.

I would try to ask on Gluster mailing list about this.

Sahina, what do you think?

Nir

Is this configuration supported or I have to change something?
> Thank you and Best Regards.
> --
> Benedetto Vassallo
> Responsabile U.O. Sviluppo e manutenzione dei sistemi
> Sistema Informativo di Ateneo
> Università degli studi di Palermo
>
> Phone: +3909123860056
> Fax: +3909123860880
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/JU7PVYPSNUASWZAU2VG2DRCLSWHK5XRX/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/N3GTOJ5ZHELKWPEI7SM4WL427SKQU2KM/

[ovirt-users] Re: What is this error message from?

2020-02-17 Thread Nir Soffer

On Mon, Feb 17, 2020, 16:53  wrote:

> I have seen this error message repeatedly when reviewing events.
>
> VDSM vmh.cyber-range.lan command HSMGetAllTasksStatusesVDS failed: low
> level Image copy failed: ("Command ['/usr/bin/qemu-img', 'convert', '-p',
> '-t', 'none', '-T', 'none', '-f', 'raw',
> u'/rhev/data-center/mnt/glusterSD/storage.cyber-range.lan:_vmstore/dd69364b-2c02-4165-bc4b-2f2a3b7fc10d/images/c651575f-75a0-492e-959e-8cfee6b6a7b5/9b5601fe-9627-4a8a-8a98-4959f68fb137',
> '-O', 'qcow2', '-o', 'compat=1.1',
> u'/rhev/data-center/mnt/glusterSD/storage.cyber-range.lan:_vmstore/dd69364b-2c02-4165-bc4b-2f2a3b7fc10d/images/6a2ce11a-deec-41e0-a726-9de6ba6d4ddd/6d738c08-0f8c-4a10-95cd-eeaa2d638db5']
> failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading
> sector 24117243: No such file or directory\\n')",)
>

Looks like copying image failed with ENOENT while reading
offset 12348028416 (11.49 GiB).

I never seen such failure, typically after opening a file read will never
fail with such error, but in gluster this may be possible.

Please share vdsm log showingn this error, it may add useful info.

Also glusterfs client logs from
/var/log/glusterfs*/*storage.cyber-range.lan*.log

Kevin, Krutika, do you have an idea about this error?

Nir


> Can someone explain what this is?  How do I get this cleared up/resolved?
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/KBNSTWBZFN7PGWW74AGAGQVPNJ2DIZ6S/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/67N6VIO47YSMQ6VHWKZZWLYETPK4HC2I/

[ovirt-users] Re: paused vm's will not resume

2020-02-18 Thread Nir Soffer

On Tue, Feb 18, 2020 at 8:46 PM  wrote:
>
> When I used virsh it always asks for credentials.

This is why I use the -r flag.

$ man virsh
...
   · -r, --readonly

   Make the initial connection read-only, as if by the --readonly
option of the connect command.

> -Original Message-
> From: Nir Soffer 
> Sent: Tuesday, February 18, 2020 12:00 PM
> To: eev...@digitaldatatechs.com
> Cc: users 
> Subject: [ovirt-users] Re: paused vm's will not resume
>
> On Tue, Feb 18, 2020 at 6:56 AM  wrote:
> >
> > I have 2 vm's, which are the most important in my world, that paused and 
> > will not resume. I have googled this to death but no solution. It stated a 
> > lack of space but none of the drives on my hosts are using more than 30% or 
> > there space and these 2 have ran on kvm host for several years and always 
> > had at least 50% free space.
>
> Can you share the VM XML of these VMs?
>
> The easier way is:
>
> # virsh -r list
>
> # virsh -r dumpxml vm-id
>
> Also having vdsm.log from the time the vm was paused would help to understand 
> why the vm was paused.
>
> For block storage, paused vms are expected to be resumed once the vm disk is 
> extended, or if the vm paused because storage was not accessible temporarily, 
> once the storage becomes accessible again.
>
> For file based storage, we don't support yet resuming paused vms.
>
> Nir
>
> > I like ovirt and want to use it but I cannot tolerate the down time. If I 
> > cannot get this resolved, I'm going back to kvm hosts. I am pulling my hair 
> > out here.
> > If anyone can help with this issue, please let me know.
> > ___
> > Users mailing list -- users@ovirt.org
> > To unsubscribe send an email to users-le...@ovirt.org Privacy
> > Statement: https://www.ovirt.org/site/privacy-policy/
> > oVirt Code of Conduct:
> > https://www.ovirt.org/community/about/community-guidelines/
> > List Archives:
> > https://lists.ovirt.org/archives/list/users@ovirt.org/message/JBXNV3WT
> > 2W72I2E7EXM2KY4YN37STIMC/
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: 
> https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/KFU4CMDLWBPARNOI2GVXNI62E7YPT4R2/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7II42Y7ZMXE62OTF26WXHCVXZMVLRR4O/

[ovirt-users] Re: oVirt behavior with thin provision/deduplicated block storage

2020-02-21 Thread Nir Soffer

On Fri, Feb 21, 2020, 17:14 Alan G  wrote:

> Hi,
>
> I have an oVirt cluster with a storage domain hosted on a FC storage array
> that utilises block de-duplication technology. oVirt reports the capacity
> of the domain as though the de-duplication factor was 1:1, which of course
> is not the case. So what I would like to understand is the likely behavior
> of oVirt when the used space approaches the reported capacity. Particularly
> around the critical action space blocker.
>

oVirt does not know about the underlying block storage thin provisioning
implemention so it cannot help with this.

You will have to use the underlying storage separately to learn about the
actual allocation.

This is unlikely to change for legacy storage, but for Managed Block
Storage (conderlib) we may have a way to access such info.

Gorka, do we have any support in cinderlib for getting info about storage
alllocation and deduplication?

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BKCQYELRW5XP5BVDPSJJ76YZD3N37LVF/

[ovirt-users] Re: oVirt behavior with thin provision/deduplicated block storage

2020-02-22 Thread Nir Soffer

On Sat, Feb 22, 2020, 13:02 Alan G  wrote:
>
> I'm not really concerned about the reporting aspect, I can look in the 
> storage vendor UI to see that. My concern is: will oVirt stop provisioning 
> storage in the domain because it *thinks* the domain is full. De-dup is 
> currently running at about 2.5:1 so I'm concerned that oVirt will think the 
> domain is full way before it actually is.
>
> Not clear if this is handled natively in oVirt or by the underlying lvs?

Because oVirt does not know about deduplication or actual allocation
on the storage side,
it will let you allocate up the size of the LUNs that you added to the
storage domain, minus
the size oVirt uses for its own metadata.

oVirt uses about 5G for its own metadata on the first LUN in a storage
domain. The rest of
the space can be used by user disks. Disks are LVM logical volumes
created in the VG created
from the LUN.

If you create a storage domain with 4T LUN, you will be able to
allocate about 4091G on this
storage domain. If you use preallocated disks, oVirt will stop when
you allocated all the space
in the VG. Actually it will stop earlier based on the minimal amount
of free space configured for
the storage domain when creating the storage domain.

If you use thin disks, oVirt will allocate only 1G per disk (by
default), so you can allocate
more storage than you actually have, but when VMs will write to the
disk, oVirt will extend
the disks. Once you use all the available space in this VG, you will
not be able to allocate
more without extending the storage domain with new LUN, or resizing
the  LUN on storage.

If you use Managed Block Storage (cinderlib) every disk is a LUN with
the exact size you
ask when you create the disk. The actual allocation of this LUN
depends on your storage.

Nir

>  On Fri, 21 Feb 2020 21:35:06 + Nir Soffer  wrote 
> 
>
>
>
> On Fri, Feb 21, 2020, 17:14 Alan G  wrote:
>
> Hi,
>
> I have an oVirt cluster with a storage domain hosted on a FC storage array 
> that utilises block de-duplication technology. oVirt reports the capacity of 
> the domain as though the de-duplication factor was 1:1, which of course is 
> not the case. So what I would like to understand is the likely behavior of 
> oVirt when the used space approaches the reported capacity. Particularly 
> around the critical action space blocker.
>
>
> oVirt does not know about the underlying block storage thin provisioning 
> implemention so it cannot help with this.
>
> You will have to use the underlying storage separately to learn about the 
> actual allocation.
>
> This is unlikely to change for legacy storage, but for Managed Block Storage 
> (conderlib) we may have a way to access such info.
>
> Gorka, do we have any support in cinderlib for getting info about storage 
> alllocation and deduplication?
>
> Nir
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BKCQYELRW5XP5BVDPSJJ76YZD3N37LVF/
>
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NVBOY24VWN6TKWJKRN345WOGJRF55XJB/

[ovirt-users] Re: paused vm's will not resume

2020-02-22 Thread Nir Soffer

On Wed, Feb 19, 2020 at 8:23 AM  wrote:
>
> I had several vm's pause again. The message in the event log is:
> VDSM command SetVolumeDescriptionVDS failed: Volume does not exist: 
> (u'e3f79840-8355-45b0-ad2b-440c877be637',)
> I restarted nfs on that node and clicked run  and they all resumed.
> This is local storage on the node with over 200GB of free space but they were 
> paused due to a storage error.
> Do you still want the xml files? It seems ovirt is losing contact with the 
> export.

Yes, having the VM XML is the first step to understand your deployment.

>
> Eric Evans
> Digital Data Services LLC.
> 304.660.9080
>
>
> -Original Message-
> From: Nir Soffer 
> Sent: Tuesday, February 18, 2020 12:00 PM
> To: eev...@digitaldatatechs.com
> Cc: users 
> Subject: [ovirt-users] Re: paused vm's will not resume
>
> On Tue, Feb 18, 2020 at 6:56 AM  wrote:
> >
> > I have 2 vm's, which are the most important in my world, that paused and 
> > will not resume. I have googled this to death but no solution. It stated a 
> > lack of space but none of the drives on my hosts are using more than 30% or 
> > there space and these 2 have ran on kvm host for several years and always 
> > had at least 50% free space.
>
> Can you share the VM XML of these VMs?
>
> The easier way is:
>
> # virsh -r list
>
> # virsh -r dumpxml vm-id
>
> Also having vdsm.log from the time the vm was paused would help to understand 
> why the vm was paused.
>
> For block storage, paused vms are expected to be resumed once the vm disk is 
> extended, or if the vm paused because storage was not accessible temporarily, 
> once the storage becomes accessible again.
>
> For file based storage, we don't support yet resuming paused vms.
>
> Nir
>
> > I like ovirt and want to use it but I cannot tolerate the down time. If I 
> > cannot get this resolved, I'm going back to kvm hosts. I am pulling my hair 
> > out here.
> > If anyone can help with this issue, please let me know.
> > ___
> > Users mailing list -- users@ovirt.org
> > To unsubscribe send an email to users-le...@ovirt.org Privacy
> > Statement: https://www.ovirt.org/site/privacy-policy/
> > oVirt Code of Conduct:
> > https://www.ovirt.org/community/about/community-guidelines/
> > List Archives:
> > https://lists.ovirt.org/archives/list/users@ovirt.org/message/JBXNV3WT
> > 2W72I2E7EXM2KY4YN37STIMC/
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: 
> https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/KFU4CMDLWBPARNOI2GVXNI62E7YPT4R2/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/547UTXHY3HKRDJEX4FIK7PTBGTXKG37U/

[ovirt-users] Re: What is this error message from?

2020-02-18 Thread Nir Soffer

On Tue, Feb 18, 2020 at 4:13 PM Jeremy Tourville
 wrote:
>
> I don't recall running any convert operations on the host and certainly not 
> on the time/date listed.  *If* I ran any conversions were run, they were run 
> from a laptop and then I moved the converted disk to this host.  I definitely 
> didn't make any volume changes.  Is this image conversion part of the 
> template process?  I have been creating quite a few templates lately.  I have 
> noted that several of them failed and I had to rerun the process.

This may be an error from template creation.

> Is this some sort of process that just keeps trying over and over because it 
> thinks it failed?

We don't have such jobs.

The log you posted contains output from getAllTasksStatuses:

2020-02-17 06:19:47,782-0600 INFO  (jsonrpc/5) [vdsm.api] FINISH
getAllTasksStatuses return={'allTasksStatus':
{'1cbc63d7-2310-4291-8f08-df5bf58376bb': {'code': 0, 'message': '1
jobs completed successfully', 'taskState': 'finished', 'taskResult':
'success', 'taskID': '1cbc63d7-2310-4291-8f08-df5bf58376bb'},
'9db209be-8e33-4c35-be8a-a58b4819812a': {'code': 261, 'message': 'low
level Image copy failed: ("Command [\'/usr/bin/qemu-img\',
\'convert\', \'-p\', \'-t\', \'none\', \'-T\', \'none\', \'-f\',
\'raw\', 
u\'/rhev/data-center/mnt/glusterSD/storage.cyber-range.lan:_vmstore/dd69364b-2c02-4165-bc4b-2f2a3b7fc10d/images/c651575f-75a0-492e-959e-8cfee6b6a7b5/9b5601fe-9627-4a8a-8a98-4959f68fb137\',
\'-O\', \'qcow2\', \'-o\', \'compat=1.1\',
u\'/rhev/data-center/mnt/glusterSD/storage.cyber-range.lan:_vmstore/dd69364b-2c02-4165-bc4b-2f2a3b7fc10d/images/6a2ce11a-deec-41e0-a726-9de6ba6d4ddd/6d738c08-0f8c-4a10-95cd-eeaa2d638db5\']
failed with rc=1 out=\'\' err=bytearray(b\'qemu-img: error while
reading sector 24117243: No such file or directoryn\')",)',
'taskState': 'finished', 'taskResult': 'cleanSuccess', 'taskID':
'9db209be-8e33-4c35-be8a-a58b4819812a'},
'bd494f24-ca73-4e89-8ad0-629ad32bb2c1': {'code': 0, 'message': '1 jobs
completed successfully', 'taskState': 'finished', 'taskResult':
'success', 'taskID': 'bd494f24-ca73-4e89-8ad0-629ad32bb2c1'}}}
from=:::172.30.50.4,33302,
task_id=8a8c6402-4e1d-46b8-a8fd-454fde7151d7 (api:54)

The failing task was:

\'9db209be-8e33-4c35-be8a-a58b4819812a\': {
\'code\': 261,
\'message\': \'low level Image copy failed: ("Command
[\'/usr/bin/qemu-img\', \'convert\', \'-p\', \'-t\', \'none\', \'-T\',
\'none\', \'-f\', \'raw\',
u\'/rhev/data-center/mnt/glusterSD/storage.cyber-range.lan:_vmstore/dd69364b-2c02-4165-bc4b-2f2a3b7fc10d/images/c651575f-75a0-492e-959e-8cfee6b6a7b5/9b5601fe-9627-4a8a-8a98-4959f68fb137\',
\'-O\', \'qcow2\', \'-o\', \'compat=1.1\',
u\'/rhev/data-center/mnt/glusterSD/storage.cyber-range.lan:_vmstore/dd69364b-2c02-4165-bc4b-2f2a3b7fc10d/images/6a2ce11a-deec-41e0-a726-9de6ba6d4ddd/6d738c08-0f8c-4a10-95cd-eeaa2d638db5\']
failed with rc=1 out=\'\' err=bytearray(b\'qemu-img: error while
reading sector 24117243: No such file or directoryn\')",)\',
\'taskState\': \'finished\',
\'taskResult\': \'cleanSuccess\',
\'taskID\': \'9db209be-8e33-4c35-be8a-a58b4819812a\'
}

You may grep for this task id in all logs and share the matching logs.

Note that you have binary data in your logs, so you need to use "grep -a".

That's the only theory I can come up with.
>
> 
> From: Kevin Wolf 
> Sent: Tuesday, February 18, 2020 3:01 AM
> To: Nir Soffer 
> Cc: jeremy_tourvi...@hotmail.com ; users 
> ; Krutika Dhananjay 
> Subject: Re: [ovirt-users] What is this error message from?
>
> Am 17.02.2020 um 16:16 hat Nir Soffer geschrieben:
> > On Mon, Feb 17, 2020, 16:53  wrote:
> >
> > > I have seen this error message repeatedly when reviewing events.
> > >
> > > VDSM vmh.cyber-range.lan command HSMGetAllTasksStatusesVDS failed: low
> > > level Image copy failed: ("Command ['/usr/bin/qemu-img', 'convert', '-p',
> > > '-t', 'none', '-T', 'none', '-f', 'raw',
> > > u'/rhev/data-center/mnt/glusterSD/storage.cyber-range.lan:_vmstore/dd69364b-2c02-4165-bc4b-2f2a3b7fc10d/images/c651575f-75a0-492e-959e-8cfee6b6a7b5/9b5601fe-9627-4a8a-8a98-4959f68fb137',
> > > '-O', 'qcow2', '-o', 'compat=1.1',
> > > u'/rhev/data-center/mnt/glusterSD/storage.cyber-range.lan:_vmstore/dd69364b-2c02-4165-bc4b-2f2a3b7fc10d/images/6a2ce11a-deec-41e0-a726-9de6ba6d4ddd/6d738c08-0f8c-4a10-95cd-eeaa2d638db5']
> > > failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading
> > > sector 24117243: No such file or directory\\n')",)
> > >
> >
> > Looks like copying image failed with ENOENT while reading
> > offset 12348028416 (11.49 GiB).
> >
> > I never seen such failure, typically after opening a file read will

[ovirt-users] Re: paused vm's will not resume

2020-02-18 Thread Nir Soffer

On Tue, Feb 18, 2020 at 6:56 AM  wrote:
>
> I have 2 vm's, which are the most important in my world, that paused and will 
> not resume. I have googled this to death but no solution. It stated a lack of 
> space but none of the drives on my hosts are using more than 30% or there 
> space and these 2 have ran on kvm host for several years and always had at 
> least 50% free space.

Can you share the VM XML of these VMs?

The easier way is:

# virsh -r list

# virsh -r dumpxml vm-id

Also having vdsm.log from the time the vm was paused would help to understand
why the vm was paused.

For block storage, paused vms are expected to be resumed once the vm
disk is extended,
or if the vm paused because storage was not accessible temporarily,
once the storage
becomes accessible again.

For file based storage, we don't support yet resuming paused vms.

Nir

> I like ovirt and want to use it but I cannot tolerate the down time. If I 
> cannot get this resolved, I'm going back to kvm hosts. I am pulling my hair 
> out here.
> If anyone can help with this issue, please let me know.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/JBXNV3WT2W72I2E7EXM2KY4YN37STIMC/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KFU4CMDLWBPARNOI2GVXNI62E7YPT4R2/

[ovirt-users] Re: ovirt_disk Ansible module creating disks of wrong size but why?

2020-01-16 Thread Nir Soffer

On Mon, Jan 13, 2020 at 11:18 AM wrote:

> Hallo Jan,
>
> maybe I worded my findings not clearly enough, sorry. oVirt is showing
> that the disk I create is 500GiB (like in your test) but inside the CentOS
> VM I have only a 8GiB disk, which I can see with lsblk and fdisk.
>

Can you share the output of this command on the image being uploaded?

qemu-img info /path/to/omage

oVirt SDK lets you create disk of any size when you upload an image. You
must create disk
large enough to be able to upload the image, but oVirt cannot prevent you
from creating a bigger
disk, and it can make sense for some cases.

The virtual size of the disk is called provisioned_size in the SDK. This
must be the same size
as reported by qemu-img info "virtual size". It you specify smaller value
the upload will fail at
the end, when oVirt verify that uploaded image. Specifying more is allowed
(for backward
compatibility) but is not recommended.

When uploading qcow2 images to block storage (e.g iSCSI/FC) you must
specify the initial_size.
You can find this value using:

qemu-img measure -f qcow2 -O qcow2 /path/to/image.qcow2

You must use the value returned as "required size".

The best tool to upload images is upload_disk.py from ovirt sdk examples:
https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/upload_disk.py

With oVirt 4.4 you can upload any image format to any disk format, and the
tool detect the image
format and calculate the correct size without errors.

Here are some examples (eliminating the engine details).

Upload raw image to qcow2 disk:

$ python3 upload_disk.py ... --disk-format qcow2 --disk-sparse
fedora-30.raw Checking image... Image format: raw Disk format: cow Disk
content type: data Disk provisioned size: 6442450944 Disk initial size:
1236336640 Disk name: fedora-30.qcow2 Connecting... Creating disk...
Creating transfer session... Uploading image... [ 100.00% ] 6.00 GiB, 4.27
seconds, 1.40 GiB/s Finalizing transfer session... Upload completed
successfully

Upload qcow2 disk to raw disk:

$ python3 upload_disk.py ... --disk-format raw --disk-sparse disk.qcow2
Checking image... Image format: qcow2 Disk format: raw Disk content type:
data Disk provisioned size: 6442450944 Disk initial size: 6442450944 Disk
name: disk.raw Connecting... Creating disk... Creating transfer session...
Uploading image... [ 100.00% ] 6.00 GiB, 4.51 seconds, 1.33 GiB/s
Finalizing transfer session... Upload completed successfully

Run upload_disk.py --help to learn about the available options.

I guess the ansible module should be updated to upload the disk properly
based on the sdk example.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PTVW6I2JTAARPXDZXXH5R6MX7DPH76CM/

[ovirt-users] Re: ISO Upload

2020-01-16 Thread Nir Soffer

On Tue, Jan 7, 2020 at 4:02 PM Chris Adams  wrote:

> Once upon a time, m.skrzetu...@gmail.com  said:
> > I'd give up on the ISO domain. I started like you and then read the docs
> which said that ISO domain is deprecated.
> > I'd upload all files to a data domain.
>
> Note that that only works if your data domain is NFS... iSCSI data
> domains will let you upload ISOs, but connecting them to a VM fails.
>

ISO on iSCSI/FC domains works fine for starting a VM from ISO, which is the
main use case.

There is  an engine issue with hotplug, not sending the disk details
properly,
so vdsm cannot activate the LV.

I hope this issue will be fixed soon.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NDGTQNC5PYJE2HTTBLLUYS4GEQOTZCL3/

[ovirt-users] Re: ISO Upload

2020-01-17 Thread Nir Soffer

On Fri, Jan 17, 2020 at 6:41 AM Strahil Nikolov 
wrote:

> On January 17, 2020 12:10:56 AM GMT+02:00, Chris Adams 
> wrote:
> >Once upon a time, Nir Soffer  said:
> >> On Tue, Jan 7, 2020 at 4:02 PM Chris Adams  wrote:
> >> > Once upon a time, m.skrzetu...@gmail.com 
> >said:
> >> > > I'd give up on the ISO domain. I started like you and then read
> >the docs
> >> > which said that ISO domain is deprecated.
> >> > > I'd upload all files to a data domain.
> >> >
> >> > Note that that only works if your data domain is NFS... iSCSI data
> >> > domains will let you upload ISOs, but connecting them to a VM
> >fails.
> >>
> >> ISO on iSCSI/FC domains works fine for starting a VM from ISO, which
> >is the
> >> main use case.
> >
> >Okay - it didn't the last time I tried it (I just got errors).  Thanks.
>
> I have opened and RFE for ISO checksumming, as currently the uploader can
> silently corrupt your DVD.
>

Can you share the bug number?

> With gluster, I have an option to check the ISO checksum and
> verify/replace the file, but with Block-based storage that will be quite
> difficult.
>

Checksumming is a general feature not related to ISO uploads. But
checksumming tools do not understand
sparseness so you should really use a tool designed for compare disk
images, like "qemu-img compare".

Here is an example:

1. Create fedora 30 image for testing:

$ virt-builder fedora-30 -o fedora-30.raw
...
$ qemu-img info fedora-30.raw
image: fedora-30.raw
file format: raw
virtual size: 6 GiB (6442450944 bytes)
disk size: 1.15 GiB

2. Create a checksum of the image

$ time shasum fedora-30.raw
991c2efee723e04b7d41d75f70d19bade02b400d  fedora-30.raw

real 0m14.641s
user 0m12.653s
sys 0m1.749s

3. Create compressed qcow2 image with same content

$ qemu-img convert -f raw -O qcow2 -c fedora-30.raw fedora-30.qcow2
...
$ qemu-img info fedora-30.qcow2
image: fedora-30.qcow2
file format: qcow2
virtual size: 6 GiB (6442450944 bytes)
disk size: 490 MiB
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

This is typical file format used for publishing disk images. The contents
of this
image are the same as the raw version from the guest point of view.

3. Compare image content

$ time qemu-img compare fedora-30.raw fedora-30.qcow2
Images are identical.

real 0m4.680s
user 0m4.273s
sys 0m0.553s

3 times faster to compare 2 images with different format compared with
creating
a checksum of single image.

Now lets see how we can use this to verify uploads.

4. Upload the qcow2 compressed image to a new raw disk (requires ovirt 4.4
alpah3):

$ python3 upload_disk.py --engine-url https://engine/ --username
admin@internal --password-file password \
--cafile ca.pem --sd-name nfs1-export2 --disk-format raw --disk-sparse
fedora-30.qcow2

5. Download image to raw format:

$ python3 download_disk.py --engine-url https://engine/ --username
admin@internal --password-file password \
--cafile ca.pem --format raw f40023a5-ddc4-4fcf-b8e2-af742f372104
fedora-30.download.raw

6. Comparing original and downloaded images

$ qemu-img compare fedora-30.qcow2 fedora-30.download.raw
Images are identical.

Back to the topic of ISO uploads to block storage. Block volumes in oVirt
are always aligned to
128 MiB, so when you upload an image which is not aligned to 128 MiB, oVirt
creates a bigger
block device. The contents of the device after the image content are not
defined, unless you
zero this area during upload. The current upload_disk.py example does not
zero the end of
the device since the guest do not care about it, but this makes verifying
uploads harder.

The best way to handle this issue is to truncate the ISO image up to the
next multiple of 128 MiB
before uploading it:

$ ls -l Fedora-Server-dvd-x86_64-30-1.2.iso
-rw-rw-r--. 1 nsoffer nsoffer 3177185280 Nov  8 23:09
Fedora-Server-dvd-x86_64-30-1.2.iso

$ python3 -c 'n = 3177185280 + 128 * 1024**2 - 1; print(n - (n % (128 *
1024**2)))'
3221225472

$ truncate -s 3221225472 Fedora-Server-dvd-x86_64-30-1.2.iso

The contents of the iso image is the same as it will be on the block device
after the upload, and
uploading this image will zero the end of the device.

If we upload this image, we can check the upload using qemu img compare.

$ python3 upload_disk.py --engine-url https://engine/ --username
admin@internal --password-file password \
--cafile ca.pem --sd-name iscsi-1 --disk-format raw
Fedora-Server-dvd-x86_64-30-1.2.iso

$ python3 download_disk.py --engine-url https://engine/ --username
admin@internal --password-file password \
--cafile ca.pem --format raw 5f0b5347-bbbc-4521-9ca0-8fc17670bab0
iso.raw

$ qemu-img compare iso.raw Fedora-Server-dvd-x86_64-30-1.2.iso
Images are identical.

This is not e

[ovirt-users] Re: Storage domain in maintenance

2020-04-11 Thread Nir Soffer

On Sat, Apr 11, 2020 at 6:55 AM Strahil Nikolov  wrote:

> I have opened a bug for it (not sure on the correct Product/Component) :
> https://bugzilla.redhat.com/1823033

This is expected behavior. We don't support arbitrary modifications of
storage domain
directory structure.

I explained in the bug how you can do the same modifications safely:
https://bugzilla.redhat.com/1823033#c1

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NXP73OKXPURKIHKQU4TXTRX76TGXG7A3/

[ovirt-users] Re: Storage domain in maintenance

2020-04-11 Thread Nir Soffer

On Sat, Apr 11, 2020, 14:19 Strahil Nikolov  wrote:

> On April 11, 2020 1:14:52 PM GMT+03:00, Nir Soffer 
> wrote:
> >On Sat, Apr 11, 2020 at 6:55 AM Strahil Nikolov 
> >wrote:
> >
> >> I have opened a bug for it (not sure on the correct
> >Product/Component) :
> >> https://bugzilla.redhat.com/1823033
> >
> >This is expected behavior. We don't support arbitrary modifications of
> >storage domain
> >directory structure.
> >
> >I explained in the bug how you can do the same modifications safely:
> >https://bugzilla.redhat.com/1823033#c1
> >
> >Nir
>
> Hey Nir,
>
> Sadly  I didn't  know that , but I couldn't also find anything related in
> the docs.
> Am I missing it or it's just not there ?
>
> A Feature  request from my side - can oVirt create the storage domain
> structure + a file  named 'DO_NOT_PUT_ANYTHING_HERE'  with the  info from
> the bug inside ?
>
> Would the dev consider that ?
> I  think that  the  implementation is  quite  simple.
>

You can file a RFE about that.


> Best Regards,
> Strahil Nikolov
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CTTXZMVGMF623OZWCW3OHFKF42ZYWYUT/

[ovirt-users] Re: HCI cluster single node error making template

2020-03-25 Thread Nir Soffer

On Wed, Mar 25, 2020 at 2:06 PM Gianluca Cecchi
 wrote:
>
> Hello,
> I'm on 4.3.9
> I have created a VM with 4vcpu, 16gb mem and a nic and a thin provisioned 
> disk of 120Gb
> Installed nothing on it, only defined.
> Now I'm trying to make template from it but I get error.
> I leave the prefille value of raw for format
> Target storage domain has 900 Gb free, almost empty
>
> In events pane:
>
> Creation of Template ocp_node from VM ocp_node_template was initiated by 
> admin@internal-authz.
> 3/25/20 12:42:35 PM
>
> Then I get error in events pane is:
>
> VDSM ovirt.example.local command HSMGetAllTasksStatusesVDS failed: low level 
> Image copy failed: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 
> 'none', '-T', 'none', '-f', 'raw', 
> u'/rhev/data-center/mnt/glusterSD/ovirtst.example.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/61689cb2-fdce-41a5-a6d9-7d06aefeb636/30009efb-83ed-4b0d-b243-3160195ae46e',
>  '-O', 'qcow2', '-o', 'compat=1.1', 
> u'/rhev/data-center/mnt/glusterSD/ovirtst.example.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/5642b52f-d7e8-48a8-adf9-f79022ce4594/982dd5cc-5f8f-41cb-b2e7-3cbdf2a656cf']
>  failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading sector 
> 18620412: No such file or directory\\n')",)

This is the second time I see this error - this error should be
impossible, reading should never
fail with ENOENT.

> Error: Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 
> 'none', '-f', 'raw', 
> u'/rhev/data-center/mnt/glusterSD/ovirtst.example.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/61689cb2-fdce-41a5-a6d9-7d06aefeb636/30009efb-83ed-4b0d-b243-3160195ae46e',
>  '-O', 'raw', 
> u'/rhev/data-center/mnt/glusterSD/ovirtst.example.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/bb167b28-94fa-434c-8fb6-c4bedfc06c62/53d3ab96-e5d1-453a-9989-2f858e6a9e0a']
>  failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading sector 
> 22028283: No data available\n')

This error ENODATA is also strange, preadv() is not documented to
return this error.

This is the second report here about impossible errors with gluster
storage. First report was here:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KBNSTWBZFN7PGWW74AGAGQVPNJ2DIZ6S/

Please file gluster bug and attach gluster client logs from /var/log/glusterfs/.


Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UDWCX47CI6W7UBW3QF3Z5ZC5EEZUSIW4/

[ovirt-users] Re: vm console problem

2020-03-25 Thread Nir Soffer

On Wed, Mar 25, 2020 at 12:45 PM David David  wrote:
>
> ovirt 4.3.8.2-1.el7
> gtk-vnc2-1.0.0-1.fc31.x86_64
> remote-viewer version 8.0-3.fc31
>
> can't open vm console by remote-viewer
> vm has vnc console protocol
> when click on console button to connect to a vm, the remote-viewer
> console disappear immediately
>
> remote-viewer debug in attachment

You an issue with the certificates:

(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.238:
../src/vncconnection.c Set credential 2 libvirt
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.239:
../src/vncconnection.c Searching for certs in /etc/pki
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.239:
../src/vncconnection.c Searching for certs in /root/.pki
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.239:
../src/vncconnection.c Failed to find certificate CA/cacert.pem
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.239:
../src/vncconnection.c No CA certificate provided, using GNUTLS global
trust
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.239:
../src/vncconnection.c Failed to find certificate CA/cacrl.pem
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.239:
../src/vncconnection.c Failed to find certificate
libvirt/private/clientkey.pem
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.239:
../src/vncconnection.c Failed to find certificate
libvirt/clientcert.pem
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.239:
../src/vncconnection.c Waiting for missing credentials
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.239:
../src/vncconnection.c Got all credentials
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.239:
../src/vncconnection.c No CA certificate provided; trying the system
trust store instead
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.240:
../src/vncconnection.c Using the system trust store and CRL
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.240:
../src/vncconnection.c No client cert or key provided
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.240:
../src/vncconnection.c No CA revocation list provided
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.241:
../src/vncconnection.c Handshake was blocking
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.243:
../src/vncconnection.c Handshake was blocking
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.251:
../src/vncconnection.c Handshake was blocking
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.298:
../src/vncconnection.c Handshake done
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.298:
../src/vncconnection.c Validating
(remote-viewer:2721): gtk-vnc-DEBUG: 11:56:25.301:
../src/vncconnection.c Error: The certificate is not trusted

Adding people that may know more about this.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y4MPGDLGXR5TZR2JUMHM4HBGE7TOMFWO/

[ovirt-users] Re: Shutdown procedure for single host HCI Gluster

2020-03-27 Thread Nir Soffer

On Wed, Mar 25, 2020 at 2:49 AM Gianluca Cecchi
 wrote:
>
> On Wed, Mar 25, 2020 at 1:16 AM Nir Soffer  wrote:
>>
>>
>>
>> OK, found it - this issue is
>> https://bugzilla.redhat.com/1609029
>>
>> Simone provided this to solve the issue:
>> https://github.com/oVirt/ovirt-ansible-shutdown-env/blob/master/README.md
>>
>> Nir
>>
>
> Ok, I will try the role provided by Simone and Sandro with my 4.3.9 single 
> HCI host and report.

Looking at the bug comments, I'm not sure this ansible script address
the issues you reported. Please
file a bug if you still see these issues when using the script.

We may need to solve this in vdsm-tool, adding an easy way to stop the
spm and disconnect from
storage cleanly. When we have such way the ansible script can use it.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GTRRP3PHE2CQU2R7BJHWHIY64SMQT76E/

[ovirt-users] Re: oVirt 4.4.0 Beta release is now available for testing

2020-03-27 Thread Nir Soffer

On Fri, Mar 27, 2020 at 5:52 PM Sandro Bonazzola 
wrote:

> oVirt 4.4.0 Beta release is now available for testing
>
> The oVirt Project is excited to announce the availability of the beta
> release of oVirt 4.4.0 for testing, as of March 27th, 2020
>
> This release unleashes an altogether more powerful and flexible open
> source virtualization solution that encompasses hundreds of individual
> changes and a wide range of enhancements across the engine, storage,
> network, user interface, and analytics on top of oVirt 4.3.
>
> Important notes before you try it
>
> Please note this is a Beta release.
>
> The oVirt Project makes no guarantees as to its suitability or usefulness.
>
> This pre-release must not to be used in production.
>
> In particular, please note that upgrades from 4.3 and future upgrades from
> this beta to the final 4.4 release from this version are not supported.
>
> Some of the features included in oVirt 4.4.0 Beta require content that
> will be available in CentOS Linux 8.2 which are currently included in Red
> Hat Enterprise Linux 8.2 beta. If you want to have a better experience you
> can test oVirt 4.4.0 Beta on Red Hat Enterprise Linux 8.2 beta.
>
> Known Issues
>
>-
>
>ovirt-imageio development is still in progress. In this beta you can’t
>upload images to data domains. You can still copy iso images into the
>deprecated ISO domain for installing VMs.
>
>
Correction, upload and download to/from data domains is fully functional via
the REST API and SDK.

For upload and download via the SDK, please see:
https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/upload_disk.py
https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/download_disk.py
Both scripts are standalone command line tool, try --help for more info.

Upload/download from UI (via browser) is not supported yet, since engine is
not
completely ported to python 3.

> Installation instructions
>
> For the engine: either use appliance or:
>
> - Install CentOS Linux 8 minimal from
> http://centos.mirror.garr.it/centos/8.1.1911/isos/x86_64/CentOS-8.1.1911-x86_64-dvd1.iso
>
> - dnf install
> https://resources.ovirt.org/pub/yum-repo/ovirt-release44-pre.rpm
>
> - dnf update (reboot if needed)
>
> - dnf module enable -y javapackages-tools pki-deps 389-ds
>
> - dnf install ovirt-engine
>
> - engine-setup
>
> For the nodes:
>
> Either use oVirt Node ISO or:
>
> - Install CentOS Linux 8 from
> http://centos.mirror.garr.it/centos/8.1.1911/isos/x86_64/CentOS-8.1.1911-x86_64-dvd1.iso
> ; select minimal installation
>
> - dnf install
> https://resources.ovirt.org/pub/yum-repo/ovirt-release44-pre.rpm
>
> - dnf update (reboot if needed)
>
> - Attach the host to engine and let it be deployed.
>
> What’s new in oVirt 4.4.0 Beta?
>
>-
>
>Hypervisors based on CentOS Linux 8 (rebuilt from award winning
>RHEL8), for both oVirt Node and standalone CentOS Linux hosts
>-
>
>Easier network management and configuration flexibility with
>NetworkManager
>-
>
>VMs based on a more modern Q35 chipset with legacy seabios and UEFI
>firmware
>-
>
>Support for direct passthrough of local host disks to VMs
>-
>
>Live migration improvements for High Performance guests.
>-
>
>New Windows Guest tools installer based on WiX framework now moved to
>VirtioWin project
>-
>
>Dropped support for cluster level prior to 4.2
>-
>
>Dropped SDK3 support
>-
>
>4K disks support
>
>
Correction, 4k is supported only for file based storage. iSCSI/FC storage
do not support 4k disks yet.


>
>-
>
>Exporting a VM to a data domain
>-
>
>Editing of floating disks
>-
>
>Integrating ansible-runner into engine, which allows a more detailed
>monitoring of playbooks executed from engine
>-
>
>Adding/reinstalling hosts are now completely based on Ansible
>-
>
>The OpenStack Neutron Agent cannot be configured by oVirt anymore, it
>should be configured by TripleO instead
>
>
> This release is available now on x86_64 architecture for:
>
> * Red Hat Enterprise Linux 8.1 or newer
>
> * CentOS Linux (or similar) 8.1 or newer
>
> This release supports Hypervisor Hosts on x86_64 and ppc64le architectures
> for:
>
> * Red Hat Enterprise Linux 8.1 or newer
>
> * CentOS Linux (or similar) 8.1 or newer
>
> * oVirt Node 4.4 based on CentOS Linux 8.1 (available for x86_64 only)
>
> See the release notes [1] for installation instructions and a list of new
> features and bugs fixed.
>
> If you manage more than one oVirt instance, OKD or RDO we also recommend
> to try ManageIQ .
>
> In such a case, please be sure  to take the qc2 image and not the ova
> image.
>
> Notes:
>
> - oVirt Appliance is already available for CentOS Linux 8
>
> - oVirt Node NG is already available for CentOS Linux 8
>
> Additional Resources:
>
> * Read more about the oVirt 4.4.0 release highlights:
> http://www.ovirt.org/release/4.4.0/
>

[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

2020-03-28 Thread Nir Soffer

On Sat, Mar 28, 2020 at 1:59 PM Strahil Nikolov  wrote:
>
> On March 28, 2020 11:03:54 AM GMT+02:00, Gianluca Cecchi 
>  wrote:
> >On Sat, Mar 28, 2020 at 8:39 AM Strahil Nikolov 
> >wrote:
> >
> >> On March 28, 2020 3:21:45 AM GMT+02:00, Gianluca Cecchi <
> >> gianluca.cec...@gmail.com> wrote:
> >>
> >>
> >[snip]
> >
> >>Actually it only happened with empty disk (thin provisioned) and
> >sudden
> >> >high I/O during the initial phase of install of the OS; it didn't
> >> >happened
> >> >then during normal operaton (even with 600MB/s of throughput).
> >>
> >
> >[snip]
> >
> >
> >> Hi Gianluca,
> >>
> >> Is it happening to machines with preallocated disks or on machines
> >with
> >> thin disks ?
> >>
> >> Best Regards,
> >> Strahil Nikolov
> >>
> >
> >thin provisioned. But as I have tro create many VMs with 120Gb of disk
> >size
> >of which probably only a part during time will be allocated, it would
> >be
> >unfeasible to make them all preallocated. I learned that thin is not
> >good
> >for block based storage domains and heavy I/O, but I would hope that it
> >is
> >not the same with file based storage domains...
> >Thanks,
> >Gianluca
>
> This is normal - gluster cannot allocate fast enough the needed shards (due 
> to high IO),  so the qemu pauses  the VM until  storage  is available  again .

I don't know glusterfs internals, but I think this is very unlikely.

For block storage thin provisioning in vdsm, vdsm is responsible for allocating
more space, but vdsm is not in the datapath, it is monitoring the allocation and
allocate more data when free space reaches a limit. It has no way to block I/O
before more space is available. Gluster is in the datapath and can
block I/O until
it can process it.

Can you explain what is the source for this theory?

> You can think about VDO (with deduplication ) as a  PV for the  Thin LVM and 
> this way you can preallocate your VMs , while saving space (deduplication, 
> zero-block elimination  and even compression).
> Of  course, VDO will reduce  performance (unless  you have battery-backed 
> write cache and compression is disabled),  but  tbe benefits will be alot 
> more.
>
> Another approach is to increase the shard size - so gluster will create fewer 
>  shards,  but allocation on disk will be higher.
>
> Best Regards,
> Strahil Nikolov
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/77DYUF7A5D6BIAYGVCBDKRBX2YWWJDJ4/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2LC5HGDMXJPOMVIYABLM77BRWG6LYOZJ/

[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

2020-03-28 Thread Nir Soffer

On Sat, Mar 28, 2020 at 5:00 AM Gianluca Cecchi
 wrote:
...
> Further information.
> What I see around time frame in gluster brick log file 
> gluster_bricks-vmstore-vmstore.log (timestamp is behind 1 hour in log file)
>
> [2020-03-27 23:30:38.575808] I [MSGID: 101055] 
> [client_t.c:436:gf_client_unref] 0-vmstore-server: Shutting down connection 
> CTX_ID:6e8f70b8-1946-4505-860f-be90e5807cb3-GRAPH_ID:0-PID:223418-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0
> [2020-03-27 23:35:15.281449] E [MSGID: 113072] 
> [posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed: 
> offset 0, [Invalid argument]
> [2020-03-27 23:35:15.281545] E [MSGID: 115067] 
> [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-vmstore-server: 34139378: 
> WRITEV 10 (00d9fe81-8a31-498e-8401-7b9d1477378e), client: 
> CTX_ID:d04437ba-ef98-43df-864f-5e9d3738620a-GRAPH_ID:0-PID:27687-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0,
>  error-xlator: vmstore-posix [Invalid argument]
> [2020-03-27 23:40:15.415794] E [MSGID: 113072] 
> [posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed: 
> offset 0, [Invalid argument]

Invalid arguments are expected when activating a storage domain, and
every 5 minutes when
storage domain are refreshed. The writes are performed to to a temporary file at
/rhev/data-center/mnt/server:_path/.prob-random-uuid

These logs do not show the path, so we don't know if the writes are
related to block size probing.

But in vdsm log we see:

2020-03-27 00:40:08,979+0100 INFO  (monitor/665ff83)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)

This call happens when vdsm is refreshing storage domain. Right before
this log, vdsm try to detect the underlying
storage block size.

So it looks like the gluster logs are related to block size probing
and are not related to the
I/O error that caused the VM to pause.

Looking at both "abnormal vm stop" and storage refresh events:

$ egrep 'Removing remnants of deleted images|abnormal vm stop' vdsm.log.18
2020-03-27 00:20:08,555+0100 INFO  (monitor/665ff83)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:24:38,254+0100 INFO  (monitor/0cb6ade)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:24:47,799+0100 INFO  (monitor/81b9724)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:25:08,660+0100 INFO  (monitor/665ff83)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:29:38,344+0100 INFO  (monitor/0cb6ade)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:29:47,901+0100 INFO  (monitor/81b9724)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:30:08,761+0100 INFO  (monitor/665ff83)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:34:38,436+0100 INFO  (monitor/0cb6ade)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:34:48,004+0100 INFO  (monitor/81b9724)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:35:08,877+0100 INFO  (monitor/665ff83)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:35:23,817+0100 INFO  (libvirt/events) [virt.vm]
(vmId='1abeafb6-72b2-4893-9cc5-41846b737670') abnormal vm stop device
ua-3b753210-09f5-4c40-90fb-ded93b00d19f error eother (vm:5079)

VM disk is on:
/rhev/data-center/mnt/glusterSD/ovirtst.mydomain.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/3b753210-09f5-4c40-90fb-ded93b00d19f/5a4aac90-a455-41b4-80dd-cf8c1ed81893

2020-03-27 00:39:38,536+0100 INFO  (monitor/0cb6ade)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:39:48,089+0100 INFO  (monitor/81b9724)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:40:08,979+0100 INFO  (monitor/665ff83)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:44:38,623+0100 INFO  (monitor/0cb6ade)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:44:48,166+0100 INFO  (monitor/81b9724)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:45:09,107+0100 INFO  (monitor/665ff83)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:49:38,719+0100 INFO  (monitor/0cb6ade)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:49:48,246+0100 INFO  (monitor/81b9724)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:50:09,234+0100 INFO  (monitor/665ff83)
[storage.StorageDomain] Removing remnants of deleted images []
(fileSD:726)
2020-03-27 00:53:07,405+0100 INFO  (libvirt/events) [virt.vm]

[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

2020-03-28 Thread Nir Soffer

On Sat, Mar 28, 2020 at 9:47 PM Strahil Nikolov  wrote:
>
> On March 28, 2020 7:26:33 PM GMT+02:00, Nir Soffer  wrote:
> >On Sat, Mar 28, 2020 at 1:59 PM Strahil Nikolov 
> >wrote:
> >>
> >> On March 28, 2020 11:03:54 AM GMT+02:00, Gianluca Cecchi
> > wrote:
> >> >On Sat, Mar 28, 2020 at 8:39 AM Strahil Nikolov
> >
> >> >wrote:
> >> >
> >> >> On March 28, 2020 3:21:45 AM GMT+02:00, Gianluca Cecchi <
> >> >> gianluca.cec...@gmail.com> wrote:
> >> >>
> >> >>
> >> >[snip]
> >> >
> >> >>Actually it only happened with empty disk (thin provisioned) and
> >> >sudden
> >> >> >high I/O during the initial phase of install of the OS; it didn't
> >> >> >happened
> >> >> >then during normal operaton (even with 600MB/s of throughput).
> >> >>
> >> >
> >> >[snip]
> >> >
> >> >
> >> >> Hi Gianluca,
> >> >>
> >> >> Is it happening to machines with preallocated disks or on machines
> >> >with
> >> >> thin disks ?
> >> >>
> >> >> Best Regards,
> >> >> Strahil Nikolov
> >> >>
> >> >
> >> >thin provisioned. But as I have tro create many VMs with 120Gb of
> >disk
> >> >size
> >> >of which probably only a part during time will be allocated, it
> >would
> >> >be
> >> >unfeasible to make them all preallocated. I learned that thin is not
> >> >good
> >> >for block based storage domains and heavy I/O, but I would hope that
> >it
> >> >is
> >> >not the same with file based storage domains...
> >> >Thanks,
> >> >Gianluca
> >>
> >> This is normal - gluster cannot allocate fast enough the needed
> >shards (due to high IO),  so the qemu pauses  the VM until  storage  is
> >available  again .
> >
> >I don't know glusterfs internals, but I think this is very unlikely.
> >
> >For block storage thin provisioning in vdsm, vdsm is responsible for
> >allocating
> >more space, but vdsm is not in the datapath, it is monitoring the
> >allocation and
> >allocate more data when free space reaches a limit. It has no way to
> >block I/O
> >before more space is available. Gluster is in the datapath and can
> >block I/O until
> >it can process it.
> >
> >Can you explain what is the source for this theory?
> >
> >> You can think about VDO (with deduplication ) as a  PV for the  Thin
> >LVM and this way you can preallocate your VMs , while saving space
> >(deduplication, zero-block elimination  and even compression).
> >> Of  course, VDO will reduce  performance (unless  you have
> >battery-backed write cache and compression is disabled),  but  tbe
> >benefits will be alot more.
> >>
> >> Another approach is to increase the shard size - so gluster will
> >create fewer  shards,  but allocation on disk will be higher.
> >>
> >> Best Regards,
> >> Strahil Nikolov
> >> ___
> >> Users mailing list -- users@ovirt.org
> >> To unsubscribe send an email to users-le...@ovirt.org
> >> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> >> oVirt Code of Conduct:
> >https://www.ovirt.org/community/about/community-guidelines/
> >> List Archives:
> >https://lists.ovirt.org/archives/list/users@ovirt.org/message/77DYUF7A5D6BIAYGVCBDKRBX2YWWJDJ4/
> >___
> >Users mailing list -- users@ovirt.org
> >To unsubscribe send an email to users-le...@ovirt.org
> >Privacy Statement: https://www.ovirt.org/privacy-policy.html
> >oVirt Code of Conduct:
> >https://www.ovirt.org/community/about/community-guidelines/
> >List Archives:
> >https://lists.ovirt.org/archives/list/users@ovirt.org/message/2LC5HGDMXJPOMVIYABLM77BRWG6LYOZJ/
>
> Hey Nir,
> You are right ... This is just a theory based on my knowledge and it might 
> not be valid.
> We nees the libvirt logs to confirm or reject  the theory, but I'm convinced 
> that is the reason.
>
> Yet,  it's quite  possible.
> Qemu tries to write to the qcow disk on gluster.
> Gluster is creating shards based of the ofset, as it was not done initially 
> (preallocated  disk  take the full size  on gluster  and all shards are 
> created  immediately). This takes time and requires  to be done

[ovirt-users] Re: Import storage domain with different storage type?

2020-03-28 Thread Nir Soffer

On Thu, Mar 19, 2020 at 4:30 PM Rik Theys  wrote:
>
> Hi,
>
> We have an oVirt environment with a FC storage domain. Multiple LUNs on
> a SAN are exported to the oVirt nodes and combined in a single FC
> storage domain.
>
> The SAN replicates the disks to another storage box that has iSCSI
> connectivity.
>
> Is it possible to - in case of disaster - import the existing,
> replicated, storage domain as an iSCSI domain and import/run the VM's
> from that domain? Or is import of a storage domain only possible if they
> are the same type? Does it also work if multiple LUNs are needed to form
> the storage domain?

If you detach the original (broken) FC storage domain you will be able
to connect to the iSCSI server, and import the same domains from backup.

You can try to create a test storage domain on one LUN, create a VM on
this domain and wait until this storage domain is replicated to the other
storage.

Then detach the domain, and try to connect and attach the replicated domain
and import the vm from this domain.

> Are there any special actions that should be performed beyond the
> regular import action?

No

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AE6ZJF2KUYZYHHDTC2FKUPODEFKGBWQW/

[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

2020-03-29 Thread Nir Soffer

On Sun, Mar 29, 2020 at 2:42 AM Gianluca Cecchi
 wrote:
>
> On Sat, Mar 28, 2020 at 7:34 PM Nir Soffer  wrote:
>>
>> On Sat, Mar 28, 2020 at 5:00 AM Gianluca Cecchi
>>  wrote:
>> ...
>> > Further information.
>> > What I see around time frame in gluster brick log file 
>> > gluster_bricks-vmstore-vmstore.log (timestamp is behind 1 hour in log file)
>> >
>> > [2020-03-27 23:30:38.575808] I [MSGID: 101055] 
>> > [client_t.c:436:gf_client_unref] 0-vmstore-server: Shutting down 
>> > connection 
>> > CTX_ID:6e8f70b8-1946-4505-860f-be90e5807cb3-GRAPH_ID:0-PID:223418-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0
>> > [2020-03-27 23:35:15.281449] E [MSGID: 113072] 
>> > [posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed: 
>> > offset 0, [Invalid argument]
>> > [2020-03-27 23:35:15.281545] E [MSGID: 115067] 
>> > [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-vmstore-server: 34139378: 
>> > WRITEV 10 (00d9fe81-8a31-498e-8401-7b9d1477378e), client: 
>> > CTX_ID:d04437ba-ef98-43df-864f-5e9d3738620a-GRAPH_ID:0-PID:27687-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0,
>> >  error-xlator: vmstore-posix [Invalid argument]
>> > [2020-03-27 23:40:15.415794] E [MSGID: 113072] 
>> > [posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed: 
>> > offset 0, [Invalid argument]
>>
>> Invalid arguments are expected when activating a storage domain, and
>> every 5 minutes when
>> storage domain are refreshed. The writes are performed to to a temporary 
>> file at
>> /rhev/data-center/mnt/server:_path/.prob-random-uuid
>>
>> These logs do not show the path, so we don't know if the writes are
>> related to block size probing.
>>
>> But in vdsm log we see:
>>
>> 2020-03-27 00:40:08,979+0100 INFO  (monitor/665ff83)
>> [storage.StorageDomain] Removing remnants of deleted images []
>> (fileSD:726)
>>
>> This call happens when vdsm is refreshing storage domain. Right before
>> this log, vdsm try to detect the underlying
>> storage block size.
>>
>> So it looks like the gluster logs are related to block size probing
>> and are not related to the
>> I/O error that caused the VM to pause.
>
>
> [snip]
>
>>
>> Looking at both "abnormal vm stop" and storage refresh events:
>>
> [snip]
>
>>
>> I don't see any relation between refreshes and the abnormal vm stop events.
>>
>> I think the key to understanding this is to enable more verbose logs
>> in gluster understand what was
>> the failure that caused the vm to stop.
>>
>
>
> Ah, ok. Thanks
> It seems default gluster logs level are INFO and I can have them more verbose 
> for a limited amount of time seeing if more information is provided.
> Can I do it with VMs running and only doing sort of reload of the service or 
> do I have to stop all to do it?

I don't know about gluster logs, you obviously cannot stop the server
or the mount
helper on the client side to change log level, and I don't know if
they support reloading
configuration while running. Asking on gluster mailing list will help.

>> It would also help if we had detailed error logs in qemu log in
>> /var/log/libvirt/qemu/vm-name.log
>
>
> I will find them. The system is not available to check right now
>
>>
>> Did you enable libvirt logs? We may have more information about the error 
>> there.
>>
>> You can enable logs by modifying these lines in /etc/libvirt/libvirtd.conf:
>>
>> log_filters="1:qemu 1:libvirt 4:object 4:json 4:event 1:util"
>> log_outputs="1:file:/var/log/libvirt/libvirtd.log"
>>
>> And restart libvirt.
>>
>> Note that libvirt log may be huge, so you need to watch it and change
>> the log level or filter after you collect what you need.
>>
>> To log only warning and errors use:
>>
>> log_outputs="3:file:/var/log/libvirt/libvirtd.log"
>>
>> Someone from gluster should help debugging this.
>>
>> Nir
>>
>
> Ok, I could also try this way if enabling more verbose gluster logs is not 
> sufficient.

Enabling warnings in libvirt logs is probably wanted anyway. The
warnings and errors
can add more info about this failure.

See this for changing libvirt log level without restarting libvirt:
https://wiki.libvirt.org/page/DebugLogs#Runtime_setting

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JRCGWHJWY4TSK64KHYEONLIMLFEKXF7C/

[ovirt-users] Re: Hosted Engine stalled and unable to restart

2020-04-01 Thread Nir Soffer

On Wed, Apr 1, 2020 at 10:37 PM Mark Steele  wrote:

> Eric - thank you - we have added this to our Wiki.
>

Looks like content for ovirt site:
https://github.com/ovirt/ovirt-site

To get read-only access you can use:

virsh -r list

Without special configuration.

Using read-write access and modifying vms managed by oVirt is asking for
trouble. Should be
used as a last resort if there is no way to control the vm via oVirt.

Nir

Best regards,
>
> ***
> *Mark Steele*
> CIO / VP Technical Operations | TelVue Corporation
> TelVue - We Share Your Vision
> 16000 Horizon Way, Suite 100 | Mt. Laurel, NJ 08054
> 800.885.8886 | mste...@telvue.com | http://www.telvue.com
> twitter: http://twitter.com/telvue | facebook:
> https://www.facebook.com/telvue
>
>
> On Wed, Apr 1, 2020 at 3:18 PM  wrote:
>
>> Sasl-passwd  It will give you access to virsh on nodes to
>> check status of vm’s etc.
>>
>>
>>
>> Eric Evans
>>
>> Digital Data Services LLC.
>>
>> 304.660.9080
>>
>>
>>
>> *From:* Mark Steele 
>> *Sent:* Tuesday, March 31, 2020 9:54 PM
>> *To:* users 
>> *Subject:* [ovirt-users] Hosted Engine stalled and unable to restart
>>
>>
>>
>> Hello,
>>
>>
>>
>> We are on an older version (3.x - cannot be specific as I cannot get my
>> ovirt hosted engine up).
>>
>>
>>
>> We experienced a storage failure earlier this evening - the hosted engine
>> was originally installed with this storage domain although we have moved
>> all VM's and disks off of it.
>>
>>
>>
>> The storage was restored and all the VM's are now running, but the ovirt
>> engine is not pinging and is unreachable.
>>
>>
>>
>> I have attempted to locate it on my HV's using 'virsh list --all' but
>> only one of those is taking my credentials - all the others fail to
>> authenticate.
>>
>>
>>
>> Is there a way to locate what the credentials are on each HV since the
>> default is not working? Additionally, is there any other way to locate the
>> hosted engine and restart it directly from a HV?
>>
>>
>>
>> Thank you for your time and consideration.
>>
>>
>>
>>
>>
>> ***
>>
>> *Mark Steele*
>>
>>
>>
>>
>> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/GRGF65PIVAIFFBRL2SGUOBVZY24KSMB2/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/22HIYZNOQPGCECTFKV7RVMJIFJWCRWP2/

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 1152 matches

Mail list logo