Re: [ovirt-users] VM has been paused due to no Storage space error.

Nir Soffer Thu, 14 Apr 2016 02:18:59 -0700

On Thu, Apr 14, 2016 at 12:02 PM, Fred Rolland <[email protected]> wrote:
> From the log, we can see that the lvextend command took 18 sec, which is
> quite long.


Fred, can you run repoplot on this log file? it will may explain why this lvm
call took 18 seconds.

Nir

>
> 60decf0c-6d9a-4c3b-bee6-de9d2ff05e85::DEBUG::2016-04-13
> 10:52:06,759::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset
> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm lvextend --config ' devices {
> preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1
> write_cache_state=0 disable_after_error_count=3 filter = [
> '\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\'', '\''r|.*|'\'' ] }
> global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1
> use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --autobackup
> n --size 6016m
> 5de4a000-a9c4-489c-8eee-10368647c413/721d09bc-60e7-4310-9ba2-522d2a4b03d0
> (cwd None)
> ....
> 60decf0c-6d9a-4c3b-bee6-de9d2ff05e85::DEBUG::2016-04-13
> 10:52:22,217::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '
> WARNING: lvmetad is running but disabled. Restart lvmetad before enabling
> it!\n  WARNING: This metadata update is NOT backed up\n'; <rc> = 0
>
>
> The watermark can be configured by the following value:
>
> 'volume_utilization_percent', '50',
>     'Together with volume_utilization_chunk_mb, set the minimal free '
>     'space before a thin provisioned block volume is extended. Use '
>     'lower values to extend earlier.')
>
> On Thu, Apr 14, 2016 at 11:42 AM, Michal Skrivanek
> <[email protected]> wrote:
>>
>>
>> > On 14 Apr 2016, at 09:57, [email protected] wrote:
>> >
>> > Ok, that makes sense, thanks for the insight both Alex and Fred. I'm
>> > attaching the VDSM log of the SPM node at the time of the pause. I couldn't
>> > find anything that would clearly identify the problem, but maybe you'll be
>> > able to.
>>
>> In extreme conditions it will happen. When your storage is slow to respond
>> to extension request, and when your write rate is very high then it may
>> happen, as it is happening to you, that you run out space sooner than the
>> extension finishes. You can change the watermark value I guess(right,
>> Fred?), but better would be to plan a bit more ahead and either use
>> preallocated or create thin and then allocate expected size in advance
>> before the operation causing it (typically it only happens during untarring
>> gigabytes of data, or huge database dump/restore)
>> Even then, the VM should always be automatially resumed once the disk
>> space is allocated
>>
>> Thanks,
>> michal
>>
>> >
>> > Thanks.
>> >
>> > Regards.
>> >
>> > El 2016-04-13 13:09, Fred Rolland escribió:
>> >> Hi,
>> >> Yes, just as Alex explained, if the disk has been created as thin
>> >> provisioning, the vdsm will extends once a watermark is reached.
>> >> Usually it should not get to the state the Vm is paused.
>> >> From the log, you can see that the request for extension has been sent
>> >> before the VM got to the No Space Error.
>> >> Later, we can see the VM resuming.
>> >> INFO::2016-04-13
>> >> 10:52:04,182::vm::1026::virt.vm::(extendDrivesIfNeeded)
>> >> vmId=`f9cd282e-110a-4896-98d3-6d320662744d`::Requesting extension for
>> >> volume
>> >> ....
>> >> INFO::2016-04-13 10:52:29,360::vm::3728::virt.vm::(onIOError)
>> >> vmId=`f9cd282e-110a-4896-98d3-6d320662744d`::abnormal vm stop device
>> >> virtio-disk0 error enospc
>> >> ....
>> >> INFO::2016-04-13 10:52:54,317::vm::5084::virt.vm::(_logGuestCpuStatus)
>> >> vmId=`f9cd282e-110a-4896-98d3-6d320662744d`::CPU running: onResume
>> >> Note that the extension is done on the SPM host, so it would be
>> >> interesting to see the vdsm log from the host that was in SPM role at
>> >> this timeframe.
>> >> Regards,
>> >> Fred
>> >> On Wed, Apr 13, 2016 at 2:43 PM, Alex Crow <[email protected]>
>> >> wrote:
>> >>> Hi,
>> >>> If you have set up VM disks as Thin Provisioned, the VM has to
>> >>> pause when the disk image needs to expand. You won't see this on VMs
>> >>> with preallocated storage.
>> >>> It's not the SAN that's running out of space, it's the VM image
>> >>> needing to be expanded incrementally each time.
>> >>> Cheers
>> >>> Alex
>> >>> On 13/04/16 12:04, [email protected] wrote:
>> >>> Hi Fred,
>> >>> This is an iSCSI storage. I'm attaching the VDSM logs from the host
>> >>> where this machine has been running. Should you need any further
>> >>> info, don't hesitate to ask.
>> >>> Thanks.
>> >>> Regards.
>> >>> El 2016-04-13 11:54, Fred Rolland escribió:
>> >>> Hi,
>> >>> What kind of storage do you have ? (ISCSI,FC,NFS...)
>> >>> Can you provide the vdsm logs from the host where this VM runs ?
>> >>> Thanks,
>> >>> Freddy
>> >>> On Wed, Apr 13, 2016 at 1:02 PM, <[email protected]> wrote:
>> >>> Hi,
>> >>> We're running oVirt 3.6.4.1-1. Lately we're seeing a bunch of
>> >>> events like these:
>> >>> 2016-04-13 10:52:30,735 INFO
>> >>> [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
>> >>> (DefaultQuartzScheduler_Worker-86) [60dea18f] VM
>> >>> 'f9cd282e-110a-4896-98d3-6d320662744d'(vm.domain.com [1] [1]) moved
>> >>> from
>> >>> 'Up' --> 'Paused'
>> >>> 2016-04-13 10:52:30,815 INFO
>> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> >>> (DefaultQuartzScheduler_Worker-86) [60dea18f] Correlation ID: null,
>> >>> Call Stack: null, Custom Event ID: -1, Message: VM vm.domain.com
>> >>> [1] [1]
>> >>> has been paused.
>> >>> 2016-04-13 10:52:30,898 ERROR
>> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> >>> (DefaultQuartzScheduler_Worker-86) [60dea18f] Correlation ID: null,
>> >>> Call Stack: null, Custom Event ID: -1, Message: VM vm.domain.com
>> >>> [1] [1]
>> >>> has been paused due to no Storage space error.
>> >>> 2016-04-13 10:52:52,320 WARN
>> >>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>> >>> (org.ovirt.thread.pool-8-thread-38) [] domain
>> >>> '5de4a000-a9c4-489c-8eee-10368647c413:iscsi01' in problem. vds:
>> >>> 'host6.domain.com [2] [2]'
>> >>> 2016-04-13 10:52:55,183 INFO
>> >>> [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
>> >>> (DefaultQuartzScheduler_Worker-70) [3da0f3d4] VM
>> >>> 'f9cd282e-110a-4896-98d3-6d320662744d'(vm.domain.com [1] [1]) moved
>> >>> from
>> >>> 'Paused' --> 'Up'
>> >>> 2016-04-13 10:52:55,318 INFO
>> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> >>> (DefaultQuartzScheduler_Worker-70) [3da0f3d4] Correlation ID: null,
>> >>> Call Stack: null, Custom Event ID: -1, Message: VM vm.domain.com
>> >>> [1] [1]
>> >>> has recovered from paused back to up.
>> >>> The storage domain is far from being full, though (400+ G available
>> >>> right now). Could this be related to this other issue [1]? If not,
>> >>> how could I debug what's going on?
>> >>> Thanks.
>> >>>  [1]: https://www.mail-archive.com/[email protected]/msg32079.html
>> >>> [3]
>> >>> [3]
>> >>> _______________________________________________
>> >>> Users mailing list
>> >>> [email protected]
>> >>> http://lists.ovirt.org/mailman/listinfo/users [4] [4]
>> >>> Links:
>> >>> ------
>> >>> [1] http://vm.domain.com [1]
>> >>> [2] http://host6.domain.com [2]
>> >>> [3] https://www.mail-archive.com/[email protected]/msg32079.html [3]
>> >>> [4] http://lists.ovirt.org/mailman/listinfo/users [4]
>> >> _______________________________________________
>> >> Users mailing list
>> >> [email protected]
>> >> http://lists.ovirt.org/mailman/listinfo/users [4]
>> >> --
>> >> This message is intended only for the addressee and may contain
>> >> confidential information. Unless you are that person, you may not
>> >> disclose its contents or use it in any way and are requested to delete
>> >> the message along with any attachments and notify us immediately.
>> >> This email is not intended to, nor should it be taken to, constitute
>> >> advice.
>> >> The information provided is correct to our knowledge & belief and must
>> >> not
>> >> be used as a substitute for obtaining tax, regulatory, investment,
>> >> legal or
>> >> any other appropriate advice.
>> >> "Transact" is operated by Integrated Financial Arrangements Ltd.
>> >> 29 Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020)
>> >> 7608 5300.
>> >> (Registered office: as above; Registered in England and Wales under
>> >> number: 3727592). Authorised and regulated by the Financial Conduct
>> >> Authority (entered on the Financial Services Register; no. 190856).
>> >> _______________________________________________
>> >> Users mailing list
>> >> [email protected]
>> >> http://lists.ovirt.org/mailman/listinfo/users [4]
>> >> Links:
>> >> ------
>> >> [1] http://vm.domain.com
>> >> [2] http://host6.domain.com
>> >> [3] https://www.mail-archive.com/[email protected]/msg32079.html
>> >> [4] http://lists.ovirt.org/mailman/listinfo/users
>> >> _______________________________________________
>> >> Users mailing list
>> >> [email protected]
>> >> http://lists.ovirt.org/mailman/listinfo/users
>> > <vdsm.log.gz>_______________________________________________
>> > Users mailing list
>> > [email protected]
>> > http://lists.ovirt.org/mailman/listinfo/users
>>
>
>
> _______________________________________________
> Users mailing list
> [email protected]
> http://lists.ovirt.org/mailman/listinfo/users
>
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] VM has been paused due to no Storage space error.

Reply via email to