[ovirt-users] Re: NFS storage was locked for 45 minutes after I attempted a clone operation

2021-09-03 Thread Strahil Nikolov via Users
That's really odd. Maybe you can try to clone it and then experiment on the 
clone itself. Once the reason is found out, you can try with the original.
My first look is to check all logs on the engine and the SPM for clues.
Best Regards,Strahil Nikolov 
 
  On Fri, Sep 3, 2021 at 11:42, David White via Users wrote:   
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6XHROFQDWZY4Y6Z5LWWORTEJKCDBYIPT/
  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AHSTIYM54FND422YEJ6GRXPJ4RTGGNAF/


[ovirt-users] Re: NFS storage was locked for 45 minutes after I attempted a clone operation

2021-09-03 Thread David White via Users
The save operation is going at a snail's pace, though.

Using "watch du -skh", I counted about 5-7 seconds per .1 GB (1/10 of 1GB).
It's a virtual disk, but I'm using over 200GB... so at this rate, it'll take a 
very long time.

I wonder if Pascal is on to something, and the export is happening over the 
frontend 1GB network?

I'm going to cancel this operation, as the VM has now been down for close to an 
hour.

Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐
On Friday, September 3rd, 2021 at 4:33 AM, David White 
 wrote:

> Update perhaps I have discovered a bug somewhere?
> 

> I started another export after hours (it's very early morning hours right 
> now, and I can tolerate a little downtime on this VM). I had the same 
> symptoms, but this time, I just left it alone. I waited about 45 minutes with 
> no progress.
> 

> I then ssh'd to the NFS destination (also on the 10Gbps storage network), and 
> running tcpdump, I didn't see any traffic coming across the wire.
> 

> So I then powered off my VM, and I immediately began to see a new backup 
> image appear in my NFS export. 
> 

> I wonder if the VM was trying to snapshot the memory and there wasn't enough 
> on the host or something? The VM has 16GB of RAM, and there are multiple VMs 
> on that host (although the host itself has 64GB of physical RAM, so should 
> have been plenty).
> 

> Sent with ProtonMail Secure Email.
> 

> ‐‐‐ Original Message ‐‐‐
> On Friday, September 3rd, 2021 at 4:10 AM, David White via Users 
>  wrote:
> 

> > In this particular case, I have 1 (one) 250GB virtual disk..
> > 

> > Sent with ProtonMail Secure Email.
> > 

> > ‐‐‐ Original Message ‐‐‐
> > On Tuesday, August 31st, 2021 at 11:21 PM, Strahil Nikolov 
> >  wrote:
> > 

> > > Hi David,
> > > 

> > > how big are your VM disks ?
> > > 

> > > I suppose you have several very large ones.
> > > 

> > > Best Regards,
> > > Strahil Nikolov
> > > 

> > > Sent from Yahoo Mail on Android
> > > 

> > > > On Thu, Aug 26, 2021 at 3:27, David White via Users
> > > >  wrote:
> > > > I have an HCI cluster running on Gluster storage. I exposed an NFS 
> > > > share into oVirt as a storage domain so that I could clone all of my 
> > > > VMs (I'm preparing to move physically to a new datacenter). I got 3-4 
> > > > VMs cloned perfectly fine yesterday. But then this evening, I tried to 
> > > > clone a big VM, and it caused the disk to lock up. The VM went totally 
> > > > unresponsive, and I didn't see a way to cancel the clone. Nagios NRPE 
> > > > (on the client VM) was reporting server load over 65+, but I was never 
> > > > able to establish an SSH connection. 
> > > > 

> > > > Eventually, I tried restarting the ovirt-engine, per 
> > > > https://access.redhat.com/solutions/396753. When that didn't work, I 
> > > > powered down the VM completely. But the disks were still locked. So I 
> > > > then tried to put the storage domain into maintenance mode, but that 
> > > > wound up putting the entire domain into a "locked" state. Finally, 
> > > > eventually, the disks unlocked, and I was able to power the VM back 
> > > > online.
> > > > 

> > > > From start to finish, my VM was down for about 45 minutes, including 
> > > > the time when NRPE was still sending data to Nagios.
> > > > 

> > > > What logs should I look at, and how can I troubleshoot what went wrong 
> > > > here, and hopefully avoid this from happening again?
> > > > 

> > > > Sent with ProtonMail Secure Email.
> > > > 

> > > > ___
> > > > Users mailing list -- users@ovirt.org
> > > > To unsubscribe send an email to users-le...@ovirt.org
> > > > Privacy Statement: https://www.ovirt.org/privacy-policy.html
> > > > oVirt Code of Conduct: 
> > > > https://www.ovirt.org/community/about/community-guidelines/
> > > > List Archives: 
> > > > https://lists.ovirt.org/archives/list/users@ovirt.org/message/ASEENELT4TRTXQ7MF4FKB6L75D3H75AN/

publickey - dmwhite823@protonmail.com - 0x320CD582.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6XHROFQDWZY4Y6Z5LWWORTEJKCDBYIPT/


[ovirt-users] Re: NFS storage was locked for 45 minutes after I attempted a clone operation

2021-09-03 Thread David White via Users
Update perhaps I have discovered a bug somewhere?

I started another export after hours (it's very early morning hours right now, 
and I can tolerate a little downtime on this VM). I had the same symptoms, but 
this time, I just left it alone. I waited about 45 minutes with no progress.

I then ssh'd to the NFS destination (also on the 10Gbps storage network), and 
running tcpdump, I didn't see any traffic coming across the wire.

So I then powered off my VM, and I immediately began to see a new backup image 
appear in my NFS export. 

I wonder if the VM was trying to snapshot the memory and there wasn't enough on 
the host or something? The VM has 16GB of RAM, and there are multiple VMs on 
that host (although the host itself has 64GB of physical RAM, so should have 
been plenty).

Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐

On Friday, September 3rd, 2021 at 4:10 AM, David White via Users 
 wrote:

> In this particular case, I have 1 (one) 250GB virtual disk..
> 

> Sent with ProtonMail Secure Email.
> 

> ‐‐‐ Original Message ‐‐‐
> 

> On Tuesday, August 31st, 2021 at 11:21 PM, Strahil Nikolov 
>  wrote:
> 

> > Hi David,
> > 

> > how big are your VM disks ?
> > 

> > I suppose you have several very large ones.
> > 

> > Best Regards,Strahil Nikolov
> > 

> > Sent from Yahoo Mail on Android
> > 

> > > On Thu, Aug 26, 2021 at 3:27, David White via Users 
> > > wrote:I have an HCI cluster running on Gluster storage. I exposed an NFS 
> > > share into oVirt as a storage domain so that I could clone all of my VMs 
> > > (I'm preparing to move physically to a new datacenter). I got 3-4 VMs 
> > > cloned perfectly fine yesterday. But then this evening, I tried to clone 
> > > a big VM, and it caused the disk to lock up. The VM went totally 
> > > unresponsive, and I didn't see a way to cancel the clone. Nagios NRPE (on 
> > > the client VM) was reporting server load over 65+, but I was never able 
> > > to establish an SSH connection. 
> > > 

> > > Eventually, I tried restarting the ovirt-engine, per 
> > > https://access.redhat.com/solutions/396753. When that didn't work, I 
> > > powered down the VM completely. But the disks were still locked. So I 
> > > then tried to put the storage domain into maintenance mode, but that 
> > > wound up putting the entire domain into a "locked" state. Finally, 
> > > eventually, the disks unlocked, and I was able to power the VM back 
> > > online.
> > > 

> > > From start to finish, my VM was down for about 45 minutes, including the 
> > > time when NRPE was still sending data to Nagios.
> > > 

> > > What logs should I look at, and how can I troubleshoot what went wrong 
> > > here, and hopefully avoid this from happening again?
> > > 

> > > Sent with ProtonMail Secure Email.
> > > 

> > > ___
> > > 

> > > Users mailing list -- users@ovirt.org
> > > 

> > > To unsubscribe send an email to users-le...@ovirt.org
> > > 

> > > Privacy Statement: https://www.ovirt.org/privacy-policy.html
> > > 

> > > oVirt Code of Conduct: 
> > > https://www.ovirt.org/community/about/community-guidelines/
> > > 

> > > List Archives: 
> > > https://lists.ovirt.org/archives/list/users@ovirt.org/message/ASEENELT4TRTXQ7MF4FKB6L75D3H75AN/

publickey - dmwhite823@protonmail.com - 0x320CD582.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PMKJX2YEAFHG574N375H3ASU3N3VR3UW/


[ovirt-users] Re: NFS storage was locked for 45 minutes after I attempted a clone operation

2021-09-03 Thread David White via Users
In this particular case, I have 1 (one) 250GB virtual disk..

Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐

On Tuesday, August 31st, 2021 at 11:21 PM, Strahil Nikolov 
 wrote:

> Hi David,
> 

> how big are your VM disks ?
> 

> I suppose you have several very large ones.
> 

> Best Regards,Strahil Nikolov
> 

> Sent from Yahoo Mail on Android
> 

> > On Thu, Aug 26, 2021 at 3:27, David White via Users 
> > wrote:I have an HCI cluster running on Gluster storage. I exposed an NFS 
> > share into oVirt as a storage domain so that I could clone all of my VMs 
> > (I'm preparing to move physically to a new datacenter). I got 3-4 VMs 
> > cloned perfectly fine yesterday. But then this evening, I tried to clone a 
> > big VM, and it caused the disk to lock up. The VM went totally 
> > unresponsive, and I didn't see a way to cancel the clone. Nagios NRPE (on 
> > the client VM) was reporting server load over 65+, but I was never able to 
> > establish an SSH connection. 
> > 

> > Eventually, I tried restarting the ovirt-engine, per 
> > https://access.redhat.com/solutions/396753. When that didn't work, I 
> > powered down the VM completely. But the disks were still locked. So I then 
> > tried to put the storage domain into maintenance mode, but that wound up 
> > putting the entire domain into a "locked" state. Finally, eventually, the 
> > disks unlocked, and I was able to power the VM back online.
> > 

> > From start to finish, my VM was down for about 45 minutes, including the 
> > time when NRPE was still sending data to Nagios.
> > 

> > What logs should I look at, and how can I troubleshoot what went wrong 
> > here, and hopefully avoid this from happening again?
> > 

> > Sent with ProtonMail Secure Email.
> > 

> > ___
> > 

> > Users mailing list -- users@ovirt.org
> > 

> > To unsubscribe send an email to users-le...@ovirt.org
> > 

> > Privacy Statement: https://www.ovirt.org/privacy-policy.html
> > 

> > oVirt Code of Conduct: 
> > https://www.ovirt.org/community/about/community-guidelines/
> > 

> > List Archives: 
> > https://lists.ovirt.org/archives/list/users@ovirt.org/message/ASEENELT4TRTXQ7MF4FKB6L75D3H75AN/

publickey - dmwhite823@protonmail.com - 0x320CD582.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CRWWNSNJTSATXRDIG7BHZDOQ3VCKQMT/


[ovirt-users] Re: NFS storage was locked for 45 minutes after I attempted a clone operation

2021-08-31 Thread Pascal D
On my setup I noticed that creating a template from a VM takes a very long time 
(around 35min for a VM disk of around 35G) even though my NFS storage network 
is 10G with 9000 MTU. 

I am wondering if creating a template is done on ovirtmngt instead of storage 
as this network is 1G but still the number don't add up comparing when I use 
perf3 to test my network speeds

Any idea why it is taking so long to create a template?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UJVCKMBN6FHUPLAHJG7MU5GM5O4FHER2/


[ovirt-users] Re: NFS storage was locked for 45 minutes after I attempted a clone operation

2021-08-31 Thread Strahil Nikolov via Users
Hi David,
how big are your VM disks ?
I suppose you have several very large ones.

Best Regards,Strahil Nikolov

Sent from Yahoo Mail on Android 
 
  On Thu, Aug 26, 2021 at 3:27, David White via Users wrote:   
I have an HCI cluster running on Gluster storage. I exposed an NFS share into 
oVirt as a storage domain so that I could clone all of my VMs (I'm preparing to 
move physically to a new datacenter). I got 3-4 VMs cloned perfectly fine 
yesterday. But then this evening, I tried to clone a big VM, and it caused the 
disk to lock up. The VM went totally unresponsive, and I didn't see a way to 
cancel the clone. Nagios NRPE (on the client VM) was reporting server load over 
65+, but I was never able to establish an SSH connection. 

Eventually, I tried restarting the ovirt-engine, per 
https://access.redhat.com/solutions/396753. When that didn't work, I powered 
down the VM completely. But the disks were still locked. So I then tried to put 
the storage domain into maintenance mode, but that wound up putting the entire 
domain into a "locked" state. Finally, eventually, the disks unlocked, and I 
was able to power the VM back online.

>From start to finish, my VM was down for about 45 minutes, including the time 
>when NRPE was still sending data to Nagios.

What logs should I look at, and how can I troubleshoot what went wrong here, 
and hopefully avoid this from happening again?

Sent with ProtonMail Secure Email.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ASEENELT4TRTXQ7MF4FKB6L75D3H75AN/
  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZT2NBSN2H44T2HD7RTJO7NC73ND37Y65/