Re: [RFC]Bypass Libvirt storage pool for NFS

2014-04-11 Thread Wido den Hollander



On 03/20/2014 07:51 PM, Nux! wrote:

On 20.03.2014 18:48, Wido den Hollander wrote:


And it just went upstream! How great is that?


Pretty great. That's how open source works. :)




Quick question: there is no problem if instead of using NFS directly we
use the shared mount point option, is it?



Probably not, since we would then be mounting the filesystem manually
instead of having libvirt do it.


Ok, so worst case scenario people can just fall-back on this option.



I'd say get this patch down into EL6 and I'll try to get it into
Ubuntu 14.04.


Done! I just got e-mail from Bugzilla: It's fixed in libvirt-0.10.2-32.el6

Wido



Yeah, fingers crossed on that!

Lucian



RE: [RFC]Bypass Libvirt storage pool for NFS

2014-03-20 Thread Nux!

On 19.03.2014 22:48, Edison Su wrote:

-Original Message-
From: Nux! [mailto:n...@li.nux.ro]
Sent: Wednesday, March 19, 2014 3:34 PM
To: dev@cloudstack.apache.org
Subject: RE: [RFC]Bypass Libvirt storage pool for NFS

On 19.03.2014 22:28, Edison Su wrote:
Edison, if - with the workarounds in place now - the current 
version
of KVM works OK, then why wouldn't a newer version work just as 
fine?

Just trying to understand this.


That's a long story, there is a bug in Libvirt, which is introduced 
in

a newer version(0.9.10), which can make the storage pool disappear.


Edison, that I understand, but what is the technical reason that 
prevents

using newer KVM?
It looks like current KVM works fine on CentOS 6.5 for example which 
has

libvirt 0.10.2.


Yes, at first glance, the newer version libvirt( 0.9.10) just works
fine. But under stress test, it will complain NFS storage pool
missing, and can't add the storage pool back, unless you shut down all
the VMs which using the storage pool. That's the
bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) all about.

In ACS 4.2/4.3 release, we only recommend to use libvirt =0.9.10, if
primary storage is NFS.


Ok, I'm trying to make some noise in that bz entry, hopefully someone 
gets annoyed enough to do something about it.


Quick question: there is no problem if instead of using NFS directly we 
use the shared mount point option, is it?


Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro


Re: [RFC]Bypass Libvirt storage pool for NFS

2014-03-20 Thread Wido den Hollander



On 03/20/2014 05:38 PM, Nux! wrote:

On 19.03.2014 22:48, Edison Su wrote:

-Original Message-
From: Nux! [mailto:n...@li.nux.ro]
Sent: Wednesday, March 19, 2014 3:34 PM
To: dev@cloudstack.apache.org
Subject: RE: [RFC]Bypass Libvirt storage pool for NFS

On 19.03.2014 22:28, Edison Su wrote:

Edison, if - with the workarounds in place now - the current version
of KVM works OK, then why wouldn't a newer version work just as fine?
Just trying to understand this.


That's a long story, there is a bug in Libvirt, which is introduced in
a newer version(0.9.10), which can make the storage pool disappear.


Edison, that I understand, but what is the technical reason that
prevents
using newer KVM?
It looks like current KVM works fine on CentOS 6.5 for example which has
libvirt 0.10.2.


Yes, at first glance, the newer version libvirt( 0.9.10) just works
fine. But under stress test, it will complain NFS storage pool
missing, and can't add the storage pool back, unless you shut down all
the VMs which using the storage pool. That's the
bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) all about.

In ACS 4.2/4.3 release, we only recommend to use libvirt =0.9.10, if
primary storage is NFS.


Ok, I'm trying to make some noise in that bz entry, hopefully someone
gets annoyed enough to do something about it.



And it just went upstream! How great is that?


Quick question: there is no problem if instead of using NFS directly we
use the shared mount point option, is it?



Probably not, since we would then be mounting the filesystem manually 
instead of having libvirt do it.


I'd say get this patch down into EL6 and I'll try to get it into Ubuntu 
14.04.


Wido


Lucian



Re: [RFC]Bypass Libvirt storage pool for NFS

2014-03-20 Thread Nux!

On 20.03.2014 18:48, Wido den Hollander wrote:


And it just went upstream! How great is that?


Pretty great. That's how open source works. :)



Quick question: there is no problem if instead of using NFS directly 
we

use the shared mount point option, is it?



Probably not, since we would then be mounting the filesystem manually
instead of having libvirt do it.


Ok, so worst case scenario people can just fall-back on this option.



I'd say get this patch down into EL6 and I'll try to get it into 
Ubuntu 14.04.


Yeah, fingers crossed on that!

Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro


RE: [RFC]Bypass Libvirt storage pool for NFS

2014-03-20 Thread Edison Su


 -Original Message-
 From: Nux! [mailto:n...@li.nux.ro]
 Sent: Thursday, March 20, 2014 11:51 AM
 To: dev@cloudstack.apache.org
 Subject: Re: [RFC]Bypass Libvirt storage pool for NFS
 
 On 20.03.2014 18:48, Wido den Hollander wrote:
 
  And it just went upstream! How great is that?
 
 Pretty great. That's how open source works. :)
Thanks guys to push it:)  
But fundamentally, I don't think Libvirt doing a right thing here, libvirt 
should not need to care about integrity of storage pool, as the storage pool is 
shared by multiple hypervisor hosts, one libvirt is only one of them, so it's 
useless and error-prone, to check the files on the storage pool.
If the integrity of storage pool is broken, then user should complain to 
CloudStack or other upper layer of cloud orchestration software.

 
 
  Quick question: there is no problem if instead of using NFS directly
  we use the shared mount point option, is it?
 
 
  Probably not, since we would then be mounting the filesystem manually
  instead of having libvirt do it.
 
 Ok, so worst case scenario people can just fall-back on this option.
 
 
  I'd say get this patch down into EL6 and I'll try to get it into
  Ubuntu 14.04.
 
 Yeah, fingers crossed on that!
 
 Lucian
 
 --
 Sent from the Delta quadrant using Borg technology!
 
 Nux!
 www.nux.ro


[RFC]Bypass Libvirt storage pool for NFS

2014-03-19 Thread Edison Su
I found many times in QA's testing environment, the libvirt storage 
pool(created on NFS) is missing on the kvm host frequently, for no reason. It 
may relate to bug https://bugzilla.redhat.com/show_bug.cgi?id=977706.
In order to fix this issue, and bug CLOUDSTACK-2729, we added a lot of 
workaround to fight with libvirt, such as, if can't find the storage pool, then 
create the same pool again etc. As the storage pool can be lost on kvm host at 
any time, it will cause a lot of operation errors, such as can't start vm, 
can't delete volume etc, etc.
I want to bypass libvirt storage pool for NFS, as java itself, already have all 
the capabilities that libvirt can provide, such as create a file, delete a 
file, list a directory etc, there is no need to add another layer of crap here. 
In doing so, we won't be blocked by libvirt 
bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) to support newer 
version of KVM.


Re: [RFC]Bypass Libvirt storage pool for NFS

2014-03-19 Thread Wido den Hollander



On 03/19/2014 07:54 PM, Edison Su wrote:

I found many times in QA's testing environment, the libvirt storage 
pool(created on NFS) is missing on the kvm host frequently, for no reason. It 
may relate to bug https://bugzilla.redhat.com/show_bug.cgi?id=977706.
In order to fix this issue, and bug CLOUDSTACK-2729, we added a lot of 
workaround to fight with libvirt, such as, if can't find the storage pool, then 
create the same pool again etc. As the storage pool can be lost on kvm host at 
any time, it will cause a lot of operation errors, such as can't start vm, 
can't delete volume etc, etc.
I want to bypass libvirt storage pool for NFS, as java itself, already have all 
the capabilities that libvirt can provide, such as create a file, delete a 
file, list a directory etc, there is no need to add another layer of crap here. 
In doing so, we won't be blocked by libvirt 
bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) to support newer 
version of KVM.



-1

I understand the issues which we see here, but imho the way forward is 
to fix this in libvirt instead of simply go around it.


We should not try to re-invent the wheel here, but fix the root-cause.

Yes, Java can do a lot, but I think libvirt can do this better.

For the RBD code I also had a couple of changes go into libvirt recently 
and this NFS issue can also be fixed.


Loosing NFS pools in libvirt is most of the times due to a restart of 
libvirt, they don't magically disappear from libvirt.


I agree that we should be able to start the pool again even while it's 
mounted, but that's something we should fix in libvirt.


Wido


Re: [RFC]Bypass Libvirt storage pool for NFS

2014-03-19 Thread Nux!

On 19.03.2014 19:01, Wido den Hollander wrote:

On 03/19/2014 07:54 PM, Edison Su wrote:
I found many times in QA's testing environment, the libvirt storage 
pool(created on NFS) is missing on the kvm host frequently, for no 
reason. It may relate to bug 
https://bugzilla.redhat.com/show_bug.cgi?id=977706.
In order to fix this issue, and bug CLOUDSTACK-2729, we added a lot 
of workaround to fight with libvirt, such as, if can't find the 
storage pool, then create the same pool again etc. As the storage pool 
can be lost on kvm host at any time, it will cause a lot of operation 
errors, such as can't start vm, can't delete volume etc, etc.
I want to bypass libvirt storage pool for NFS, as java itself, 
already have all the capabilities that libvirt can provide, such as 
create a file, delete a file, list a directory etc, there is no need 
to add another layer of crap here. In doing so, we won't be blocked by 
libvirt bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) to 
support newer version of KVM.




-1

I understand the issues which we see here, but imho the way forward
is to fix this in libvirt instead of simply go around it.

We should not try to re-invent the wheel here, but fix the root-cause.

Yes, Java can do a lot, but I think libvirt can do this better.

For the RBD code I also had a couple of changes go into libvirt
recently and this NFS issue can also be fixed.

Loosing NFS pools in libvirt is most of the times due to a restart of
libvirt, they don't magically disappear from libvirt.

I agree that we should be able to start the pool again even while
it's mounted, but that's something we should fix in libvirt.

Wido


-1 and 100% with Wido. If libvirt gets fixed then it would save loads 
of code in the future and bring other benefits (think support for Xen 
Project via libvirt etc).

Let's push for libvirt fix instead.

My 2 cents,
Lucian


--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro


RE: [RFC]Bypass Libvirt storage pool for NFS

2014-03-19 Thread Edison Su
It’s hard to find root cause and fix something in libvirt, even we found the 
root cause, it's hard to push the fix into libvirt upstream, and not to say 
push into downstream, like RHEL 6 etc. For example, we already have a fix for 
the bug https://bugzilla.redhat.com/show_bug.cgi?id=977706 for a few month now, 
there is no resolution to resolve the issue. Without the fix, we just simply 
are blocked to support newer version of KVM.

So if community doesn't like what I proposed, then how about another way:
I will write a new implementation of KVMStoragePool interface, which will be 
backed by java/python/shell script, it won't be enabled by default.  It's a 
simple thing, don't understand why libvirt gets it done so complicated, and 
introduce a lot of pain.

 -Original Message-
 From: Nux! [mailto:n...@li.nux.ro]
 Sent: Wednesday, March 19, 2014 12:35 PM
 To: dev@cloudstack.apache.org
 Subject: Re: [RFC]Bypass Libvirt storage pool for NFS
 
 On 19.03.2014 19:01, Wido den Hollander wrote:
  On 03/19/2014 07:54 PM, Edison Su wrote:
  I found many times in QA's testing environment, the libvirt storage
  pool(created on NFS) is missing on the kvm host frequently, for no
  reason. It may relate to bug
  https://bugzilla.redhat.com/show_bug.cgi?id=977706.
  In order to fix this issue, and bug CLOUDSTACK-2729, we added a lot
  of workaround to fight with libvirt, such as, if can't find the
  storage pool, then create the same pool again etc. As the storage
  pool can be lost on kvm host at any time, it will cause a lot of
  operation errors, such as can't start vm, can't delete volume etc, etc.
  I want to bypass libvirt storage pool for NFS, as java itself,
  already have all the capabilities that libvirt can provide, such as
  create a file, delete a file, list a directory etc, there is no need
  to add another layer of crap here. In doing so, we won't be blocked
  by libvirt bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) to
  support newer version of KVM.
 
 
  -1
 
  I understand the issues which we see here, but imho the way forward is
  to fix this in libvirt instead of simply go around it.
 
  We should not try to re-invent the wheel here, but fix the root-cause.
 
  Yes, Java can do a lot, but I think libvirt can do this better.
 
  For the RBD code I also had a couple of changes go into libvirt
  recently and this NFS issue can also be fixed.
 
  Loosing NFS pools in libvirt is most of the times due to a restart of
  libvirt, they don't magically disappear from libvirt.
 
  I agree that we should be able to start the pool again even while it's
  mounted, but that's something we should fix in libvirt.
 
  Wido
 
 -1 and 100% with Wido. If libvirt gets fixed then it would save loads of code 
 in
 the future and bring other benefits (think support for Xen Project via libvirt
 etc).
 Let's push for libvirt fix instead.
 
 My 2 cents,
 Lucian
 
 
 --
 Sent from the Delta quadrant using Borg technology!
 
 Nux!
 www.nux.ro


Re: [RFC]Bypass Libvirt storage pool for NFS

2014-03-19 Thread Ahmad Emneina
I'm all for adding a bit of additional smarts to cloudstack so it can
workaround the current KVM limitations. Waiting for anything to get fixed
up stream is affecting deployments NOW, and a bit utopian. cloudstack seems
to be the lower barrier to entry on getting these scenarios addressed.


On Wed, Mar 19, 2014 at 1:29 PM, Edison Su edison...@citrix.com wrote:

 It's hard to find root cause and fix something in libvirt, even we found
 the root cause, it's hard to push the fix into libvirt upstream, and not to
 say push into downstream, like RHEL 6 etc. For example, we already have a
 fix for the bug https://bugzilla.redhat.com/show_bug.cgi?id=977706 for a
 few month now, there is no resolution to resolve the issue. Without the
 fix, we just simply are blocked to support newer version of KVM.

 So if community doesn't like what I proposed, then how about another way:
 I will write a new implementation of KVMStoragePool interface, which will
 be backed by java/python/shell script, it won't be enabled by default.
  It's a simple thing, don't understand why libvirt gets it done so
 complicated, and introduce a lot of pain.

  -Original Message-
  From: Nux! [mailto:n...@li.nux.ro]
  Sent: Wednesday, March 19, 2014 12:35 PM
  To: dev@cloudstack.apache.org
  Subject: Re: [RFC]Bypass Libvirt storage pool for NFS
 
  On 19.03.2014 19:01, Wido den Hollander wrote:
   On 03/19/2014 07:54 PM, Edison Su wrote:
   I found many times in QA's testing environment, the libvirt storage
   pool(created on NFS) is missing on the kvm host frequently, for no
   reason. It may relate to bug
   https://bugzilla.redhat.com/show_bug.cgi?id=977706.
   In order to fix this issue, and bug CLOUDSTACK-2729, we added a lot
   of workaround to fight with libvirt, such as, if can't find the
   storage pool, then create the same pool again etc. As the storage
   pool can be lost on kvm host at any time, it will cause a lot of
   operation errors, such as can't start vm, can't delete volume etc,
 etc.
   I want to bypass libvirt storage pool for NFS, as java itself,
   already have all the capabilities that libvirt can provide, such as
   create a file, delete a file, list a directory etc, there is no need
   to add another layer of crap here. In doing so, we won't be blocked
   by libvirt bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) to
   support newer version of KVM.
  
  
   -1
  
   I understand the issues which we see here, but imho the way forward is
   to fix this in libvirt instead of simply go around it.
  
   We should not try to re-invent the wheel here, but fix the root-cause.
  
   Yes, Java can do a lot, but I think libvirt can do this better.
  
   For the RBD code I also had a couple of changes go into libvirt
   recently and this NFS issue can also be fixed.
  
   Loosing NFS pools in libvirt is most of the times due to a restart of
   libvirt, they don't magically disappear from libvirt.
  
   I agree that we should be able to start the pool again even while it's
   mounted, but that's something we should fix in libvirt.
  
   Wido
 
  -1 and 100% with Wido. If libvirt gets fixed then it would save loads of
 code in
  the future and bring other benefits (think support for Xen Project via
 libvirt
  etc).
  Let's push for libvirt fix instead.
 
  My 2 cents,
  Lucian
 
 
  --
  Sent from the Delta quadrant using Borg technology!
 
  Nux!
  www.nux.ro



RE: [RFC]Bypass Libvirt storage pool for NFS

2014-03-19 Thread Nux!

On 19.03.2014 20:29, Edison Su wrote:

It’s hard to find root cause and fix something in libvirt, even we
found the root cause, it's hard to push the fix into libvirt upstream,
and not to say push into downstream, like RHEL 6 etc. For example, we
already have a fix for the bug
https://bugzilla.redhat.com/show_bug.cgi?id=977706 for a few month
now, there is no resolution to resolve the issue. Without the fix, we
just simply are blocked to support newer version of KVM.


Edison, if - with the workarounds in place now - the current version of 
KVM works OK, then why wouldn't a newer version work just as fine?

Just trying to understand this.

Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro


RE: [RFC]Bypass Libvirt storage pool for NFS

2014-03-19 Thread Edison Su


 -Original Message-
 From: Nux! [mailto:n...@li.nux.ro]
 Sent: Wednesday, March 19, 2014 3:07 PM
 To: dev@cloudstack.apache.org
 Subject: RE: [RFC]Bypass Libvirt storage pool for NFS
 
 On 19.03.2014 20:29, Edison Su wrote:
  It’s hard to find root cause and fix something in libvirt, even we
  found the root cause, it's hard to push the fix into libvirt upstream,
  and not to say push into downstream, like RHEL 6 etc. For example, we
  already have a fix for the bug
  https://bugzilla.redhat.com/show_bug.cgi?id=977706 for a few month
  now, there is no resolution to resolve the issue. Without the fix, we
  just simply are blocked to support newer version of KVM.
 
 Edison, if - with the workarounds in place now - the current version of KVM
 works OK, then why wouldn't a newer version work just as fine?
 Just trying to understand this.

That's a long story, there is a bug in Libvirt, which is introduced in a newer 
version(0.9.10), which can make the storage pool disappear.
Wei made a patch to fix it, more than half a year ago: 
https://www.redhat.com/archives/libvir-list/2013-July/msg00635.html, 
unfortunately, the patch seems not getting into upstream yet.
So in order to move forward, to support newer version of KVM, we have to do 
something in 4.4 release.

 
 Lucian
 
 --
 Sent from the Delta quadrant using Borg technology!
 
 Nux!
 www.nux.ro


RE: [RFC]Bypass Libvirt storage pool for NFS

2014-03-19 Thread Nux!

On 19.03.2014 22:28, Edison Su wrote:
Edison, if - with the workarounds in place now - the current version 
of KVM

works OK, then why wouldn't a newer version work just as fine?
Just trying to understand this.


That's a long story, there is a bug in Libvirt, which is introduced
in a newer version(0.9.10), which can make the storage pool
disappear.


Edison, that I understand, but what is the technical reason that 
prevents using newer KVM?
It looks like current KVM works fine on CentOS 6.5 for example which 
has libvirt 0.10.2.


Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro


RE: [RFC]Bypass Libvirt storage pool for NFS

2014-03-19 Thread Edison Su


 -Original Message-
 From: Nux! [mailto:n...@li.nux.ro]
 Sent: Wednesday, March 19, 2014 3:34 PM
 To: dev@cloudstack.apache.org
 Subject: RE: [RFC]Bypass Libvirt storage pool for NFS
 
 On 19.03.2014 22:28, Edison Su wrote:
  Edison, if - with the workarounds in place now - the current version
  of KVM works OK, then why wouldn't a newer version work just as fine?
  Just trying to understand this.
 
  That's a long story, there is a bug in Libvirt, which is introduced in
  a newer version(0.9.10), which can make the storage pool disappear.
 
 Edison, that I understand, but what is the technical reason that prevents
 using newer KVM?
 It looks like current KVM works fine on CentOS 6.5 for example which has
 libvirt 0.10.2.

Yes, at first glance, the newer version libvirt( 0.9.10) just works fine. But 
under stress test, it will complain NFS storage pool missing, and can't add the 
storage pool back, unless you shut down all the VMs which using the storage 
pool. That's the bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) all 
about.

In ACS 4.2/4.3 release, we only recommend to use libvirt =0.9.10, if primary 
storage is NFS.

 
 Lucian
 
 --
 Sent from the Delta quadrant using Borg technology!
 
 Nux!
 www.nux.ro