from:"Serge E. Hallyn"

Re: [libvirt] [PATCH 7/8] appmor, libvirt-qemu: Add 9p support

2017-05-18 Thread Serge E. Hallyn

Quoting Guido Günther (a...@sigxcpu.org):
> On Thu, May 18, 2017 at 11:21:54AM -0500, Serge E. Hallyn wrote:
> > Mind you I'm not crazy about this.  If this could be toggled with a
> > default-off config option that would seem better than always giving
> > these caps to libvirt-qemu.
> 
> virt-aa-helper could add these if it detects a 9pfs file system. That
> would be better than always adding it.

Agreed

> Cheers,
>  -- Guido
> 
> > 
> > Quoting Stefan Bader (stefan.ba...@canonical.com):
> > > From: Serge Hallyn <serge.hal...@ubuntu.com>
> > > 
> > > Add fowner and fsetid to libvirt-qemu profile.
> > > 
> > > Bug-Ubuntu: https://bugs.launchpad.net/bugs/1378434
> > > 
> > > Signed-off-by: Christian Ehrhardt <christian.ehrha...@canonical.com>
> > > Signed-off-by: Stefan Bader <stefan.ba...@canonical.com>
> > > ---
> > >  examples/apparmor/libvirt-qemu | 4 
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/examples/apparmor/libvirt-qemu 
> > > b/examples/apparmor/libvirt-qemu
> > > index 89466c9..f04ce04 100644
> > > --- a/examples/apparmor/libvirt-qemu
> > > +++ b/examples/apparmor/libvirt-qemu
> > > @@ -13,6 +13,10 @@
> > >capability setgid,
> > >capability setuid,
> > >  
> > > +  # for 9p
> > > +  capability fsetid,
> > > +  capability fowner,
> > > +
> > >network inet stream,
> > >network inet6 stream,
> > >  
> > > -- 
> > > 2.7.4
> > 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 7/8] appmor, libvirt-qemu: Add 9p support

2017-05-18 Thread Serge E. Hallyn

Mind you I'm not crazy about this.  If this could be toggled with a
default-off config option that would seem better than always giving
these caps to libvirt-qemu.

Quoting Stefan Bader (stefan.ba...@canonical.com):
> From: Serge Hallyn 
> 
> Add fowner and fsetid to libvirt-qemu profile.
> 
> Bug-Ubuntu: https://bugs.launchpad.net/bugs/1378434
> 
> Signed-off-by: Christian Ehrhardt 
> Signed-off-by: Stefan Bader 
> ---
>  examples/apparmor/libvirt-qemu | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/examples/apparmor/libvirt-qemu b/examples/apparmor/libvirt-qemu
> index 89466c9..f04ce04 100644
> --- a/examples/apparmor/libvirt-qemu
> +++ b/examples/apparmor/libvirt-qemu
> @@ -13,6 +13,10 @@
>capability setgid,
>capability setuid,
>  
> +  # for 9p
> +  capability fsetid,
> +  capability fowner,
> +
>network inet stream,
>network inet6 stream,
>  
> -- 
> 2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Various apparmor related changes (part 1), version 2

2017-05-18 Thread Serge E. Hallyn

Quoting Stefan Bader (stefan.ba...@canonical.com):
> > Over the years there have been a bunch of changes to the
> > apparmor profiles and/or virt-aa-helper which have been
> > carried in Debian/Ubuntu but never made it upstream.
> > 
> > In an attempt to clean this up and generally improve the
> > apparmor based environments, we (Christian and I) went
> > over the changes, cleaned out cruft as much as possible 
> > and would be sending out hunks of changes to this list
> > for upstream inclusion.
> > 
> > I hope doing multiple but smaller rounds of submissions
> > will make it simpler to get those reviewed and hopefully
> > accepted.
> 
> For the second version I added acks, merged the patches
> related to explicit device denials and local apparmor
> profiles, and split the 9p support one (holding back the
> part allowing link access for later or to be replaced by
> a safer solution).
> I also tried to improve the explanation in the description
> of patch #1 (virt-aa-helper: Ask for no deny rule for readonly
> disk elements).
> 
> Thanks,
> Stefan

Thanks,

Acked-by: Serge Hallyn 

I don't like the added capabilities in the one patch, but I'm not
nacking it on that account.  Still a toggle would be comforting.
Make people who want 9p consciously sign in to the added privs.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v4] Add machine parameter qemu-kvm-migration for live migrate compatibility with qemu-kvm

2014-09-29 Thread Serge E. Hallyn

Quoting Alex Bligh (a...@alex.org.uk):
 
 On 29 Sep 2014, at 11:08, Michael S. Tsirkin m...@redhat.com wrote:
 
  On Sun, Sep 28, 2014 at 09:33:08PM +0100, Alex Bligh wrote:
  Hang on a second! v2 of this patch DID use a new virtual machine,
  called exactly that. I thought you were objecting to that and
  wanting a machine parameter instead! It's far easier with a new
  machine type, and I'd far prefer a new machine type.
  
  If you were just objecting to the fact that pc-1.0 was made to
  be an alias of either one or the other at compile time, simply
  drop the second patch of the v2 patchset.
  
  I think same applies to v3 that I reviewed right?
  Absolutely, I'm fine with just a new machine type.
  This means that management tools will need to learn to
  add -qemu-kvm suffix to the machine name if user
  requested compatibility with qemu-kvm.
  I think there were some implementation issues with patch 1/2
  though.
  
  If we have a new machine type, I don't /think/ I need the early_init
  thing at all (I may be wrong about that).
  
  Good.
 
 OK, I will respin this when I get a chance with the new machine
 type back and hopefully address some of the other issues you
 brought up.

Thanks, Alex.  I've got working packages for 12.04, 14,04 and 14.10
at ppa:serge-hallyn/qemu-migration2 and have opened three bugs
(https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1374622,
https://bugs.launchpad.net/ubuntu/+bug/1374617, and
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1374612) to
get them into the archive.  However if the machine type patches
may end up upstream, then obviously I'd prefer to wait for the
final upstream set.  As we're approaching final freeze for 14.10,
I suspect this will mean getting the patches into 15.04 on the
first day of the cycle (no FFEs needed there) and then a quick
SRU into 12.04 and 14.04.

-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2 0/2] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm

2014-08-08 Thread Serge E. Hallyn

Quoting Alex Bligh (a...@alex.org.uk):
 
 On 7 Aug 2014, at 20:26, Serge E. Hallyn se...@hallyn.com wrote:
 
  A-ha, acpi wasn't a problem.  I actually had a general migration
  problem even when coming from other utopic hosts.  With that fixed,
  I've got successful migration from qemu-kvm 1.0 in precise to
  a utopic host.
 
 That's good news. You might try the 2 patches here:
 
 https://github.com/abligh/qemu/tree/livemigrate-qemu-kvm-to-qemu-2.0.0-test-2
 
 and see if you can get Precise to Trusty migration working (as well
 as Precise to Utopic)

These have built at 
https://launchpad.net/~serge-hallyn/+archive/ubuntu/qemu-p-migration

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2 0/2] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm

2014-08-07 Thread Serge E. Hallyn

Quoting Alex Bligh (a...@alex.org.uk):
 Serge,
 
 On 7 Aug 2014, at 03:50, Serge Hallyn serge.hal...@ubuntu.com wrote:
 
  This worked for me when migrating by hand.  I'm trying to make it work
  through libvirt, using the following patch.  (So whether to have
  pc-1.0 be treated as qemu's or qemu-kvm's pc-1.0 is specifed using a
  boolean in /etc/libvirt/qemu.conf)  Qemu starts with decent
  looking args, but for some reason the the migration is failing -
  still looking through the logfile to figure out why.
 
 Are you using exactly the same arguments by hand and with libvirt?
 
 Also, on reflection, given one of the changes between 1.0 and 2.0
 is ACPI, I should probably have done some testing with an ACPI
 enabled image, rather than just cirros (which not ACPI enabled);
 any chance this is ACPI related?

Turning off acpi (well, commenting it out in the xml, which I'm assuming
dtrt) doesn't help:

===
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -name cirros -S -global 
virtio-net-pci.romfile=pxe-virtio.rom.12.04 -machine 
pc-1.0-qemu-kvm,accel=kvm,usb=off -m 512 -realtime mlock=off -smp 
1,sockets=1,cores=1,threads=1 -uuid 2542c328-6842-33ef-d30e-866c3f3189a8 
-no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/cirros.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-no-acpi -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 
-drive 
file=/var/lib/libvirt/images/cirros.img,if=none,id=drive-ide0-0-0,format=raw 
-device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 
-netdev tap,fd=26,id=hostnet0 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:be:d8:99,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-vnc 127.0.0.1:0 -device cirrus-vga,id=!
 video0,bus=pci.0,addr=0x2 -device AC97,id=sound0,bus=pci.0,addr=0x4 -incoming 
fd:23 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg 
timestamp=on
2014-08-07 12:51:02.400+: 1539: debug : virFileClose:99 : Closed fd 25
2014-08-07 12:51:02.401+: 1539: debug : virFileClose:99 : Closed fd 31
2014-08-07 12:51:02.401+: 1539: debug : virFileClose:99 : Closed fd 3
2014-08-07 12:51:02.401+: 1540: debug : virExec:616 : Run hook 
0x7f25cb17bca0 0x7f25d3aedf20
2014-08-07 12:51:02.401+: 1540: debug : qemuProcessHook:2719 : Obtaining 
domain lock
2014-08-07 12:51:02.401+: 1540: debug : virDomainLockProcessStart:175 : 
plugin=0x7f25c4170290 dom=0x7f25c4186510 paused=1 fd=0x7f25d3aedb44
2014-08-07 12:51:02.401+: 1540: debug : virDomainLockManagerNew:133 : 
plugin=0x7f25c4170290 dom=0x7f25c4186510 withResources=1
2014-08-07 12:51:02.401+: 1540: debug : virLockManagerPluginGetDriver:281 : 
plugin=0x7f25c4170290
2014-08-07 12:51:02.401+: 1540: debug : virLockManagerNew:305 : 
driver=0x7f25da723580 type=0 nparams=5 params=0x7f25d3aeda30 flags=0
2014-08-07 12:51:02.401+: 1540: debug : virLockManagerLogParams:98 :   
key=uuid type=uuid value=2542c328-6842-33ef-d30e-866c3f3189a8
2014-08-07 12:51:02.401+: 1540: debug : virLockManagerLogParams:91 :   
key=name type=string value=cirros
2014-08-07 12:51:02.401+: 1540: debug : virLockManagerLogParams:79 :   
key=id type=uint value=2
2014-08-07 12:51:02.401+: 1540: debug : virLockManagerLogParams:79 :   
key=pid type=uint value=1540
2014-08-07 12:51:02.401+: 1540: debug : virLockManagerLogParams:94 :   
key=uri type=cstring value=qemu:///system
2014-08-07 12:51:02.401+: 1540: debug : virDomainLockManagerNew:145 : 
Adding leases
2014-08-07 12:51:02.401+: 1540: debug : virDomainLockManagerNew:150 : 
Adding disks
2014-08-07 12:51:02.401+: 1540: debug : virDomainLockManagerAddDisk:91 : 
Add disk /var/lib/libvirt/images/cirros.img
2014-08-07 12:51:02.401+: 1540: debug : virLockManagerAddResource:332 : 
lock=0x7f25c417b080 type=0 name=/var/lib/libvirt/images/cirros.img nparams=0 
params=(nil) flags=0
2014-08-07 12:51:02.401+: 1540: debug : virLockManagerAcquire:350 : 
lock=0x7f25c417b080 state='null' flags=3 action=0 fd=0x7f25d3aedb44
2014-08-07 12:51:02.401+: 1540: debug : virLockManagerFree:387 : 
lock=0x7f25c417b080
2014-08-07 12:51:02.401+: 1540: debug : virObjectUnref:259 : OBJECT_UNREF: 
obj=0x7f25c415e620
2014-08-07 12:51:02.401+: 1540: debug : qemuProcessHook:2746 : Hook 
complete ret=0
2014-08-07 12:51:02.401+: 1540: debug : virExec:618 : Done hook 0
2014-08-07 12:51:02.401+: 1540: debug : virExec:638 : Setting child 
AppArmor profile to libvirt-2542c328-6842-33ef-d30e-866c3f3189a8
2014-08-07 12:51:02.402+: 1540: debug : virExec:655 : Setting child uid:gid 
to 107:113 with caps 0
2014-08-07 12:51:02.402+: 1540: debug : virCommandHandshakeChild:358 : 
Notifying parent for handshake start on 28
2014-08-07 12:51:02.402+: 1540: debug

Re: [libvirt] [Qemu-devel] [PATCH v2 0/2] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm

2014-08-07 Thread Serge E. Hallyn

Quoting Alex Bligh (a...@alex.org.uk):
 Serge,
 
 On 7 Aug 2014, at 03:50, Serge Hallyn serge.hal...@ubuntu.com wrote:
 
  This worked for me when migrating by hand.  I'm trying to make it work
  through libvirt, using the following patch.  (So whether to have
  pc-1.0 be treated as qemu's or qemu-kvm's pc-1.0 is specifed using a
  boolean in /etc/libvirt/qemu.conf)  Qemu starts with decent
  looking args, but for some reason the the migration is failing -
  still looking through the logfile to figure out why.
 
 Are you using exactly the same arguments by hand and with libvirt?
 
 Also, on reflection, given one of the changes between 1.0 and 2.0
 is ACPI, I should probably have done some testing with an ACPI
 enabled image, rather than just cirros (which not ACPI enabled);
 any chance this is ACPI related?

A-ha, acpi wasn't a problem.  I actually had a general migration
problem even when coming from other utopic hosts.  With that fixed,
I've got successful migration from qemu-kvm 1.0 in precise to
a utopic host.

-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm

2014-07-29 Thread Serge E. Hallyn

Quoting Alex Bligh (a...@alex.org.uk):
 Serge,
 
  I don't think that is in any way a problem.  Is migrating to older
  versions ever actually expected to work?  In either case I don't
  think for this particular case it's a problem.
 
 Good; no; and good - respectively.
 
  (The how to handle this in libvirt question is more interesting)
 
 I've been giving this some thought. However rococo we make this,
 I think the caller is often going to need to modify the command
 line anyway, i.e. is going to need to be aware of the migration
 problem.
 
 For instance, taking Ubuntu as an example, 12.04 shipped with
 qemu-kvm-1.0, and a pxe-virtio.rom of just under 64k, giving
 a 64k ROM slot. 14.04 ships with qemu-2.0 and a pxe-virtio.rom
 of over 80k, giving a 128k ROM slot. So however we fix the
 machine types, when migrating in a 12.04 initiated VM, qemu
 will need
  -global virtio-net-pci.romfile=/path/to/pxe-virtio.rom.12.04
 on the command line (or, if you don't much care about PXE
 working on a soft reboot, a blank file of the same size),
 whereas when migrating in a 14.04 initiated VM, that must
 not be on the command line.
 
 Fixing this properly would entail requiring that the ROMs are
 (effectively) distributed with qemu or at least that all
 ROM sizes become part of the machine type standard. This
 would have the advantage that loading a larger ROM than
 the machine type specifies would fail unless the ROM
 size was explicitly configured on the command line. But
 this is a subject wider than this patch.
 
 So the long and the short of it is that libvirt (sadly) like
 anything else starting qemu machines needs to know a bit about
 different versions of qemu, and be able to replace a machine
 type option with a machine type option and more on the
 command line.
 
 My previous suggestion doesn't help much because qemu will
 still need to be passed something on the command line.
 
 I think the best way to go with this patch would be something
 like:
 
 * Add pc-1.0-qemu-kvm as a machine type (done)
 
 * Rename pc-1.0 to pc-1.0-qemu-git
 
 * Add an alias for pc-1.0 to either pc-1.0-qemu-git or
   pc-1.0-qemu-kvm, configurable at build time with
   a ./configure option. The distro can then set this
   appropriately. This would default to pc-1.0-qemu-git
   (i.e. the current behaviour).
 
 On distros that only every used one of the above,
 ./configure will sort things out, +/- self-inflicted
 injuries relating to ROM size changes. If life is
 more complicated, libvirt (and other callers) will
 need to be aware. pc-1.0-qemu-git and pc-1.0-qemu-kvm
 can be used to unambiguously mean the relevant machine
 type, which will fix things going forward for that
 machine type.
 
 WDYT?

That sounds good.

And from there I think the thing to do will be to introduce a transient
alternate package that has the pc-1.0 alias pointing ot pc-1.0-qemu-kvm and
depends on the legacy pxe rom.  And maybe users can then choose that package for
migration purposes if needed, without having to make any changes to libvirt at
all, while others are completely unaffected.

-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 1/1] lxc: use our own hand-rolled code in place of unlockpt and grantpt (v2)

2011-10-19 Thread Serge E. Hallyn

Quoting Eric Blake (ebl...@redhat.com):
 +VIR_FORCE_CLOSE(*ttymaster);
 +VIR_FREE(*ttyName)
 
 How did this ever pass compile-testing without that semicolon?

It didn't.  So I fixed it.  Then apparently did not do a new
git format-patch before sending.  Grr.

...

 ACK to the intent and to incorporating previous review comments.
 Here's what I squashed in before pushing:

Great, looks good, thanks.

-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 1/1] lxc: use our own hand-rolled code in place of unlockpt and grantpt (v2)

2011-10-18 Thread Serge E. Hallyn

Quoting Eric Blake (ebl...@redhat.com):
 [but we still have to fix the hard-coding of
 gid=5 in the mount() option].
 
 I missed something - why do we have to fix that?
 
 We don't have to fix it now, but we should fix it someday.  There's
 nothing that says a distro has to map 'tty' to gid 5, and while most
 distros do that, we should instead be portable to compilation on a
 distro where 'tty' is gid 6 (or some other unusual number).

But that gid should be set according to what the guest expects, right?
So we actually can't do it at compilation?

 Instead, I'd just do this as:
 
 virAsprintf(ttyName, /dev/pts/%d, ptyno);
 
 Where virAsprintf will allocate when ttyName starts as null?
 
 Yep - the whole point of virAsprintf is to allocate the string on
 your behalf, and to gracefully ensure that ttyName is NULL on
 allocation failure - much more compact than using snprintf yourself,
 and avoids the waste of a PATH_MAX allocation.
 
 And, while you are at it, what about also fixing src/util/util.c to
 remove virFileOpenTtyAt (now that no one calls that), by moving its
 body into virFileOpenTty except for using posix_openpt instead of
 open(/dev/ptmx).

I was thinking that should be a separate patch for ease of review, but
I'll roll it into my next patch.

thanks,
-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 1/1] lxc: use our own hand-rolled code in place of unlockpt and grantpt (v2)

2011-10-18 Thread Serge E. Hallyn

New version, compile-tested only tonight.  I followed the suggestion
about using posix_openpt(), though its manpage worries me - does libvirt
need to compile on any platforms that don't have that fn?  (In which case
we can add the trivial define if we need to, but...)

Subject: [PATCH 1/1] lxc: use our own hand-rolled code in place of unlockpt
 and grantpt

The glibc ones cannot handle ptys opened in a devpts not mounted at
/dev/pts.

Changelog:
  Oct 17: Per Eli Qiao, use VIR_ALLOC_N when appropriate, make
  sure to check return values, and follow coding style
  convention.
  Change lxcGetTtyGid() to return -1 on error, otherwise
  return gid in passed-in argument.
  Oct 18: Per Eric Blake, consolidate virFileOpenTtyAt() into
  virFileOpenTty(), and use posix_openpt().
  Per Eric Blake, don't set gid on opened pty since we
  ask the kernel to do so with 'gid=5' mount option.

Signed-off-by: Serge Hallyn serge.hal...@canonical.com
---
 src/lxc/lxc_controller.c |   67 +++---
 src/util/util.c  |   24 +
 2 files changed, 64 insertions(+), 27 deletions(-)

diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c
index 89ce7f5..464bfe8 100644
--- a/src/lxc/lxc_controller.c
+++ b/src/lxc/lxc_controller.c
@@ -41,6 +41,8 @@
 #include locale.h
 #include linux/loop.h
 #include dirent.h
+#include grp.h
+#include sys/stat.h
 
 #if HAVE_CAPNG
 # include cap-ng.h
@@ -780,6 +782,62 @@ static int lxcSetPersonality(virDomainDefPtr def)
 # define MS_SLAVE  (119)
 #endif
 
+#define ACCESSPERMS (S_IRWXU|S_IRWXG|S_IRWXO) /* 0777 */
+
+/* heavily borrowed from glibc, but don't assume devpts == /dev/pts */
+static int
+lxcCreateTty(char *ptmx, int *ttymaster, char **ttyName)
+{
+int rc = -1;
+int ret;
+int ptyno;
+uid_t uid;
+struct stat st;
+int unlock = 0;
+
+if ((*ttymaster = open(ptmx, O_RDWR|O_NOCTTY|O_NONBLOCK))  0)
+goto cleanup;
+
+if (ioctl(*ttymaster, TIOCSPTLCK, unlock)  0)
+goto cleanup;
+
+if (ioctl(*ttymaster, TIOCGPTN, ptyno)  0)
+goto cleanup;
+
+if (fstat(*ttymaster, st)  0)
+goto cleanup;
+
+uid = getuid();
+if (st.st_uid != uid)
+if (fchown(*ttymaster, uid, -1)  0)
+goto cleanup;
+
+if ((st.st_mode  ACCESSPERMS) != (S_IRUSR|S_IWUSR|S_IWGRP))
+if (fchmod(*ttymaster, S_IRUSR|S_IWUSR|S_IWGRP)  0)
+goto cleanup;
+
+/*
+ * ptyno shouldn't currently be anything other than 0, but let's
+ * play it safe
+ */
+if (virAsprintf(ttyName, /dev/pts/%d, ptyno)  0) {
+virReportOOMError();
+errno = ENOMEM;
+goto cleanup;
+}
+
+
+rc = 0;
+
+cleanup:
+if (rc != 0) {
+VIR_FORCE_CLOSE(*ttymaster);
+VIR_FREE(*ttyName)
+}
+
+return rc;
+}
+
 static int
 lxcControllerRun(virDomainDefPtr def,
  unsigned int nveths,
@@ -877,6 +935,10 @@ lxcControllerRun(virDomainDefPtr def,
 goto cleanup;
 }
 
+   /*
+* todo - should we support gid=X for X!=5 for distros which
+* use a different gid for tty?
+*/
 VIR_DEBUG(Mounting 'devpts' on %s, devpts);
 if (mount(devpts, devpts, devpts, 0,
   newinstance,ptmxmode=0666,mode=0620,gid=5)  0) {
@@ -894,10 +956,7 @@ lxcControllerRun(virDomainDefPtr def,
 
 if (devptmx) {
 VIR_DEBUG(Opening tty on private %s, devptmx);
-if (virFileOpenTtyAt(devptmx,
- containerPty,
- containerPtyPath,
- 0)  0) {
+if (lxcCreateTty(devptmx, containerPty, containerPtyPath)  0) {
 virReportSystemError(errno, %s,
  _(Failed to allocate tty));
 goto cleanup;
diff --git a/src/util/util.c b/src/util/util.c
index dac616b..1ec36f1 100644
--- a/src/util/util.c
+++ b/src/util/util.c
@@ -1104,21 +1104,9 @@ int virFileOpenTty(int *ttymaster,
char **ttyName,
int rawmode)
 {
-return virFileOpenTtyAt(/dev/ptmx,
-ttymaster,
-ttyName,
-rawmode);
-}
-
-#ifdef __linux__
-int virFileOpenTtyAt(const char *ptmx,
- int *ttymaster,
- char **ttyName,
- int rawmode)
-{
 int rc = -1;
 
-if ((*ttymaster = open(ptmx, O_RDWR|O_NOCTTY|O_NONBLOCK))  0)
+if ((*ttymaster = posix_openpt(O_RDWR|O_NOCTTY|O_NONBLOCK))  0)
 goto cleanup;
 
 if (unlockpt(*ttymaster)  0)
@@ -1159,16 +1147,6 @@ cleanup:
 return rc;
 
 }
-#else
-int virFileOpenTtyAt(const char *ptmx ATTRIBUTE_UNUSED,
- int *ttymaster ATTRIBUTE_UNUSED,
- char **ttyName ATTRIBUTE_UNUSED,
- int rawmode ATTRIBUTE_UNUSED)
-{
-

Re: [libvirt] [PATCH 1/1] lxc: use our own hand-rolled code in place of unlockpt and grantpt

2011-10-17 Thread Serge E. Hallyn

Quoting Eli Qiao (ta...@linux.vnet.ibm.com):
 hi Serge :

Thanks for taking a look.

 I checked the code ,  only in lxc_controller.c call virFileOpenTtyAt().
 I didn't test the path, but my suggestion is that modify the
 utility function in /src/util/util.c instead of adding a new
 function.

But virFileOpenTtyAt() is called by virFileOpenTty() in the same
file.  So we really do want a new function using its own versions of
grantpt+unlockpt, because I think that, when we can, we want to continue
using the glibc versions.

So the safe approach seemed to me to be: put the container-specific
code into src/lxc/lxc_controller.c, then (in a separate patch) just fold
virFileOpenTtyAt(), straight into virFileOpenTty().

If you agree, I'll post a new version incorporating your other feedback,
especially the garish use of alloca :)

(If you disagree, please feel free to send your own patch - I'm in no
way attached to having my version included, I mainly wanted to point out
the bug)

thanks,
-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH 1/1] lxc: use our own hand-rolled code in place of unlockpt and grantpt (v2)

2011-10-17 Thread Serge E. Hallyn

The glibc ones cannot handle ptys opened in a devpts not mounted at
/dev/pts.

Changelog:
  Oct 17: Per Eli Qiao, use VIR_ALLOC_N when appropriate, make
  sure to check return values, and follow coding style
  convention.
  Change lxcGetTtyGid() to return -1 on error, otherwise
  return gid in passed-in argument.

Signed-off-by: Serge Hallyn serge.hal...@canonical.com
---
 src/lxc/lxc_controller.c |   89 +++--
 1 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c
index 89ce7f5..8c9caee 100644
--- a/src/lxc/lxc_controller.c
+++ b/src/lxc/lxc_controller.c
@@ -41,6 +41,8 @@
 #include locale.h
 #include linux/loop.h
 #include dirent.h
+#include grp.h
+#include sys/stat.h
 
 #if HAVE_CAPNG
 # include cap-ng.h
@@ -781,6 +783,88 @@ static int lxcSetPersonality(virDomainDefPtr def)
 #endif
 
 static int
+lxcGetTtyGid(int *gid) {
+char *grtmpbuf;
+struct group grbuf;
+size_t grbuflen = sysconf(_SC_GETGR_R_SIZE_MAX);
+struct group *p;
+int ret;
+gid_t tty_gid = -1;
+
+/* Get the group ID of the special `tty' group.  */
+if (grbuflen == (size_t) -1L)
+/* `sysconf' does not support _SC_GETGR_R_SIZE_MAX.
+   Try a moderate value.  */
+grbuflen = 1024;
+
+if ((VIR_ALLOC_N(grtmpbuf, grbuflen))  0)
+   return -1;
+
+ret = getgrnam_r(tty, grbuf, grtmpbuf, grbuflen, p);
+if (ret || p != NULL)
+tty_gid = p-gr_gid;
+
+VIR_FREE(grtmpbuf);
+
+*gid = tty_gid == -1 ? getgid() : tty_gid;
+return 0;
+}
+
+#define ACCESSPERMS (S_IRWXU|S_IRWXG|S_IRWXO) /* 0777 */
+
+/* heavily borrowed from glibc, but don't assume devpts == /dev/pts */
+static int
+lxcCreateTty(char *ptmx, int *ttymaster, char **ttyName)
+{
+int rc = -1;
+int ret;
+int ptyno;
+uid_t uid;
+gid_t gid;
+struct stat st;
+int unlock = 0;
+
+if ((*ttymaster = open(ptmx, O_RDWR|O_NOCTTY|O_NONBLOCK))  0)
+goto cleanup;
+
+if (ioctl(*ttymaster, TIOCSPTLCK, unlock)  0)
+goto cleanup;
+
+if (ioctl(*ttymaster, TIOCGPTN, ptyno)  0)
+goto cleanup;
+
+if (fstat(*ttymaster, st)  0)
+goto cleanup;
+
+if (lxcGetTtyGid(gid)  0)
+goto cleanup;
+
+uid = getuid();
+if (st.st_uid != uid || st.st_gid != gid)
+if (fchown(*ttymaster, uid, gid)  0)
+goto cleanup;
+
+if ((st.st_mode  ACCESSPERMS) != (S_IRUSR|S_IWUSR|S_IWGRP))
+if (fchmod(*ttymaster, S_IRUSR|S_IWUSR|S_IWGRP)  0)
+goto cleanup;
+
+if (VIR_ALLOC_N(*ttyName, PATH_MAX)  0) {
+errno = ENOMEM;
+goto cleanup;
+}
+
+snprintf(*ttyName, PATH_MAX, /dev/pts/%d, ptyno);
+
+rc = 0;
+
+cleanup:
+if (rc != 0)
+VIR_FORCE_CLOSE(*ttymaster);
+
+return rc;
+}
+
+static int
 lxcControllerRun(virDomainDefPtr def,
  unsigned int nveths,
  char **veths,
@@ -894,10 +978,7 @@ lxcControllerRun(virDomainDefPtr def,
 
 if (devptmx) {
 VIR_DEBUG(Opening tty on private %s, devptmx);
-if (virFileOpenTtyAt(devptmx,
- containerPty,
- containerPtyPath,
- 0)  0) {
+if (lxcCreateTty(devptmx, containerPty, containerPtyPath)  0) {
 virReportSystemError(errno, %s,
  _(Failed to allocate tty));
 goto cleanup;
-- 
1.7.5.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 1/1] lxc: use our own hand-rolled code in place of unlockpt and grantpt (v2)

2011-10-17 Thread Serge E. Hallyn

Quoting Eric Blake (ebl...@redhat.com):
 On 10/17/2011 01:04 PM, Serge E. Hallyn wrote:
 The glibc ones cannot handle ptys opened in a devpts not mounted at
 /dev/pts.
 
 Changelog:
Oct 17: Per Eli Qiao, use VIR_ALLOC_N when appropriate, make
sure to check return values, and follow coding style
convention.
Change lxcGetTtyGid() to return -1 on error, otherwise
return gid in passed-in argument.
 
 Signed-off-by: Serge Hallynserge.hal...@canonical.com
 ---
   src/lxc/lxc_controller.c |   89 
  +++--
   1 files changed, 85 insertions(+), 4 deletions(-)
 
 
 @@ -781,6 +783,88 @@ static int lxcSetPersonality(virDomainDefPtr def)
   #endif
 
   static int
 +lxcGetTtyGid(int *gid) {
 +char *grtmpbuf;
 +struct group grbuf;
 +size_t grbuflen = sysconf(_SC_GETGR_R_SIZE_MAX);
 +struct group *p;
 +int ret;
 +gid_t tty_gid = -1;
 
 Hmm.  This gets called once per lxc guest, instead of once per
 libvirtd process, even though the gid won't change in the meantime.
 
 Furthermore, we have _already_ hardcoded this to 5, based on the
 options we gave to mount(devpts) - either we need to fix that
 mount call (better), or we can skip this function altogether
 (assuming that our hard-coding of 5 is correct, there's no need to
 requery it, although that does sound like a worse solution).  For
 that matter, the whole point of the mount(devpts,...,gid=5)
 designation is that the new pty will be owned by gid 5, without
 needing to fchown() it.  Are there kernels that are new enough to
 support newinstance mounting, yet old enough to not honor gid=5?

No.  Mode and gid precede multiple devpts instances.

 That would be the only case where we would have to chown in the
 first place.  But if all kernels new enough to support newinstance
 mounting correctly honor the gid=5 option, then we don't even have
 to do chown() calls [but we still have to fix the hard-coding of
 gid=5 in the mount() option].

I missed something - why do we have to fix that?

It seems to me you're right - this is a linux-specific fn and
we are setting gid to 5 and the mode, we can skip this whole lxcGetTtyGid
function and the corresponding fchown and fchmod calls.  Nice!

 +if (fstat(*ttymaster,st)  0)
 +goto cleanup;
 +
 +if (lxcGetTtyGid(gid)  0)
 +goto cleanup;
 +
 +uid = getuid();
 +if (st.st_uid != uid || st.st_gid != gid)
 +if (fchown(*ttymaster, uid, gid)  0)
 +goto cleanup;
 +
 +if ((st.st_mode  ACCESSPERMS) != (S_IRUSR|S_IWUSR|S_IWGRP))
 +if (fchmod(*ttymaster, S_IRUSR|S_IWUSR|S_IWGRP)  0)
 +goto cleanup;
 
 Likewise, I think this fchmod() is useless; the mode=0620 in the
 mount option should have already set this up.

Yup.

 +
 +if (VIR_ALLOC_N(*ttyName, PATH_MAX)  0) {
 +errno = ENOMEM;
 +goto cleanup;
 +}
 
 Wasteful.  PATH_MAX is MUCH bigger than we need.

I thought so, but it was being done that way before, and didn't
seem all that important.

 +
 +snprintf(*ttyName, PATH_MAX, /dev/pts/%d, ptyno);
 
 Instead, I'd just do this as:
 
 virAsprintf(ttyName, /dev/pts/%d, ptyno);

Where virAsprintf will allocate when ttyName starts as null?

 Also, do we want this to be the name of the pty, _as seen from the
 guest after the fs pivot is complete_, or do we want this to be the
 name of the pty, as seen from the host prior to the pivot, in which
 case it would be some derivative of %s/dev/pts/%d, root-src,
 ptyno?

This gets passed into lxc_container which then prepends the chroot
location.  Don't know if there is any reason why we can't just
set it to the full name here, haven't looked further.

-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 1/2] Fix occasional container creation failure due to misuse of grantpt

2011-10-14 Thread Serge E. Hallyn

Quoting Eric Blake (ebl...@redhat.com):
...
 of /dev/pts, then passing that fd back to the parent; the
 alternative solution would be to ditch glibc interfaces and do the
 raw ioctl calls on the master pty ourselves.  Since lxc is already
 Linux-specific, I think that I would favor this approach as being
 simpler (that is, completely ditch virFileOpenTtyAt, and teach lxc
 to just make the raw ioctl calls instead of using grantpt and
 unlockpt, all without forking or passing fds across a socket).
 
 Serge, is this something you can attempt, or should I look into it more?

Hey Eric,

yup, if it's acceptable to you I think that's the better, simpler
approach.  I'll send out a new patch.

thanks,
-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH 1/2] Fix occasional container creation failure due to misuse of grantpt

2011-10-12 Thread Serge E. Hallyn

glibc's grantpt and ptsname cannot be used on a fd for a pty not in
/dev/pts.  The lxc controller tries to do just that.  So if you try to
start a container on a system where /dev/pts/0 is not available, it
will fail.  You can make this happen by opening a terminal on
/dev/pts/0, and doing 'sleep 2h  disown; exit'.  To fix this, I call
the virFileOpenTtyAt() from a forked task in a new mount ns, and first
mount the container's /dev/pts onto /dev/pts.  (Then the opened fd must
be passed back to the lxc driver).  Another solution would be to just
do it all by hand without grantpt and ptsname.

Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/863629

Signed-off-by: Serge Hallyn serge.hal...@canonical.com
---
 src/lxc/lxc_controller.c |  117 --
 1 files changed, 112 insertions(+), 5 deletions(-)

diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c
index 51488e7..1a56e0c 100644
--- a/src/lxc/lxc_controller.c
+++ b/src/lxc/lxc_controller.c
@@ -780,6 +780,113 @@ static int lxcSetPersonality(virDomainDefPtr def)
 # define MS_SLAVE  (119)
 #endif
 
+static int send_pty(int sock, int *pty)
+{
+struct iovec vector;
+struct msghdr msg;
+struct cmsghdr * cmsg;
+int ret;
+
+vector.iov_base = PTY;
+vector.iov_len = 3;
+
+msg.msg_name = NULL;
+msg.msg_namelen = 0;
+msg.msg_iov = vector;
+msg.msg_iovlen = 1;
+
+cmsg = alloca(sizeof(struct cmsghdr) + sizeof(*pty));
+cmsg-cmsg_len = sizeof(struct cmsghdr) + sizeof(*pty);
+cmsg-cmsg_level = SOL_SOCKET;
+cmsg-cmsg_type = SCM_RIGHTS;
+  
+memcpy(CMSG_DATA(cmsg), pty, sizeof(*pty));
+
+msg.msg_control = cmsg;
+msg.msg_controllen = cmsg-cmsg_len;
+
+ret = sendmsg(sock, msg, 0);
+if (ret  0)
+return -1;
+return 0;
+}
+
+static int recv_pty(int sock, int *pty, char **path, char *devpts)
+{
+char buf[50];
+struct iovec vector;
+struct msghdr msg;
+struct cmsghdr * cmsg;
+int ret;
+
+vector.iov_base = buf;
+vector.iov_len = 50;
+
+msg.msg_name = NULL;
+msg.msg_namelen = 0;
+msg.msg_iov = vector;
+msg.msg_iovlen = 1;
+
+cmsg = alloca(sizeof(struct cmsghdr) + sizeof(*pty));
+cmsg-cmsg_len = sizeof(struct cmsghdr) + sizeof(*pty);
+msg.msg_control = cmsg;
+msg.msg_controllen = cmsg-cmsg_len;
+
+ret = recvmsg(sock, msg, 0);
+if (ret  0)
+return ret;
+
+memcpy(pty, CMSG_DATA(cmsg), sizeof(*pty));
+
+if (VIR_ALLOC_N(*path, PATH_MAX)  0) {
+virReportSystemError(errno, %s,
+_(Failed to allocate space for ptyname));
+return -ENOMEM;
+}
+//snprintf(*path, PATH_MAX, %s/0, devpts);
+snprintf(*path, PATH_MAX, /dev/pts/0);
+return 0;
+}
+
+static int private_open_tty_at(char *devpts, char *devptmx,
+int *containerPty,
+char **containerPtyPath, int rawmode)
+{
+int pid;
+int ret;
+int status;
+int s[2];
+
+ret = socketpair(PF_UNIX, SOCK_DGRAM, 0, s);
+if (ret  0)
+   return ret;
+
+pid = fork();
+if (pid  0)
+exit(pid);
+if (pid == 0) {
+close(s[1]);
+ret = unshare(CLONE_NEWNS);
+if (ret  0)
+exit(ret);
+ret = mount(devpts, /dev/pts, none, MS_BIND, NULL);
+if (ret  0)
+exit(ret);
+ret = virFileOpenTtyAt(devptmx, containerPty, containerPtyPath, 
rawmode);
+if (ret  0)
+exit(ret);
+send_pty(s[0], containerPty);
+exit(ret);
+}
+close(s[0]);
+ret = recv_pty(s[1], containerPty, containerPtyPath, devpts);
+close(s[1]);
+if (ret)
+return ret;
+waitpid(pid, status, 0);
+return WEXITSTATUS(status);
+}
+
 static int
 lxcControllerRun(virDomainDefPtr def,
  unsigned int nveths,
@@ -894,12 +1001,12 @@ lxcControllerRun(virDomainDefPtr def,
 
 if (devptmx) {
 VIR_DEBUG(Opening tty on private %s, devptmx);
-if (virFileOpenTtyAt(devptmx,
- containerPty,
- containerPtyPath,
- 0)  0) {
+if (private_open_tty_at(devpts, devptmx,
+containerPty,
+containerPtyPath,
+0)  0) {
 virReportSystemError(errno, %s,
- _(Failed to allocate tty));
+_(Failed to allocate tty));
 goto cleanup;
 }
 } else {
-- 
1.7.5.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH 2/2] Fix type in lxc_controller

2011-10-12 Thread Serge E. Hallyn

s/Mouting/Mounting.

Signed-off-by: Serge Hallyn serge.hal...@canonical.com
---
 src/lxc/lxc_controller.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c
index 1a56e0c..6557c07 100644
--- a/src/lxc/lxc_controller.c
+++ b/src/lxc/lxc_controller.c
@@ -984,7 +984,7 @@ lxcControllerRun(virDomainDefPtr def,
 goto cleanup;
 }
 
-VIR_DEBUG(Mouting 'devpts' on %s, devpts);
+VIR_DEBUG(Mounting 'devpts' on %s, devpts);
 if (mount(devpts, devpts, devpts, 0,
   newinstance,ptmxmode=0666,mode=0620,gid=5)  0) {
 virReportSystemError(errno,
-- 
1.7.5.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] virsh bash completion file

2011-10-06 Thread Serge E. Hallyn

Quoting Daniel P. Berrange (berra...@redhat.com):
 On Wed, Oct 05, 2011 at 03:17:47PM -0500, Serge E. Hallyn wrote:
  Hi,
  
  I've been trying out a bash autocompletion file by Geoff Low (slight hack
  by me, don't blame him for my hack), and it's working pretty nicely.
  I'm not sure where to put it in the git tree, but it seems like it'd be
  nice to have upstream?
 
 David Lutterkort previously suggested this
 
https://www.redhat.com/archives/libvir-list/2007-October/msg00231.html
 
 And I didn't ever notice that and so wrote one myself
 
https://www.redhat.com/archives/libvir-list/2008-July/msg00175.html
https://www.redhat.com/archives/libvir-list/2008-July/msg00177.html
 
 Neither ever got committed. There were some things I didn't much like
 about mine, but I can't really remember now.
 
 Third time lucky perhaps :-)

So the things you didn't much like must not have been *too* bad?  :)
Are you going to make some changes and resubmit, or would you like
me to test (and port as needed) the above patches first?

thanks,
-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] virsh bash completion file

2011-10-05 Thread Serge E. Hallyn

Quoting Eric Blake (ebl...@redhat.com):
 While I'd love to see better bash completion support, I think that
 we should be going about it by fixing virsh to make it easier to
 query what completions make sense, so I'm not going to spend much
 time further reviewing this.  Of course, others are free to use this

Thanks, Eric, makes sense.

-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC] security_dac: don't chown iso file

2011-10-04 Thread Serge E. Hallyn

Quoting Serge E. Hallyn (serge.hal...@canonical.com):
 isos are read-only, so libvirt doesn't need to chown them.  In one of
 our testing setups, libvirt uses mirrorred isos.  Since libvirt chowns
 the files, (and especially does not chown them back) the mirror refuses
 to update the iso.
 
 This patch prevents libvirt from chowning files.
 
 Does this seem reasonable?

Hi,

any feedback on this?  Does it seem ok?

thanks,
-serge

 Signed-off-by: Serge Hallyn serge.hal...@canonical.com
 ---
  src/security/security_dac.c |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)
 
 diff --git a/src/security/security_dac.c b/src/security/security_dac.c
 index af02236..e7db324 100644
 --- a/src/security/security_dac.c
 +++ b/src/security/security_dac.c
 @@ -555,6 +555,8 @@ virSecurityDACSetSecurityAllLabel(virSecurityManagerPtr 
 mgr,
  /* XXX fixme - we need to recursively label the entire tree :-( */
  if (vm-def-disks[i]-type == VIR_DOMAIN_DISK_TYPE_DIR)
  continue;
 + if (vm-def-disks[i]-device == VIR_DOMAIN_DISK_DEVICE_CDROM)
 + continue;
  if (virSecurityDACSetSecurityImageLabel(mgr,
  vm,
  vm-def-disks[i])  0)
 -- 
 1.7.5.4
 
 --
 libvir-list mailing list
 libvir-list@redhat.com
 https://www.redhat.com/mailman/listinfo/libvir-list

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC PATCH] lxc: don't return error on GetInfo when cgroups not yet set up

2011-10-03 Thread Serge E. Hallyn

Quoting Daniel P. Berrange (berra...@redhat.com):
 The LXC controller 'main' method received the handshake FD and invokes
 lxcControllerRun(). This method does various setup tasks, in particular
 the following:
 
 
 
 if (lxcSetContainerResources(def)  0)
 goto cleanup;
 ...
...
 if (lxcContainerSendContinue(handshakefd)  0) {
 virReportSystemError(errno, %s,
  _(error sending continue signal to parent));
 goto cleanup;
 }
 VIR_FORCE_CLOSE(handshakefd);

Thanks, Daniel.  You're right!  This is fixed in git, by the patch 'lxc:
controller: Improve container error reporting' (which does much more
than it says :).  The following patch is how I had just fixed 0.9.2 this
morning.  It'll be nicer if I can get the git commit cherrypicked.  I
can't wait till I can upgrade!

thanks,
-serge

Description: Make lxc driver hold sem until controller is far enough
 The lxc driver currently does not wait until the container has set
 up its cgroups before dropping the driver mutex.
 First, move the controller's accept of the monitor socket wait
 until it has set up the cgroups.
 Second, because a connect does not actually wait for an accept to
 happen, force the driver to wait with a silly two-way read/write
 handshake.  Since the monitor socket is also used elsewhere, make
 this handshake happen everywhere.
Author: Serge Hallyn serge.hal...@canonical.com
Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/+source/nova/+bug/842845
Forwarded: not-needed

Index: libvirt-0.9.2/src/lxc/lxc_controller.c
===
--- libvirt-0.9.2.orig/src/lxc/lxc_controller.c 2011-10-03 13:10:31.098934902 
-0500
+++ libvirt-0.9.2/src/lxc/lxc_controller.c  2011-10-03 13:10:53.823679619 
-0500
@@ -432,6 +432,8 @@
 numEvents = epoll_wait(epollFd, epollEvent, 1, timeout);
 if (numEvents  0) {
 if (epollEvent.data.fd == monitor) {
+int ret;
+char go[4];
 int fd = accept(monitor, NULL, 0);
 if (fd  0) {
 /* First reflex may be simply to declare accept failure
@@ -457,6 +459,17 @@
  _(epoll_ctl(client) failed));
 goto cleanup;
 }
+ret = read(client, go, 3);
+if (ret  0) {
+virReportSystemError(errno, %s,
+ _(Failed to read 'go' from driver));
+goto cleanup;
+}
+ret = write(client, go, 3);
+if (ret  0) {
+virReportSystemError(errno, %s, _(Failed to write 'go' 
to container));
+goto cleanup;
+}
 } else if (client != -1  epollEvent.data.fd == client) {
 if (0  epoll_ctl(epollFd, EPOLL_CTL_DEL, client, 
epollEvent)) {
 virReportSystemError(errno, %s,
@@ -611,10 +624,9 @@
  unsigned int nveths,
  char **veths,
  int monitor,
- int client,
  int appPty)
 {
-int rc = -1;
+int ret, rc = -1;
 int control[2] = { -1, -1};
 int containerPty = -1;
 char *containerPtyPath = NULL;
@@ -622,6 +634,8 @@
 virDomainFSDefPtr root;
 char *devpts = NULL;
 char *devptmx = NULL;
+char go[4];
+int client;
 
 if (socketpair(PF_UNIX, SOCK_STREAM, 0, control)  0) {
 virReportSystemError(errno, %s,
@@ -631,8 +645,29 @@
 
 root = virDomainGetRootFilesystem(def);
 
+VIR_DEBUG(About to set resources and cgroups\n);
 if (lxcSetContainerResources(def)  0)
 goto cleanup;
+VIR_DEBUG(Done setting resources and cgroups\n);
+
+/* Accept initial client which is the libvirtd daemon */
+if ((client = accept(monitor, NULL, 0))  0) {
+virReportSystemError(errno, %s,
+ _(Failed to accept a connection from driver));
+goto cleanup;
+}
+VIR_DEBUG(Accepted monitor fd from driver\n);
+ret = read(client, go, 3);
+if (ret  0) {
+virReportSystemError(errno, %s,
+ _(Failed to read 'go' from driver));
+goto cleanup;
+}
+ret = write(client, go, 3);
+if (ret  0) {
+virReportSystemError(errno, %s, _(Failed to write 'go' to 
container));
+goto cleanup;
+}
 
 /*
  * If doing a chroot style setup, we need to prepare
@@ -765,7 +800,6 @@
 {
 pid_t pid;
 int rc = 1;
-int client;
 char *name = NULL;
 int nveths = 0;
 char **veths = NULL;
@@ -922,14 +956,7 @@
 /* Initialize logging */
 virLogSetFromEnv();
 
-/* Accept initial client which is the libvirtd daemon */
-if ((client = accept(monitor, NULL, 0))  0) {
-virReportSystemError(errno, %s,
- _(Failed to accept

Re: [libvirt] [Openstack] [RFC PATCH] lxc: don't return error on GetInfo when cgroups not yet set up

2011-10-02 Thread Serge E. Hallyn

Haven't tested this, but I think the following patch should fix the
race, by forcing lxc_driver to hang on lxcMonitorClient() until
after the lxc_controller has set up the cgroups, ensuring that that
happens before the driver is unlocked.

(I'll test tomorrow)

Index: libvirt-0.9.2/src/lxc/lxc_controller.c
===
--- libvirt-0.9.2.orig/src/lxc/lxc_controller.c 2011-10-02 20:30:23.988539174 
-0500
+++ libvirt-0.9.2/src/lxc/lxc_controller.c  2011-10-02 20:30:34.392538998 
-0500
@@ -611,7 +611,6 @@
  unsigned int nveths,
  char **veths,
  int monitor,
- int client,
  int appPty)
 {
 int rc = -1;
@@ -622,6 +621,7 @@
 virDomainFSDefPtr root;
 char *devpts = NULL;
 char *devptmx = NULL;
+int client;
 
 if (socketpair(PF_UNIX, SOCK_STREAM, 0, control)  0) {
 virReportSystemError(errno, %s,
@@ -634,6 +634,13 @@
 if (lxcSetContainerResources(def)  0)
 goto cleanup;
 
+/* Accept initial client which is the libvirtd daemon */
+if ((client = accept(monitor, NULL, 0))  0) {
+virReportSystemError(errno, %s,
+ _(Failed to accept a connection from driver));
+goto cleanup;
+}
+
 /*
  * If doing a chroot style setup, we need to prepare
  * a private /dev/pts for the child now, which they
@@ -922,14 +929,7 @@
 /* Initialize logging */
 virLogSetFromEnv();
 
-/* Accept initial client which is the libvirtd daemon */
-if ((client = accept(monitor, NULL, 0))  0) {
-virReportSystemError(errno, %s,
- _(Failed to accept a connection from driver));
-goto cleanup;
-}
-
-rc = lxcControllerRun(def, nveths, veths, monitor, client, appPty);
+rc = lxcControllerRun(def, nveths, veths, monitor, appPty);
 
 
 cleanup:

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC PATCH] lxc: don't return error on GetInfo when cgroups not yet set up

2011-09-29 Thread Serge E. Hallyn

Quoting Daniel P. Berrange (berra...@redhat.com):
 On Wed, Sep 28, 2011 at 02:14:52PM -0500, Serge E. Hallyn wrote:
  Nova (openstack) calls libvirt to create a container, then
  periodically checks using GetInfo to see whether the container
  is up.  If it does this too quickly, then libvirt returns an
  error, which in libvirt.py causes an exception to be raised,
  the same type as if the container was bad.
 
 lxcDomainGetInfo(), holds a mutex on 'dom' for the duration of
 its execution. It checks for virDomainObjIsActive() before
 trying to use the cgroups.
 
 lxcDomainStart(), holds the mutex on 'dom' for the duration of
 its execution, and does not return until the container is running
 and cgroups are present.

Yup, now that you mention it, I do see that.  So this shouldn't be
happening.  Can't explain it, but copious fprintf debugging still
suggests it is :)

Is it possible that vm-def-id is not being set to -1 when it is
first defined, and I'm catching it between define and start?  I
would think that would show up as much more broken, though I'm
not seeing where vm-def-id gets set to -1 during domain definition.

Well, I'll keep digging then.  Thanks for setting me straight on
the mutex!

 Similarly when we delete the cgroups, we again hold the lock
 on 'dom'.
 
 Thus any time viDomainObjIsActive() returns true, AFAICT, we have
 guaranteed that the cgroup does in fact exist.
 
 So can't see how control gets to the 'else' part of this
 condition if the cgroups don't exist like you describe.
 
 if (!virDomainObjIsActive(vm) || driver-cgroup == NULL) {
 info-cpuTime = 0;
 info-memory = vm-def-mem.cur_balloon;
 } else {
 if (virCgroupForDomain(driver-cgroup, vm-def-name, cgroup, 0) != 
 0) {
 lxcError(VIR_ERR_INTERNAL_ERROR,
  _(Unable to get cgroup for %s), vm-def-name);
 goto cleanup;
 }
 
 
 What libvirt version were you seeing this behaviour with ?

0.9.2

thanks,
-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC PATCH] lxc: don't return error on GetInfo when cgroups not yet set up

2011-09-29 Thread Serge E. Hallyn

Quoting Daniel P. Berrange (berra...@redhat.com):
 On Wed, Sep 28, 2011 at 02:14:52PM -0500, Serge E. Hallyn wrote:
  Nova (openstack) calls libvirt to create a container, then
  periodically checks using GetInfo to see whether the container
  is up.  If it does this too quickly, then libvirt returns an
  error, which in libvirt.py causes an exception to be raised,
  the same type as if the container was bad.
 lxcDomainGetInfo(), holds a mutex on 'dom' for the duration of
 its execution. It checks for virDomainObjIsActive() before
 trying to use the cgroups.

Yes, it does, but

 lxcDomainStart(), holds the mutex on 'dom' for the duration of
 its execution, and does not return until the container is running
 and cgroups are present.

No.  It calls the lxc_controller with --background.  The controller
main task in turn exits before the cgroups have been set up.  There
is the race.

-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [RFC PATCH] lxc: don't return error on GetInfo when cgroups not yet set up

2011-09-28 Thread Serge E. Hallyn

Nova (openstack) calls libvirt to create a container, then
periodically checks using GetInfo to see whether the container
is up.  If it does this too quickly, then libvirt returns an
error, which in libvirt.py causes an exception to be raised,
the same type as if the container was bad.

This may not be the best way to handle it, but with this
patch, we assume that a -ENOENT return from virCgroupForDomain
means the cgroups are not yet set up, and so we return the
same values for cpu and memory usage as if the domain was not
active.

Signed-off-by: Serge Hallyn serge.hal...@canonical.com
---
 src/lxc/lxc_driver.c |   37 +
 1 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c
index 4b62600..a68b8e7 100644
--- a/src/lxc/lxc_driver.c
+++ b/src/lxc/lxc_driver.c
@@ -542,26 +542,31 @@ static int lxcDomainGetInfo(virDomainPtr dom,
 info-cpuTime = 0;
 info-memory = vm-def-mem.cur_balloon;
 } else {
-if (virCgroupForDomain(driver-cgroup, vm-def-name, cgroup, 0) != 
0) {
+int ret = virCgroupForDomain(driver-cgroup, vm-def-name, cgroup, 
0);
+if (ret == -ENOENT) {
+/* cgroups are not set up yet */
+info-cpuTime = 0;
+info-memory = vm-def-mem.cur_balloon;
+} else if (ret != 0) {
 lxcError(VIR_ERR_INTERNAL_ERROR,
  _(Unable to get cgroup for %s), vm-def-name);
 goto cleanup;
-}
-
-if (virCgroupGetCpuacctUsage(cgroup, (info-cpuTime))  0) {
-lxcError(VIR_ERR_OPERATION_FAILED,
- %s, _(Cannot read cputime for domain));
-goto cleanup;
-}
-if ((rc = virCgroupGetMemoryUsage(cgroup, (info-memory)))  0) {
-lxcError(VIR_ERR_OPERATION_FAILED,
- %s, _(Cannot read memory usage for domain));
-if (rc == -ENOENT) {
-/* Don't fail if we can't read memory usage due to a lack of
- * kernel support */
-info-memory = 0;
-} else
+} else {
+if (virCgroupGetCpuacctUsage(cgroup, (info-cpuTime))  0) {
+lxcError(VIR_ERR_OPERATION_FAILED,
+ %s, _(Cannot read cputime for domain));
 goto cleanup;
+}
+if ((rc = virCgroupGetMemoryUsage(cgroup, (info-memory)))  0) {
+lxcError(VIR_ERR_OPERATION_FAILED,
+ %s, _(Cannot read memory usage for domain));
+if (rc == -ENOENT) {
+/* Don't fail if we can't read memory usage due to a lack 
of
+ * kernel support */
+info-memory = 0;
+} else
+goto cleanup;
+}
 }
 }
 
-- 
1.7.5.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH 1/1] lvm storage backend: handle command_names=1 in lvm.conf (v2)

2011-09-28 Thread Serge E. Hallyn

[ Thanks for the feedback, Eric.  Hopefully I correctly incorporated it
in this version.  This version still tests fine with and without
lvm.conf command_names=1 ]

If the regexes supported (?:pvs)?, then we could handle this by
optionally matching but not returning the initial command name.  But it
doesn't.  So add a new char* argument to
virStorageBackendRunProgRegex().  If that argument is NULL then we act
as usual.  Otherwise, if the string at that argument is found at the
start of a returned line, we drop that before running the regex.

With this patch, virt-manager shows me lvs with command_names 1 or 0.

The definitions of PVS_BASE etc may want to be moved into the configure
scripts (though given how PVS is found, IIUC that could only happen if
pvs was a link to pvs_real), but in any case no sense dealing with that
until we're sure this is an ok way to handle it.

Changelog:
  Sep 28: Use STRSKIP and make cmd_to_ignore arg const, as per Eric's
  feedback.

Signed-off-by: Serge Hallyn serge.hal...@canonical.com
---
 src/storage/storage_backend.c |   15 +++
 src/storage/storage_backend.h |2 +-
 src/storage/storage_backend_fs.c  |2 +-
 src/storage/storage_backend_iscsi.c   |4 ++--
 src/storage/storage_backend_logical.c |9 ++---
 5 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c
index d125504..dd7e36b 100644
--- a/src/storage/storage_backend.c
+++ b/src/storage/storage_backend.c
@@ -1400,7 +1400,7 @@ virStorageBackendRunProgRegex(virStoragePoolObjPtr pool,
   const char **regex,
   int *nvars,
   virStorageBackendListVolRegexFunc func,
-  void *data)
+  void *data, const char *cmd_to_ignore)
 {
 int fd = -1, err, ret = -1;
 FILE *list = NULL;
@@ -1460,13 +1460,20 @@ virStorageBackendRunProgRegex(virStoragePoolObjPtr pool,
 }
 
 while (fgets(line, sizeof(line), list) != NULL) {
+char *p = NULL;
 /* Strip trailing newline */
 int len = strlen(line);
 if (len  line[len-1] == '\n')
 line[len-1] = '\0';
 
+/* if cmd_to_ignore is specified, then ignore it */
+if (cmd_to_ignore)
+p = STRSKIP(line, cmd_to_ignore);
+if (!p)
+p = line;
+
 for (i = 0 ; i = maxReg  i  nregex ; i++) {
-if (regexec(reg[i], line, nvars[i]+1, vars, 0) == 0) {
+if (regexec(reg[i], p, nvars[i]+1, vars, 0) == 0) {
 maxReg++;
 
 if (i == 0)
@@ -1475,9 +1482,9 @@ virStorageBackendRunProgRegex(virStoragePoolObjPtr pool,
 /* NULL terminate each captured group in the line */
 for (j = 0 ; j  nvars[i] ; j++) {
 /* NB vars[0] is the full pattern, so we offset j by 1 */
-line[vars[j+1].rm_eo] = '\0';
+p[vars[j+1].rm_eo] = '\0';
 if ((groups[ngroup++] =
- strdup(line + vars[j+1].rm_so)) == NULL) {
+ strdup(p + vars[j+1].rm_so)) == NULL) {
 virReportOOMError();
 goto cleanup;
 }
diff --git a/src/storage/storage_backend.h b/src/storage/storage_backend.h
index 67ac32c..75ed676 100644
--- a/src/storage/storage_backend.h
+++ b/src/storage/storage_backend.h
@@ -140,7 +140,7 @@ int virStorageBackendRunProgRegex(virStoragePoolObjPtr pool,
   const char **regex,
   int *nvars,
   virStorageBackendListVolRegexFunc func,
-  void *data);
+  void *data, const char *cmd_to_ignore);
 
 int virStorageBackendRunProgNul(virStoragePoolObjPtr pool,
 const char **prog,
diff --git a/src/storage/storage_backend_fs.c b/src/storage/storage_backend_fs.c
index da98f87..d678625 100644
--- a/src/storage/storage_backend_fs.c
+++ b/src/storage/storage_backend_fs.c
@@ -266,7 +266,7 @@ virStorageBackendFileSystemNetFindPoolSources(virConnectPtr 
conn ATTRIBUTE_UNUSE
 
 if (virStorageBackendRunProgRegex(NULL, prog, 1, regexes, vars,
 virStorageBackendFileSystemNetFindPoolSourcesFunc,
-state)  0)
+state, NULL)  0)
 goto cleanup;
 
 retval = virStoragePoolSourceListFormat(state.list);
diff --git a/src/storage/storage_backend_iscsi.c 
b/src/storage/storage_backend_iscsi.c
index 346e698..99e69c9 100644
--- a/src/storage/storage_backend_iscsi.c
+++ b/src/storage/storage_backend_iscsi.c
@@ -160,7 +160,7 @@ virStorageBackendISCSISession(virStoragePoolObjPtr pool,
   regexes,

[libvirt] [PATCH 1/1] [RFC] lvm storage backend: handle command_names=1 in lvm.conf

2011-09-15 Thread Serge E. Hallyn

If the regexes supported (?:pvs)?, then we could handle this by
optionally matching but not returning the initial command name.  But it
doesn't.  So add a new char* argument to
virStorageBackendRunProgRegex().  If that argument is NULL then we act
as usual.  Otherwise, if the string at that argument is found at the
start of a returned line, we drop that before running the regex.

With this patch, virt-manager shows me lvs with command_names 1 or 0.

The definitions of PVS_BASE etc may want to be moved into the configure
scripts (though given how PVS is found, IIUC that could only happen if
pvs was a link to pvs_real), but in any case no sense dealing with that
until we're sure this is an ok way to handle it.

Signed-off-by: Serge Hallyn serge.hal...@canonical.com
---
 src/storage/storage_backend.c |   14 ++
 src/storage/storage_backend.h |2 +-
 src/storage/storage_backend_fs.c  |2 +-
 src/storage/storage_backend_iscsi.c   |4 ++--
 src/storage/storage_backend_logical.c |9 ++---
 5 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c
index d125504..7a352da 100644
--- a/src/storage/storage_backend.c
+++ b/src/storage/storage_backend.c
@@ -1400,7 +1400,7 @@ virStorageBackendRunProgRegex(virStoragePoolObjPtr pool,
   const char **regex,
   int *nvars,
   virStorageBackendListVolRegexFunc func,
-  void *data)
+  void *data, char *cmd_to_ignore)
 {
 int fd = -1, err, ret = -1;
 FILE *list = NULL;
@@ -1460,13 +1460,19 @@ virStorageBackendRunProgRegex(virStoragePoolObjPtr pool,
 }
 
 while (fgets(line, sizeof(line), list) != NULL) {
+char *p;
 /* Strip trailing newline */
 int len = strlen(line);
 if (len  line[len-1] == '\n')
 line[len-1] = '\0';
 
+p = line;
+/* if cmd_to_ignore is specified, then ignore it */
+if (strncmp(line, cmd_to_ignore, strlen(cmd_to_ignore)) == 0)
+p += strlen(cmd_to_ignore);
+
 for (i = 0 ; i = maxReg  i  nregex ; i++) {
-if (regexec(reg[i], line, nvars[i]+1, vars, 0) == 0) {
+if (regexec(reg[i], p, nvars[i]+1, vars, 0) == 0) {
 maxReg++;
 
 if (i == 0)
@@ -1475,9 +1481,9 @@ virStorageBackendRunProgRegex(virStoragePoolObjPtr pool,
 /* NULL terminate each captured group in the line */
 for (j = 0 ; j  nvars[i] ; j++) {
 /* NB vars[0] is the full pattern, so we offset j by 1 */
-line[vars[j+1].rm_eo] = '\0';
+p[vars[j+1].rm_eo] = '\0';
 if ((groups[ngroup++] =
- strdup(line + vars[j+1].rm_so)) == NULL) {
+ strdup(p + vars[j+1].rm_so)) == NULL) {
 virReportOOMError();
 goto cleanup;
 }
diff --git a/src/storage/storage_backend.h b/src/storage/storage_backend.h
index 67ac32c..aa467f5 100644
--- a/src/storage/storage_backend.h
+++ b/src/storage/storage_backend.h
@@ -140,7 +140,7 @@ int virStorageBackendRunProgRegex(virStoragePoolObjPtr pool,
   const char **regex,
   int *nvars,
   virStorageBackendListVolRegexFunc func,
-  void *data);
+  void *data, char *cmd_to_ignore);
 
 int virStorageBackendRunProgNul(virStoragePoolObjPtr pool,
 const char **prog,
diff --git a/src/storage/storage_backend_fs.c b/src/storage/storage_backend_fs.c
index 02c4c17..a98db4b 100644
--- a/src/storage/storage_backend_fs.c
+++ b/src/storage/storage_backend_fs.c
@@ -266,7 +266,7 @@ virStorageBackendFileSystemNetFindPoolSources(virConnectPtr 
conn ATTRIBUTE_UNUSE
 
 if (virStorageBackendRunProgRegex(NULL, prog, 1, regexes, vars,
 virStorageBackendFileSystemNetFindPoolSourcesFunc,
-state)  0)
+state, NULL)  0)
 goto cleanup;
 
 retval = virStoragePoolSourceListFormat(state.list);
diff --git a/src/storage/storage_backend_iscsi.c 
b/src/storage/storage_backend_iscsi.c
index 346e698..99e69c9 100644
--- a/src/storage/storage_backend_iscsi.c
+++ b/src/storage/storage_backend_iscsi.c
@@ -160,7 +160,7 @@ virStorageBackendISCSISession(virStoragePoolObjPtr pool,
   regexes,
   vars,
   virStorageBackendISCSIExtractSession,
-  session)  0)
+  session, NULL)  0)
 return NULL;
 
 if (session == NULL 
@@ -517,7

[libvirt] [RFC] security_dac: don't chown iso file

2011-09-13 Thread Serge E. Hallyn

isos are read-only, so libvirt doesn't need to chown them.  In one of
our testing setups, libvirt uses mirrorred isos.  Since libvirt chowns
the files, (and especially does not chown them back) the mirror refuses
to update the iso.

This patch prevents libvirt from chowning files.

Does this seem reasonable?

Signed-off-by: Serge Hallyn serge.hal...@canonical.com
---
 src/security/security_dac.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/src/security/security_dac.c b/src/security/security_dac.c
index af02236..e7db324 100644
--- a/src/security/security_dac.c
+++ b/src/security/security_dac.c
@@ -555,6 +555,8 @@ virSecurityDACSetSecurityAllLabel(virSecurityManagerPtr mgr,
 /* XXX fixme - we need to recursively label the entire tree :-( */
 if (vm-def-disks[i]-type == VIR_DOMAIN_DISK_TYPE_DIR)
 continue;
+   if (vm-def-disks[i]-device == VIR_DOMAIN_DISK_DEVICE_CDROM)
+   continue;
 if (virSecurityDACSetSecurityImageLabel(mgr,
 vm,
 vm-def-disks[i])  0)
-- 
1.7.5.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] lvm backed storage

2011-09-12 Thread Serge E. Hallyn

Quoting Daniel P. Berrange (berra...@redhat.com):
 On Thu, Sep 08, 2011 at 11:00:07AM -0500, Serge Hallyn wrote:
  Hi,
  
  When lvm.conf has 'command_names = 1', then all results are prefixed with
  the command name.  This confuses libvirt which does not ignore those.  I
  thought fixing that would be a simple case of detecting those conditions
  at virStorageBackendLogicalMakeVol() and friends, but I was wrong - those
  functions are not getting called at all when command_names=1.  I'll keep
  nosing around, but it seemed prudent to ping the list and ask what you
  think would be the cleanest way to handle this case?
 
 Urgh, what a frickin' horrible feature to have in a config file, with
 no command line override.

Agreed!

 That said, it should be possible to fix via a regex tweak. There are
 several places where we call virStorageBackendRunProgRegex in
 storage_backend_logical.c. In all of those cases you need to tweak
 the 'regexes' definition to add a bit of magic to optionally detect
 and ignore the prefix.

I was trying that in the backend_logical.c file, but when command_names=1
that code never even gets called, so there is some other place that is
calling a pvscan or somesuch that I haven't found yet.

Will take another look though.

thanks,
-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Notes from the KVM Forum relevant to libvirt

2011-08-25 Thread Serge E. Hallyn

Quoting Stefan Hajnoczi (stefa...@gmail.com):
 On Thu, Aug 25, 2011 at 11:03 AM, Daniel P. Berrange
 berra...@redhat.com wrote:
  On Thu, Aug 25, 2011 at 10:10:27AM +0100, Stefan Hajnoczi wrote:
  On Wed, Aug 24, 2011 at 3:46 PM, Daniel P. Berrange berra...@redhat.com 
  wrote:
   On Wed, Aug 24, 2011 at 03:20:57PM +0100, Stefan Hajnoczi wrote:
   On Tue, Aug 23, 2011 at 4:31 PM, Daniel P. Berrange 
   berra...@redhat.com wrote:
On Tue, Aug 23, 2011 at 04:24:46PM +0100, Stefan Hajnoczi wrote:
On Tue, Aug 23, 2011 at 12:15 PM, Daniel P. Berrange
berra...@redhat.com wrote:
 I was at the KVM Forum / LinuxCon last week and there were many
 interesting things discussed which are relevant to ongoing libvirt
 development. Here was the list that caught my attention. If I have
 missed any, fill in the gaps

  - Sandbox/container KVM.  The Solaris port of KVM puts QEMU inside
   a zone so that an exploit of QEMU can't escape into the full OS.
   Containers are Linux's parallel of Zones, and while not nearly as
   secure yet, it would still be worth using more containers support
   to confine QEMU.
   
Can you elaborate on why Linux containers are not nearly as secure
[as Solaris Zones]?
   
Mostly because the Linux namespace functionality is far from complete,
notably lacking proper UID/GID/capability separation, and UID/GID
virtualization wrt filesystems. The longer answer is here:
   
  https://wiki.ubuntu.com/UserNamespace
   
So at this time you can't build a secure container on Linux, relying
just on DAC alone. You have to add in a MAC layer ontop of the 
container
to get full security benefits, which obviously defeats the point of
using the container as a backup for failure in the MAC layer.
  
   Thanks, that is interesting.  I still don't understand why that is a
   problem.  Linux containers (lxc) uses a different pid namespace (no
   ptrace worries), file system root (restricted to a subdirectory tree),
   forbids most device nodes, etc.  Why does the user namespace matter
   for security in this case?
  
   A number of reasons really...
  
   If user ID '0' on the host starts a container, and a process inside
   the container does 'setuid(500)', then any user outside the container
   with UID 500 will be able to kill that process. Only user ID '0' should
   have been allowed todo that.
  
   It will also let non-root user IDs on the host OS, start containers
   and have root uid=0 inside the container.
  
   Finally, any files created inside the container with, say, uid 500
   will be accessible by any other process with UID 500, in either the
   host or any other container
 
  These points mean that the host can peek inside containers and has
  access to their processes/files.  But from the point of a libvirt
  running inside a container there is no security problem.
 
  This is kind of like saying that root on the host can modify KVM guest
  disk images.  That is true but I don't see it as a security problem
  because the root on the host is the trusted part of the system.
 
   I think it matters when giving multiple containers access to the same
   file system.  Is that what you'd like to do for libvirt?
  
   Each container would have to share a (readonly) view onto the host
   filesystem so it can see the QEMU emulator install / libraries. There
   would also have to be some writable areas per QEMU container.  QEMU
   inside the container would be set to run as some non-root UID (from
   the container's POV). So both problem 1  3 above would impact the
   security of this confinement.
 
  But is there a way to escape confinement?  If not, then this is secure.
 
  The filesystem UID/GID ownership is the most likely way you can escape
  the confinement. You would have to be very careful to ensure that each
  container's view of the filesystem did not include any directories
  with files that are assigned to another container, since the UID
  separation would not prevent access to another container's resources.
 
  This is rather tedious but could be just about doable, but it gets
  harder when you throw in things like sysfs and PCI device assignment.
  eg a guest with PCI device assigned gets given ownership of the files
  in /sys/bus/pci/devices/:00:XX:XX/ and since there is no UID
  namespacing, this will be accessible to any other container with the
  same UID. To hack around this when starting up a container you would
  probably have to bind mount a empty tmpfs over the top of all the
  PCI device paths you wanted to block in sysfs.

Which of course is easily undoable by root in the container :)

 Ah, I hadn't thought of /sys/bus/pci or /sys/bus/usb!
 
 Thanks for the explanation and it does seem like the design would get messy.

And plenty more, i.e.  http://blog.bofh.it/debian/id_413

See http://sourceforge.net/mailarchive/message.php?msg_id=27878921 for
someone actively using Smack to help

[libvirt] enumerating disk devices

2011-07-19 Thread Serge E. Hallyn

Hi,

Some people appear to have autostart VMs residing on slow storage.  If
libvirtd starts too early, it'll simply fail trying to start those VMs.
It'd be nice to know when all the storage on which autostart containers
depend becomes available, so as to safely start libvirt.

The python script appended below is one approach.  It could be called as
something like

for f in /etc/libvirt/qemu/autostart/* /etc/libvirt/lxc/autostart/*; do
for d in `libvirt_list_storage.py $f`; do
if [ ! -r $d ]; then
echo storage not yet ready
exit 1
fi
done
done
echo starting libvirtd
libvirtd

(Realistically I would have this running in an upstart job which is
'start on mounted' and, if all storage available, emits a
'libvirt-storage-up' event).

Are there better ways to go about this?

thanks,
-serge

#!/usr/bin/python

import sys
from xml.dom.minidom import parse

dom = parse(sys.argv[1])

def getText(nodelist):
rc = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
rc.append(node.data)
return ''.join(rc)

def print_domname(dom):
#return
print Domain %s\n % getText(dom.childNodes)

def liststorages(dom):
print_domname(dom.getElementsByTagName(name)[0])
disks = dom.getElementsByTagName(disk)
list_disks(disks)
filesystems = dom.getElementsByTagName('filesystem')
list_filesystems(filesystems)

def list_filesystems(fslist):
for fs in fslist:
if not fs.hasAttribute('type'):
continue
if fs.getAttribute('type') != 'mount':
continue
dirs = fs.getElementsByTagName(source)
if len(dirs) == 0:
continue
if not dirs[0].getAttribute('dir'):
continue
print dir: %s\n % dirs[0].getAttribute('dir')

def list_disks(disks):
for disk in disks:
if not disk.hasAttribute('type'):
continue
if disk.getAttribute('type') == 'file':
list_disk_type_file(disk)
elif disk.getAttribute('type') == 'block':
list_disk_type_block(disk)
else:
# not supported
continue

def list_disk_type_file(disk):
disksrcs = disk.getElementsByTagName(source)
if len(disksrcs) == 0:
return
if not disksrcs[0].hasAttribute('file'):
return
print file: %s\n % disksrcs[0].getAttribute('file')

def list_disk_type_block(disk):
devsrscs = disk.getElementsByTagName(source)
if len(devsrscs) == 0:
return
if not devsrscs[0].hasAttribute('dev'):
return
print dev: %s\n % devsrscs[0].getAttribute('dev')

liststorages(dom)

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] RFC: experimental libvirtd upstart job

2011-02-23 Thread Serge E. Hallyn

Quoting Daniel P. Berrange (berra...@redhat.com):
 On Fri, Feb 18, 2011 at 03:48:29PM -0600, Serge E. Hallyn wrote:
  Quoting ape...@gmail.com (ape...@gmail.com):
   From: Alan Pevec ape...@redhat.com
  =
  # libvirt-cgred-wait
  start on starting libvirt-bin
  stop on started cgred or stopped cgred
  
  task
  normal exit 2
  script
status cgred | grep -q start/running  exit 0
start cgred || true
sleep 3600
  end script
  =
 
 FYI, libvirt has no need for the  cgred to be present / running, and

Absolutely.  And so the above job ships with the libcgroup package,
not with libvirt.

 indeed if you're not careful with the rules you create for cgred it
 could really mess things up by moving VM proceses to a different
 cgroup behind libvirt's back.

Yes.  And with the default rules, if libcgroup is installed *and*
if libvirt does not wait until libcgroup is configured at boot, then
VMs won't start up.

So when just libvirt is installed, the above job is not needed (and
not installed on the machine).  When just libcgroup is installed, the
above job will be installed, but since it starts on 'starting libvirt',
it won't ever run if libvirt is not installed.

 For cgroups integration, libvirt merely requires that something has
 mounted the various cgroups controllers. The can be mounted in any
 location, and with any combination of controllers. libvirtd will
 detect what cgroups it is located in at startup, and create its
 own cgroups below these points in the hierarchy.

Yup, I definately don't want to suggest that there is anything wrong
with what libvirt does.

 On Fedora we do this with the 'cgconfig' init script which more or
 less just runs  'cgconfigparse -l /etc/cgconfig.conf' to mount the
 bits

Right, and the above job just makes sure that the above has finished
before libvirt starts.

thanks,
-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] RFC: experimental libvirtd upstart job

2011-02-18 Thread Serge E. Hallyn

Quoting ape...@gmail.com (ape...@gmail.com):
 From: Alan Pevec ape...@redhat.com
 
 To install it, disable libvirtd sysv initscript:
 chkconfig libvirtd off
 service libvirtd stop
 
 and enable libvirtd upstart job:
 cp  /usr/share/doc/libvirt-*/libvirtd.upstart \
 /etc/init/libvirtd.conf
 initctl reload-configuration
 initctl start libvirtd
 
 Test:
 initctl status libvirtd
 libvirtd start/running, process 3929
 killall -9 libvirtd
 initctl status libvirtd
 libvirtd start/running, process 4047
 ---
  daemon/Makefile.am  |1 +
  daemon/libvirtd.upstart |   47 
 +++
  libvirt.spec.in |1 +
  3 files changed, 49 insertions(+), 0 deletions(-)
  create mode 100644 daemon/libvirtd.upstart
 
 diff --git a/daemon/Makefile.am b/daemon/Makefile.am
 index cdf0f75..a4ebcf8 100644
 --- a/daemon/Makefile.am
 +++ b/daemon/Makefile.am
 @@ -27,6 +27,7 @@ EXTRA_DIST =
 \
   remote_generate_stubs.pl\
   libvirtd.conf   \
   libvirtd.init.in\
 + libvirtd.upstart\
   libvirtd.policy-0   \
   libvirtd.policy-1   \
   libvirtd.sasl   \
 diff --git a/daemon/libvirtd.upstart b/daemon/libvirtd.upstart
 new file mode 100644
 index 000..40d5fa3
 --- /dev/null
 +++ b/daemon/libvirtd.upstart
 @@ -0,0 +1,47 @@
 +# libvirtd upstart job
 +#
 +# XXX wait for rc to get all dependent initscripts started
 +# from sysv libvirtd initscript: Required-Start: $network messagebus
 +start on stopped rc RUNLEVEL=[345]
 +stop on runlevel [016]
 +respawn
 +# default rate limit is 10x/5s
 +#respawn limit 10 5
 +
 +# DAEMON_COREFILE_LIMIT in /etc/sysconfig/libvirtd doesn't have effect
 +# must set resource limits here
 +#limit core unlimited unlimited
 +
 +# documented http://upstart.ubuntu.com/wiki/Stanzas#pid
 +# but not accepted by upstart-0.6.5
 +#pid file /var/run/libvirtd.pid
 +
 +env PIDFILE=/var/run/libvirtd.pid
 +
 +script
 + LIBVIRTD_CONFIG=
 + LIBVIRTD_ARGS=
 + KRB5_KTNAME=/etc/libvirt/krb5.tab
 + 
 + test -f /etc/sysconfig/libvirtd  . /etc/sysconfig/libvirtd
 + 
 + export QEMU_AUDIO_DRV
 + export SDL_AUDIODRIVER
 + 
 + LIBVIRTD_CONFIG_ARGS=
 + if [ -n $LIBVIRTD_CONFIG ]
 + then
 + LIBVIRTD_CONFIG_ARGS=--config $LIBVIRTD_CONFIG
 + else
 + true
 + fi
 + 
 + mkdir -p /var/cache/libvirt
 + rm -rf /var/cache/libvirt/*
 + KRB5_KTNAME=$KRB5_KTNAME /usr/sbin/libvirtd $LIBVIRTD_CONFIG_ARGS 
 $LIBVIRTD_ARGS
 +end script
 +
 +post-stop script
 + rm -f $PIDFILE
 + rm -rf /var/cache/libvirt/*
 +end script

Hi,

If you're using upstart, then why use your own pidfile? FWIW, attached is
the upstart script used in Ubuntu:

=
description libvirt daemon
author Dustin Kirkland kirkl...@canonical.com

start on runlevel [2345]
stop on runlevel [!2345]

expect daemon
respawn

env libvirtd_opts=-d
env start_libvirtd=yes
pre-start script
[ -r /etc/default/libvirt-bin ]  . /etc/default/libvirt-bin
[ ! x$start_libvirtd = xyes ]  { stop; exit 0; }
mkdir -p /var/run/libvirt
# Clean up a pidfile that might be left around
rm -f /var/run/libvirtd.pid
end script

# If you used to set $libvirtd_opts in /etc/default/libvirt-bin,
# change the 'exec' line here instead.
exec /usr/sbin/libvirtd $libvirtd_opts
=

In order to play nicely with libcgroup, we're also testing
the following job to ship with libcgroup:

=
# libvirt-cgred-wait
start on starting libvirt-bin
stop on started cgred or stopped cgred

task
normal exit 2
script
  status cgred | grep -q start/running  exit 0
  start cgred || true
  sleep 3600
end script
=

thanks,
-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] libvirt 0.8.7 tests failure on arm and ppc

2011-02-15 Thread Serge E. Hallyn

Quoting Matthias Bolte (matthias.bo...@googlemail.com):
 2011/2/15 Serge E. Hallyn serge.hal...@canonical.com:
  Hi, as per the message after the tests fail, I'm reporting this on
  the list.  Hopefully someone has seen this before.  I've not yet
  tried this with the latest git snapshot.  With 0.8.7, I get:
 
  TEST: qemuxml2argvtest
   40
   80
  .!.!     116 FAIL
 
  -serge
 
 
 I don't have ARM or PPC at hand to test, so I can't really tell why
 these tests fail.
 
 Could you rerun this test with more verbose output like this
 
 cd /path/to/libvirt
 cd tests
 LIBVIRT_LOG_OUTPUTS=3:stderr VIR_TEST_DEBUG=2 \
 ./qemuxml2argvtest  qemuxml2argvtest.log 21
 
 and attach the qemuxml2argvtest.log.
 
 Matthias

Thanks, Matthias.  Here is the output (with 'OK' lines trimmed):

TEST: qemuxml2argvtest
QEMU driver capabilities:
capabilities

  host
cpu
  archarmv7l/arch
  modelcore2duo/model
  vendorIntel/vendor
  topology sockets='1' cores='2' threads='1'/
  feature name='lahf_lm'/
  feature name='xtpr'/
  feature name='cx16'/
  feature name='tm2'/
  feature name='est'/
  feature name='vmx'/
  feature name='ds_cpl'/
  feature name='pbe'/
  feature name='tm'/
  feature name='ht'/
  feature name='ss'/
  feature name='acpi'/
  feature name='ds'/
/cpu
  /host

  guest
os_typehvm/os_type
arch name='i686'
  wordsize32/wordsize
  emulator/usr/bin/qemu/emulator
  machinepc/machine
  machineisapc/machine
  domain type='qemu'
  /domain
/arch
  /guest

  guest
os_typehvm/os_type
arch name='x86_64'
  wordsize64/wordsize
  emulator/usr/bin/qemu-system-x86_64/emulator
  machinepc-0.11/machine
  machine canonical='pc-0.11'pc/machine
  machinepc-0.10/machine
  machineisapc/machine
  domain type='qemu'
  /domain
  domain type='kvm'
emulator/usr/bin/kvm/emulator
machinepc/machine
machineisapc/machine
  /domain
/arch
  /guest

  guest
os_typexen/os_type
arch name='x86_64'
  wordsize64/wordsize
  emulator/usr/bin/xenner/emulator
  machinexenner/machine
  domain type='kvm'
emulator/usr/bin/kvm/emulator
  /domain
/arch
  /guest

/capabilities
 3) QEMU XML-2-ARGV machine-aliases2  ... 
13:35:12.002: 8839: error : qemuBuildCommandLine:2550 : unsupported 
configuration: the QEMU binary /usr/bin/kvm does not support kvm
OK
 9) QEMU XML-2-ARGV bootloader... 
13:35:12.065: 8839: error : qemuBuildCommandLine:2550 : unsupported 
configuration: the QEMU binary /usr/bin/xenner does not support kvm
OK
52) QEMU XML-2-ARGV input-xen ... 
13:35:12.304: 8839: error : qemuBuildCommandLine:2550 : unsupported 
configuration: the QEMU binary /usr/bin/xenner does not support kvm
OK
110) QEMU XML-2-ARGV cpu-topology2 ... 
13:35:12.620: 8839: error : qemuBuildCpuArgStr:2300 : unsupported 
configuration: CPU specification not supported by hypervisor
libvir: QEMU error : unsupported configuration: CPU specification not supported 
by hypervisor
FAILED
111) QEMU XML-2-ARGV cpu-topology3 ... OK
112) QEMU XML-2-ARGV cpu-minimum1  ... 
13:35:12.642: 8839: error : qemuBuildCpuArgStr:2300 : unsupported 
configuration: CPU specification not supported by hypervisor
libvir: QEMU error : unsupported configuration: CPU specification not supported 
by hypervisor
FAILED
113) QEMU XML-2-ARGV cpu-minimum2  ... 
13:35:12.653: 8839: error : qemuBuildCpuArgStr:2300 : unsupported 
configuration: CPU specification not supported by hypervisor
libvir: QEMU error : unsupported configuration: CPU specification not supported 
by hypervisor
FAILED
114) QEMU XML-2-ARGV cpu-exact1... 
13:35:12.665: 8839: error : qemuBuildCpuArgStr:2300 : unsupported 
configuration: CPU specification not supported by hypervisor
libvir: QEMU error : unsupported configuration: CPU specification not supported 
by hypervisor
FAILED
115) QEMU XML-2-ARGV cpu-exact2... 
13:35:12.677: 8839: error : qemuBuildCpuArgStr:2300 : unsupported 
configuration: CPU specification not supported by hypervisor
libvir: QEMU error : unsupported configuration: CPU specification not supported 
by hypervisor
FAILED
before cpu-strict1
116) QEMU XML-2-ARGV cpu-strict1   ... 
13:35:12.689: 8839: error : qemuBuildCpuArgStr:2300 : unsupported 
configuration: CPU specification not supported by hypervisor
libvir: QEMU error : unsupported configuration: CPU specification not supported 
by hypervisor
FAILED
after cpu-strict1

--
libvir-list

Re: [libvirt] libvirt 0.8.7 tests failure on arm and ppc

2011-02-15 Thread Serge E. Hallyn

Quoting Jiri Denemark (jdene...@redhat.com):
 On Tue, Feb 15, 2011 at 07:37:37 -0600, Serge E. Hallyn wrote:
  TEST: qemuxml2argvtest
  QEMU driver capabilities:
  capabilities
  
host
  cpu
archarmv7l/arch
 
 This is the problem. The code does not properly dealing with the case where
 host CPU architecture as described by host capabilities does not match the
 architecture of the real host CPU. This case can only happen in tests where we
 fake host CPU.
 
 I have a fix for that already. I'll make a proper patch from it and send soon.
 
 Jirka

Thanks!

I'll wait for that and slurp it in.  (Presumably the same thing is going
on with my ppc failure.)

thanks,
-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 1/2] tests: Fake host capabilities properly

2011-02-15 Thread Serge E. Hallyn

Quoting Jiri Denemark (jdene...@redhat.com):
 Since we fake host CPU we should also fake host arch instead of taking
 the real architecture tests are running on.
 ---
  tests/testutilsqemu.c |8 +---
  1 files changed, 5 insertions(+), 3 deletions(-)

Tested-by: Serge Hallyn serge.hal...@canonical.com

Thanks, Jiri!

 
 diff --git a/tests/testutilsqemu.c b/tests/testutilsqemu.c
 index bfac307..fbbb6c1 100644
 --- a/tests/testutilsqemu.c
 +++ b/tests/testutilsqemu.c
 @@ -95,7 +95,7 @@ virCapsPtr testQemuCapsInit(void) {
  };
  
  uname (utsname);
 -if ((caps = virCapabilitiesNew(utsname.machine,
 +if ((caps = virCapabilitiesNew(host_cpu.arch,
 0, 0)) == NULL)
  return NULL;
  
 @@ -107,7 +107,8 @@ virCapsPtr testQemuCapsInit(void) {
  
  if ((guest = virCapabilitiesAddGuest(caps, hvm, i686, 32,
   /usr/bin/qemu, NULL,
 - nmachines, machines)) == NULL)
 + nmachines, machines)) == NULL ||
 +!virCapabilitiesAddGuestFeature(guest, cpuselection, 1, 0))
  goto cleanup;
  machines = NULL;
  
 @@ -124,7 +125,8 @@ virCapsPtr testQemuCapsInit(void) {
  
  if ((guest = virCapabilitiesAddGuest(caps, hvm, x86_64, 64,
   /usr/bin/qemu-system-x86_64, NULL,
 - nmachines, machines)) == NULL)
 + nmachines, machines)) == NULL ||
 +!virCapabilitiesAddGuestFeature(guest, cpuselection, 1, 0))
  goto cleanup;
  machines = NULL;
  
 -- 
 1.7.4.1
 
 --
 libvir-list mailing list
 libvir-list@redhat.com
 https://www.redhat.com/mailman/listinfo/libvir-list

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 2/2] qemu: Fix command line generation with faked host CPU

2011-02-15 Thread Serge E. Hallyn

Quoting Jiri Denemark (jdene...@redhat.com):
 The code expected that host CPU architecture matches the architecture on
 which libvirt runs. This is normally true but not in tests, where host
 CPU is faked to produce consistent results.
 ---
  src/qemu/qemu_command.c |8 +---
  1 files changed, 5 insertions(+), 3 deletions(-)

Tested-by: Serge Hallyn serge.hal...@canonical.com

 
 diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
 index 1687203..efd120c 100644
 --- a/src/qemu/qemu_command.c
 +++ b/src/qemu/qemu_command.c
 @@ -2440,7 +2440,8 @@ qemuBuildCpuArgStr(const struct qemud_driver *driver,
  *hasHwVirt = false;
  
  if (def-cpu  def-cpu-model) {
 -if (qemuCapsProbeCPUModels(emulator, qemuCmdFlags, ut-machine,
 +if (host 
 +qemuCapsProbeCPUModels(emulator, qemuCmdFlags, host-arch,
 ncpus, cpus)  0)
  goto cleanup;
  
 @@ -2469,7 +2470,7 @@ qemuBuildCpuArgStr(const struct qemud_driver *driver,
  break;
  }
  
 -if (VIR_ALLOC(guest)  0 || !(guest-arch = strdup(ut-machine)))
 +if (VIR_ALLOC(guest)  0 || !(guest-arch = strdup(host-arch)))
  goto no_memory;
  
  if (def-cpu-match == VIR_CPU_MATCH_MINIMUM)
 @@ -2528,8 +2529,9 @@ qemuBuildCpuArgStr(const struct qemud_driver *driver,
  ret = 0;
  
  cleanup:
 +if (guest)
 +cpuDataFree(guest-arch, data);
  virCPUDefFree(guest);
 -cpuDataFree(ut-machine, data);
  
  if (cpus) {
  for (i = 0; i  ncpus; i++)
 -- 
 1.7.4.1
 
 --
 libvir-list mailing list
 libvir-list@redhat.com
 https://www.redhat.com/mailman/listinfo/libvir-list

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] libvirt 0.8.7 tests failure on arm and ppc

2011-02-14 Thread Serge E. Hallyn

Hi, as per the message after the tests fail, I'm reporting this on
the list.  Hopefully someone has seen this before.  I've not yet
tried this with the latest git snapshot.  With 0.8.7, I get:

TEST: qemuxml2argvtest  
  
 40 

 80 

.!.! 116 FAIL   


-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH RFC] Don't use CLONE_NEWUSER for now

2011-02-08 Thread Serge E. Hallyn

Until now, user namespaces have not done much, but (for that
reason) have been innocuous to glob in with other CLONE_
flags.  Upcoming userns development, however, will make tasks
cloned with CLONE_NEWUSER far more restricted.  In particular,
for some time they will be unable to access files with anything
other than the world access perms.

This patch assumes that noone really needs the user namespaces
to be enabled.  If that is wrong, then we can try a more
baroque patch where we create a file owned by a test userid with
700 perms and, if we can't access it after setuid'ing to that
userid, then return 0.  Otherwise, assume we are using an
older, 'harmless' user namespace implementation.

Comments appreciated.  Is it ok to do this?

Signed-off-by: Serge Hallyn serge.hal...@canonical.com
---
 src/lxc/lxc_container.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 876bc62..a735eb7 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -810,7 +810,14 @@ static int lxcContainerChild( void *data )
 
 static int userns_supported(void)
 {
+#if 1
+/*
+ * put off using userns until uid mapping is implemented
+ */
+return 0;
+#else
 return lxcContainerAvailable(LXC_CONTAINER_FEATURE_USER) == 0;
+#endif
 }
 
 /**
-- 
1.7.2.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] if you use user namespaces

2011-02-07 Thread Serge E. Hallyn

Please let me know.  lxc does not use them right now.  Libvirt uses them
for lxc containers f they are available, but I hope we can essentially
have it stop for awhile.  In addition, there's tons of software out
there that I don't know about, and fear of breaking their use of current
user namespaces has been keeping me from pushing further userns patches.

I've outlined how I see user namespaces developing at
https://wiki.ubuntu.com/UserNamespace .  Note there is nothing new
in there - some of it goes a year back, much of it more than two
years.  Nothing actually new.

Currently user namespaces are not very useful, but they do provide
separate uid accounting, and simply tossing CLONE_NEWUSER in with
CLONE_NEWNS and friends has until now been safe to do.  As you can
see, that is going to change.  So if that would cause you pain that
you can't work around, please get back to me.  Otherwise, I'd like
to get serious soon about expanding upon, and pushing upstream, the
patches to make CLONE_NEWUSER more useful for sandboxing.

thanks,
-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] build: avoid corrupted gnulib/tests/Makefile

2011-01-25 Thread Serge E. Hallyn

Quoting Eric Blake (ebl...@redhat.com):
 On 01/24/2011 08:27 PM, Daniel Veillard wrote:
  This may be an upstream gnulib bug, where a more elegant
  solution will present itself in the future:
  http://thread.gmane.org/gmane.comp.lib.gnulib.bugs/24898
 
  But in the meantime, I was able to reproduce both the issue,
  and this solution to work around it.
 
  * bootstrap.conf (bootstrap_epilogue): Ensure that no stray
  ../../.. components remain in gnulib/tests/Makefile.in.
  Reported by Serge Hallyn.
 
ACK,
 
 Thanks; pushed.

Thanks, guys.

-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Update to libvirt-lxc driver doc page

2010-06-16 Thread Serge E. Hallyn

Quoting Eric Blake (ebl...@redhat.com):
 On 06/07/2010 08:55 AM, Serge Hallyn wrote:
  Here is a new drvlxc.html.in file to make the first example work.  I'll
  play with the second example next.
 
 Thanks for the sample text; it looked good to me on a first read.  Are
 you willing to finish out this patch by also including the patch to
 Makefile.am to generate drvlxc.html and distribute the new file?

Hi Eric,

docs/Makefile has

dot_html_in = $(wildcard *.html.in)

and 'make distdir' does compile drvlxc.html

So when I get the second part of drvlxc.html.in updated, I'll send a
git diff for the whole thing.

thanks,
-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Update to libvirt-lxc driver doc page

2010-06-15 Thread Serge E. Hallyn

Quoting Eric Blake (ebl...@redhat.com):
 On 06/07/2010 08:55 AM, Serge Hallyn wrote:
  Here is a new drvlxc.html.in file to make the first example work.  I'll
  play with the second example next.
 
 Thanks for the sample text; it looked good to me on a first read.  Are
 you willing to finish out this patch by also including the patch to
 Makefile.am to generate drvlxc.html and distribute the new file?

Uh, sure.  As I got yelled at last time for not sending the .in file
I was sure it was in there.  But I see it, will send an update.

 However, when I tried the example, I got:
 
 $ virsh --connect lxc:/// start vm1
 error: Failed to start domain vm1
 error: internal error Container 'LIBVIR_LXC_CMD' unexpectedly shutdown
 during startup
 
 when using the virsh in Fedora 13 (libvirt-0.7.7-4.fc13.x86_64).  What
 setup did you test with, to help me in updating my system to have a
 similar setup?

I used ubuntu lucid.  I'll try to reproduce in an uptodate fedora 13.
However LIBVIR_LXC_CMD shouldn't be in the xml at all, rather the
result of

find /usr -name libvirt_lxc

should take its place.

(I haven't yet attacked the second example, so if you prefer to wait
for a single complete update, I'll try to get to that soon-ish)

thanks,
-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Update to libvirt-lxc driver doc page

2010-06-15 Thread Serge E. Hallyn

Quoting Eric Blake (ebl...@redhat.com):
 http://libvirt.org/git/?p=libvirt.git;a=blob;f=README-hacking

Doh - I had no idea the web pages were just sitting in libvirt.git/docs!
Now it all makes sense.  Thanks.

-serge

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 2/3] New utility functions virFileCreate and virDirCreate

2010-01-13 Thread Serge E. Hallyn

Quoting Laine Stump (la...@laine.org):
 These functions create a new file or directory with the given
 uid/gid. If the flag VIR_FILE_CREATE_AS_UID is given, they do this by
 forking a new process, calling setuid/setgid in the new process, and
 then creating the file. This is better than simply calling open then
 fchown, because in the latter case, a root-squashing nfs server would
 create the new file as user nobody, then refuse to allow fchown.
 
 If VIR_FILE_CREATE_AS_UID is not specified, the simpler tactic of
 creating the file/dir, then chowning is is used. This gives better
 results in cases where the parent directory isn't on a root-squashing
 NFS server, but doesn't give permission for the specified uid/gid to
 create files. (Note that if the fork/setuid method fails to create the
 file due to access privileges, the parent process will make a second
 attempt using this simpler method.)
 
 Return from both of these functions is 0 on success, or the value of
 errno if there was a failure.
 ---
  src/util/util.c |  247 
 +++
  src/util/util.h |9 ++
  2 files changed, 256 insertions(+), 0 deletions(-)
 
 diff --git a/src/util/util.c b/src/util/util.c
 index 1d493de..1cb29f4 100644
 --- a/src/util/util.c
 +++ b/src/util/util.c
 @@ -1126,6 +1126,253 @@ int virFileExists(const char *path)
  return(0);
  }
 
 +
 +static int virFileCreateSimple(const char *path, mode_t mode, uid_t uid, 
 gid_t gid) {
 +int fd = -1;
 +int ret = 0;
 +
 +if ((fd = open(path, O_RDWR | O_CREAT | O_EXCL, mode))  0) {
 +ret = errno;
 +virReportSystemError(NULL, errno, _(failed to create file '%s'),
 + path);
 +goto error;
 +}
 +if ((getuid() == 0)  ((uid != 0) || (gid != 0))) {

How about checking for CAP_CHOWN instead of getuid()==0?  Otherwise
if I try installing this certain ways, virFileCreateSimple() will refuse
to try the chown even though it could succeed.

-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 2/3] New utility functions virFileCreate and virDirCreate

2010-01-13 Thread Serge E. Hallyn

Quoting Serge E. Hallyn (se...@us.ibm.com):
 Quoting Laine Stump (la...@laine.org):
  These functions create a new file or directory with the given
  uid/gid. If the flag VIR_FILE_CREATE_AS_UID is given, they do this by
  forking a new process, calling setuid/setgid in the new process, and
  then creating the file. This is better than simply calling open then
  fchown, because in the latter case, a root-squashing nfs server would
  create the new file as user nobody, then refuse to allow fchown.
  
  If VIR_FILE_CREATE_AS_UID is not specified, the simpler tactic of
  creating the file/dir, then chowning is is used. This gives better
  results in cases where the parent directory isn't on a root-squashing
  NFS server, but doesn't give permission for the specified uid/gid to
  create files. (Note that if the fork/setuid method fails to create the
  file due to access privileges, the parent process will make a second
  attempt using this simpler method.)
  
  Return from both of these functions is 0 on success, or the value of
  errno if there was a failure.
  ---
   src/util/util.c |  247 
  +++
   src/util/util.h |9 ++
   2 files changed, 256 insertions(+), 0 deletions(-)
  
  diff --git a/src/util/util.c b/src/util/util.c
  index 1d493de..1cb29f4 100644
  --- a/src/util/util.c
  +++ b/src/util/util.c
  @@ -1126,6 +1126,253 @@ int virFileExists(const char *path)
   return(0);
   }
  
  +
  +static int virFileCreateSimple(const char *path, mode_t mode, uid_t uid, 
  gid_t gid) {
  +int fd = -1;
  +int ret = 0;
  +
  +if ((fd = open(path, O_RDWR | O_CREAT | O_EXCL, mode))  0) {
  +ret = errno;
  +virReportSystemError(NULL, errno, _(failed to create file '%s'),
  + path);
  +goto error;
  +}
  +if ((getuid() == 0)  ((uid != 0) || (gid != 0))) {
 
 How about checking for CAP_CHOWN instead of getuid()==0?  Otherwise
 if I try installing this certain ways, virFileCreateSimple() will refuse
 to try the chown even though it could succeed.

I guess for certain netfs's the uid might matter more, so checking for
either condition - or in fact just trying the fchown, then doing a stat,
then making sure st.st_{ug}id are correct after the fact - seems useful.

-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] Re: kernel summit topic - 'containers end-game'

2009-07-08 Thread Serge E. Hallyn

Quoting Daniel Lezcano (dlezc...@fr.ibm.com):
 Do you plan to do send the minutes of the ksummit ?

Absolutely.  Of course it's not until October.  I'll be sending out
a copy of the notes I take with me (including the info from this
thread) beforehand.

thanks,
-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Cannot get network to work with LXC

2009-07-07 Thread Serge E. Hallyn

Quoting James Yu (cyu...@gmail.com):
 Hi all,
 
 I tried to use the first xml config in http://libvirt.org/drvlxc.html to

Which version of libvirt were you using?  Was it the one which came with
your distro, and, if so, which distro?  You might try installing the
version from
git clone git://libvirt.org/libvirt.git

 setup my first light weight container.
 However, I was unable to get the container's network to work.
 When I tried to give the interface an IP, I got the following error message:
 
 sh-4.0# /sbin/ifconfig veth1 192.168.0.1/24
 SIOCSIFADDR: Permission denied
 SIOCSIFNETMASK: Permission denied
 SIOCGIFADDR: Cannot assign requested address
 SIOCSIFBROADCAST: Permission denied
 SIOCSIFFLAGS: Permission denied
 sh-4.0#
 Any advise ?
 
 
 This is a UTF-8 formatted mail
 ---
 James C.-C.Yu

 --
 Libvir-list mailing list
 Libvir-list@redhat.com
 https://www.redhat.com/mailman/listinfo/libvir-list

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] Re: kernel summit topic - 'containers end-game'

2009-07-06 Thread Serge E. Hallyn

Quoting Oren Laadan (or...@cs.columbia.edu):
 
 
 Serge E. Hallyn wrote:
  A topic on ksummit agenda is 'containers end-game and how do we
  get there'.
  
  So for starters, looking just at application (and system) containers, what 
  do
  the libvirt and liblxc projects want to see in kernel support that is 
  currently
  missing?  Are there specific things that should be done soon to make 
  containers
  more useful and usable?
  
  More generally, the topic raises the question... what 'end-games' are there?
  A few I can think of off-hand include:
  
  1. resource control
  2. lightweight virtual servers
  3. (or 2.5) unprivileged containers/jail-on-steroids
  (lightweight virtual servers in which you might, just
  maybe, almost, be able to give away a root account, at
  least as much as you could do so with a kvm/qemu/xen
  partition)
  4. checkpoint, restart, and migration
  
  For each end-game, what kernel pieces do we think are missing?  For 
  instance,
  people seem agreed that resource control needs io control :)  Containers imo
  need a user namespace.  I think there are quite a few network namespace
  exploiters who require sysfs directory tagging (or some equivalent) to
  allow us to migrate physical devices into network namespaces.  And
  checkpoint/restart needs... checkpoint/restart.
 
 Heh ... it does need ... checkpoint/restart; and a few issues
 which we should think about sometime --

Yup, these are all things we need to discuss.  For some of them we might
just need to flail about and code a few approaches until we figure out an
answer, but then I think that everyone has thought about a few of these
in some detail, so there probably is much we could gain from talking.

...  Does this mean we should try to have a mini-summit in the next 6
months or so?  I'd recommend having one right before kernel summit so
we can get our act together, but getting everyone to tokyo to chat seems
uneconomical :)  It'd be good to chat about at least the first two items
before the summit, though.

Maybe after we finish v17, we pick a few of these and try a focused push
to get answers?

 * Encapsulation of machine/OS config capabilities
- how to detect (versioning, capabilities) ?
- how to deal with mismatches ?  (bail ? emulate ? hope for the best ?)
- what happens if, e.g. VDSO page changes, or how to detect FPU changes...
 
 * Conversion of checkpoint image between kernel version (and automation)
 
 * Network namespaces, mnt namespaces - what's the best approach ?
 
 * Security assessment and brainstorming
 
 * Appealing use-cases for everyday use:
- for hybernation
- to reboot to new kernel without losing your session
- to time travel back to before you lost in bejewled
 
 * Userspace tools - mainly for inspection of checkpoint images
 
 * Testing frameworks
 
 * Distributed c/r ?
 
 * Optimizations: low downtime, pre-copy, post-copy, cow, parallelization
 
 
 Now I really go hide :p
 
 Oren.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] Re: kernel summit topic - 'containers end-game'

2009-07-06 Thread Serge E. Hallyn

Quoting Daniel Lezcano (dlezc...@fr.ibm.com):
 Serge E. Hallyn wrote:
...
 Checkpoint:
   - The initiator of the checkpoint initialize the barrier and send a  
 signal SIGCKPT to all the checkpointable tasks and these ones will jump  
 on the handler and block on the barrier.

   - When all these tasks reach this barrier, the initiator of the
 checkpoint dumps the system wide resources (memory, sysv ipc, struct  
 files, etc ...).

   - When this is done, the tasks are released and they store their  
 process wide resources (semundo, file descriptor, etc ...) to a  
 current-ckpt_restart buffer and then set the status of the operation  
 and block on the barrier.

   - The initiator of the checkpoint then collects all these informations  
 and dump them.

Do you envision all of the dumping being done in kernel or by userspace?

...

   - Finally the initiator of the checkpoint release the tasks.


 Restart:
   - The user executes the statefile, that spawns the process tree and all 
  
 the processes are blocked in the barrier.

   - The initiator of the restart restore the system wide resources
 and fill the restarted processes' current-ckpt_restart buffer.

Same question about restore...

   - The initiator sends a SIGRESTART to all the tasks and unblock the 
 tasks

   - all the tasks restore their process wide resources regarding the  
 current-ckpt_restart buffer.

   - all the tasks write their status and block on the barrier

   - the initiator of the restart release the tasks which will return to  
 their execution context when they were checkpointed.

 This approach is different of you are doing but I am pretty sure most of  
 the code is re-usable. I see different advantages of this approach:

  - because the process resources are checkpointed / restarted from  
 current, it would be easy to reuse some syscalls code (from the kernel  
 POV) and that would reduce the code duplication and maintenance overhead.

  - the approach is more fine grained as we can implement piece by piece  
 the checkpoint / restart.

  - as the statefile is in the elf format, gdb could be used to debug a  
 statefile as a core file

Note btw that Dave has found that a checkpoint is faster than a core-dump
at the moment :)  That's not entirely an aside - I need to reread your
email a few times and really process your suggestion, but given that some
users want to dump hundreds of gigabytes of memory, not slowing down the
checkpoint is a big consideration.

  - as each process checkpoint / restart themselves, most of the  
 execution context is stored in the stack which is CR with the memory, so  
 when returning from the signal handler, the process returns to the right  
 context. That is less complicated and more generic than externally  
 checkpoint the execution context of a frozen task which would be  
 potentially different for the restart.


 I hope Serge you can present this approach as an alternative of the  
 current patchset __if__ this one is not acceptable.

I'll try to understand it better than I do right now - I don't think
it's for discussing at ksummit, but definately if we have a mini-summit
or during the next round of discussions (during or immediately after
the ckpt-v17 publish).

thanks,
-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] Re: kernel summit topic - 'containers end-game'

2009-07-06 Thread Serge E. Hallyn

Quoting Oren Laadan (or...@cs.columbia.edu):
 
 
 Serge E. Hallyn wrote:
  Quoting Oren Laadan (or...@cs.columbia.edu):
 
  Serge E. Hallyn wrote:
  A topic on ksummit agenda is 'containers end-game and how do we
  get there'.
 
  So for starters, looking just at application (and system) containers, 
  what do
  the libvirt and liblxc projects want to see in kernel support that is 
  currently
  missing?  Are there specific things that should be done soon to make 
  containers
  more useful and usable?
 
  More generally, the topic raises the question... what 'end-games' are 
  there?
  A few I can think of off-hand include:
 
1. resource control
2. lightweight virtual servers
3. (or 2.5) unprivileged containers/jail-on-steroids
(lightweight virtual servers in which you might, just
maybe, almost, be able to give away a root account, at
least as much as you could do so with a kvm/qemu/xen
partition)
4. checkpoint, restart, and migration
 
  For each end-game, what kernel pieces do we think are missing?  For 
  instance,
  people seem agreed that resource control needs io control :)  Containers 
  imo
  need a user namespace.  I think there are quite a few network namespace
  exploiters who require sysfs directory tagging (or some equivalent) to
  allow us to migrate physical devices into network namespaces.  And
  checkpoint/restart needs... checkpoint/restart.
  Heh ... it does need ... checkpoint/restart; and a few issues
  which we should think about sometime --
  
  Yup, these are all things we need to discuss.  For some of them we might
  just need to flail about and code a few approaches until we figure out an
  answer, but then I think that everyone has thought about a few of these
  in some detail, so there probably is much we could gain from talking.
  
  ...  Does this mean we should try to have a mini-summit in the next 6
  months or so?  I'd recommend having one right before kernel summit so
  we can get our act together, but getting everyone to tokyo to chat seems
  uneconomical :)  It'd be good to chat about at least the first two items
  before the summit, though.
  
 
 How about linux plumbers ?

Well it seems like an appropriate place for it.  Alas there is almost no chance
of my being there, but let's hear a roll call - how many people (interested in
checkpoint/restart) will be or can be at plumber's?

I'm pretty sure Suka and Dave will be there.

-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] kernel summit topic - 'containers end-game'

2009-06-30 Thread Serge E. Hallyn

Quoting Balbir Singh (bal...@linux.vnet.ibm.com):
 On Tue, Jun 23, 2009 at 8:26 PM, Serge E. Hallynse...@us.ibm.com wrote:
  A topic on ksummit agenda is 'containers end-game and how do we
  get there'.
 
  So for starters, looking just at application (and system) containers, what 
  do
  the libvirt and liblxc projects want to see in kernel support that is 
  currently
  missing?  Are there specific things that should be done soon to make 
  containers
  more useful and usable?
 
  More generally, the topic raises the question... what 'end-games' are there?
  A few I can think of off-hand include:
 
         1. resource control
 
 We intend to hold a io-controller minisummit before KS, we should have
 updates on that front. We also need to discuss CPU hard limits and
 Memory soft limits. We need control for memory large page, mlock, OOM
 notification support, shared page accounting, etc. Eventually on the
 libvirt front, we want to isolate cgroup and lxc support into
 individual components (long term)

Thanks, Balbir.  By the last sentence, are you talking about having
cgroup in its own libcgroup, or do you mean something else?

On the topic of cgroups, does anyone not agree that we should try
to get rid of the ns cgroup, at least once user namespaces can
prevent root in a container from escaping their cgroup?

thanks,
-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 4/3] Control LXC capabilities

2009-06-23 Thread Serge E. Hallyn

Quoting Daniel P. Berrange (berra...@redhat.com):
 This patch updates the LXC driver to make use of libcap-ng for managing
 process capabilities. Previously Ryota Ozaki had provided code to remove
 the CAP_BOOT  capabilities inside the container, preventing host reboots.
 In addition to that one, I believe we should be removing ability to
 load kernel modules, change the system clock and changing audit/MAC.
 So this patch also clears the following:
 
  CAP_SYS_MODULE, /* No kernel module loading */
  CAP_SYS_TIME, /* No changing the clock */
  CAP_AUDIT_CONTROL, /* No messing with auditing */
  CAP_AUDIT_WRITE, /* No messing with auditing */
  CAP_MAC_ADMIN, /* No messing with LSM */
  CAP_MAC_OVERRIDE, /* No messing with LSM */

Thanks, Daniel, this does look good.  I wonder whether there is a more
appropriate list to email caps-related patches (including libcap-ng
itself) to.  Not only does the code itself warrant discussion (for
instance, should capng_lock() also set CAP_NOSUID_FIXUP?), but such
discussions, in one place, about converting several programs to dropping
capabilities would help others to do the same with this code.

I can't think of anything other than the LSM list, so I'm cc:ing it
here.

 We use libcap-ng's capng_updatev/apply functions to remove these from 
 the permitted, inheritable, effective and bounding sets. Then we use
 capng_lock to set NOROOT and NOROOT_LOCKED in the process securebits
 to prevent them ever being re-acquired.
 
 The other thing I realized is that the 'libvirt_lxc' controller process
 does not need to keep any capabilities at all once it has spawned the 
 container process, since all its doing is forwarding I/O between 2 open
 file descripts. So I also clear all capabilities from that. We should
 probably make it chuid/gid to a non-root user in future too. 
 
  lxc_container.c  |   66 
 +--
  lxc_controller.c |   28 +++
  2 files changed, 73 insertions(+), 21 deletions(-)
 
 
 Regards,
 Daniel
 
 diff -r 7e766489c4a2 src/lxc_container.c
 --- a/src/lxc_container.c Tue Jun 23 11:13:45 2009 +0100
 +++ b/src/lxc_container.c Tue Jun 23 11:54:10 2009 +0100
 @@ -41,8 +41,9 @@
  /* For MS_MOVE */
  #include linux/fs.h
 
 -#include sys/prctl.h
 -#include linux/capability.h
 +#if HAVE_CAPNG
 +#include cap-ng.h
 +#endif
 
  #include virterror_internal.h
  #include logging.h
 @@ -642,27 +643,50 @@ static int lxcContainerSetupMounts(virDo
  return lxcContainerSetupExtraMounts(vmDef);
  }
 
 -static int lxcContainerDropCapabilities(virDomainDefPtr vmDef 
 ATTRIBUTE_UNUSED)
 +
 +/*
 + * This is running as the 'init' process insid the container.
 + * It removes some capabilities that could be dangerous to
 + * host system, since they are not currently containerized
 + */
 +static int lxcContainerDropCapabilities(void)
  {
 -#ifdef PR_CAPBSET_DROP
 -int i;
 -const struct {
 -int id;
 -const char *name;
 -} caps[] = {
 -#define ID_STRING(name) name, #name
 -{ ID_STRING(CAP_SYS_BOOT) },
 -};
 +#if HAVE_CAPNG
 +int ret;
 
 -for (i = 0 ; i  ARRAY_CARDINALITY(caps) ; i++) {
 -if (prctl(PR_CAPBSET_DROP, caps[i].id, 0, 0, 0)) {
 -lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
 - _(failed to drop %s), caps[i].name);
 -return -1;
 -}
 +capng_get_caps_process();
 +
 +if ((ret = capng_updatev(CAPNG_DROP,
 + CAPNG_EFFECTIVE | CAPNG_PERMITTED |
 + CAPNG_INHERITABLE | CAPNG_BOUNDING_SET,
 + CAP_SYS_BOOT, /* No use of reboot */
 + CAP_SYS_MODULE, /* No kernel module loading */
 + CAP_SYS_TIME, /* No changing the clock */
 + CAP_AUDIT_CONTROL, /* No messing with auditing 
 */
 + CAP_AUDIT_WRITE, /* No messing with auditing */
 + CAP_MAC_ADMIN, /* No messing with LSM */
 + CAP_MAC_OVERRIDE, /* No messing with LSM */
 + -1 /* sentinal */))  0) {
 +lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
 + _(failed to remove capabilities %d), ret);
 +return -1;
  }
 -#else /* ! PR_CAPBSET_DROP */
 -VIR_WARN0(_(failed to drop capabilities PR_CAPBSET_DROP undefined));
 +
 +if ((ret = capng_apply(CAPNG_SELECT_BOTH))  0) {
 +lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
 + _(failed to apply capabilities: %d), ret);
 +return -1;
 +}

The only problem offhand with this idiom is that you need CAP_SETPCAP to
set securebits and drop caps from bounding set, but I think a lot of
applications could stand to drop CAP_SETPCAP otherwise.  So I don't know
if it would help to do the capng_lock() before capng_apply().

(To be clear, not bc you need to

Re: [libvirt] PATCH: Remove all getuid==0 checks from code

2009-06-02 Thread Serge E. Hallyn

Quoting Daniel P. Berrange (berra...@redhat.com):
 This patch is preparing the way for future work on allowing the libvirtd
 daemon to run as a less-privileged user ID. The idea is that we will 
 switch from 'root' to 'libvirtd', but use Linux capabilties to keep the
 handful of higher privileges we need for our work. Thus any code which
 does a check of 'getuid() == 0' is guarenteed to break [1].
 
 The way this patch approaches this problem, is to change the driver
 initialization function virStateInitialize() to have it be passed in a
 'int privileged' flag from the libvirtd daemon. Each driver is updated
 to record this flag, and use it for checks where needed. The only real
 exception is the Xen driver, where we simply check access(2) against
 the file we need to open.

Hi Daniel,

just a few questions:

...

 diff -r 5e3b5d1f91c2 qemud/qemud.c
...
 @@ -2871,7 +2870,7 @@ int main(int argc, char **argv) {
  sigaction(SIGPIPE, sig_action, NULL);
 
  /* Ensure the rundir exists (on tmpfs on some systems) */
 -if (geteuid () == 0) {
 +if (getuid() == 0) {

Why this change?

...

 diff -r 5e3b5d1f91c2 src/qemu_driver.c
 --- a/src/qemu_driver.c   Thu May 21 16:21:20 2009 +0100
 +++ b/src/qemu_driver.c   Thu May 21 16:27:16 2009 +0100
 @@ -130,24 +130,26 @@ static struct qemud_driver *qemu_driver 
 
 
  static int
 -qemudLogFD(virConnectPtr conn, const char* logDir, const char* name)
 +qemudLogFD(virConnectPtr conn, struct qemud_driver *driver, const char* name)
  {
  char logfile[PATH_MAX];
  mode_t logmode;
 -uid_t uid = geteuid();
  int ret, fd = -1;
 
 -if ((ret = snprintf(logfile, sizeof(logfile), %s/%s.log, logDir, name))
 +if ((ret = snprintf(logfile, sizeof(logfile), %s/%s.log,
 +driver-logDir, name))
   0 || ret = sizeof(logfile)) {
  virReportOOMError(conn);
  return -1;
  }
 
  logmode = O_CREAT | O_WRONLY;
 -if (uid != 0)
 +/* Only logrotate files in /var/log, so only append if running 
 privileged */
 +if (driver-privileged)
 +logmode |= O_APPEND;
 +else
  logmode |= O_TRUNC;
 -else
 -logmode |= O_APPEND;

Hmm, so if I run as unpriv user my logfiles will always be truncated?

thanks,
-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] Fix a compilation problem with LXC drop capabilities

2009-05-31 Thread Serge E. Hallyn

Quoting Daniel Veillard (veill...@redhat.com):
 On Fri, May 29, 2009 at 04:42:54PM -0500, Serge E. Hallyn wrote:
  Quoting Ryota Ozaki (ozaki.ry...@gmail.com):
   On Fri, May 29, 2009 at 9:20 PM, Daniel Veillard veill...@redhat.com 
   wrote:
 The lxcContainerDropCapabilities() function requires PR_CAPBSET_DROP
to be defined in order to compile, but it may not be defined in older
kernels. So I made the compilation of the core of the function
conditional, raise an error but still return 0 to not make the
container initialization fail. But I'm unsure, should we just fail and
return -1 if we can't drop capabilities instead ?
   
   I think it depends on applications. AFAIK, libvirt intends to support
   two types of applications; application workload isolation and
   virtual private servers. In the latter case, we MUST drop the capability
   and if it fails we have to fail booting a container as well. OTOH, in
   the former case, we might not need to fail booting.
   
   Nonetheless, I agree with the patch because old kernels that don't
   support PR_CAPBSET_DROP (they would be 2.6.24 or earlier) don't
   have enough facilities to support VPSs (e.g., they lacks sysfs, devpts, 
   etc.).
   Therefore, with the old kernels we don't need to care much about the
   dropping-failed-but-booting-success case.
  
  Hmm, yeah but note that often userspace is out of date with respect to
  recent new kernel-related defines.  I do a lot of testing on a rhel
  5.3 partition with spanking-new kernels, so rare is the time that I
  don't have to do
  
  #ifndef PR_CAPBSET_DROP
  #define PR_CAPBSET_DROP 24
  #endif
  
  and same for clone flags (CLONE_NEWIPC), securebits, capabilities,
  etc.
  
  So if the prctl(PR_CAPBSET_DROP) returns -ENOSYS then absolutely I
  agree, but the patch just does
  
  #ifdef PR_CAPBSET_DROP
  
  which seems wrong.
 
   Well, if you have a better patch to suggest, fire :-)

No, I'm not sure how it should be handled.  Some ugly autoconf thing?

Maybe PR_CAPBSET_DROP should get hand-defined, and the code at runtime
handles its failure?  I'll take a look next week.  Of course in the
meantime this patch is better than nothing  :)

thanks,
-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] Fix a compilation problem with LXC drop capabilities

2009-05-29 Thread Serge E. Hallyn

Quoting Ryota Ozaki (ozaki.ry...@gmail.com):
 On Fri, May 29, 2009 at 9:20 PM, Daniel Veillard veill...@redhat.com wrote:
   The lxcContainerDropCapabilities() function requires PR_CAPBSET_DROP
  to be defined in order to compile, but it may not be defined in older
  kernels. So I made the compilation of the core of the function
  conditional, raise an error but still return 0 to not make the
  container initialization fail. But I'm unsure, should we just fail and
  return -1 if we can't drop capabilities instead ?
 
 I think it depends on applications. AFAIK, libvirt intends to support
 two types of applications; application workload isolation and
 virtual private servers. In the latter case, we MUST drop the capability
 and if it fails we have to fail booting a container as well. OTOH, in
 the former case, we might not need to fail booting.
 
 Nonetheless, I agree with the patch because old kernels that don't
 support PR_CAPBSET_DROP (they would be 2.6.24 or earlier) don't
 have enough facilities to support VPSs (e.g., they lacks sysfs, devpts, etc.).
 Therefore, with the old kernels we don't need to care much about the
 dropping-failed-but-booting-success case.

Hmm, yeah but note that often userspace is out of date with respect to
recent new kernel-related defines.  I do a lot of testing on a rhel
5.3 partition with spanking-new kernels, so rare is the time that I
don't have to do

#ifndef PR_CAPBSET_DROP
#define PR_CAPBSET_DROP 24
#endif

and same for clone flags (CLONE_NEWIPC), securebits, capabilities,
etc.

So if the prctl(PR_CAPBSET_DROP) returns -ENOSYS then absolutely I
agree, but the patch just does

#ifdef PR_CAPBSET_DROP

which seems wrong.

-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC][PATCH] lxc: fix for ns cgroups subsystem

2009-05-16 Thread Serge E. Hallyn

Quoting Ryota Ozaki (ozaki.ry...@gmail.com):
 I've updated the patch. The change includes support for multiple mount
 points of cgroups that I didn't cope with in the previous patch.
 
 Through the work, I found a bit messy problem. Current lxc controller writes
 pid in a 'tasks' file multiple times if one mount point has multiple 
 subsystems.
 It is bad because the first write changes the cgroups path of a controller, 
 and
 then the second write points a missing file like
 $CGROUPS_MOUNTPOINT/path_to_domain/path_to_domain/tasks where
 the correct file is $CGROUPS_MOUNTPOINT/path_to_domain/tasks.
 I did workaround this problem with a tricky way by truncating the
 duplicated path.
 We probably need a more feasible solution.

Seems like the loop in virCgroupAddTask() should just keep track
of which hierarchies it has already written to.  Just a small
temporary hash table or for that matter a small array that you
walk linearly...

-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC][PATCH] lxc: fix for ns cgroups subsystem

2009-05-08 Thread Serge E. Hallyn

Quoting Ryota Ozaki (ozaki.ry...@gmail.com):
 Hi Serge,
 
 On Fri, May 8, 2009 at 11:48 AM, Serge E. Hallyn se...@us.ibm.com wrote:
  IIUC, the real problem is that src/cgroup.c assumes that the
  cgroup name should be $CGROUP_MOUNTPOINT/groupname.  But of
  course if the ns cgroup is enabled, then the unshare(CLONE_NEWNS)
  to create a new namespace in which to mount the new devpts
  locks the driver under $CGROUP_MOUNTPOINT/pid_of_driver/
  or somesuch.
 
  If this fixes the problem I have no objections, but it seems
  more fragile than perhaps trying to teach src/cgroup.c to
  consider it's current cgroup as a starting point.
 
 hmm, I don't know why the assumption is bad and how the approach
 you are suggesting helps the ns problem.

To be clear, the asssumption is that the driver starts in the
root cgroup, i.e. it's pid is listed in $CGROUP_MOUNTPOINT/tasks.
And that it can create $CGROUP_MOUNTPOINT/groupname and move
itself into $CGROUP_MOUNTPOINT/groupname/tasks.

So, the assumption is bad because when the driver does a
unshare(CLONE_NEWNS), it gets moved into $CGROUP_MOUNTPOINT/X,
and after that can only move itself into
$CGROUP_MOUNTPOINT/X/groupname.

Even with your patch, it's possible for the lxc driver to have
been started under say $CGROUP_MOUNTPOINT/libvir or
$CGROUP_MOUNTPOINT/username through libcgroup/PAM for instance,
in which case your patch would be insufficient.

thanks,
-serge

PS
The point of the ns cgroup is to prevent even privileged tasks in a
resource group from escaping that resource group.  FWIW this can
currently also be done using selinux/smack, and eventually should
be accomplished using user namespaces.  At that point we should
seriously consider removing the movement restriction.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC][PATCH] lxc: fix for ns cgroups subsystem

2009-05-08 Thread Serge E. Hallyn

Quoting Daniel P. Berrange (berra...@redhat.com):
 On Fri, May 08, 2009 at 08:34:12AM -0500, Serge E. Hallyn wrote:
  Quoting Ryota Ozaki (ozaki.ry...@gmail.com):
   Hi Serge,
   
   On Fri, May 8, 2009 at 11:48 AM, Serge E. Hallyn se...@us.ibm.com wrote:
IIUC, the real problem is that src/cgroup.c assumes that the
cgroup name should be $CGROUP_MOUNTPOINT/groupname.  But of
course if the ns cgroup is enabled, then the unshare(CLONE_NEWNS)
to create a new namespace in which to mount the new devpts
locks the driver under $CGROUP_MOUNTPOINT/pid_of_driver/
or somesuch.
   
If this fixes the problem I have no objections, but it seems
more fragile than perhaps trying to teach src/cgroup.c to
consider it's current cgroup as a starting point.
   
   hmm, I don't know why the assumption is bad and how the approach
   you are suggesting helps the ns problem.
  
  To be clear, the asssumption is that the driver starts in the
  root cgroup, i.e. it's pid is listed in $CGROUP_MOUNTPOINT/tasks.
  And that it can create $CGROUP_MOUNTPOINT/groupname and move
  itself into $CGROUP_MOUNTPOINT/groupname/tasks.
  
  So, the assumption is bad because when the driver does a
  unshare(CLONE_NEWNS), it gets moved into $CGROUP_MOUNTPOINT/X,
  and after that can only move itself into
  $CGROUP_MOUNTPOINT/X/groupname.
  
  Even with your patch, it's possible for the lxc driver to have
  been started under say $CGROUP_MOUNTPOINT/libvir or
  $CGROUP_MOUNTPOINT/username through libcgroup/PAM for instance,
  in which case your patch would be insufficient.
 
 Indeed, we can't assume we're in the root group at any time. Our
 general plan is that libvirt itself uses 3 levels of cgroup
 starting at the cgroup that libvirtd was placed in by the admin
 of the host, then a per-driver group, then per-guest groups
 
   $LIBVIRTD_ROOT_CGROUP
  |
  +- lxc
  ||
  |+- LXC-GUEST-1
  |+- LXC-GUEST-2
  |+- LXC-GUEST-3
  |+- ...
  |
  +- qemu
   |
   +- QEMU-GUEST-1
   +- QEMU-GUEST-2
   +- QEMU-GUEST-3
   +- ...
 
 $LIBVIRTD_ROOT_CGROUP, may be the actaul root mount point for
 the cgroup controller in question, or it may be a sub-directory
 that the admin chose to put it in. Also, if running libvirtd
 as a normal user, the admin may have created per-user account
 cgroups and so libvirtd would end up in there. So we have to
 be prepared for anything wrt initial libvirtd cgroup root.
 
 NB The code for putting QEMU in a cgroup isn't merged yet.

That sounds good.  I just don't think the code currently does
it.  To do the right thing, IIUC, virCgroupPathOfGroup() should
parse the /proc/pid/cgroup contents of the 'controller' process,
and insert that between the virCgroupGetMount(controller)
result and the group name.

Or something...

(right?)

thanks,
-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC][PATCH] lxc: drop CAP_SYS_BOOT capability to preventrebooting from inside containers

2009-05-07 Thread Serge E. Hallyn

Quoting Ryota Ozaki (ozaki.ry...@gmail.com):
 Hi,
 
 Current lxc driver unexpectedly allows users inside containers to reboot
 host physical machine. This patch prevents this by dropping CAP_SYS_BOOT
 capability in the bounding set of the init processes in every containers.
 
 Note that the patch intends to make it easy to add further capabilities
 to drop if needed, although I'm not sure which capabilities should be
 dropped. (We might need to drop CAP_SETFCAP as well to be strict...)
 
 Thanks,
   ozaki-r
 
 Signed-off-by: Ryota Ozaki ozaki.ry...@gmail.com
 
 From 0e7a7622bc6411bbe76c05c63c6e6e61d379d97b Mon Sep 17 00:00:00 2001
 From: Ryota Ozaki ozaki.ry...@gmail.com
 Date: Fri, 8 May 2009 04:29:24 +0900
 Subject: [PATCH] lxc: drop CAP_SYS_BOOT capability to prevent
 rebooting from inside containers
 
 Current lxc driver unexpectedly allows users inside containers to reboot
 host physical machine. This patch prevents this by dropping CAP_SYS_BOOT
 capability in the bounding set of the init processes in every containers.
 ---
  src/lxc_container.c |   30 ++
  1 files changed, 30 insertions(+), 0 deletions(-)
 
 diff --git a/src/lxc_container.c b/src/lxc_container.c
 index 3946b84..37ab216 100644
 --- a/src/lxc_container.c
 +++ b/src/lxc_container.c
 @@ -32,6 +32,8 @@
  #include sys/ioctl.h
  #include sys/mount.h
  #include sys/wait.h
 +#include sys/prctl.h
 +#include sys/capability.h
  #include unistd.h
  #include mntent.h
 
 @@ -639,6 +641,30 @@ static int lxcContainerSetupMounts(virDomainDefPtr vmDef,
  return lxcContainerSetupExtraMounts(vmDef);
  }
 
 +
 +static int lxcContainerDropCapabilities( virDomainDefPtr vmDef )
 +{
 +int i;
 +const struct {
 +int id;
 +const char *name;
 +} caps[] = {
 +#define ID_STRING(name) name, #name
 +{ ID_STRING(CAP_SYS_BOOT) },
 +};
 +
 +for (i = 0 ; i  ARRAY_CARDINALITY(caps) ; i++) {
 +if (prctl(PR_CAPBSET_DROP, caps[i].id, 0, 0, 0)) {
 +lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
 + %s, _(failed to drop %s), caps[i].name);
 +return -1;

Ideally you should also drop it from pI.

 +}
 +}
 +
 +return 0;
 +}
 +
 +
  /**
   * lxcChild:
   * @argv: Pointer to container arguments
 @@ -705,6 +731,10 @@ static int lxcContainerChild( void *data )
  if (lxcContainerEnableInterfaces(argv-nveths, argv-veths)  0)
  return -1;
 
 +/* drop a set of root capabilities */
 +if (lxcContainerDropCapabilities(vmDef)  0)
 +return -1;
 +
  /* this function will only return if an error occured */
  return lxcContainerExecInit(vmDef);
  }
 -- 
 1.6.0.6
 
 --
 Libvir-list mailing list
 Libvir-list@redhat.com
 https://www.redhat.com/mailman/listinfo/libvir-list

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC][PATCH] lxc: drop CAP_SYS_BOOT capability to preventrebooting from inside containers

2009-05-07 Thread Serge E. Hallyn

Quoting Ryota Ozaki (ozaki.ry...@gmail.com):
 Hi Serge,
 
 On Fri, May 8, 2009 at 9:12 AM, Serge E. Hallyn se...@us.ibm.com wrote:
  Quoting Ryota Ozaki (ozaki.ry...@gmail.com):
  Hi,

...

  +    for (i = 0 ; i  ARRAY_CARDINALITY(caps) ; i++) {
  +        if (prctl(PR_CAPBSET_DROP, caps[i].id, 0, 0, 0)) {
  +            lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
  +                     %s, _(failed to drop %s), caps[i].name);
  +            return -1;
 
  Ideally you should also drop it from pI.
 
 If not drop it, a user in a container could set CAP_SYS_BOOT fI bit of
 /bin/reboot on and then the user could gain CAP_SYS_BOOT back through
 the fI. Is this understanding right?

Yup.

Of course most tasks run with pI empty, so it seems unlikely that
it would be a problem, but unless the libcap dependecy becomes a
problem, it seems worth making sure that doesn't happen.

thanks,
-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] change permissions of directories in cgroups

2009-05-07 Thread Serge E. Hallyn

Quoting Ryota Ozaki (ozaki.ry...@gmail.com):
 Hi,
 
 This patch creates a directory in cgroups with an ordinary
 permission 0755 (rwxr-xr-x) instead of 0655 (rw-r-xr-x).
 
 I guess 0655 is not expected and just a mistake, or is
 there a special reason?

Haha, that sure seems like a mistake.  Good catch.

 Thanks,
   ozaki-r
 
 Signed-off-by: Ryota Ozaki ozaki.ry...@gmail.com

Acked-by: Serge Hallyn se...@us.ibm.com

thanks,
-serge

 From 4829855bc0baa3c75806f106bc0e54eb1da75eea Mon Sep 17 00:00:00 2001
 From: Ryota Ozaki ozaki.ry...@gmail.com
 Date: Fri, 8 May 2009 03:23:50 +0900
 Subject: [PATCH] change permissions of directories in cgroups
 
 This patch creates a directory in cgroups with an ordinary
 permission 0755 (rwxr-xr-x) instead of 0655 (rw-r-xr-x).
 ---
  src/cgroup.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/src/cgroup.c b/src/cgroup.c
 index d1d44a2..50517e2 100644
 --- a/src/cgroup.c
 +++ b/src/cgroup.c
 @@ -436,7 +436,7 @@ static int virCgroupMakeGroup(const char *name)
  virCgroupFree(root);
 
  if (access(path, F_OK) != 0) {
 -if (mkdir(path, 0655)  0) {
 +if (mkdir(path, 0755)  0) {
  rc = -errno;
  VIR_FREE(path);
  break;
 -- 
 1.6.0.6
 
 --
 Libvir-list mailing list
 Libvir-list@redhat.com
 https://www.redhat.com/mailman/listinfo/libvir-list

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC][PATCH] lxc: fix for ns cgroups subsystem

2009-05-07 Thread Serge E. Hallyn

IIUC, the real problem is that src/cgroup.c assumes that the
cgroup name should be $CGROUP_MOUNTPOINT/groupname.  But of
course if the ns cgroup is enabled, then the unshare(CLONE_NEWNS)
to create a new namespace in which to mount the new devpts
locks the driver under $CGROUP_MOUNTPOINT/pid_of_driver/
or somesuch.

If this fixes the problem I have no objections, but it seems
more fragile than perhaps trying to teach src/cgroup.c to
consider it's current cgroup as a starting point.

-serge

Quoting Ryota Ozaki (ozaki.ry...@gmail.com):
 From 46531182708dc3eb132b14ce2f23fbc639430176 Mon Sep 17 00:00:00 2001
 From: Ryota Ozaki ozaki.ry...@gmail.com
 Date: Fri, 8 May 2009 05:31:03 +0900
 Subject: [PATCH] lxc: fix for ns cgroups subsystem
 
 lxc does not work if ns cgroups subsystem is enabled because
 of two factors; one is that ns has a special rule to create
 a group[*] unlike other subsystems and the other is lxc
 controller creates a new namespace for /dev/pts prior to
 create a new group for a domain. Unfortunately the new
 namespace breaks the rule of ns and that prevents a lxc
 controller from creating a new group.
 
 This patch addresses the problem by creating a new group
 before creating a new namespace (i.e. call unshare syscall).
 
 Note that this patch is only for the case ns is enabled and
 current code works well if it disabled. However, I think
 this patch makes sense because not just a few users know
 much about cgroups and likely to enable all of subsystems
 without notions (i.e. mount cgroups without any options).
 
 [*] 
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=kernel/ns_cgroup.c;hb=HEAD
 ---
  src/lxc_controller.c |6 +++---
  1 files changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/src/lxc_controller.c b/src/lxc_controller.c
 index e0fb05d..1231817 100644
 --- a/src/lxc_controller.c
 +++ b/src/lxc_controller.c
 @@ -458,6 +458,9 @@ lxcControllerRun(virDomainDefPtr def,
  goto cleanup;
  }
 
 +if (lxcSetContainerResources(def)  0)
 +goto cleanup;
 +
  root = virDomainGetRootFilesystem(def);
 
  /*
 @@ -543,9 +546,6 @@ lxcControllerRun(virDomainDefPtr def,
  }
 
 
 -if (lxcSetContainerResources(def)  0)
 -goto cleanup;
 -
  if ((container = lxcContainerStart(def,
 nveths,
 veths,
 -- 
 1.6.0.6
 
 --
 Libvir-list mailing list
 Libvir-list@redhat.com
 https://www.redhat.com/mailman/listinfo/libvir-list

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] PATCH: Allow LXC to use private /dev/pts instance

2009-04-20 Thread Serge E. Hallyn

Quoting Daniel P. Berrange (berra...@redhat.com):
 This patch attached now just makes it MS_SLAVE.  There's no need for the
 extra SHARED flag, since the only process libvirt_lxc spawns is the 'init'
 process inside the container and that immediately makes its own root
 private.

Thanks, this looks good.

Acked-by: Serge Hallyn se...@us.ibm.com

-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH 1/1] lxc: only do CLONE_NEWUSER when kernel supports it

2009-04-17 Thread Serge E. Hallyn

I was trying to get the lxc driver to work on ubuntu jaunty.  This
patch gets me further than I was getting before.  Like I say below,
it's probably not the right way though.

-serge

From 2513f8a7e0654e84570fe0ef2204dabe276b9e4e Mon Sep 17 00:00:00 2001
From: root r...@jaunty.(none)
Date: Fri, 17 Apr 2009 16:41:01 -0500
Subject: [PATCH 1/1] lxc: only do CLONE_NEWUSER when kernel supports it

The ubuntu jaunty kernel is not compiled with USER_NS.  Since
libvirt-lxc always does clone(CLONE_NEWUSER) it gets -EINVAL
and mysteriously claims to be unable to contact hypervisor.

This patch isn't the right thing to do, but I'm not sure what
is.  User namespaces do (since recently) isolate the in-kernel
keyring.  So the right thing might be to add a flag to the
xml definition file to specify whether to use a user namespace.
This patch doesn't do that, rather it always does CLONE_NEWUSER
if the kernel supports it, and never if not.

Signed-off-by: Serge Hallyn se...@us.ibm.com
---
 src/lxc_container.c |   15 +--
 src/lxc_container.h |1 +
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/lxc_container.c b/src/lxc_container.c
index 67c66bd..8069af7 100644
--- a/src/lxc_container.c
+++ b/src/lxc_container.c
@@ -666,6 +666,11 @@ static int lxcContainerChild( void *data )
 return lxcContainerExecInit(vmDef);
 }
 
+int userns_supported(void)
+{
+   return lxcContainerAvailable(LXC_CONTAINER_FEATURE_USER) == 0;
+}
+
 /**
  * lxcContainerStart:
  * @driver: pointer to driver structure
@@ -694,7 +699,10 @@ int lxcContainerStart(virDomainDefPtr def,
 }
 stacktop = stack + stacksize;
 
-flags = 
CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWUSER|CLONE_NEWIPC|SIGCHLD;
+flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|SIGCHLD;
+
+if (userns_supported())
+flags |= CLONE_NEWUSER;
 
 if (def-nets != NULL)
 flags |= CLONE_NEWNET;
@@ -719,13 +727,16 @@ static int lxcContainerDummyChild(void *argv 
ATTRIBUTE_UNUSED)
 
 int lxcContainerAvailable(int features)
 {
-int flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWUSER|
+int flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|
 CLONE_NEWIPC|SIGCHLD;
 int cpid;
 char *childStack;
 char *stack;
 int childStatus;
 
+if (features  LXC_CONTAINER_FEATURE_USER)
+flags |= CLONE_NEWUSER;
+
 if (features  LXC_CONTAINER_FEATURE_NET)
 flags |= CLONE_NEWNET;
 
diff --git a/src/lxc_container.h b/src/lxc_container.h
index 5d037b0..b99e83e 100644
--- a/src/lxc_container.h
+++ b/src/lxc_container.h
@@ -28,6 +28,7 @@
 
 enum {
 LXC_CONTAINER_FEATURE_NET = (1  0),
+LXC_CONTAINER_FEATURE_USER = (1  1),
 };
 
 #define LXC_DEV_MAJ_MEMORY  1
-- 
1.6.0.4

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] PATCH: Allow LXC to use private /dev/pts instance

2009-04-15 Thread Serge E. Hallyn

Quoting Daniel P. Berrange (berra...@redhat.com):
 This change seemed to fix that problem with no ill-effects.
 
 -if (chroot(oldroot)  0) {
 -virReportSystemError(NULL, errno, %s,
 - _(failed to chroot into tmpfs));
 -goto err;
 -}
 -
 -if (chdir(/new)  0) {
 -virReportSystemError(NULL, errno, %s,
 - _(failed to chdir into /new on tmpfs));
 +if (chdir(newroot)  0) {
 +virReportSystemError(NULL, errno,
 + _(failed to chroot into %s), newroot);

Yes, good.  We can probably pare it down later, but I'll look at that
once other stuff settles down.

 So I'm removing this chunk:
 
  if (chdir(/)  0)
  goto err;
 
 -if (umount2(.oldroot, MNT_DETACH)  0) {
 -virReportSystemError(NULL, errno, %s,
 - _(failed to lazily unmount old root));
 -goto err;
 -}
 -

Yeah as I added that I actually was wondering whether that would happen
- whether libvirt would try to make later bind mounts out of the old
fs which I'd umonted.  But I couldn't find where else it was umounted.
Glad you solved it :)

...
 Index: src/lxc_container.c
 ===
...

This all looks good, though I haven't tested it yet.

 +/*
 + * If doing a chroot style setup, we need to prepare
 + * a private /dev/pts for the child now, which they
 + * will later move into position.
 + *
 + * This is complex because 'virsh console' needs to
 + * use /dev/pts from the host OS, and the guest OS
 + * needs to use /dev/pts from the guest.
 + *
 + * This means that we (libvirt_lxc) need to see and
 + * use both /dev/pts instances. We're running in the
 + * host OS context though and don't want to expose
 + * the guest OS /dev/pts there.
 + *
 + * Thus we call unshare(CLONE_NS) so that we can see
 + * the guest's new /dev/pts, without it becoming
 + * visible to the host OS.
 + */

Calling unshare(CLONE_NEWNS) will not prevent the host OS from
seeing the new /dev/pts if / was MS_SHARED.  That isn't taken
care of anywhere else for this process's namespace, is it?

I assume the reason you want the new devpts not visible in the
host OS is so that it will be auto-umounted when the container is
released?

Thanks for doing this, the patch looks really good (minus MS_SHARED
bit).

-serge

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH] lxc: fix veth off by 1 error

2009-04-07 Thread Serge E. Hallyn

When not specifying a target for veth device, veth.c:getFreeVethName()
is supposed to scan for unused veth devices in /sys/class/net.
However, when it finds one, it bumps the index by one before
returning it.

So, if you have one container running, veth0 is passed into
the container, veth1 is taken and still sitting in /sys/class/net.
When you now start a second container, getFreeVethName() finds
veth0 is unused, but returns 1.  Now container creation dies
becuase /sys/class/net/veth1 exists.

Signed-off-by: Serge Hallyn se...@us.ibm.com
---
 src/veth.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/veth.c b/src/veth.c
index 93173e4..90c1dcb 100644
--- a/src/veth.c
+++ b/src/veth.c
@@ -35,12 +35,12 @@
 static int getFreeVethName(char *veth, int maxLen, int startDev)
 {
 int rc = -1;
-int devNum = startDev;
+int devNum = startDev-1;
 char path[PATH_MAX];
 
 do {
-snprintf(path, PATH_MAX, /sys/class/net/veth%d/, devNum);
 ++devNum;
+snprintf(path, PATH_MAX, /sys/class/net/veth%d/, devNum);
 } while (virFileExists(path));
 
 snprintf(veth, maxLen, veth%d, devNum);
@@ -97,6 +97,7 @@ int vethCreate(char* veth1, int veth1MaxLen,
 
 while ((1  strlen(veth2)) || STREQ(veth1, veth2)) {
 vethDev = getFreeVethName(veth2, veth2MaxLen, vethDev);
+++vethDev;
 DEBUG(assigned veth2: %s, veth2);
 }
 
-- 
1.6.2

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH] lxc: make the pivot_root more robust.

2009-04-06 Thread Serge E. Hallyn

libvirt/lxc is broken on F11.  The pivot_root() call returns
-EINVAL.  The below is one way we can fix it.  I'm also sending
another patch which takes the simpler approach of using chroot.
However, chroot is trivially escapable (see for instance
http://www.linuxsecurity.com/content/view/117632/49/).  I don't
know whether the typical libvirt user would care.  If so, then
the below patch is probably the way to go.

From 26cac415771a2d9712af0e1ce60a0bcb41b44665 Mon Sep 17 00:00:00 2001
From: root r...@localhost.localdomain
Date: Sat, 4 Apr 2009 22:49:20 -0400
Subject: [PATCH 1/1] lxc: make the pivot_root more robust.

The libvirt lxc driver uses pivot_root instead of chroot, because
the latter is trivially escapable.  However, the pivot_root(2)
system call can fail for several subtle reasons.  Depending upon
your distro init sequence you may get lucky and have the old
recipe work, but on a Fedora 11 standard install, for instance,
it will fail.

Do a few more steps to make pivot_root hopefully always
succeed.  We mark / as MS_PRIVATE, create an empty tmpfs,
and bind-mount the container root onto /new in that fs.
In this way, we ensure two reasons for pivot_root to fail -
namely old_root-parent being MS_SHARED and old_root and
new_root being on the same fs - won't happen.

Signed-off-by: Serge Hallyn se...@us.ibm.com
---
 src/lxc_container.c |  108 ---
 1 files changed, 85 insertions(+), 23 deletions(-)

diff --git a/src/lxc_container.c b/src/lxc_container.c
index 3f17b8d..d3959f6 100644
--- a/src/lxc_container.c
+++ b/src/lxc_container.c
@@ -264,50 +264,113 @@ static int lxcContainerChildMountSort(const void *a, 
const void *b)
   return strcmp(*sb, *sa);
 }
 
+#ifndef MS_REC
+#define MS_REC  16384
+#endif
+
+#ifndef MNT_DETACH
+#define MNT_DETACH  0x0002
+#endif
+
+#ifndef MS_PRIVATE
+#define MS_PRIVATE  118
+#endif
+
 static int lxcContainerPivotRoot(virDomainFSDefPtr root)
 {
 int rc;
-char *oldroot;
+char *oldroot = NULL, *newroot = NULL;
 
-/* First step is to ensure the new root itself is
-   a mount point */
-if (mount(root-src, root-src, NULL, MS_BIND, NULL)  0) {
-virReportSystemError(NULL, errno,
- _(failed to bind new root %s),
- root-src);
-return -1;
+/* root-parent must be private, so make / private. */
+if (mount(, /, NULL, MS_PRIVATE|MS_REC, NULL)  0) {
+virReportSystemError(NULL, errno, %s,
+ _(failed to make root private));
+goto err;
 }
 
 if (virAsprintf(oldroot, %s/.oldroot, root-src)  0) {
 virReportOOMError(NULL);
-return -1;
+goto err;
 }
 
 if ((rc = virFileMakePath(oldroot))  0) {
 virReportSystemError(NULL, rc,
  _(failed to create %s),
  oldroot);
-VIR_FREE(oldroot);
-return -1;
+goto err;
+}
+
+/* Create a tmpfs root since old and new roots must be
+ * on separate filesystems */
+if (mount(, oldroot, tmpfs, 0, NULL)  0) {
+virReportSystemError(NULL, errno,
+ _(failed to mount empty tmpfs at %s),
+ oldroot);
+goto err;
+}
+   
+/* Create a directory called 'new' in tmpfs */
+if (virAsprintf(newroot, %s/new, oldroot)  0) {
+virReportOOMError(NULL);
+goto err;
+}
+
+if ((rc = virFileMakePath(newroot))  0) {
+virReportSystemError(NULL, rc,
+ _(failed to create %s),
+ newroot);
+goto err;
+}
+
+/* ... and mount our root onto it */
+if (mount(root-src, newroot, NULL, MS_BIND|MS_REC, NULL)  0) {
+virReportSystemError(NULL, errno,
+ _(failed to bind new root %s into tmpfs),
+ root-src);
+goto err;
+}
+
+/* Now we chroot into the tmpfs, then pivot into the
+ * root-src bind-mounted onto '/new' */
+if (chroot(oldroot)  0) {
+virReportSystemError(NULL, errno, %s,
+ _(failed to chroot into tmpfs));
+goto err;
+}
+
+if (chdir(/new)  0) {
+virReportSystemError(NULL, errno, %s,
+ _(failed to chdir into /new on tmpfs));
+goto err;
 }
 
 /* The old root directory will live at /.oldroot after
  * this and will soon be unmounted completely */
-if (pivot_root(root-src, oldroot)  0) {
-virReportSystemError(NULL, errno,
- _(failed to pivot root %s to %s),
- oldroot, root-src);
-VIR_FREE(oldroot);
-return -1;
+if (pivot_root(., .oldroot)  0) {
+virReportSystemError(NULL, errno, %s,
+ _(failed to pivot root));
+goto err;
 }

[libvirt] [PATCH] lxc: use chroot instead of pivot_root

2009-04-06 Thread Serge E. Hallyn

This is an alternative to the pivot_root patch which I just
sent.  It has the advantage of being much simpler.  It also
won't have a problem with the container's / being a read-only
mount.  It has the disadvantage, of course, of being escapable.

From a91bca7f60f27e8fbdb4e3bacf3232a6cbb630d3 Mon Sep 17 00:00:00 2001
From: root r...@localhost.localdomain
Date: Fri, 3 Apr 2009 23:35:24 -0400
Subject: [PATCH 1/1] lxc: use chroot instead of pivot_root

pivot_root is too fragile.  For instance, if the container's
/ is read-only, we can't create .oldroot.  Maybe we're happy
telling users that if they can't create .oldroot at container
create time, then they must make sure it exists ahead of time.

Or, maybe we're ok with chroot being escapable, and should
just go this simple route.

Signed-off-by: Serge Hallyn se...@us.ibm.com
---
 src/lxc_container.c |   75 +--
 1 files changed, 19 insertions(+), 56 deletions(-)

diff --git a/src/lxc_container.c b/src/lxc_container.c
index 3f17b8d..142ed4d 100644
--- a/src/lxc_container.c
+++ b/src/lxc_container.c
@@ -264,52 +264,6 @@ static int lxcContainerChildMountSort(const void *a, const 
void *b)
   return strcmp(*sb, *sa);
 }
 
-static int lxcContainerPivotRoot(virDomainFSDefPtr root)
-{
-int rc;
-char *oldroot;
-
-/* First step is to ensure the new root itself is
-   a mount point */
-if (mount(root-src, root-src, NULL, MS_BIND, NULL)  0) {
-virReportSystemError(NULL, errno,
- _(failed to bind new root %s),
- root-src);
-return -1;
-}
-
-if (virAsprintf(oldroot, %s/.oldroot, root-src)  0) {
-virReportOOMError(NULL);
-return -1;
-}
-
-if ((rc = virFileMakePath(oldroot))  0) {
-virReportSystemError(NULL, rc,
- _(failed to create %s),
- oldroot);
-VIR_FREE(oldroot);
-return -1;
-}
-
-/* The old root directory will live at /.oldroot after
- * this and will soon be unmounted completely */
-if (pivot_root(root-src, oldroot)  0) {
-virReportSystemError(NULL, errno,
- _(failed to pivot root %s to %s),
- oldroot, root-src);
-VIR_FREE(oldroot);
-return -1;
-}
-VIR_FREE(oldroot);
-
-/* CWD is undefined after pivot_root, so go to / */
-if (chdir(/)  0) {
-return -1;
-}
-
-return 0;
-}
-
 static int lxcContainerPopulateDevices(void)
 {
 int i;
@@ -349,10 +303,9 @@ static int lxcContainerPopulateDevices(void)
  _(cannot create /dev/pts));
 return -1;
 }
-if (mount(/.oldroot/dev/pts, /dev/pts, NULL,
-  MS_MOVE, NULL)  0) {
+if (mount(devpts, /dev/pts, devpts, 0, NULL)  0) {
 virReportSystemError(NULL, errno, %s,
- _(failed to move /dev/pts into container));
+ _(failed to mount /dev/pts in container));
 return -1;
 }
 
@@ -461,15 +414,25 @@ static int lxcContainerUnmountOldFS(void)
 }
 
 
-/* Got a FS mapped to /, we're going the pivot_root
- * approach to do a better-chroot-than-chroot
- * this is based on this thread http://lkml.org/lkml/2008/3/5/29
- */
-static int lxcContainerSetupPivotRoot(virDomainDefPtr vmDef,
+/* Got a FS mapped to /, now chroot.
+ * pivot_root would work, but requires too much hand-holding
+ * (especially, old_root-parent must not be shared) */
+static int lxcContainerSetupChroot(virDomainDefPtr vmDef,
   virDomainFSDefPtr root)
 {
-if (lxcContainerPivotRoot(root)  0)
+if (chdir(root-src)  0) {
+virReportSystemError(NULL, errno,
+ _(failed to chdir to %s),
+ root-src);
+return -1;
+}
+
+if (chroot(root-src)  0) {
+virReportSystemError(NULL, errno,
+ _(failed to chroot to %s),
+ root-src);
 return -1;
+}
 
 if (virFileMakePath(/proc)  0 ||
 mount(none, /proc, proc, 0, NULL)  0) {
@@ -537,7 +500,7 @@ static int lxcContainerSetupMounts(virDomainDefPtr vmDef)
 }
 
 if (root)
-return lxcContainerSetupPivotRoot(vmDef, root);
+return lxcContainerSetupChroot(vmDef, root);
 else
 return lxcContainerSetupExtraMounts(vmDef);
 }
-- 
1.6.2

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

69 matches

Mail list logo