Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-31 Thread Christian Brauner
On Tue, Dec 12, 2017 at 11:00:01PM -0600, Serge Hallyn wrote:
> On Tue, Dec 05, 2017 at 05:20:32PM +0100, Dirk Geschke wrote:
> > Hi Serge,
> > 
> > > > I am a little bit clueless, I have several systems running with
> > > > Debian and unprivileged LXC. But newer systems won't start new
> > > > containers.
> > > > 
> > > > Actually I have a Debian stretch, installed the normal way but
> > > > with lxc-2.0.9 and cgmanager-0.41 installed from sources.
> > > > 
> > > > I can setup cgmanager, can do a cgm movepid and it is no problem
> > > > to download a template. But starting the container does not work,
> > > > it simply hungs at:
> > > > 
> > > >$ lxc-start -n lxc-test -l trace -o wheezy -F
> > > 
> > > I see no bad errors in the log.  When this hangs, can you
> > > from another terminal see whether 'lxc-ls -f' shows it
> > > running, and what 'lxc-attach -n lxc-test' does?
> > 
> > that's the funny part: Nothing. There is not one process from 
> > the subuid range running. It simply hangs before it tries to 
> > start the container at all. And I have no idea, why.
> > But with lxc-2.0.8 it works. 
> > 
> > I just installed and started debian wheezy, upgraded it to jessie
> > and finally to stretch. It works fine.
> > 
> > I now installed lxc-2.0.9 again, tried to start the container again
> > and nothing happens:
> > 
> >$ lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > 
> > That's all. lxc-ls -f and lxc-attach-n lxc-test hangs, too.
> > 
> > I see also three processes of lxc-start:
> > 
> >$ ps waux |grep lxc-start
> >lxc-test 24478  0.0  0.1  51740  4232 pts/0S+   17:16   0:00
> >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> >lxc-test 24487  0.0  0.0  51740   504 pts/0S+   17:16   0:00
> >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> >lxc-test 24492  0.0  0.0  51740   508 pts/0S+   17:16   0:00
> >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> 
> When you gdb-attach to these (which you have to do as root for two
> of them) you find that you're hung on:
> 
> (gdb) where
> #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x7f2a94f68b95 in __GI___pthread_mutex_lock 
> (mutex=mutex@entry=0x7f2a95868a60 )
> at ../nptl/pthread_mutex_lock.c:80
> #2  0x7f2a95638c4d in lock_mutex (l=0x7f2a95868a60 ) at 
> cgroups/cgmanager.c:80
> #3  cgm_lock () at cgroups/cgmanager.c:98
> #4  0x7f2a94a722f5 in __libc_fork () at ../sysdeps/nptl/fork.c:96
> #5  0x7f2a95604ee7 in run_command (buf=buf@entry=0x7fff3b5a20e0 "",
> buf_size=buf_size@entry=4096,
> child_fn=child_fn@entry=0x7f2a95606a30 ,
> args=args@entry=0x7fff3b5a40e0) at utils.c:2262
> #6  0x7f2a9560b01e in lxc_map_ids (idmap=idmap@entry=0x55b48a62c1c0, 
> pid=pid@entry=15389)
> at conf.c:2652
> #7  0x7f2a9560f1e5 in userns_exec_1 (conf=conf@entry=0x55b48a625b90,
> fn=fn@entry=0x7f2a95639a20 , 
> data=data@entry=0x7fff3b5a5210,
> fn_name=fn_name@entry=0x7f2a9563fadc "chown_cgroup_wrapper") at 
> conf.c:3822
> #8  0x7f2a9563a1a9 in chown_cgroup (conf=0x55b48a625b90, 
> cgroup_path=)
> at cgroups/cgmanager.c:500
> #9  cgm_chown (hdata=0x55b48a62b1d0, conf=0x55b48a625b90) at 
> cgroups/cgmanager.c:1555
> #10 0x7f2a955fa397 in lxc_spawn (handler=0x55b48a624e50) at start.c:1363
> 
> and
> 
> #0  0x7f2a94f6f1f0 in __read_nocancel () at 
> ../sysdeps/unix/syscall-template.S:84
> #1  0x7f2a95607872 in run_userns_fn (data=0x7fff3b5a51b0) at conf.c:3570
> #2  0x7f2a94aa2aff in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
> 
> So it seems to be hanging on cgm_lock().
> 
> I'm a bit too tired to think it through clearly enough, but I'm thinking
> this might have to do with the introduction of run_command().  It introduces
> an extra fork() between the clone(CLONE_NEWUSER)'d thread and the task which
> actually does the work.  Perhaps that is messing with our lock state?

I mean, one *could* attribute it to the pthread atfork handler... Let me
take a look at run_command().
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-31 Thread Dirk Geschke
Hi Serge,

just forgot to mention: I'm using sysv-init on the host, not systemd...

Best regards

Dirk

> > > If you build without cgmanager, and your system has the cgroups
> > > individually mounted under /sys/fs/cgroup, then cgfsng will be
> > > automatically used.
> 
> I just tested it by compiling and installing lxcfs. If I add to 
> /etc/pam.d/common-session and /etc/pam.d/common-session-noninteractive 
> this line
> 
>session optional  /usr/local/lib/security/pam_cgfs.so -c 
> freezer,memory,name=systemd
> 
> and mount /sys/fs/cgroup manually (I use the cgroup-mounts script from
> Ubuntu's cgroup-lite-1.9 package) LXC works completely unprivileged.
> There seems to be no need for cgmanager any longer. I just tested LXC
> version 2.0.9, probably 2.1.x will work, too.
> 
> Maybe one should adjust the documentation?
> 
>https://linuxcontainers.org/lxc/getting-started/
>
> Best regards and many thanks for your help!
> 
> Dirk
> 
> -- 
> +--+
> | Dr. Dirk Geschke   / Plankensteinweg 61/ 85435 Erding|
> | Telefon: 08122-559448  / Mobil: 0176-96906350 / Fax: 08122-9818106   |
> | d...@geschke-online.de / d...@lug-erding.de  / kont...@lug-erding.de |
> +--+

-- 
+--+
| Dr. Dirk Geschke   / Plankensteinweg 61/ 85435 Erding|
| Telefon: 08122-559448  / Mobil: 0176-96906350 / Fax: 08122-9818106   |
| d...@geschke-online.de / d...@lug-erding.de  / kont...@lug-erding.de |
+--+
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-31 Thread Dirk Geschke
Hi Serge,

> > If you build without cgmanager, and your system has the cgroups
> > individually mounted under /sys/fs/cgroup, then cgfsng will be
> > automatically used.

I just tested it by compiling and installing lxcfs. If I add to 
/etc/pam.d/common-session and /etc/pam.d/common-session-noninteractive 
this line

   session optional  /usr/local/lib/security/pam_cgfs.so -c 
freezer,memory,name=systemd

and mount /sys/fs/cgroup manually (I use the cgroup-mounts script from
Ubuntu's cgroup-lite-1.9 package) LXC works completely unprivileged.
There seems to be no need for cgmanager any longer. I just tested LXC
version 2.0.9, probably 2.1.x will work, too.

Maybe one should adjust the documentation?

   https://linuxcontainers.org/lxc/getting-started/
   
Best regards and many thanks for your help!

Dirk

-- 
+--+
| Dr. Dirk Geschke   / Plankensteinweg 61/ 85435 Erding|
| Telefon: 08122-559448  / Mobil: 0176-96906350 / Fax: 08122-9818106   |
| d...@geschke-online.de / d...@lug-erding.de  / kont...@lug-erding.de |
+--+
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-17 Thread Dirk Geschke
Hi Serge,

> > no, lxc-2.1.1 shows a similar problem. It hangs, too, but it tries
> > to send a command in one thread and to receive it in another (afair).
> > 
> > But what is cgfsng? How can I use find and use this?
> 
> If you build without cgmanager, and your system has the cgroups
> individually mounted under /sys/fs/cgroup, then cgfsng will be
> automatically used.

hmm, strange. I have build lxc-2.0.9 this way and tried all 
variants. The only way I got it up running was by installing
libpam-cgfs. But this has dependencies to systemd and cgmanager.

Although I still use sysv-init, systemd got installed and starting
the container works. But I have now an cgmanager installed and
runnig, too.

> > I think, this kind of setup is the most secure to deal with LXC,
> > especially if you are not interested in migrating containers
> > between hosts...
> 
> The 'not much used any more' isn't referring to unprivileged
> containers, but to use cgmanager, which is deprecated (until
> we decide we need it again :)

But how do I get it up and running without cgmanager? I think,
I need a process to setup the cgroups accordingly...

The pam modules libpam-cgm and libpam-cgfs require cgmanager
to run, too. And is there a way to avoid using systemd?

Best regards

Dirk

-- 
+--+
| Dr. Dirk Geschke   / Plankensteinweg 61/ 85435 Erding|
| Telefon: 08122-559448  / Mobil: 0176-96906350 / Fax: 08122-9818106   |
| d...@geschke-online.de / d...@lug-erding.de  / kont...@lug-erding.de |
+--+
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-15 Thread Serge E. Hallyn
Quoting Dirk Geschke (d...@lug-erding.de):
> Hi Serge,
> 
> > > just for the record, lxc-2.0.8 is still working this way, but it
> > > stops starting with lxc-2.0.9 and the whole lxc-2.1.x branch.
> > > 
> > > I have no idea, what happened to break it nor do I have any clue
> > > to fix it. But since I like to use unprivileged containers, it
> > > would be nice to get it running again.
> > 
> > You can see whether lxc-2.1.1 fixes it for you, or
> > you can run wigh cgfsng instead of cgmanager, as your
> > problem is just with the cgm_lock.
> 
> no, lxc-2.1.1 shows a similar problem. It hangs, too, but it tries
> to send a command in one thread and to receive it in another (afair).
> 
> But what is cgfsng? How can I use find and use this?

If you build without cgmanager, and your system has the cgroups
individually mounted under /sys/fs/cgroup, then cgfsng will be
automatically used.

> > > Can I help in any way?
> > 
> > If you were feeling bored and/or industrious, you could
> > grab the lxc git tree and git bisect to the commit that
> > breaks it :)  I'm 99% sure it'll point to the commit that
> > introduces run_command(), but actually it's possible that
> > I am actually wrong about that, so confirmation would be
> > useful.
> > 
> > Or instead of a bisect, you could just revert ea3a694fe
> > in the 2.0.9 tree and see if that fixes it.  Though it
> > may not revert cleanly.
> 
> Hmm, that looks like it causes a lot of files to be modified,
> especially network.c. This seems to be in rewritten in great
> parts...
> 
> > But, you've been enormously helpful in finding this.  While
> > it currently only affects a configuration which isn't much
> > used any more, if we're right about the cause then there is
> > a more general underlying problem which can strike elsewhere
> > too.  So thanks!
> 
> I think, this kind of setup is the most secure to deal with LXC,
> especially if you are not interested in migrating containers
> between hosts...

The 'not much used any more' isn't referring to unprivileged
containers, but to use cgmanager, which is deprecated (until
we decide we need it again :)

-serge
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-14 Thread Dirk Geschke
Hi Serge,

> > just for the record, lxc-2.0.8 is still working this way, but it
> > stops starting with lxc-2.0.9 and the whole lxc-2.1.x branch.
> > 
> > I have no idea, what happened to break it nor do I have any clue
> > to fix it. But since I like to use unprivileged containers, it
> > would be nice to get it running again.
> 
> You can see whether lxc-2.1.1 fixes it for you, or
> you can run wigh cgfsng instead of cgmanager, as your
> problem is just with the cgm_lock.

no, lxc-2.1.1 shows a similar problem. It hangs, too, but it tries
to send a command in one thread and to receive it in another (afair).

But what is cgfsng? How can I use find and use this?

> > Can I help in any way?
> 
> If you were feeling bored and/or industrious, you could
> grab the lxc git tree and git bisect to the commit that
> breaks it :)  I'm 99% sure it'll point to the commit that
> introduces run_command(), but actually it's possible that
> I am actually wrong about that, so confirmation would be
> useful.
> 
> Or instead of a bisect, you could just revert ea3a694fe
> in the 2.0.9 tree and see if that fixes it.  Though it
> may not revert cleanly.

Hmm, that looks like it causes a lot of files to be modified,
especially network.c. This seems to be in rewritten in great
parts...

> But, you've been enormously helpful in finding this.  While
> it currently only affects a configuration which isn't much
> used any more, if we're right about the cause then there is
> a more general underlying problem which can strike elsewhere
> too.  So thanks!

I think, this kind of setup is the most secure to deal with LXC,
especially if you are not interested in migrating containers
between hosts...

Best regards

Dirk

-- 
+--+
| Dr. Dirk Geschke   / Plankensteinweg 61/ 85435 Erding|
| Telefon: 08122-559448  / Mobil: 0176-96906350 / Fax: 08122-9818106   |
| d...@geschke-online.de / d...@lug-erding.de  / kont...@lug-erding.de |
+--+
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-13 Thread Christian Brauner
> You can see whether lxc-2.1.1 fixes it for you, or

It won't.

> you can run wigh cgfsng instead of cgmanager, as your
> problem is just with the cgm_lock.
> 
> > Can I help in any way?

I've appened a patch to this mail which I think solves your problem. If
you could apply it and test that would be amazing since I don't have
cgmanager here.

>
> If you were feeling bored and/or industrious, you could
> grab the lxc git tree and git bisect to the commit that
> breaks it :)  I'm 99% sure it'll point to the commit that
> introduces run_command(), but actually it's possible that
> I am actually wrong about that, so confirmation would be
> useful.

Yes, run_command() is the cause. It is caused by pthread_atfork()
handlers.

> 
> Or instead of a bisect, you could just revert ea3a694fe
> in the 2.0.9 tree and see if that fixes it.  Though it
> may not revert cleanly.
> 
> But, you've been enormously helpful in finding this.  While
> it currently only affects a configuration which isn't much
> used any more, if we're right about the cause then there is
> a more general underlying problem which can strike elsewhere
> too.  So thanks!

Essentially, run_command() is called in contexts where threads
explicitly hold a lock while fork()ing. Currently, this just affects the
legacy cgmanager cgroup driver.  Here's what's happening when we use
fork():

1. cgm_chown() calls cgm_dbus_connect()
2. cgm_dbus_connect() calls cgm_lock():
   Now the thread holds an explicit lock on the mutex
3. cgm_chown() calls chown_cgroup()
4. chown_cgroup() calls userns_exec_1()
5. userns_exec_1() forks with an explicit lock on the mutex being held
6. pthread_atfork() handlers get run including the prepare() handler:

#ifdef HAVE_PTHREAD_ATFORK __attribute__((constructor))
static void process_lock_setup_atfork(void)
{
pthread_atfork(cgm_lock, cgm_unlock, cgm_unlock);
}
#endif

   thus trying to acquire the mutex that is being explicitly held in the
   parent. If we were using recursive locks then the parent would now
   hold two locks but since I don't see us using them I guess we're
   simply getting undefined behavior.

There are multiple ways to solve this problem. They are all not very nice. One
solution is to use interposition wrapper for pthread_atfork() but that is
rather tricky since we need to have wrappers for the pthread_atfork() callbacks
and need to identify our caller so that we can make a decision whether we
should execute the callback or not. If this were a generic problem I'd say we
go for this solution but as this only affects the legacy cgmanager driver we
don't really care and I'd much rather enforce that any future code does not
take an explicit lock during a fork(). That sounds like a bad idea in the first
place. So simply switch from using fork() to clone() which does not run
pthread_atfork() handlers. If push comes to shove we might just go for doing
the clone() syscall directly via syscall(SYS_clone, ...).

Serge, please take a look at https://github.com/lxc/lxc/pull/2034 and
see whether that is acceptable to you. :)

Christian
>From 3b52c88ce5ba62013dd079e28003703028a9965f Mon Sep 17 00:00:00 2001
From: Christian Brauner 
Date: Thu, 14 Dec 2017 02:37:04 +0100
Subject: [PATCH] utils: use clone() in run_command()

run_command() is called in contexts where threads explicitly hold a lock while
fork()ing. Currently, this just affects the legacy cgmanager cgroup driver.
Here's what's happening when we use fork():

1. cgm_chown() calls cgm_dbus_connect()
2. cgm_dbus_connect() calls cgm_lock():
   Now the thread holds an explicit lock on the mutex
3. cgm_chown() calls chown_cgroup()
4. chown_cgroup() calls userns_exec_1()
5. userns_exec_1() forks with an explicit lock on the mutex being held
6. pthread_atfork() handlers get run including the prepare() handler:

#ifdef HAVE_PTHREAD_ATFORK __attribute__((constructor))
static void process_lock_setup_atfork(void)
{
pthread_atfork(cgm_lock, cgm_unlock, cgm_unlock);
}
#endif

   thus trying to acquire the mutex that is being explicitly held in the
   parent. If we were using recursive locks then the parent would now
   hold two locks but since I don't see us using them I guess we're
   simply getting undefined behavior.

There are multiple ways to solve this problem. They are all not very nice. One
solution is to use interposition wrapper for pthread_atfork() but that is
rather tricky since we need to have wrappers for the pthread_atfork() callbacks
and need to identify our caller so that we can make a decision whether we
should execute the callback or not. If this were a generic problem I'd say we
go for this solution but as this only affects the legacy cgmanager driver we
don't really care and I'd much rather enforce that any future code does not
take an explicit lock during a fork(). That sounds like a bad idea in the first
place. So 

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-13 Thread Serge E. Hallyn
Quoting Dirk Geschke (d...@lug-erding.de):
> Hi Christian,
> 
> > > > Older liblxc version used system() instead of run_command(). For
> > > > system() POSIX leaves it unspecified whether pthread_atfork() handlers
> > > > are called but glibc's implementation of system() guarantees that they
> > > > are not. But there's no requirement. So this might be why we have been
> > > > fine - by chance - all of the time.
> > > 
> > > I don't think so.  The previous system did not use system(), it just
> > > did a clone() followed by calling the fn directly.
> > 
> > This commit is present at least in 1.0.11 until at least 2.0.4 and it
> > has lxc_map_ids() call system() when new{g,u}idmap is used:
> > 
> > commit cf3ef16dc479c102433a82b8ddbb4265d3818cce
> > Author: Serge Hallyn 
> > Date:   Wed Oct 23 01:02:57 2013 +
> 
> just for the record, lxc-2.0.8 is still working this way, but it
> stops starting with lxc-2.0.9 and the whole lxc-2.1.x branch.
> 
> I have no idea, what happened to break it nor do I have any clue
> to fix it. But since I like to use unprivileged containers, it
> would be nice to get it running again.

You can see whether lxc-2.1.1 fixes it for you, or
you can run wigh cgfsng instead of cgmanager, as your
problem is just with the cgm_lock.

> Can I help in any way?

If you were feeling bored and/or industrious, you could
grab the lxc git tree and git bisect to the commit that
breaks it :)  I'm 99% sure it'll point to the commit that
introduces run_command(), but actually it's possible that
I am actually wrong about that, so confirmation would be
useful.

Or instead of a bisect, you could just revert ea3a694fe
in the 2.0.9 tree and see if that fixes it.  Though it
may not revert cleanly.

But, you've been enormously helpful in finding this.  While
it currently only affects a configuration which isn't much
used any more, if we're right about the cause then there is
a more general underlying problem which can strike elsewhere
too.  So thanks!

-serge
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-13 Thread Dirk Geschke
Hi Christian,

> > > Older liblxc version used system() instead of run_command(). For
> > > system() POSIX leaves it unspecified whether pthread_atfork() handlers
> > > are called but glibc's implementation of system() guarantees that they
> > > are not. But there's no requirement. So this might be why we have been
> > > fine - by chance - all of the time.
> > 
> > I don't think so.  The previous system did not use system(), it just
> > did a clone() followed by calling the fn directly.
> 
> This commit is present at least in 1.0.11 until at least 2.0.4 and it
> has lxc_map_ids() call system() when new{g,u}idmap is used:
> 
> commit cf3ef16dc479c102433a82b8ddbb4265d3818cce
> Author: Serge Hallyn 
> Date:   Wed Oct 23 01:02:57 2013 +

just for the record, lxc-2.0.8 is still working this way, but it
stops starting with lxc-2.0.9 and the whole lxc-2.1.x branch.

I have no idea, what happened to break it nor do I have any clue
to fix it. But since I like to use unprivileged containers, it
would be nice to get it running again.

Can I help in any way?

Best regards

Dirk

-- 
+--+
| Dr. Dirk Geschke   / Plankensteinweg 61/ 85435 Erding|
| Telefon: 08122-559448  / Mobil: 0176-96906350 / Fax: 08122-9818106   |
| d...@geschke-online.de / d...@lug-erding.de  / kont...@lug-erding.de |
+--+
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-13 Thread Christian Brauner
On Wed, Dec 13, 2017 at 09:22:11AM -0600, Serge Hallyn wrote:
> Quoting Christian Brauner (christian.brau...@mailbox.org):
> > On Tue, Dec 12, 2017 at 11:00:01PM -0600, Serge Hallyn wrote:
> > > On Tue, Dec 05, 2017 at 05:20:32PM +0100, Dirk Geschke wrote:
> > > > Hi Serge,
> > > > 
> > > > > > I am a little bit clueless, I have several systems running with
> > > > > > Debian and unprivileged LXC. But newer systems won't start new
> > > > > > containers.
> > > > > > 
> > > > > > Actually I have a Debian stretch, installed the normal way but
> > > > > > with lxc-2.0.9 and cgmanager-0.41 installed from sources.
> > > > > > 
> > > > > > I can setup cgmanager, can do a cgm movepid and it is no problem
> > > > > > to download a template. But starting the container does not work,
> > > > > > it simply hungs at:
> > > > > > 
> > > > > >$ lxc-start -n lxc-test -l trace -o wheezy -F
> > > > > 
> > > > > I see no bad errors in the log.  When this hangs, can you
> > > > > from another terminal see whether 'lxc-ls -f' shows it
> > > > > running, and what 'lxc-attach -n lxc-test' does?
> > > > 
> > > > that's the funny part: Nothing. There is not one process from 
> > > > the subuid range running. It simply hangs before it tries to 
> > > > start the container at all. And I have no idea, why.
> > > > But with lxc-2.0.8 it works. 
> > > > 
> > > > I just installed and started debian wheezy, upgraded it to jessie
> > > > and finally to stretch. It works fine.
> > > > 
> > > > I now installed lxc-2.0.9 again, tried to start the container again
> > > > and nothing happens:
> > > > 
> > > >$ lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > > > 
> > > > That's all. lxc-ls -f and lxc-attach-n lxc-test hangs, too.
> > > > 
> > > > I see also three processes of lxc-start:
> > > > 
> > > >$ ps waux |grep lxc-start
> > > >lxc-test 24478  0.0  0.1  51740  4232 pts/0S+   17:16   0:00
> > > >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > > >lxc-test 24487  0.0  0.0  51740   504 pts/0S+   17:16   0:00
> > > >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > > >lxc-test 24492  0.0  0.0  51740   508 pts/0S+   17:16   0:00
> > > >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > > 
> > > When you gdb-attach to these (which you have to do as root for two
> > > of them) you find that you're hung on:
> > > 
> > > (gdb) where
> > > #0  __lll_lock_wait () at 
> > > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> > > #1  0x7f2a94f68b95 in __GI___pthread_mutex_lock 
> > > (mutex=mutex@entry=0x7f2a95868a60 )
> > > at ../nptl/pthread_mutex_lock.c:80
> > > #2  0x7f2a95638c4d in lock_mutex (l=0x7f2a95868a60 ) at 
> > > cgroups/cgmanager.c:80
> > > #3  cgm_lock () at cgroups/cgmanager.c:98
> > > #4  0x7f2a94a722f5 in __libc_fork () at ../sysdeps/nptl/fork.c:96
> > > #5  0x7f2a95604ee7 in run_command (buf=buf@entry=0x7fff3b5a20e0 "",
> > > buf_size=buf_size@entry=4096,
> > > child_fn=child_fn@entry=0x7f2a95606a30 ,
> > > args=args@entry=0x7fff3b5a40e0) at utils.c:2262
> > > #6  0x7f2a9560b01e in lxc_map_ids (idmap=idmap@entry=0x55b48a62c1c0, 
> > > pid=pid@entry=15389)
> > > at conf.c:2652
> > > #7  0x7f2a9560f1e5 in userns_exec_1 (conf=conf@entry=0x55b48a625b90,
> > > fn=fn@entry=0x7f2a95639a20 , 
> > > data=data@entry=0x7fff3b5a5210,
> > > fn_name=fn_name@entry=0x7f2a9563fadc "chown_cgroup_wrapper") at 
> > > conf.c:3822
> > > #8  0x7f2a9563a1a9 in chown_cgroup (conf=0x55b48a625b90, 
> > > cgroup_path=)
> > > at cgroups/cgmanager.c:500
> > > #9  cgm_chown (hdata=0x55b48a62b1d0, conf=0x55b48a625b90) at 
> > > cgroups/cgmanager.c:1555
> > > #10 0x7f2a955fa397 in lxc_spawn (handler=0x55b48a624e50) at 
> > > start.c:1363
> > > 
> > > and
> > > 
> > > #0  0x7f2a94f6f1f0 in __read_nocancel () at 
> > > ../sysdeps/unix/syscall-template.S:84
> > > #1  0x7f2a95607872 in run_userns_fn (data=0x7fff3b5a51b0) at 
> > > conf.c:3570
> > > #2  0x7f2a94aa2aff in clone () at 
> > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
> > > 
> > > So it seems to be hanging on cgm_lock().
> > > 
> > > I'm a bit too tired to think it through clearly enough, but I'm thinking
> > > this might have to do with the introduction of run_command().  It 
> > > introduces
> > > an extra fork() between the clone(CLONE_NEWUSER)'d thread and the task 
> > > which
> > > actually does the work.  Perhaps that is messing with our lock state?
> > 
> > Right, so as I said, this could be related to pthread_atfork() handlers.
> > (I suspect that cgmanager is multi-threaded or calls to libnh or dbus
> > which is, Serge?)
> 
> Right - thanks for jogging my memory.  So yes we need to drop the
> cgm_lock before we fork.
> 
> > Older liblxc version used system() instead of run_command(). For
> > system() POSIX leaves it unspecified whether pthread_atfork() handlers
> > are called but glibc's implementation of system() guarantees 

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-13 Thread Christian Brauner
On Wed, Dec 13, 2017 at 01:35:01PM +0100, Christian Brauner wrote:
> On Wed, Dec 13, 2017 at 12:54:04PM +0100, Christian Brauner wrote:
> > On Tue, Dec 12, 2017 at 11:00:01PM -0600, Serge Hallyn wrote:
> > > On Tue, Dec 05, 2017 at 05:20:32PM +0100, Dirk Geschke wrote:
> > > > Hi Serge,
> > > > 
> > > > > > I am a little bit clueless, I have several systems running with
> > > > > > Debian and unprivileged LXC. But newer systems won't start new
> > > > > > containers.
> > > > > > 
> > > > > > Actually I have a Debian stretch, installed the normal way but
> > > > > > with lxc-2.0.9 and cgmanager-0.41 installed from sources.
> > > > > > 
> > > > > > I can setup cgmanager, can do a cgm movepid and it is no problem
> > > > > > to download a template. But starting the container does not work,
> > > > > > it simply hungs at:
> > > > > > 
> > > > > >$ lxc-start -n lxc-test -l trace -o wheezy -F
> > > > > 
> > > > > I see no bad errors in the log.  When this hangs, can you
> > > > > from another terminal see whether 'lxc-ls -f' shows it
> > > > > running, and what 'lxc-attach -n lxc-test' does?
> > > > 
> > > > that's the funny part: Nothing. There is not one process from 
> > > > the subuid range running. It simply hangs before it tries to 
> > > > start the container at all. And I have no idea, why.
> > > > But with lxc-2.0.8 it works. 
> > > > 
> > > > I just installed and started debian wheezy, upgraded it to jessie
> > > > and finally to stretch. It works fine.
> > > > 
> > > > I now installed lxc-2.0.9 again, tried to start the container again
> > > > and nothing happens:
> > > > 
> > > >$ lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > > > 
> > > > That's all. lxc-ls -f and lxc-attach-n lxc-test hangs, too.
> > > > 
> > > > I see also three processes of lxc-start:
> > > > 
> > > >$ ps waux |grep lxc-start
> > > >lxc-test 24478  0.0  0.1  51740  4232 pts/0S+   17:16   0:00
> > > >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > > >lxc-test 24487  0.0  0.0  51740   504 pts/0S+   17:16   0:00
> > > >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > > >lxc-test 24492  0.0  0.0  51740   508 pts/0S+   17:16   0:00
> > > >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > > 
> > > When you gdb-attach to these (which you have to do as root for two
> > > of them) you find that you're hung on:
> > > 
> > > (gdb) where
> > > #0  __lll_lock_wait () at 
> > > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> > > #1  0x7f2a94f68b95 in __GI___pthread_mutex_lock 
> > > (mutex=mutex@entry=0x7f2a95868a60 )
> > > at ../nptl/pthread_mutex_lock.c:80
> > > #2  0x7f2a95638c4d in lock_mutex (l=0x7f2a95868a60 ) at 
> > > cgroups/cgmanager.c:80
> > > #3  cgm_lock () at cgroups/cgmanager.c:98
> > > #4  0x7f2a94a722f5 in __libc_fork () at ../sysdeps/nptl/fork.c:96
> > > #5  0x7f2a95604ee7 in run_command (buf=buf@entry=0x7fff3b5a20e0 "",
> > > buf_size=buf_size@entry=4096,
> > > child_fn=child_fn@entry=0x7f2a95606a30 ,
> > > args=args@entry=0x7fff3b5a40e0) at utils.c:2262
> > > #6  0x7f2a9560b01e in lxc_map_ids (idmap=idmap@entry=0x55b48a62c1c0, 
> > > pid=pid@entry=15389)
> > > at conf.c:2652
> > > #7  0x7f2a9560f1e5 in userns_exec_1 (conf=conf@entry=0x55b48a625b90,
> > > fn=fn@entry=0x7f2a95639a20 , 
> > > data=data@entry=0x7fff3b5a5210,
> > > fn_name=fn_name@entry=0x7f2a9563fadc "chown_cgroup_wrapper") at 
> > > conf.c:3822
> > > #8  0x7f2a9563a1a9 in chown_cgroup (conf=0x55b48a625b90, 
> > > cgroup_path=)
> > > at cgroups/cgmanager.c:500
> > > #9  cgm_chown (hdata=0x55b48a62b1d0, conf=0x55b48a625b90) at 
> > > cgroups/cgmanager.c:1555
> > > #10 0x7f2a955fa397 in lxc_spawn (handler=0x55b48a624e50) at 
> > > start.c:1363
> > > 
> > > and
> > > 
> > > #0  0x7f2a94f6f1f0 in __read_nocancel () at 
> > > ../sysdeps/unix/syscall-template.S:84
> > > #1  0x7f2a95607872 in run_userns_fn (data=0x7fff3b5a51b0) at 
> > > conf.c:3570
> > > #2  0x7f2a94aa2aff in clone () at 
> > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
> > > 
> > > So it seems to be hanging on cgm_lock().
> > > 
> > > I'm a bit too tired to think it through clearly enough, but I'm thinking
> > > this might have to do with the introduction of run_command().  It 
> > > introduces
> > > an extra fork() between the clone(CLONE_NEWUSER)'d thread and the task 
> > > which
> > > actually does the work.  Perhaps that is messing with our lock state?
> > 
> > Right, so as I said, this could be related to pthread_atfork() handlers.
> > (I suspect that cgmanager is multi-threaded or calls to libnh or dbus
> > which is, Serge?)
> > Older liblxc version used system() instead of run_command(). For
> > system() POSIX leaves it unspecified whether pthread_atfork() handlers
> > are called but glibc's implementation of system() guarantees that they
> > are not. But there's no requirement. So this might be why we have been
> > 

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-13 Thread Christian Brauner
On Wed, Dec 13, 2017 at 12:54:04PM +0100, Christian Brauner wrote:
> On Tue, Dec 12, 2017 at 11:00:01PM -0600, Serge Hallyn wrote:
> > On Tue, Dec 05, 2017 at 05:20:32PM +0100, Dirk Geschke wrote:
> > > Hi Serge,
> > > 
> > > > > I am a little bit clueless, I have several systems running with
> > > > > Debian and unprivileged LXC. But newer systems won't start new
> > > > > containers.
> > > > > 
> > > > > Actually I have a Debian stretch, installed the normal way but
> > > > > with lxc-2.0.9 and cgmanager-0.41 installed from sources.
> > > > > 
> > > > > I can setup cgmanager, can do a cgm movepid and it is no problem
> > > > > to download a template. But starting the container does not work,
> > > > > it simply hungs at:
> > > > > 
> > > > >$ lxc-start -n lxc-test -l trace -o wheezy -F
> > > > 
> > > > I see no bad errors in the log.  When this hangs, can you
> > > > from another terminal see whether 'lxc-ls -f' shows it
> > > > running, and what 'lxc-attach -n lxc-test' does?
> > > 
> > > that's the funny part: Nothing. There is not one process from 
> > > the subuid range running. It simply hangs before it tries to 
> > > start the container at all. And I have no idea, why.
> > > But with lxc-2.0.8 it works. 
> > > 
> > > I just installed and started debian wheezy, upgraded it to jessie
> > > and finally to stretch. It works fine.
> > > 
> > > I now installed lxc-2.0.9 again, tried to start the container again
> > > and nothing happens:
> > > 
> > >$ lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > > 
> > > That's all. lxc-ls -f and lxc-attach-n lxc-test hangs, too.
> > > 
> > > I see also three processes of lxc-start:
> > > 
> > >$ ps waux |grep lxc-start
> > >lxc-test 24478  0.0  0.1  51740  4232 pts/0S+   17:16   0:00
> > >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > >lxc-test 24487  0.0  0.0  51740   504 pts/0S+   17:16   0:00
> > >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > >lxc-test 24492  0.0  0.0  51740   508 pts/0S+   17:16   0:00
> > >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > 
> > When you gdb-attach to these (which you have to do as root for two
> > of them) you find that you're hung on:
> > 
> > (gdb) where
> > #0  __lll_lock_wait () at 
> > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> > #1  0x7f2a94f68b95 in __GI___pthread_mutex_lock 
> > (mutex=mutex@entry=0x7f2a95868a60 )
> > at ../nptl/pthread_mutex_lock.c:80
> > #2  0x7f2a95638c4d in lock_mutex (l=0x7f2a95868a60 ) at 
> > cgroups/cgmanager.c:80
> > #3  cgm_lock () at cgroups/cgmanager.c:98
> > #4  0x7f2a94a722f5 in __libc_fork () at ../sysdeps/nptl/fork.c:96
> > #5  0x7f2a95604ee7 in run_command (buf=buf@entry=0x7fff3b5a20e0 "",
> > buf_size=buf_size@entry=4096,
> > child_fn=child_fn@entry=0x7f2a95606a30 ,
> > args=args@entry=0x7fff3b5a40e0) at utils.c:2262
> > #6  0x7f2a9560b01e in lxc_map_ids (idmap=idmap@entry=0x55b48a62c1c0, 
> > pid=pid@entry=15389)
> > at conf.c:2652
> > #7  0x7f2a9560f1e5 in userns_exec_1 (conf=conf@entry=0x55b48a625b90,
> > fn=fn@entry=0x7f2a95639a20 , 
> > data=data@entry=0x7fff3b5a5210,
> > fn_name=fn_name@entry=0x7f2a9563fadc "chown_cgroup_wrapper") at 
> > conf.c:3822
> > #8  0x7f2a9563a1a9 in chown_cgroup (conf=0x55b48a625b90, 
> > cgroup_path=)
> > at cgroups/cgmanager.c:500
> > #9  cgm_chown (hdata=0x55b48a62b1d0, conf=0x55b48a625b90) at 
> > cgroups/cgmanager.c:1555
> > #10 0x7f2a955fa397 in lxc_spawn (handler=0x55b48a624e50) at start.c:1363
> > 
> > and
> > 
> > #0  0x7f2a94f6f1f0 in __read_nocancel () at 
> > ../sysdeps/unix/syscall-template.S:84
> > #1  0x7f2a95607872 in run_userns_fn (data=0x7fff3b5a51b0) at conf.c:3570
> > #2  0x7f2a94aa2aff in clone () at 
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
> > 
> > So it seems to be hanging on cgm_lock().
> > 
> > I'm a bit too tired to think it through clearly enough, but I'm thinking
> > this might have to do with the introduction of run_command().  It introduces
> > an extra fork() between the clone(CLONE_NEWUSER)'d thread and the task which
> > actually does the work.  Perhaps that is messing with our lock state?
> 
> Right, so as I said, this could be related to pthread_atfork() handlers.
> (I suspect that cgmanager is multi-threaded or calls to libnh or dbus
> which is, Serge?)
> Older liblxc version used system() instead of run_command(). For
> system() POSIX leaves it unspecified whether pthread_atfork() handlers
> are called but glibc's implementation of system() guarantees that they
> are not. But there's no requirement. So this might be why we have been
> fine - by chance - all of the time. The obvious solution is to switch
> back to system() instead of run_command() but let me think about this
> for a second.

Right, so I think that this is indeed pthread_atfork() and cgmanager:
1. cgm_chown() calls cgm_dbus_connect()
2. cgm_dbus_connect() calls 

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-13 Thread Christian Brauner
On Tue, Dec 12, 2017 at 11:00:01PM -0600, Serge Hallyn wrote:
> On Tue, Dec 05, 2017 at 05:20:32PM +0100, Dirk Geschke wrote:
> > Hi Serge,
> > 
> > > > I am a little bit clueless, I have several systems running with
> > > > Debian and unprivileged LXC. But newer systems won't start new
> > > > containers.
> > > > 
> > > > Actually I have a Debian stretch, installed the normal way but
> > > > with lxc-2.0.9 and cgmanager-0.41 installed from sources.
> > > > 
> > > > I can setup cgmanager, can do a cgm movepid and it is no problem
> > > > to download a template. But starting the container does not work,
> > > > it simply hungs at:
> > > > 
> > > >$ lxc-start -n lxc-test -l trace -o wheezy -F
> > > 
> > > I see no bad errors in the log.  When this hangs, can you
> > > from another terminal see whether 'lxc-ls -f' shows it
> > > running, and what 'lxc-attach -n lxc-test' does?
> > 
> > that's the funny part: Nothing. There is not one process from 
> > the subuid range running. It simply hangs before it tries to 
> > start the container at all. And I have no idea, why.
> > But with lxc-2.0.8 it works. 
> > 
> > I just installed and started debian wheezy, upgraded it to jessie
> > and finally to stretch. It works fine.
> > 
> > I now installed lxc-2.0.9 again, tried to start the container again
> > and nothing happens:
> > 
> >$ lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > 
> > That's all. lxc-ls -f and lxc-attach-n lxc-test hangs, too.
> > 
> > I see also three processes of lxc-start:
> > 
> >$ ps waux |grep lxc-start
> >lxc-test 24478  0.0  0.1  51740  4232 pts/0S+   17:16   0:00
> >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> >lxc-test 24487  0.0  0.0  51740   504 pts/0S+   17:16   0:00
> >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> >lxc-test 24492  0.0  0.0  51740   508 pts/0S+   17:16   0:00
> >lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> 
> When you gdb-attach to these (which you have to do as root for two
> of them) you find that you're hung on:
> 
> (gdb) where
> #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x7f2a94f68b95 in __GI___pthread_mutex_lock 
> (mutex=mutex@entry=0x7f2a95868a60 )
> at ../nptl/pthread_mutex_lock.c:80
> #2  0x7f2a95638c4d in lock_mutex (l=0x7f2a95868a60 ) at 
> cgroups/cgmanager.c:80
> #3  cgm_lock () at cgroups/cgmanager.c:98
> #4  0x7f2a94a722f5 in __libc_fork () at ../sysdeps/nptl/fork.c:96
> #5  0x7f2a95604ee7 in run_command (buf=buf@entry=0x7fff3b5a20e0 "",
> buf_size=buf_size@entry=4096,
> child_fn=child_fn@entry=0x7f2a95606a30 ,
> args=args@entry=0x7fff3b5a40e0) at utils.c:2262
> #6  0x7f2a9560b01e in lxc_map_ids (idmap=idmap@entry=0x55b48a62c1c0, 
> pid=pid@entry=15389)
> at conf.c:2652
> #7  0x7f2a9560f1e5 in userns_exec_1 (conf=conf@entry=0x55b48a625b90,
> fn=fn@entry=0x7f2a95639a20 , 
> data=data@entry=0x7fff3b5a5210,
> fn_name=fn_name@entry=0x7f2a9563fadc "chown_cgroup_wrapper") at 
> conf.c:3822
> #8  0x7f2a9563a1a9 in chown_cgroup (conf=0x55b48a625b90, 
> cgroup_path=)
> at cgroups/cgmanager.c:500
> #9  cgm_chown (hdata=0x55b48a62b1d0, conf=0x55b48a625b90) at 
> cgroups/cgmanager.c:1555
> #10 0x7f2a955fa397 in lxc_spawn (handler=0x55b48a624e50) at start.c:1363
> 
> and
> 
> #0  0x7f2a94f6f1f0 in __read_nocancel () at 
> ../sysdeps/unix/syscall-template.S:84
> #1  0x7f2a95607872 in run_userns_fn (data=0x7fff3b5a51b0) at conf.c:3570
> #2  0x7f2a94aa2aff in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
> 
> So it seems to be hanging on cgm_lock().
> 
> I'm a bit too tired to think it through clearly enough, but I'm thinking
> this might have to do with the introduction of run_command().  It introduces
> an extra fork() between the clone(CLONE_NEWUSER)'d thread and the task which
> actually does the work.  Perhaps that is messing with our lock state?

Right, so as I said, this could be related to pthread_atfork() handlers.
(I suspect that cgmanager is multi-threaded or calls to libnh or dbus
which is, Serge?)
Older liblxc version used system() instead of run_command(). For
system() POSIX leaves it unspecified whether pthread_atfork() handlers
are called but glibc's implementation of system() guarantees that they
are not. But there's no requirement. So this might be why we have been
fine - by chance - all of the time. The obvious solution is to switch
back to system() instead of run_command() but let me think about this
for a second.
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-12 Thread Serge E. Hallyn
On Tue, Dec 05, 2017 at 05:20:32PM +0100, Dirk Geschke wrote:
> Hi Serge,
> 
> > > I am a little bit clueless, I have several systems running with
> > > Debian and unprivileged LXC. But newer systems won't start new
> > > containers.
> > > 
> > > Actually I have a Debian stretch, installed the normal way but
> > > with lxc-2.0.9 and cgmanager-0.41 installed from sources.
> > > 
> > > I can setup cgmanager, can do a cgm movepid and it is no problem
> > > to download a template. But starting the container does not work,
> > > it simply hungs at:
> > > 
> > >$ lxc-start -n lxc-test -l trace -o wheezy -F
> > 
> > I see no bad errors in the log.  When this hangs, can you
> > from another terminal see whether 'lxc-ls -f' shows it
> > running, and what 'lxc-attach -n lxc-test' does?
> 
> that's the funny part: Nothing. There is not one process from 
> the subuid range running. It simply hangs before it tries to 
> start the container at all. And I have no idea, why.
> But with lxc-2.0.8 it works. 
> 
> I just installed and started debian wheezy, upgraded it to jessie
> and finally to stretch. It works fine.
> 
> I now installed lxc-2.0.9 again, tried to start the container again
> and nothing happens:
> 
>$ lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> 
> That's all. lxc-ls -f and lxc-attach-n lxc-test hangs, too.
> 
> I see also three processes of lxc-start:
> 
>$ ps waux |grep lxc-start
>lxc-test 24478  0.0  0.1  51740  4232 pts/0S+   17:16   0:00
>lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
>lxc-test 24487  0.0  0.0  51740   504 pts/0S+   17:16   0:00
>lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
>lxc-test 24492  0.0  0.0  51740   508 pts/0S+   17:16   0:00
>lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F

When you gdb-attach to these (which you have to do as root for two
of them) you find that you're hung on:

(gdb) where
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x7f2a94f68b95 in __GI___pthread_mutex_lock 
(mutex=mutex@entry=0x7f2a95868a60 )
at ../nptl/pthread_mutex_lock.c:80
#2  0x7f2a95638c4d in lock_mutex (l=0x7f2a95868a60 ) at 
cgroups/cgmanager.c:80
#3  cgm_lock () at cgroups/cgmanager.c:98
#4  0x7f2a94a722f5 in __libc_fork () at ../sysdeps/nptl/fork.c:96
#5  0x7f2a95604ee7 in run_command (buf=buf@entry=0x7fff3b5a20e0 "",
buf_size=buf_size@entry=4096,
child_fn=child_fn@entry=0x7f2a95606a30 ,
args=args@entry=0x7fff3b5a40e0) at utils.c:2262
#6  0x7f2a9560b01e in lxc_map_ids (idmap=idmap@entry=0x55b48a62c1c0, 
pid=pid@entry=15389)
at conf.c:2652
#7  0x7f2a9560f1e5 in userns_exec_1 (conf=conf@entry=0x55b48a625b90,
fn=fn@entry=0x7f2a95639a20 , 
data=data@entry=0x7fff3b5a5210,
fn_name=fn_name@entry=0x7f2a9563fadc "chown_cgroup_wrapper") at conf.c:3822
#8  0x7f2a9563a1a9 in chown_cgroup (conf=0x55b48a625b90, 
cgroup_path=)
at cgroups/cgmanager.c:500
#9  cgm_chown (hdata=0x55b48a62b1d0, conf=0x55b48a625b90) at 
cgroups/cgmanager.c:1555
#10 0x7f2a955fa397 in lxc_spawn (handler=0x55b48a624e50) at start.c:1363

and

#0  0x7f2a94f6f1f0 in __read_nocancel () at 
../sysdeps/unix/syscall-template.S:84
#1  0x7f2a95607872 in run_userns_fn (data=0x7fff3b5a51b0) at conf.c:3570
#2  0x7f2a94aa2aff in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:97

So it seems to be hanging on cgm_lock().

I'm a bit too tired to think it through clearly enough, but I'm thinking
this might have to do with the introduction of run_command().  It introduces
an extra fork() between the clone(CLONE_NEWUSER)'d thread and the task which
actually does the work.  Perhaps that is messing with our lock state?

-serge
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-05 Thread Serge E. Hallyn
Quoting Dirk Geschke (d...@lug-erding.de):
> Hi Serge,
> 
> > > I am a little bit clueless, I have several systems running with
> > > Debian and unprivileged LXC. But newer systems won't start new
> > > containers.
> > > 
> > > Actually I have a Debian stretch, installed the normal way but
> > > with lxc-2.0.9 and cgmanager-0.41 installed from sources.
> > > 
> > > I can setup cgmanager, can do a cgm movepid and it is no problem
> > > to download a template. But starting the container does not work,
> > > it simply hungs at:
> > > 
> > >$ lxc-start -n lxc-test -l trace -o wheezy -F
> > 
> > I see no bad errors in the log.  When this hangs, can you
> > from another terminal see whether 'lxc-ls -f' shows it
> > running, and what 'lxc-attach -n lxc-test' does?
> 
> that's the funny part: Nothing. There is not one process from 
> the subuid range running. It simply hangs before it tries to 
> start the container at all. And I have no idea, why.
> But with lxc-2.0.8 it works. 
> 
> I just installed and started debian wheezy, upgraded it to jessie
> and finally to stretch. It works fine.
> 
> I now installed lxc-2.0.9 again, tried to start the container again
> and nothing happens:
> 
>$ lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> 
> That's all. lxc-ls -f and lxc-attach-n lxc-test hangs, too.
> 
> I see also three processes of lxc-start:
> 
>$ ps waux |grep lxc-start
>lxc-test 24478  0.0  0.1  51740  4232 pts/0S+   17:16   0:00
>lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
>lxc-test 24487  0.0  0.0  51740   504 pts/0S+   17:16   0:00
>lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
>lxc-test 24492  0.0  0.0  51740   508 pts/0S+   17:16   0:00
>lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> 
> That's really strange...

Can you (install dbgsym pkg if you need to and) gdb attach to
the lxc-start process, and figure out where it's sitting?
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-05 Thread Dirk Geschke
Hi Serge,

> > I am a little bit clueless, I have several systems running with
> > Debian and unprivileged LXC. But newer systems won't start new
> > containers.
> > 
> > Actually I have a Debian stretch, installed the normal way but
> > with lxc-2.0.9 and cgmanager-0.41 installed from sources.
> > 
> > I can setup cgmanager, can do a cgm movepid and it is no problem
> > to download a template. But starting the container does not work,
> > it simply hungs at:
> > 
> >$ lxc-start -n lxc-test -l trace -o wheezy -F
> 
> I see no bad errors in the log.  When this hangs, can you
> from another terminal see whether 'lxc-ls -f' shows it
> running, and what 'lxc-attach -n lxc-test' does?

that's the funny part: Nothing. There is not one process from 
the subuid range running. It simply hangs before it tries to 
start the container at all. And I have no idea, why.
But with lxc-2.0.8 it works. 

I just installed and started debian wheezy, upgraded it to jessie
and finally to stretch. It works fine.

I now installed lxc-2.0.9 again, tried to start the container again
and nothing happens:

   $ lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F

That's all. lxc-ls -f and lxc-attach-n lxc-test hangs, too.

I see also three processes of lxc-start:

   $ ps waux |grep lxc-start
   lxc-test 24478  0.0  0.1  51740  4232 pts/0S+   17:16   0:00
   lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
   lxc-test 24487  0.0  0.0  51740   504 pts/0S+   17:16   0:00
   lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
   lxc-test 24492  0.0  0.0  51740   508 pts/0S+   17:16   0:00
   lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F

That's really strange...

Best regards

Dirk

-- 
+--+
| Dr. Dirk Geschke   / Plankensteinweg 61/ 85435 Erding|
| Telefon: 08122-559448  / Mobil: 0176-96906350 / Fax: 08122-9818106   |
| d...@geschke-online.de / d...@lug-erding.de  / kont...@lug-erding.de |
+--+
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-05 Thread Serge E. Hallyn
Quoting Dirk Geschke (d...@lug-erding.de):
> Hi all,
> 
> I am a little bit clueless, I have several systems running with
> Debian and unprivileged LXC. But newer systems won't start new
> containers.
> 
> Actually I have a Debian stretch, installed the normal way but
> with lxc-2.0.9 and cgmanager-0.41 installed from sources.
> 
> I can setup cgmanager, can do a cgm movepid and it is no problem
> to download a template. But starting the container does not work,
> it simply hungs at:
> 
>$ lxc-start -n lxc-test -l trace -o wheezy -F

I see no bad errors in the log.  When this hangs, can you
from another terminal see whether 'lxc-ls -f' shows it
running, and what 'lxc-attach -n lxc-test' does?
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Debian and unprivileged LXC not working...

2017-12-05 Thread Dirk Geschke
Hi all,
> 
> I am a little bit clueless, I have several systems running with
> Debian and unprivileged LXC. But newer systems won't start new
> containers.
> 
> Actually I have a Debian stretch, installed the normal way but
> with lxc-2.0.9 and cgmanager-0.41 installed from sources.

hmm, strange, I now compiled and installed lxc-2.0.8 and all works...

Does this help to find, what is missing with 2.0.9?

Best regards

Dirk

-- 
+--+
| Dr. Dirk Geschke   / Plankensteinweg 61/ 85435 Erding|
| Telefon: 08122-559448  / Mobil: 0176-96906350 / Fax: 08122-9818106   |
| d...@geschke-online.de / d...@lug-erding.de  / kont...@lug-erding.de |
+--+
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users