[lxc-devel] Share the loopback with network namespaces

2011-03-31 Thread Marian Marinov
Hello,
I want to implement network namespaces in my software. The problem I'm seeing 
is that each namespace has its own loopback interface.
Is there any way I can brindge, forward or tunnel the traffic of one loopback 
interface (from some namespace) to the real loopback interface on the host 
machine?

Any iptables patches available? Or some trics that are not documented ?
I want all users to have their own network namespaces with their own IPs, but 
I want them all to share the host loopback interface for connections to mysql, 
pgsql, smtp, imap and so on.

-- 
Best regards,
Marian Marinov


signature.asc
Description: This is a digitally signed message part.
--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] Share the loopback with network namespaces

2011-04-01 Thread Marian Marinov
On Friday 01 April 2011 06:12:34 Eric Brower wrote:
 Does it really need to be done on loopback?  How about creating a
 bridge on the host, adding veth devices for each namespace/container
 and the host, and adding them to the bridge-- this would allow the
 host and each container to access this private, bridged network, but
 would not provide external access unless explicitly configured.
 
 E

I'm thinking a solution like yours and the traffic from that bridge will be 
then 
forwarded to the loopback device. But it is not very clean :)

I really need the access to the Loopback on the host. The problem is that I 
can not change the scripts of my users to connect to mysql on different IPs.

Marian

 
 On Thu, Mar 31, 2011 at 6:59 PM, Marian Marinov m...@yuhu.biz wrote:
  Hello,
  I want to implement network namespaces in my software. The problem I'm
  seeing is that each namespace has its own loopback interface.
  Is there any way I can brindge, forward or tunnel the traffic of one
  loopback interface (from some namespace) to the real loopback interface
  on the host machine?
  
  Any iptables patches available? Or some trics that are not documented ?
  I want all users to have their own network namespaces with their own IPs,
  but I want them all to share the host loopback interface for connections
  to mysql, pgsql, smtp, imap and so on.
  
  --
  Best regards,
  Marian Marinov
  
  -
  - Create and publish websites with WebMatrix
  Use the most popular FREE web apps or write code yourself;
  WebMatrix provides all the features you need to develop and
  publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
  
  ___
  Lxc-devel mailing list
  Lxc-devel@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/lxc-devel

-- 
Best regards,
Marian Marinov


signature.asc
Description: This is a digitally signed message part.
--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] cgroup cpuacct enhancement

2011-10-04 Thread Marian Marinov
On Wednesday 05 October 2011 01:25:15 martin.pe...@bull.com wrote:
 Hello.
 
 I apologize if this is not the right list for this question, but it was
 the best match I could find. I'm working on a project to estimate the
 power consumption of jobs running on a Linux HPC cluster.  We've been
 looking into the possibility of producing this estimate using cgroups with
 the cpuacct subsystem.  cpuacct currently collects only cpu time.  If it
 also collected cpu cycles we could use these two values to calculate the
 cpu frequency and estimate the job's power consumption.  So my questions
 are:
 Is it feasible to enhance cgroup cpuacct to provide this additional data
 (number of cpu cycles)?
 Is there anyone actively working on cgroups who would be willing to make
 this change?
 
 Regards,
 Martin Perry
 Bull Information Systems

Hi Martin,

I think that you would want to ask here:
Paul Menage menage {} google.com
Li Zefan lizf {} cn.fujitsu.com
containers {} lists.linux-foundation.org

These are the maintainers of cgroups within the kernel.

I personally think that this is possible.

However I haven't done any research on that.

Regards,
Marian


signature.asc
Description: This is a digitally signed message part.
--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


[lxc-devel] /proc/cpuinfo per cgroup

2013-11-25 Thread Marian Marinov
Hi guys,
I'm using LXC containers for some of my teaching and I want to have 
/proc/cpuinfo and /proc/memory based on the cgroup 
limits that I have set.

The idea is that if one container is limited to a cpuset of 0-1 it should see 
only the first two cores and not all the 
cores on the machine.

The same thing is needed for the memory.

I simply want my students see the actual resources that they have.

Does any of you have any suggestions?

I'm planning to patch the kernel. As far as I can see it, I need to patch the 
following files:
./tile/kernel/proc.c
./sh/kernel/cpu/proc.c
./x86/kernel/cpu/proc.c
./mips/kernel/proc.c

Actually the c_start function.

Marian

--
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] /proc/cpuinfo per cgroup

2013-11-25 Thread Marian Marinov
On 11/25/2013 05:12 PM, Daniel P. Berrange wrote:
 On Mon, Nov 25, 2013 at 09:09:40AM -0600, Serge Hallyn wrote:
 Quoting Marian Marinov (m...@yuhu.biz):
 Hi guys,
 I'm using LXC containers for some of my teaching and I want to have 
 /proc/cpuinfo and /proc/memory based on the cgroup
 limits that I have set.

 The idea is that if one container is limited to a cpuset of 0-1 it should 
 see only the first two cores and not all the
 cores on the machine.

 The same thing is needed for the memory.

 I simply want my students see the actual resources that they have.

 Does any of you have any suggestions?

 I'm planning to patch the kernel. As far as I can see it, I need to patch 
 the following files:
 ./tile/kernel/proc.c
 ./sh/kernel/cpu/proc.c
 ./x86/kernel/cpu/proc.c
 ./mips/kernel/proc.c

 Actually the c_start function.

 Hi,

 patching the kernel would be a good exercise.  Historically that hasn't
 been acceptable upstream - but then tastes and politics change pretty
 frequently, and what was nacked one year can be enthusiastically
 accepted two years later...

 now the alternative is to use fuse to have userspace change what is
 shown in those files.  Daniel Lezcano years ago had one working.  The
 code for that is up at https://github.com/hallyn/procfs, however it
 won't work or even compile as is.  But if you can whip that into a
 working shape we could hopefully figure out how to ship it with lxc.

 In libvirt we went the FUSE route for /proc/meminfo, given the
 kernel guys resistance to changing kernel code for this use case.

Thank you for the good pointers. In my case I think it will be better to 
support a kernel patch for the kernel.
But I would also try the procfs.

It is always a good idea to try to convince the kernel devs that something is 
useful :)


 Regards,
 Daniel



--
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] cgroup management daemon

2013-11-25 Thread Marian Marinov
On 11/26/2013 12:43 AM, Serge E. Hallyn wrote:
 Hi,

 as i've mentioned several times, I want to write a standalone cgroup
 management daemon.  Basic requirements are that it be a standalone
 program; that a single instance running on the host be usable from
 containers nested at any depth; that it not allow escaping ones
 assigned limits; that it not allow subjegating tasks which do not
 belong to you; and that, within your limits, you be able to parcel
 those limits to your tasks as you like.

 Additionally, Tejun has specified that we do not want users to be
 too closely tied to the cgroupfs implementation.  Therefore
 commands will be just a hair more general than specifying cgroupfs
 filenames and values.  I may go so far as to avoid specifying
 specific controllers, as AFAIK there should be no redundancy in
 features.  On the other hand, I don't want to get too general.
 So I'm basing the API loosely on the lmctfy command line API.

 One of the driving goals is to enable nested lxc as simply and safely as
 possible.  If this project is a success, then a large chunk of code can
 be removed from lxc.  I'm considering this project a part of the larger
 lxc project, but given how central it is to systems management that
 doesn't mean that I'll consider anyone else's needs as less important
 than our own.

 This document consists of two parts.  The first describes how I
 intend the daemon (cgmanager) to be structured and how it will
 enforce the safety requirements.  The second describes the commands
 which clients will be able to send to the manager.  The list of
 controller keys which can be set is very incomplete at this point,
 serving mainly to show the approach I was thinking of taking.

 Summary

 Each 'host' (identified by a separate instance of the linux kernel) will
 have exactly one running daemon to manage control groups.  This daemon
 will answer cgroup management requests over a dbus socket, located at
 /sys/fs/cgroup/manager.  This socket can be bind-mounted into various
 containers, so that one daemon can support the whole system.

 Programs will be able to make cgroup requests using dbus calls, or
 indirectly by linking against lmctfy which will be modified to use the
 dbus calls if available.

 Outline:
. A single manager, cgmanager, is started on the host, very early
  during boot.  It has very few dependencies, and requires only
  /proc, /run, and /sys to be mounted, with /etc ro.  It will mount
  the cgroup hierarchies in a private namespace and set defaults
  (clone_children, use_hierarchy, sane_behavior, release_agent?) It
  will open a socket at /sys/fs/cgroup/cgmanager (in a small tmpfs).
. A client (requestor 'r') can make cgroup requests over
  /sys/fs/cgroup/manager using dbus calls.  Detailed privilege
  requirements for r are listed below.
. The client request will pertain an existing or new cgroup A.  r's
  privilege over the cgroup must be checked.  r is said to have
  privilege over A if A is owned by r's uid, or if A's owner is mapped
  into r's user namespace, and r is root in that user namespace.
. The client request may pertain a victim task v, which may be moved
  to a new cgroup.  In that case r's privilege over both the cgroup
  and v must be checked.  r is said to have privilege over v if v
  is mapped in r's pid namespace, v's uid is mapped into r's user ns,
  and r is root in its userns.  Or if r and v have the same uid
  and v is mapped in r's pid namespace.
. r's credentials will be taken from socket's peercred, ensuring that
  pid and uid are translated.
. r passes PID(v) as a SCM_CREDENTIAL, so that cgmanager receives the
  translated global pid.  It will then read UID(v) from 
 /proc/PID(v)/status,
  which is the global uid, and check /proc/PID(r)/uid_map to see whether
  UID is mapped there.
. dbus-send can be enhanced to send a pid as SCM_CREDENTIAL to have
  the kernel translate it for the reader.  Only 'move task v to cgroup
  A' will require a SCM_CREDENTIAL to be sent.

 Privilege requirements by action:
  * Requestor of an action (r) over a socket may only make
changes to cgroups over which it has privilege.
  * Requestors may be limited to a certain #/depth of cgroups
(to limit memory usage) - DEFER?
  * Cgroup hierarchy is responsible for resource limits
  * A requestor must either be uid 0 in its userns with victim mapped
ito its userns, or the same uid and in same/ancestor pidns as the
victim
  * If r requests creation of cgroup '/x', /x will be interpreted
as relative to r's cgroup.  r cannot make changes to cgroups not
under its own current cgroup.
  * If r is not in the initial user_ns, then it may not change settings
in its own cgroup, only descendants.  (Not strictly necessary -
we could require the use of extra cgroups when wanted, as lxc does
 

Re: [lxc-devel] cgroup management daemon

2013-11-25 Thread Marian Marinov
On 11/26/2013 02:11 AM, Stéphane Graber wrote:
 On Tue, Nov 26, 2013 at 02:03:16AM +0200, Marian Marinov wrote:
 On 11/26/2013 12:43 AM, Serge E. Hallyn wrote:
 Hi,

 as i've mentioned several times, I want to write a standalone cgroup
 management daemon.  Basic requirements are that it be a standalone
 program; that a single instance running on the host be usable from
 containers nested at any depth; that it not allow escaping ones
 assigned limits; that it not allow subjegating tasks which do not
 belong to you; and that, within your limits, you be able to parcel
 those limits to your tasks as you like.

 Additionally, Tejun has specified that we do not want users to be
 too closely tied to the cgroupfs implementation.  Therefore
 commands will be just a hair more general than specifying cgroupfs
 filenames and values.  I may go so far as to avoid specifying
 specific controllers, as AFAIK there should be no redundancy in
 features.  On the other hand, I don't want to get too general.
 So I'm basing the API loosely on the lmctfy command line API.

 One of the driving goals is to enable nested lxc as simply and safely as
 possible.  If this project is a success, then a large chunk of code can
 be removed from lxc.  I'm considering this project a part of the larger
 lxc project, but given how central it is to systems management that
 doesn't mean that I'll consider anyone else's needs as less important
 than our own.

 This document consists of two parts.  The first describes how I
 intend the daemon (cgmanager) to be structured and how it will
 enforce the safety requirements.  The second describes the commands
 which clients will be able to send to the manager.  The list of
 controller keys which can be set is very incomplete at this point,
 serving mainly to show the approach I was thinking of taking.

 Summary

 Each 'host' (identified by a separate instance of the linux kernel) will
 have exactly one running daemon to manage control groups.  This daemon
 will answer cgroup management requests over a dbus socket, located at
 /sys/fs/cgroup/manager.  This socket can be bind-mounted into various
 containers, so that one daemon can support the whole system.

 Programs will be able to make cgroup requests using dbus calls, or
 indirectly by linking against lmctfy which will be modified to use the
 dbus calls if available.

 Outline:
 . A single manager, cgmanager, is started on the host, very early
   during boot.  It has very few dependencies, and requires only
   /proc, /run, and /sys to be mounted, with /etc ro.  It will mount
   the cgroup hierarchies in a private namespace and set defaults
   (clone_children, use_hierarchy, sane_behavior, release_agent?) It
   will open a socket at /sys/fs/cgroup/cgmanager (in a small tmpfs).
 . A client (requestor 'r') can make cgroup requests over
   /sys/fs/cgroup/manager using dbus calls.  Detailed privilege
   requirements for r are listed below.
 . The client request will pertain an existing or new cgroup A.  r's
   privilege over the cgroup must be checked.  r is said to have
   privilege over A if A is owned by r's uid, or if A's owner is mapped
   into r's user namespace, and r is root in that user namespace.
 . The client request may pertain a victim task v, which may be moved
   to a new cgroup.  In that case r's privilege over both the cgroup
   and v must be checked.  r is said to have privilege over v if v
   is mapped in r's pid namespace, v's uid is mapped into r's user ns,
   and r is root in its userns.  Or if r and v have the same uid
   and v is mapped in r's pid namespace.
 . r's credentials will be taken from socket's peercred, ensuring that
   pid and uid are translated.
 . r passes PID(v) as a SCM_CREDENTIAL, so that cgmanager receives the
   translated global pid.  It will then read UID(v) from 
 /proc/PID(v)/status,
   which is the global uid, and check /proc/PID(r)/uid_map to see whether
   UID is mapped there.
 . dbus-send can be enhanced to send a pid as SCM_CREDENTIAL to have
   the kernel translate it for the reader.  Only 'move task v to cgroup
   A' will require a SCM_CREDENTIAL to be sent.

 Privilege requirements by action:
   * Requestor of an action (r) over a socket may only make
 changes to cgroups over which it has privilege.
   * Requestors may be limited to a certain #/depth of cgroups
 (to limit memory usage) - DEFER?
   * Cgroup hierarchy is responsible for resource limits
   * A requestor must either be uid 0 in its userns with victim mapped
 ito its userns, or the same uid and in same/ancestor pidns as the
 victim
   * If r requests creation of cgroup '/x', /x will be interpreted
 as relative to r's cgroup.  r cannot make changes to cgroups not
 under its own current cgroup.
   * If r is not in the initial user_ns, then it may not change

[lxc-devel] LXC live migrate

2013-11-25 Thread Marian Marinov
Hey guys,
I just read on LWN about the checkpoint/restore tool:
   http://lwn.net/Articles/574917/

With this, it seams possible to freeze and restore a whole container from one 
node to another.

I'll give it a try this week to give more details on how it actually works.

Marian

--
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] LXC live migrate

2013-11-25 Thread Marian Marinov
On 11/26/2013 04:58 AM, Stéphane Graber wrote:
 On Tue, Nov 26, 2013 at 04:04:36AM +0200, Marian Marinov wrote:
 Hey guys,
 I just read on LWN about the checkpoint/restore tool:
 http://lwn.net/Articles/574917/

 With this, it seams possible to freeze and restore a whole container from 
 one node to another.

 I'll give it a try this week to give more details on how it actually works.

 Marian

 I think I last tried it with CRIU 0.8 without much success but I took an
 action item during Ubuntu's planning even last week to try with a newer
 release and get in touch with Pavel if I'm still having issues.

  From what we discussed at Linux Plumbers, CRIU should indeed let you
 dump a full container and restore it on the same machine or on another
 so long as the filesystem and any other external dependency of the
 container matches.

 If I can get this working and they've resolved a few of the known issues
 (specifically the fact that it'd only build on x86_64), then the plan is
 to add API calls to LXC's API that'll implement the checkpoint/restore
 feature using CRIU.

I'm going to test this today on CentOS 6 with kernel 3.12. So if you want, you 
can wait for my results :)

Marian

--
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] LXC live migrate

2013-11-26 Thread Marian Marinov
On 11/26/2013 05:29 PM, Dwight Engen wrote:
 On Mon, 25 Nov 2013 21:58:13 -0500
 Stéphane Graber stgra...@ubuntu.com wrote:

 On Tue, Nov 26, 2013 at 04:04:36AM +0200, Marian Marinov wrote:
 Hey guys,
 I just read on LWN about the checkpoint/restore tool:
 http://lwn.net/Articles/574917/

 With this, it seams possible to freeze and restore a whole
 container from one node to another.

 I'll give it a try this week to give more details on how it
 actually works.

 Marian

 I think I last tried it with CRIU 0.8 without much success but I took
 an action item during Ubuntu's planning even last week to try with a
 newer release and get in touch with Pavel if I'm still having issues.

 Hi all,

 I also started looking into this (just trying to dump a simple busybox
 container) and the first thing I ran into is that criu can't dump
 init's fd 0 - /dev/zero. I believe this is because that inode is
 outside the container (ie. its the hosts' /dev/zero). I'm looking into
 having lxc_start open std[in,out,err] in do_start after it has cloned
 into the namespace. This means the container would have to have
 a /dev/zero and /dev/null.

On my test setup it works for processes like apache, dovecot and mysql.

However it does not work with containers:

root@s321:~# criu dump -D deb1 -t 19332 --file-locks
(00.004962) Error (namespaces.c:155): Can't dump nested pid namespace for 28352
(00.004985) Error (namespaces.c:321): Can't make pidns id
(00.005327) Error (cr-dump.c:1811): Dumping FAILED.
root@s321:~#
When I try to dump the init process(which I believe I should not do), here is 
what I see:
   http://pastebin.com/DFC0ADpp

(00.291294) Error (tty.c:222): tty: Unexpected format on path /dev/tty1
(00.291315) Error (cr-dump.c:1491): Dump files (pid: 29702) failed with -1
(00.291892) Error (cr-dump.c:1811): Dumping FAILED.

This is my setup:
19332 ?Ss 0:00 lxc-start -n deb1 -d
28352 ?Ss 0:00  \_ init [3]
28393 ?Ss 0:00  \_ /usr/sbin/apache2 -k start
28419 ?S  0:00  |   \_ /usr/sbin/apache2 -k start
28422 ?Sl 0:00  |   \_ /usr/sbin/apache2 -k start
28423 ?Sl 0:00  |   \_ /usr/sbin/apache2 -k start
28489 ?S  0:00  \_ /bin/sh /usr/bin/mysqld_safe
28620 ?Sl 0:00  |   \_ /usr/sbin/mysqld --basedir=/usr 
--datadir=/var/lib/mysql --user=mysql 
--pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock 
--port
28621 ?S  0:00  |   \_ logger -t mysqld -p daemon.error
28598 ?Ss 0:00  \_ /usr/sbin/sshd
29702 pts/0Ss+0:00  \_ /sbin/getty 38400 tty1 linux

I rebooted the container without getty on tty1 and then I got this:

(00.260757) Error (mount.c:255): 86:/dev/tty4 doesn't have a proper root mount
(00.261007) Error (namespaces.c:445): Namespaces dumping finished with error 
65280
(00.261454) Error (cr-dump.c:1811): Dumping FAILED.

This ithe relevant container config
## Device config
lxc.cgroup.devices.deny = a
# /dev/null and zero
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
# consoles
lxc.cgroup.devices.allow = c 5:1 rwm
lxc.cgroup.devices.allow = c 5:0 rwm
lxc.cgroup.devices.allow = c 4:0 rwm
lxc.cgroup.devices.allow = c 4:1 rwm
# /dev/{,u}random
lxc.cgroup.devices.allow = c 1:9 rwm
lxc.cgroup.devices.allow = c 1:8 rwm
lxc.cgroup.devices.allow = c 136:* rwm
lxc.cgroup.devices.allow = c 5:2 rwm
# rtc
lxc.cgroup.devices.allow = c 254:0 rm

# mounts point
lxc.mount.entry = devpts dev/pts devpts gid=5,mode=620 0 0
lxc.mount.auto = proc:mixed sys:ro


Am I doing something wrong?

Marian


  From what we discussed at Linux Plumbers, CRIU should indeed let you
 dump a full container and restore it on the same machine or on another
 so long as the filesystem and any other external dependency of the
 container matches.

 If I can get this working and they've resolved a few of the known
 issues (specifically the fact that it'd only build on x86_64), then
 the plan is to add API calls to LXC's API that'll implement the
 checkpoint/restore feature using CRIU.

 Assuming we can get it to work, I think we'd rather link to some sort
 of libcriu than to system() out to criu? If that is the case I think
 we'll need to do a bit of packaging work to make such a lib in crtools.




--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel


Re: [lxc-devel] [CRIU] LXC live migrate

2013-11-27 Thread Marian Marinov
On 11/27/2013 10:54 AM, Pavel Emelyanov wrote:
 On 11/27/2013 06:19 AM, Qiang Huang wrote:
 On 2013/11/27 0:19, Marian Marinov wrote:

 On my test setup it works for processes like apache, dovecot and mysql.

 However it does not work with containers:

 root@s321:~# criu dump -D deb1 -t 19332 --file-locks
 (00.004962) Error (namespaces.c:155): Can't dump nested pid namespace for 
 28352
 (00.004985) Error (namespaces.c:321): Can't make pidns id
 (00.005327) Error (cr-dump.c:1811): Dumping FAILED.
 root@s321:~#
 When I try to dump the init process(which I believe I should not do), here 
 is what I see:
 http://pastebin.com/DFC0ADpp

 (00.291294) Error (tty.c:222): tty: Unexpected format on path /dev/tty1
 (00.291315) Error (cr-dump.c:1491): Dump files (pid: 29702) failed with -1
 (00.291892) Error (cr-dump.c:1811): Dumping FAILED.

 This is my setup:
 19332 ?Ss 0:00 lxc-start -n deb1 -d
 28352 ?Ss 0:00  \_ init [3]
 28393 ?Ss 0:00  \_ /usr/sbin/apache2 -k start
 28419 ?S  0:00  |   \_ /usr/sbin/apache2 -k start
 28422 ?Sl 0:00  |   \_ /usr/sbin/apache2 -k start
 28423 ?Sl 0:00  |   \_ /usr/sbin/apache2 -k start
 28489 ?S  0:00  \_ /bin/sh /usr/bin/mysqld_safe
 28620 ?Sl 0:00  |   \_ /usr/sbin/mysqld --basedir=/usr 
 --datadir=/var/lib/mysql --user=mysql
 --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock 
 --port
 28621 ?S  0:00  |   \_ logger -t mysqld -p daemon.error
 28598 ?Ss 0:00  \_ /usr/sbin/sshd
 29702 pts/0Ss+0:00  \_ /sbin/getty 38400 tty1 linux

 I rebooted the container without getty on tty1 and then I got this:

 (00.260757) Error (mount.c:255): 86:/dev/tty4 doesn't have a proper root 
 mount

 This is the reason. That's container's console which is a bind mounted tty 
 from
 the host. And since this is an external connection, CRIU doesn't dump one.

 There are two ways to resolve this. The first is disable container's console. 
 It's
 fast, but ugly. The second way is supporting one, but it would require criu 
 hacking.
 We should detect, that this is an external tty, decide, that we're OK to 
 disconnect
 it after dump and on restore -- connect it back.

The ugly fix does not work either. Because even if you comment the lxc.tty 
option criu is complaining:

(00.243390) Error (mount.c:255): 82:/dev/console doesn't have a proper root 
mount
(00.243626) Error (namespaces.c:445): Namespaces dumping finished with error 
65280
(00.244029) Error (cr-dump.c:1811): Dumping FAILED.


 (00.261007) Error (namespaces.c:445): Namespaces dumping finished with 
 error 65280
 (00.261454) Error (cr-dump.c:1811): Dumping FAILED.

 This ithe relevant container config
 ## Device config
 lxc.cgroup.devices.deny = a
 # /dev/null and zero
 lxc.cgroup.devices.allow = c 1:3 rwm
 lxc.cgroup.devices.allow = c 1:5 rwm
 # consoles
 lxc.cgroup.devices.allow = c 5:1 rwm
 lxc.cgroup.devices.allow = c 5:0 rwm
 lxc.cgroup.devices.allow = c 4:0 rwm
 lxc.cgroup.devices.allow = c 4:1 rwm
 # /dev/{,u}random
 lxc.cgroup.devices.allow = c 1:9 rwm
 lxc.cgroup.devices.allow = c 1:8 rwm
 lxc.cgroup.devices.allow = c 136:* rwm
 lxc.cgroup.devices.allow = c 5:2 rwm
 # rtc
 lxc.cgroup.devices.allow = c 254:0 rm

 # mounts point
 lxc.mount.entry = devpts dev/pts devpts gid=5,mode=620 0 0
 lxc.mount.auto = proc:mixed sys:ro


 Am I doing something wrong?

 According to the criu TODO list: http://criu.org/Todo
 cgroups in container is not supported yet, so I doubt it would work for
 normal containers.

Pavel, can you give me some pointers for this? I would be interested in helping 
out with this part.


 AFAIK cgroups are used _inside_ containers only with recent guest templates.
 In OpenVZ we use more old ones (and more stable) so haven't meet this yet.
 And yes, cgroups are in plans for the nearest future :)

 I'm interested in this too, so let's cc CRIU list and find out what is wrong 
 :)


 Marian


 Thanks,
 Pavel





--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk
___
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel