Re: [systemd-devel] getty : how to run getty on every ttyX

2013-12-15 Thread Gao feng
On 12/14/2013 12:20 AM, Lennart Poettering wrote:
 On Fri, 13.12.13 16:15, Lennart Poettering (lenn...@poettering.net) wrote:
 
 We had discussed this back at Linux Plumbers last year, and at the time
 you had suggested that rather than create /dev/ttyN symlinks we should
 instead do something like  /dev/containerttyN instead, and set a
 'container_tty' variable containing a list of all those device names
 so that systemd can discover them sensibly. We never got around to
 doing this from the libvirt side, and AFAIK systemd hasn't done anything
 on its side either. So is this still a suitable way forward ?

 Yeah, I am pretty sure that's what we should do. I figure I should hack
 that up. I'll work on it now.
 
 Committed. systemd-getty-generator will now look for $container_ttys
 set as an environment variable for PID 1. If that is set it will split
 the string up on whitespaces and start a getty on all ptys
 referenced. Note that this only supports ptys, not any other ttys. 
 
 Example:
 
 container_ttys=pts/5 pts/8 pts/15
 
 when pass to PID 1 will spawn three additional gettys on ptys 5, 8 and
 15.
 
 Note that this *really* only supports ptys, not any other kinds of ttys,
 sinc for those we require propery device enumeration and notification
 and we don't have those in containers... I still chose to name this
 $container_ttys rather than $container_ptys, so that maybe one day we
 can extend it should devices like this ever get virtualized.
 
 This will be in systemd 209.
 

So quickly! thanks you guys!


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] getty : how to run getty on every ttyX

2013-12-13 Thread Gao feng
Hi,

As we know, systemd only forks getty on ttyX when we press ctrl + alt + FX.
I whould like to let systemd forks server gettys on all of tty deivces by 
default.
this is very useful in container environment, since we can't use ctrl+alt+FX
to trigger getty in container.

If there are some configurations can achieve this,it whould be great!

Thanks!
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] getty : how to run getty on every ttyX

2013-12-13 Thread Gao feng
Hello Mantas,

On 12/13/2013 04:17 PM, Mantas MikulÄ—nas wrote:
 
 On Dec 13, 2013 10:09 AM, Gao feng gaof...@cn.fujitsu.com 
 mailto:gaof...@cn.fujitsu.com wrote:

 Hi,

 As we know, systemd only forks getty on ttyX when we press ctrl + alt + FX.
 I whould like to let systemd forks server gettys on all of tty deivces by 
 default.
 this is very useful in container environment, since we can't use ctrl+alt+FX
 to trigger getty in container.
 
 ...do containers even have such devices?
 

pts device ;)

 Anyway, just enable more instances of getty@.service for all devices you 
 need, just like getty@tty1.service is started by default.
 
 The autostart that you mention is part of logind and all it does is just 
 start the same services via systemd, no magic.
 

getty@tty1.service under /etc/systemd/system/getty.target.wants/ is linked to 
/usr/lib/systemd/system/getty@.service,
so I create getty@tty2.service which links to 
/usr/lib/systemd/system/getty@.service too. is this right?

In libvirt lxc, the ttyX actually is pts devices.

[root@localhost getty.target.wants]# ll
total 0
lrwxrwxrwx 1 root root 38 Dec 13 02:49 getty@tty1.service - 
/usr/lib/systemd/system/getty@.service
lrwxrwxrwx 1 root root 38 Dec 13 03:22 getty@tty2.service - 
/usr/lib/systemd/system/getty@.service

seems like in my container, agetty listens on /dev/console, not tty1 or tty2
/sbin/agetty --noclear --keep-baud console 115200 38400 9600

it seems getty-generator does the extra job.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH v2 1/4] cgroup: add the setting memory.use_hierarchy support for systemd

2013-09-25 Thread Gao feng
On 09/25/2013 08:23 AM, Lennart Poettering wrote:
 On Tue, 24.09.13 08:36, Gao feng (gaof...@cn.fujitsu.com) wrote:
 
 Some programs need to set the memory.use_hierarchy(such as libvirt),
 Add this feature.

 As mentioned already: this really shouldn't be configurable but simply
 the unconditional configuration that systemd requires. I have hence now
 commited a different patch that strictly enables this option at
 boot-time in the root memory hierarchy and does not turn this into yet
 another user config switch.

 So administrator has no rights to set use_hierarchy to false?
 
 Yes. use_hierarchy=0 is a bad idea, and exists only for legacy
 reasons. Since we kinda break compat with the systemd-based cgroup
 rework this should not exist in systemd ever, especially since Tejun
 already clearly said that this option will go away in the kernel too,
 and will be implicitly on.
 
 This patchset enables use_hierarchy for unit by default. but give
 users the rights to disable this.
 
 We should never blindly add more options for users, unless the interface
 is known to be useful and nicely designed, and if we know the setting
 has a future. This option is both badly designed on the kernel side and
 has no future on the kernel side hence it really doesn't belong in
 systemd's interfaces.

Thanks you guys.

Seems I need to read the future works of cgroup. :)

Thanks
Gao
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH v2 2/4] cgroup: show MemoryHierarchy in cgroup_context_dump

2013-09-23 Thread Gao feng
Show MemoryHierarchy in cgroup_context_dump.
---
 src/core/cgroup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/core/cgroup.c b/src/core/cgroup.c
index 68615c3..362b074 100644
--- a/src/core/cgroup.c
+++ b/src/core/cgroup.c
@@ -92,6 +92,7 @@ void cgroup_context_dump(CGroupContext *c, FILE* f, const 
char *prefix) {
 %sCPUAccounting=%s\n
 %sBlockIOAccounting=%s\n
 %sMemoryAccounting=%s\n
+%sMemoryHierarchy=%s\n
 %sCPUShares=%lu\n
 %sBlockIOWeight=%lu\n
 %sMemoryLimit=% PRIu64 \n
@@ -99,6 +100,7 @@ void cgroup_context_dump(CGroupContext *c, FILE* f, const 
char *prefix) {
 prefix, yes_no(c-cpu_accounting),
 prefix, yes_no(c-blockio_accounting),
 prefix, yes_no(c-memory_accounting),
+prefix, yes_no(c-memory_hierarchy),
 prefix, c-cpu_shares,
 prefix, c-blockio_weight,
 prefix, c-memory_limit,
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH v2 1/4] cgroup: add the setting memory.use_hierarchy support for systemd

2013-09-23 Thread Gao feng
Some programs need to set the memory.use_hierarchy(such as libvirt),
Add this feature.
---
 src/core/cgroup.c |  6 ++
 src/core/cgroup.h |  1 +
 src/core/dbus-cgroup.c| 16 
 src/core/load-fragment-gperf.gperf.m4 |  1 +
 4 files changed, 24 insertions(+)

diff --git a/src/core/cgroup.c b/src/core/cgroup.c
index d10f205..68615c3 100644
--- a/src/core/cgroup.c
+++ b/src/core/cgroup.c
@@ -32,6 +32,7 @@ void cgroup_context_init(CGroupContext *c) {
 /* Initialize everything to the kernel defaults, assuming the
  * structure is preinitialized to 0 */
 
+c-memory_hierarchy = true;
 c-cpu_shares = 1024;
 c-memory_limit = (uint64_t) -1;
 c-blockio_weight = 1000;
@@ -263,6 +264,11 @@ void cgroup_context_apply(CGroupContext *c, 
CGroupControllerMask mask, const cha
 
 if (r  0)
 log_error(Failed to set memory.limit_in_bytes on %s: 
%s, path, strerror(-r));
+
+r = cg_set_attribute(memory, path, memory.use_hierarchy, 
c-memory_hierarchy ? 1 : 0);
+
+if (r  0)
+log_error(Failed to set memory.use_hierarchy on %s: 
%s, path, strerror(-r));
 }
 
 if (mask  CGROUP_DEVICE) {
diff --git a/src/core/cgroup.h b/src/core/cgroup.h
index 0a079e9..f87c16a 100644
--- a/src/core/cgroup.h
+++ b/src/core/cgroup.h
@@ -69,6 +69,7 @@ struct CGroupContext {
 bool cpu_accounting;
 bool blockio_accounting;
 bool memory_accounting;
+bool memory_hierarchy;
 
 unsigned long cpu_shares;
 
diff --git a/src/core/dbus-cgroup.c b/src/core/dbus-cgroup.c
index 9ebcad9..a8a5f81 100644
--- a/src/core/dbus-cgroup.c
+++ b/src/core/dbus-cgroup.c
@@ -132,6 +132,7 @@ const BusProperty bus_cgroup_context_properties[] = {
 { BlockIOReadBandwidth,bus_cgroup_append_device_bandwidths, 
a(st), 0   },
 { BlockIOWriteBandwidth,   bus_cgroup_append_device_bandwidths, 
a(st), 0   },
 { MemoryAccounting,bus_property_append_bool,b, 
offsetof(CGroupContext, memory_accounting)  },
+{ MemoryHierarchy, bus_property_append_bool,b, 
offsetof(CGroupContext, memory_hierarchy)   },
 { MemoryLimit, bus_property_append_uint64,  t, 
offsetof(CGroupContext, memory_limit)   },
 { DevicePolicy,bus_cgroup_append_device_policy, s, 
offsetof(CGroupContext, device_policy)  },
 { DeviceAllow, bus_cgroup_append_device_allow,  
a(ss), 0   },
@@ -417,6 +418,21 @@ int bus_cgroup_set_property(
 
 return 1;
 
+} else if (streq(name, MemoryHierarchy)) {
+
+if (dbus_message_iter_get_arg_type(i) != DBUS_TYPE_BOOLEAN)
+return -EINVAL;
+
+if (mode != UNIT_CHECK) {
+dbus_bool_t b;
+dbus_message_iter_get_basic(i, b);
+
+c-memory_hierarchy = b;
+unit_write_drop_in_private(u, mode, name, b ? 
MemoryHierarchy=yes : MemoryHierarchy=no);
+}
+
+return 1;
+
 } else if (streq(name, MemoryLimit)) {
 
 if (dbus_message_iter_get_arg_type(i) != DBUS_TYPE_UINT64)
diff --git a/src/core/load-fragment-gperf.gperf.m4 
b/src/core/load-fragment-gperf.gperf.m4
index 25bd3aa..b43ca05 100644
--- a/src/core/load-fragment-gperf.gperf.m4
+++ b/src/core/load-fragment-gperf.gperf.m4
@@ -88,6 +88,7 @@ m4_define(`CGROUP_CONTEXT_CONFIG_ITEMS',
 $1.CPUAccounting,config_parse_bool,  0,
 offsetof($1, cgroup_context.cpu_accounting)
 $1.CPUShares,config_parse_cpu_shares,0,
 offsetof($1, cgroup_context)
 $1.MemoryAccounting, config_parse_bool,  0,
 offsetof($1, cgroup_context.memory_accounting)
+$1.MemoryHierarchy,  config_parse_bool,  0,
 offsetof($1, cgroup_context.memory_hierarchy)
 $1.MemoryLimit,  config_parse_memory_limit,  0,
 offsetof($1, cgroup_context)
 $1.DeviceAllow,  config_parse_device_allow,  0,
 offsetof($1, cgroup_context)
 $1.DevicePolicy, config_parse_device_policy, 0,
 offsetof($1, cgroup_context.device_policy)
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH v2 3/4] systemctl: support setting MemoryHierarchy through systemctl

2013-09-23 Thread Gao feng
support setting MemoryHierarchy of unit through systemctl.
---
 src/systemctl/systemctl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/systemctl/systemctl.c b/src/systemctl/systemctl.c
index 8b9183d..7d251a0 100644
--- a/src/systemctl/systemctl.c
+++ b/src/systemctl/systemctl.c
@@ -3644,6 +3644,7 @@ static int append_assignment(DBusMessageIter *iter, const 
char *assignment) {
 
 if (streq(field, CPUAccounting) ||
 streq(field, MemoryAccounting) ||
+streq(field, MemoryHierarchy) ||
 streq(field, BlockIOAccounting)) {
 dbus_bool_t b;
 
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH v2 4/4] man: add manpage for MemoryHierarchy

2013-09-23 Thread Gao feng
---
 man/systemd.cgroup.xml | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/man/systemd.cgroup.xml b/man/systemd.cgroup.xml
index ac58962..f347f1b 100644
--- a/man/systemd.cgroup.xml
+++ b/man/systemd.cgroup.xml
@@ -135,6 +135,17 @@ along with systemd; If not, see 
http://www.gnu.org/licenses/.
   /varlistentry
 
   varlistentry
+termvarnameMemoryHierarchy=/varname/term
+
+listitem
+  paraTurn on hierarchical accounting and reclaim for this unit.
+  Takes a boolean argument. Note that turning on hierarchical 
accounting
+  for on unit will also implicitly turn it on for all units contained
+  therein. This is enabled by default./para
+/listitem
+  /varlistentry
+
+  varlistentry
 
termvarnameMemoryLimit=replaceablebytes/replaceable/varname/term
 
 listitem
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH v2 1/4] cgroup: add the setting memory.use_hierarchy support for systemd

2013-09-23 Thread Gao feng
On 09/24/2013 05:06 AM, Lennart Poettering wrote:
 On Mon, 23.09.13 14:30, Gao feng (gaof...@cn.fujitsu.com) wrote:
 
 Some programs need to set the memory.use_hierarchy(such as libvirt),
 Add this feature.
 
 As mentioned already: this really shouldn't be configurable but simply
 the unconditional configuration that systemd requires. I have hence now
 commited a different patch that strictly enables this option at
 boot-time in the root memory hierarchy and does not turn this into yet
 another user config switch.

So administrator has no rights to set use_hierarchy to false?
This patchset enables use_hierarchy for unit by default. but give
users the rights to disable this.

 
 THis is what Tejun suggested to do and is also what makes the most sense
 to me. cgroups after all are organized logically in a tree and it is
 simply contradictory ignore this all and pretend everything was flat...
 
 ---
  src/core/cgroup.c |  6 ++
  src/core/cgroup.h |  1 +
  src/core/dbus-cgroup.c| 16 
  src/core/load-fragment-gperf.gperf.m4 |  1 +
  4 files changed, 24 insertions(+)

 diff --git a/src/core/cgroup.c b/src/core/cgroup.c
 index d10f205..68615c3 100644
 --- a/src/core/cgroup.c
 +++ b/src/core/cgroup.c
 @@ -32,6 +32,7 @@ void cgroup_context_init(CGroupContext *c) {
  /* Initialize everything to the kernel defaults, assuming the
   * structure is preinitialized to 0 */
  
 +c-memory_hierarchy = true;
  c-cpu_shares = 1024;
  c-memory_limit = (uint64_t) -1;
  c-blockio_weight = 1000;
 @@ -263,6 +264,11 @@ void cgroup_context_apply(CGroupContext *c, 
 CGroupControllerMask mask, const cha
  
  if (r  0)
  log_error(Failed to set memory.limit_in_bytes on 
 %s: %s, path, strerror(-r));
 +
 +r = cg_set_attribute(memory, path, 
 memory.use_hierarchy, c-memory_hierarchy ? 1 : 0);
 +
 +if (r  0)
 +log_error(Failed to set memory.use_hierarchy on 
 %s: %s, path, strerror(-r));
  }
  
  if (mask  CGROUP_DEVICE) {
 diff --git a/src/core/cgroup.h b/src/core/cgroup.h
 index 0a079e9..f87c16a 100644
 --- a/src/core/cgroup.h
 +++ b/src/core/cgroup.h
 @@ -69,6 +69,7 @@ struct CGroupContext {
  bool cpu_accounting;
  bool blockio_accounting;
  bool memory_accounting;
 +bool memory_hierarchy;
  
  unsigned long cpu_shares;
  
 diff --git a/src/core/dbus-cgroup.c b/src/core/dbus-cgroup.c
 index 9ebcad9..a8a5f81 100644
 --- a/src/core/dbus-cgroup.c
 +++ b/src/core/dbus-cgroup.c
 @@ -132,6 +132,7 @@ const BusProperty bus_cgroup_context_properties[] = {
  { BlockIOReadBandwidth,bus_cgroup_append_device_bandwidths, 
 a(st), 0   },
  { BlockIOWriteBandwidth,   bus_cgroup_append_device_bandwidths, 
 a(st), 0   },
  { MemoryAccounting,bus_property_append_bool,
 b, offsetof(CGroupContext, memory_accounting)  },
 +{ MemoryHierarchy, bus_property_append_bool,
 b, offsetof(CGroupContext, memory_hierarchy)   },
  { MemoryLimit, bus_property_append_uint64,  
 t, offsetof(CGroupContext, memory_limit)   },
  { DevicePolicy,bus_cgroup_append_device_policy, 
 s, offsetof(CGroupContext, device_policy)  },
  { DeviceAllow, bus_cgroup_append_device_allow,  
 a(ss), 0   },
 @@ -417,6 +418,21 @@ int bus_cgroup_set_property(
  
  return 1;
  
 +} else if (streq(name, MemoryHierarchy)) {
 +
 +if (dbus_message_iter_get_arg_type(i) != DBUS_TYPE_BOOLEAN)
 +return -EINVAL;
 +
 +if (mode != UNIT_CHECK) {
 +dbus_bool_t b;
 +dbus_message_iter_get_basic(i, b);
 +
 +c-memory_hierarchy = b;
 +unit_write_drop_in_private(u, mode, name, b ? 
 MemoryHierarchy=yes : MemoryHierarchy=no);
 +}
 +
 +return 1;
 +
  } else if (streq(name, MemoryLimit)) {
  
  if (dbus_message_iter_get_arg_type(i) != DBUS_TYPE_UINT64)
 diff --git a/src/core/load-fragment-gperf.gperf.m4 
 b/src/core/load-fragment-gperf.gperf.m4
 index 25bd3aa..b43ca05 100644
 --- a/src/core/load-fragment-gperf.gperf.m4
 +++ b/src/core/load-fragment-gperf.gperf.m4
 @@ -88,6 +88,7 @@ m4_define(`CGROUP_CONTEXT_CONFIG_ITEMS',
  $1.CPUAccounting,config_parse_bool,  0, 
 offsetof($1, cgroup_context.cpu_accounting)
  $1.CPUShares,config_parse_cpu_shares,0, 
 offsetof($1, cgroup_context)
  $1.MemoryAccounting

Re: [systemd-devel] [PATCH v2 1/4] cgroup: add the setting memory.use_hierarchy support for systemd

2013-09-23 Thread Gao feng
On 09/24/2013 08:36 AM, Gao feng wrote:
 On 09/24/2013 05:06 AM, Lennart Poettering wrote:
 On Mon, 23.09.13 14:30, Gao feng (gaof...@cn.fujitsu.com) wrote:

 Some programs need to set the memory.use_hierarchy(such as libvirt),
 Add this feature.

 As mentioned already: this really shouldn't be configurable but simply
 the unconditional configuration that systemd requires. I have hence now
 commited a different patch that strictly enables this option at
 boot-time in the root memory hierarchy and does not turn this into yet
 another user config switch.
 
 So administrator has no rights to set use_hierarchy to false?

If so, there is no need for kernel to export this file to user space.

 This patchset enables use_hierarchy for unit by default. but give
 users the rights to disable this.
 

 THis is what Tejun suggested to do and is also what makes the most sense
 to me. cgroups after all are organized logically in a tree and it is
 simply contradictory ignore this all and pretend everything was flat...

 ---
  src/core/cgroup.c |  6 ++
  src/core/cgroup.h |  1 +
  src/core/dbus-cgroup.c| 16 
  src/core/load-fragment-gperf.gperf.m4 |  1 +
  4 files changed, 24 insertions(+)

 diff --git a/src/core/cgroup.c b/src/core/cgroup.c
 index d10f205..68615c3 100644
 --- a/src/core/cgroup.c
 +++ b/src/core/cgroup.c
 @@ -32,6 +32,7 @@ void cgroup_context_init(CGroupContext *c) {
  /* Initialize everything to the kernel defaults, assuming the
   * structure is preinitialized to 0 */
  
 +c-memory_hierarchy = true;
  c-cpu_shares = 1024;
  c-memory_limit = (uint64_t) -1;
  c-blockio_weight = 1000;
 @@ -263,6 +264,11 @@ void cgroup_context_apply(CGroupContext *c, 
 CGroupControllerMask mask, const cha
  
  if (r  0)
  log_error(Failed to set memory.limit_in_bytes on 
 %s: %s, path, strerror(-r));
 +
 +r = cg_set_attribute(memory, path, 
 memory.use_hierarchy, c-memory_hierarchy ? 1 : 0);
 +
 +if (r  0)
 +log_error(Failed to set memory.use_hierarchy on 
 %s: %s, path, strerror(-r));
  }
  
  if (mask  CGROUP_DEVICE) {
 diff --git a/src/core/cgroup.h b/src/core/cgroup.h
 index 0a079e9..f87c16a 100644
 --- a/src/core/cgroup.h
 +++ b/src/core/cgroup.h
 @@ -69,6 +69,7 @@ struct CGroupContext {
  bool cpu_accounting;
  bool blockio_accounting;
  bool memory_accounting;
 +bool memory_hierarchy;
  
  unsigned long cpu_shares;
  
 diff --git a/src/core/dbus-cgroup.c b/src/core/dbus-cgroup.c
 index 9ebcad9..a8a5f81 100644
 --- a/src/core/dbus-cgroup.c
 +++ b/src/core/dbus-cgroup.c
 @@ -132,6 +132,7 @@ const BusProperty bus_cgroup_context_properties[] = {
  { BlockIOReadBandwidth,bus_cgroup_append_device_bandwidths, 
 a(st), 0   },
  { BlockIOWriteBandwidth,   bus_cgroup_append_device_bandwidths, 
 a(st), 0   },
  { MemoryAccounting,bus_property_append_bool,
 b, offsetof(CGroupContext, memory_accounting)  },
 +{ MemoryHierarchy, bus_property_append_bool,
 b, offsetof(CGroupContext, memory_hierarchy)   },
  { MemoryLimit, bus_property_append_uint64,  
 t, offsetof(CGroupContext, memory_limit)   },
  { DevicePolicy,bus_cgroup_append_device_policy, 
 s, offsetof(CGroupContext, device_policy)  },
  { DeviceAllow, bus_cgroup_append_device_allow,  
 a(ss), 0   },
 @@ -417,6 +418,21 @@ int bus_cgroup_set_property(
  
  return 1;
  
 +} else if (streq(name, MemoryHierarchy)) {
 +
 +if (dbus_message_iter_get_arg_type(i) != DBUS_TYPE_BOOLEAN)
 +return -EINVAL;
 +
 +if (mode != UNIT_CHECK) {
 +dbus_bool_t b;
 +dbus_message_iter_get_basic(i, b);
 +
 +c-memory_hierarchy = b;
 +unit_write_drop_in_private(u, mode, name, b ? 
 MemoryHierarchy=yes : MemoryHierarchy=no);
 +}
 +
 +return 1;
 +
  } else if (streq(name, MemoryLimit)) {
  
  if (dbus_message_iter_get_arg_type(i) != DBUS_TYPE_UINT64)
 diff --git a/src/core/load-fragment-gperf.gperf.m4 
 b/src/core/load-fragment-gperf.gperf.m4
 index 25bd3aa..b43ca05 100644
 --- a/src/core/load-fragment-gperf.gperf.m4
 +++ b/src/core/load-fragment-gperf.gperf.m4
 @@ -88,6 +88,7 @@ m4_define(`CGROUP_CONTEXT_CONFIG_ITEMS',
  $1.CPUAccounting,config_parse_bool,  0,
  offsetof($1, cgroup_context.cpu_accounting)
  $1.CPUShares

Re: [systemd-devel] Question: who should set up the cpuset cgroup

2013-09-22 Thread Gao feng
On 09/18/2013 12:56 AM, Lennart Poettering wrote:
 On Tue, 17.09.13 17:45, Lennart Poettering (lenn...@poettering.net) wrote:
 
 1, set up when systemd creates cpuset cgroup for machine.slice?
 2, the programs which create the machines?
 3, other ideas? kernel?

 I have the suspicion that we need to propagate this down from the root
 of the hierarchy as soon as a leaf sets a cpu set.

 Let's say the root has enabled all cpus (which is the default after
 all), and you enable only a subset for a leaf 2 levels down, but do not
 specify anything for the slice that is in the middle, then the cpuset of
 the root should be propagated down.
   
 Consider this hierarchy:

 -.slice 
/   \
   / \
 machine.slicesystemd.slice
  |
  |
 machine-customer1.slice
  |
  |
  a.scope

 If a.scope gets a cpuset assigned, and neither machine.slice nor
 machine-customer1.slice have any, then the cpuset of -.slice should be
 inherited into machine.slice and then further into
 machine-customer1.slice, if you follow what I mean?
 
 So I talked to Tejun about this too, and he said that cpuset is too
 broken in the kernel for us to wrap it right now in systemd. As soon as
 the kernel is fixed it will provide a proper propagation logic in the
 kernel natively anyway, so we shouldn't bother with this in userspace.
 

Glad to hear this. :)
Thanks!

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 1/5] cgroup: add the setting memory.use_hierarchy support for systemd

2013-09-15 Thread Gao feng
On 09/13/2013 08:26 PM, Lennart Poettering wrote:
 On Fri, 13.09.13 17:49, Gao feng (gaof...@cn.fujitsu.com) wrote:
 
 Some programs need to set the memory.use_hierarchy(such as libvirt),
 Add this feature.
 
 Hmm, should this really be an option? Shouldn't we much rather turn this
 on unconditionally if the memory controller is used for a unit?
 
 This appears like an option that would go a way in Tejun's new sane
 cgroup tree logic anyway, no?
 

So, systemd has no job to deal with memory hierarchy? the kernel will
support memory hierarchy unconditionally? right?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] cgroup: add missing equals for BlockIOWeight

2013-09-15 Thread Gao feng
---
 src/core/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/core/cgroup.c b/src/core/cgroup.c
index 1f41efc..9277dd6 100644
--- a/src/core/cgroup.c
+++ b/src/core/cgroup.c
@@ -92,7 +92,7 @@ void cgroup_context_dump(CGroupContext *c, FILE* f, const 
char *prefix) {
 %sBlockIOAccounting=%s\n
 %sMemoryAccounting=%s\n
 %sCPUShares=%lu\n
-%sBlockIOWeight%lu\n
+%sBlockIOWeight=%lu\n
 %sMemoryLimit=% PRIu64 \n
 %sMemorySoftLimit=% PRIu64 \n
 %sDevicePolicy=%s\n,
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Question: who should set up the cpuset cgroup

2013-09-15 Thread Gao feng
Hi

I'm working on adding set cpuset cgroup support for systemd.
this work almost finished. but I faced a problem.

As we know, if we create a machine,the cgroup for this machine
will be created as well. and the cgroup of this machine is a
sub-directory of machine.slice(or other slices..). And if the
cpuset of machine.slice is unset. the setting of cpuset for this
machine will failed. since the cpuset.mems and cpuset.cpus of
machine.slice is null.

The question is who should set up the cpuset cgroup of machine.slice?

1, set up when systemd creates cpuset cgroup for machine.slice?
2, the programs which create the machines?
3, other ideas? kernel?

Thanks!
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] cgroup: fix incorrectly setting memory cgroup

2013-09-13 Thread Gao feng
If the memory_limit of unit is -1, we should write -1
to the file memory.limit_in_bytes. not the (unit64_t) -1.

otherwise the memory.limit_in_bytes will be set to zero.
---
 src/core/cgroup.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/core/cgroup.c b/src/core/cgroup.c
index aee93ba..244baff 100644
--- a/src/core/cgroup.c
+++ b/src/core/cgroup.c
@@ -257,14 +257,21 @@ void cgroup_context_apply(CGroupContext *c, 
CGroupControllerMask mask, const cha
 
 if (mask  CGROUP_MEMORY) {
 char buf[DECIMAL_STR_MAX(uint64_t) + 1];
+if (c-memory_limit != (uint64_t) -1) {
+sprintf(buf, % PRIu64 \n, c-memory_limit);
+r = cg_set_attribute(memory, path, 
memory.limit_in_bytes, buf);
+} else
+r = cg_set_attribute(memory, path, 
memory.limit_in_bytes, -1);
 
-sprintf(buf, % PRIu64 \n, c-memory_limit);
-r = cg_set_attribute(memory, path, memory.limit_in_bytes, 
buf);
 if (r  0)
 log_error(Failed to set memory.limit_in_bytes on %s: 
%s, path, strerror(-r));
 
-sprintf(buf, % PRIu64 \n, c-memory_soft_limit);
-r = cg_set_attribute(memory, path, 
memory.soft_limit_in_bytes, buf);
+if (c-memory_soft_limit != (uint64_t) -1) {
+sprintf(buf, % PRIu64 \n, c-memory_soft_limit);
+r = cg_set_attribute(memory, path, 
memory.soft_limit_in_bytes, buf);
+} else
+r = cg_set_attribute(memory, path, 
memory.soft_limit_in_bytes, -1);
+
 if (r  0)
 log_error(Failed to set memory.soft_limit_in_bytes on 
%s: %s, path, strerror(-r));
 }
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 1/5] cgroup: add the setting memory.use_hierarchy support for systemd

2013-09-13 Thread Gao feng
Some programs need to set the memory.use_hierarchy(such as libvirt),
Add this feature.
---
 src/core/cgroup.c |  9 +
 src/core/cgroup.h |  1 +
 src/core/dbus-cgroup.c| 16 
 src/core/load-fragment-gperf.gperf.m4 |  1 +
 4 files changed, 27 insertions(+)

diff --git a/src/core/cgroup.c b/src/core/cgroup.c
index 244baff..336d394 100644
--- a/src/core/cgroup.c
+++ b/src/core/cgroup.c
@@ -274,6 +274,14 @@ void cgroup_context_apply(CGroupContext *c, 
CGroupControllerMask mask, const cha
 
 if (r  0)
 log_error(Failed to set memory.soft_limit_in_bytes on 
%s: %s, path, strerror(-r));
+
+if (c-memory_hierarchy)
+r = cg_set_attribute(memory, path, 
memory.use_hierarchy, 1);
+else
+r = cg_set_attribute(memory, path, 
memory.use_hierarchy, 0);
+
+if (r  0)
+log_error(Failed to set memory.use_hierarchy on %s: 
%s, path, strerror(-r));
 }
 
 if (mask  CGROUP_DEVICE) {
@@ -336,6 +344,7 @@ CGroupControllerMask cgroup_context_get_mask(CGroupContext 
*c) {
 mask |= CGROUP_BLKIO;
 
 if (c-memory_accounting ||
+c-memory_hierarchy ||
 c-memory_limit != (uint64_t) -1 ||
 c-memory_soft_limit != (uint64_t) -1)
 mask |= CGROUP_MEMORY;
diff --git a/src/core/cgroup.h b/src/core/cgroup.h
index 786bd71..c35eea5 100644
--- a/src/core/cgroup.h
+++ b/src/core/cgroup.h
@@ -69,6 +69,7 @@ struct CGroupContext {
 bool cpu_accounting;
 bool blockio_accounting;
 bool memory_accounting;
+bool memory_hierarchy;
 
 unsigned long cpu_shares;
 
diff --git a/src/core/dbus-cgroup.c b/src/core/dbus-cgroup.c
index 1f2a396..20a913c 100644
--- a/src/core/dbus-cgroup.c
+++ b/src/core/dbus-cgroup.c
@@ -132,6 +132,7 @@ const BusProperty bus_cgroup_context_properties[] = {
 { BlockIOReadBandwidth,bus_cgroup_append_device_bandwidths, 
a(st), 0   },
 { BlockIOWriteBandwidth,   bus_cgroup_append_device_bandwidths, 
a(st), 0   },
 { MemoryAccounting,bus_property_append_bool,b, 
offsetof(CGroupContext, memory_accounting)  },
+{ MemoryHierarchy, bus_property_append_bool,b, 
offsetof(CGroupContext, memory_hierarchy)   },
 { MemoryLimit, bus_property_append_uint64,  t, 
offsetof(CGroupContext, memory_limit)   },
 { MemorySoftLimit, bus_property_append_uint64,  t, 
offsetof(CGroupContext, memory_soft_limit)  },
 { DevicePolicy,bus_cgroup_append_device_policy, s, 
offsetof(CGroupContext, device_policy)  },
@@ -438,6 +439,21 @@ int bus_cgroup_set_property(
 
 return 1;
 
+} else if (streq(name, MemoryHierarchy)) {
+
+if (dbus_message_iter_get_arg_type(i) != DBUS_TYPE_BOOLEAN)
+return -EINVAL;
+
+if (mode != UNIT_CHECK) {
+dbus_bool_t b;
+dbus_message_iter_get_basic(i, b);
+
+c-memory_hierarchy = b;
+unit_write_drop_in_private(u, mode, name, b ? 
MemoryHierarchy=yes : MemoryHierarchy=no);
+}
+
+return 1;
+
 } else if (streq(name, DevicePolicy)) {
 const char *policy;
 CGroupDevicePolicy p;
diff --git a/src/core/load-fragment-gperf.gperf.m4 
b/src/core/load-fragment-gperf.gperf.m4
index 33c6880..bc57780 100644
--- a/src/core/load-fragment-gperf.gperf.m4
+++ b/src/core/load-fragment-gperf.gperf.m4
@@ -88,6 +88,7 @@ m4_define(`CGROUP_CONTEXT_CONFIG_ITEMS',
 $1.CPUAccounting,config_parse_bool,  0,
 offsetof($1, cgroup_context.cpu_accounting)
 $1.CPUShares,config_parse_cpu_shares,0,
 offsetof($1, cgroup_context)
 $1.MemoryAccounting, config_parse_bool,  0,
 offsetof($1, cgroup_context.memory_accounting)
+$1.MemoryHierarchy,  config_parse_bool,  0,
 offsetof($1, cgroup_context.memory_hierarchy)
 $1.MemoryLimit,  config_parse_memory_limit,  0,
 offsetof($1, cgroup_context)
 $1.MemorySoftLimit,  config_parse_memory_limit,  0,
 offsetof($1, cgroup_context)
 $1.DeviceAllow,  config_parse_device_allow,  0,
 offsetof($1, cgroup_context)
-- 
1.8.3.1

___
systemd-devel mailing list

[systemd-devel] [PATCH 2/5] cgroup: show MemoryHierarchy in cgroup_context_dump

2013-09-13 Thread Gao feng
Show MemoryHierarchy in cgroup_context_dump.
---
 src/core/cgroup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/core/cgroup.c b/src/core/cgroup.c
index 336d394..2c8aa22 100644
--- a/src/core/cgroup.c
+++ b/src/core/cgroup.c
@@ -91,6 +91,7 @@ void cgroup_context_dump(CGroupContext *c, FILE* f, const 
char *prefix) {
 %sCPUAccounting=%s\n
 %sBlockIOAccounting=%s\n
 %sMemoryAccounting=%s\n
+%sMemoryHierarchy=%s\n
 %sCPUShares=%lu\n
 %sBlockIOWeight%lu\n
 %sMemoryLimit=% PRIu64 \n
@@ -99,6 +100,7 @@ void cgroup_context_dump(CGroupContext *c, FILE* f, const 
char *prefix) {
 prefix, yes_no(c-cpu_accounting),
 prefix, yes_no(c-blockio_accounting),
 prefix, yes_no(c-memory_accounting),
+prefix, yes_no(c-memory_hierarchy),
 prefix, c-cpu_shares,
 prefix, c-blockio_weight,
 prefix, c-memory_limit,
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 3/5] systemctl: support setting MemoryHierarchy through systemctl

2013-09-13 Thread Gao feng
support setting MemoryHierarchy of unit through systemctl.
---
 src/systemctl/systemctl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/systemctl/systemctl.c b/src/systemctl/systemctl.c
index 57e5bb9..87388bb 100644
--- a/src/systemctl/systemctl.c
+++ b/src/systemctl/systemctl.c
@@ -3643,6 +3643,7 @@ static int append_assignment(DBusMessageIter *iter, const 
char *assignment) {
 
 if (streq(field, CPUAccounting) ||
 streq(field, MemoryAccounting) ||
+streq(field, MemoryHierarchy) ||
 streq(field, BlockIOAccounting)) {
 dbus_bool_t b;
 
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 5/5] machine.slice: enable MemoryHierarchy by default

2013-09-13 Thread Gao feng
We can simply limit the total memory usage of machines
through setting the memory usage limit of machine.slice.
---
 units/machine.slice | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/units/machine.slice b/units/machine.slice
index 3d40dfd..c959598 100644
--- a/units/machine.slice
+++ b/units/machine.slice
@@ -9,3 +9,6 @@
 Description=Virtual Machine and Container Slice
 Documentation=man:systemd.special(7)
 Before=slices.target
+
+[Slice]
+MemoryHierarchy=yes
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 4/5] man: add manpage for MemoryHierarchy

2013-09-13 Thread Gao feng
add the manpage for MemoryHierarchy.
---
 man/systemd.cgroup.xml | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/man/systemd.cgroup.xml b/man/systemd.cgroup.xml
index cc0eb15..80fc2e2 100644
--- a/man/systemd.cgroup.xml
+++ b/man/systemd.cgroup.xml
@@ -135,6 +135,17 @@ along with systemd; If not, see 
http://www.gnu.org/licenses/.
   /varlistentry
 
   varlistentry
+termvarnameMemoryHierarchy=/varname/term
+
+listitem
+  paraTurn on hierarchical accounting and reclaim for this
+  unit. Takes a boolean argument. Note that turning on hierarchical
+  accounting for one unit will also implicitly turn it on for
+  all units contained therein./para
+/listitem
+  /varlistentry
+
+  varlistentry
 
termvarnameMemoryLimit=replaceablebytes/replaceable/varname/term
 
termvarnameMemorySoftLimit=replaceablebytes/replaceable/varname/term
 
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 1/2] cgroup: add the missing setting of variable's value

2013-09-12 Thread Gao feng
set the value of variable r to the return value
of cg_set_attribute.
---
 src/core/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/core/cgroup.c b/src/core/cgroup.c
index 3eeb475..fba0b2f 100644
--- a/src/core/cgroup.c
+++ b/src/core/cgroup.c
@@ -264,7 +264,7 @@ void cgroup_context_apply(CGroupContext *c, 
CGroupControllerMask mask, const cha
 log_error(Failed to set memory.limit_in_bytes on %s: 
%s, path, strerror(-r));
 
 sprintf(buf, % PRIu64 \n, c-memory_soft_limit);
-cg_set_attribute(memory, path, memory.soft_limit_in_bytes, 
buf);
+r = cg_set_attribute(memory, path, 
memory.soft_limit_in_bytes, buf);
 if (r  0)
 log_error(Failed to set memory.limit_in_bytes on %s: 
%s, path, strerror(-r));
 }
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 2/2] cgroup: correct the log information

2013-09-12 Thread Gao feng
it should be memory.soft_limit_in_bytes.
---
 src/core/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/core/cgroup.c b/src/core/cgroup.c
index fba0b2f..aee93ba 100644
--- a/src/core/cgroup.c
+++ b/src/core/cgroup.c
@@ -266,7 +266,7 @@ void cgroup_context_apply(CGroupContext *c, 
CGroupControllerMask mask, const cha
 sprintf(buf, % PRIu64 \n, c-memory_soft_limit);
 r = cg_set_attribute(memory, path, 
memory.soft_limit_in_bytes, buf);
 if (r  0)
-log_error(Failed to set memory.limit_in_bytes on %s: 
%s, path, strerror(-r));
+log_error(Failed to set memory.soft_limit_in_bytes on 
%s: %s, path, strerror(-r));
 }
 
 if (mask  CGROUP_DEVICE) {
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] implies MemoryAccounting=true

2013-09-11 Thread Gao feng
On 09/11/2013 08:07 PM, Lennart Poettering wrote:
 On Wed, 11.09.13 09:18, Gao feng (gaof...@cn.fujitsu.com) wrote:
 
 The SYSTEMD.CGROUP(5) said if MemoryLimit=bytes is set for unit, it
 implies MemeoryAccounting=true for this unit.

 But seems systemd didn't implement this hint. CPUShares  BlockIO have
 the same problem, this is a shortage? patch needed?


 The logic for this is in src/core/cgroup.c's cgroup_context_get_mask()
 call, which will determine to which cgroup controllers to add a unit
 to. Note that setting MemoryLimit= will not actually propagate to the
 boolean exposed in MemoryAccounting=, it will just have the same effect
 as if it was set...


 Maybe we should also report MemoryAccounting=yes in cgroup_context_dump
 if we set MemoryLimit.
 
 What I don't really like about this is that we allow MemoryLimit to be
 altered dynamically, which would then need to influence
 MemoryAccounting= dynamically too. WHich means that we would have to
 shadow that field so that we can revert to the originally configured
 value. Also, it would have very weird effects if people then try to
 dynamically unset MemoryAccounting which would either possibly be a NOP
 or would have the effect of resseting MemoryLimit... If you follow what
 I mean.
 

make sense, thanks for your explanation :)

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] implies MemoryAccounting=true

2013-09-10 Thread Gao feng
On 09/10/2013 11:50 PM, Lennart Poettering wrote:
 On Mon, 26.08.13 11:19, Gao feng (gaof...@cn.fujitsu.com) wrote:
 
 Hi

 The SYSTEMD.CGROUP(5) said if MemoryLimit=bytes is set for unit, it
 implies MemeoryAccounting=true for this unit.

 But seems systemd didn't implement this hint. CPUShares  BlockIO have
 the same problem, this is a shortage? patch needed?
 
 
 The logic for this is in src/core/cgroup.c's cgroup_context_get_mask()
 call, which will determine to which cgroup controllers to add a unit
 to. Note that setting MemoryLimit= will not actually propagate to the
 boolean exposed in MemoryAccounting=, it will just have the same effect
 as if it was set...
 

Maybe we should also report MemoryAccounting=yes in cgroup_context_dump
if we set MemoryLimit.

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 1/3] blkio bandwidth: don't clean up all of entries in blockio_device_bandwidths list

2013-09-10 Thread Gao feng
On 09/10/2013 11:13 PM, Lennart Poettering wrote:
 On Fri, 30.08.13 10:56, Gao feng (gaof...@cn.fujitsu.com) wrote:
 
 if we get BlockIOReadBandwidth=, we should only remove the
 read-bandwidth-entries in blockio_device_bandwidths list.
 
 Thanks! Applied, though with one change:
 
 +read = streq(BlockIOReadBandwidth, lvalue);
 +
  if (isempty(rvalue)) {
 -while (c-blockio_device_bandwidths)
 -cgroup_context_free_blockio_device_bandwidth(c, 
 c-blockio_device_bandwidths);
 +CGroupBlockIODeviceBandwidth *next;
 +
 +LIST_FOREACH_SAFE (device_bandwidths, b, next, 
 c-blockio_device_bandwidths) {
 +if (b-read == read) {
 +LIST_REMOVE(CGroupBlockIODeviceBandwidth, 
 device_bandwidths, c-blockio_device_bandwidths, b);
 +free(b-path);
 +free(b);
 
 I replaced the three previous lines with an invocation of
 cgroup_context_free_blockio_device_bandwidth() again.
 


Thanks for your help!

Gao

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 1/3] blkio bandwidth: don't clean up all of entries in blockio_device_bandwidths list

2013-08-29 Thread Gao feng
if we get BlockIOReadBandwidth=, we should only remove the
read-bandwidth-entries in blockio_device_bandwidths list.
---
 src/core/load-fragment.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/core/load-fragment.c b/src/core/load-fragment.c
index 4714687..6f316cc 100644
--- a/src/core/load-fragment.c
+++ b/src/core/load-fragment.c
@@ -2205,6 +2205,7 @@ int config_parse_blockio_bandwidth(
 CGroupContext *c = data;
 const char *bandwidth;
 off_t bytes;
+bool read;
 size_t n;
 int r;
 
@@ -2212,9 +2213,18 @@ int config_parse_blockio_bandwidth(
 assert(lvalue);
 assert(rvalue);
 
+read = streq(BlockIOReadBandwidth, lvalue);
+
 if (isempty(rvalue)) {
-while (c-blockio_device_bandwidths)
-cgroup_context_free_blockio_device_bandwidth(c, 
c-blockio_device_bandwidths);
+CGroupBlockIODeviceBandwidth *next;
+
+LIST_FOREACH_SAFE (device_bandwidths, b, next, 
c-blockio_device_bandwidths) {
+if (b-read == read) {
+LIST_REMOVE(CGroupBlockIODeviceBandwidth, 
device_bandwidths, c-blockio_device_bandwidths, b);
+free(b-path);
+free(b);
+}
+}
 
 return 0;
 }
@@ -2253,7 +2263,7 @@ int config_parse_blockio_bandwidth(
 b-path = path;
 path = NULL;
 b-bandwidth = (uint64_t) bytes;
-b-read = streq(BlockIOReadBandwidth, lvalue);
+b-read = read;
 
 LIST_PREPEND(CGroupBlockIODeviceBandwidth, device_bandwidths, 
c-blockio_device_bandwidths, b);
 
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 2/3] cgroup: setup BlockIORead/WriteBandwidth in bus_cgroup_set_property

2013-08-29 Thread Gao feng
This patch adds the support for setting up BlockIORead/WriteBandwidth
in bus_cgroup_set_property.
---
 src/core/dbus-cgroup.c | 101 +
 1 file changed, 101 insertions(+)

diff --git a/src/core/dbus-cgroup.c b/src/core/dbus-cgroup.c
index c0e2371..4ef203d 100644
--- a/src/core/dbus-cgroup.c
+++ b/src/core/dbus-cgroup.c
@@ -307,6 +307,107 @@ int bus_cgroup_set_property(
 }
 
 return 1;
+
+} else if (streq(name, BlockIOReadBandwidth) || streq(name, 
BlockIOWriteBandwidth)) {
+DBusMessageIter sub;
+unsigned n = 0;
+bool read = true;
+
+if (streq(name, BlockIOWriteBandwidth))
+read = false;
+
+if (dbus_message_iter_get_arg_type(i) != DBUS_TYPE_ARRAY ||
+dbus_message_iter_get_element_type(i) != DBUS_TYPE_STRUCT)
+return -EINVAL;
+
+dbus_message_iter_recurse(i, sub);
+while (dbus_message_iter_get_arg_type(sub) == 
DBUS_TYPE_STRUCT) {
+DBusMessageIter sub2;
+const char *path;
+uint64_t u64;
+CGroupBlockIODeviceBandwidth *a;
+
+dbus_message_iter_recurse(sub, sub2);
+
+if (bus_iter_get_basic_and_next(sub2, 
DBUS_TYPE_STRING, path, true)  0 ||
+bus_iter_get_basic_and_next(sub2, 
DBUS_TYPE_UINT64, u64, false)  0)
+return -EINVAL;
+
+if (mode != UNIT_CHECK) {
+CGroupBlockIODeviceBandwidth *b;
+bool exist = false;
+
+LIST_FOREACH(device_bandwidths, b, 
c-blockio_device_bandwidths) {
+if (streq(path, b-path)  read == 
b-read) {
+a = b;
+exist = true;
+break;
+}
+}
+
+if (!exist) {
+a = new0(CGroupBlockIODeviceBandwidth, 
1);
+if (!a)
+return -ENOMEM;
+
+a-read = read;
+a-path = strdup(path);
+if (!a-path) {
+free(a);
+return -ENOMEM;
+}
+}
+
+a-bandwidth = u64;
+
+if (!exist)
+
LIST_PREPEND(CGroupBlockIODeviceBandwidth, device_bandwidths,
+ 
c-blockio_device_bandwidths, a);
+}
+
+n++;
+dbus_message_iter_next(sub);
+}
+
+if (mode != UNIT_CHECK) {
+_cleanup_free_ char *buf = NULL;
+_cleanup_fclose_ FILE *f = NULL;
+CGroupBlockIODeviceBandwidth *a;
+CGroupBlockIODeviceBandwidth *next;
+size_t size = 0;
+
+if (n == 0) {
+LIST_FOREACH_SAFE(device_bandwidths, a, next, 
c-blockio_device_bandwidths) {
+if (a-read == read) {
+
LIST_REMOVE(CGroupBlockIODeviceBandwidth, device_bandwidths, 
c-blockio_device_bandwidths, a);
+free(a-path);
+free(a);
+}
+}
+}
+
+f = open_memstream(buf, size);
+if (!f)
+return -ENOMEM;
+
+if (read) {
+fputs(BlockIOReadBandwidth=\n, f);
+LIST_FOREACH(device_bandwidths, a, 
c-blockio_device_bandwidths)
+if (a-read)
+fprintf(f, 
BlockIOReadBandwidth=%s % PRIu64 \n, a-path, a-bandwidth);
+} else {
+fputs(BlockIOWriteBandwidth=\n, f);
+LIST_FOREACH(device_bandwidths, a, 
c-blockio_device_bandwidths)
+if (!a-read)
+  

[systemd-devel] [PATCH 3/3] systemcl: add support for setting BlockIORead/WriteBandwidth for unit

2013-08-29 Thread Gao feng
This patch allows user to set up BlockIOReadBandwidth and BlockIOWriteBandwidth
for unit through systemctl. Such as

systemctl set-property sshd.service BlockIOReadBandwidth=/dev/sda 10
systemctl set-property sshd.service BlockIOWriteBandwidth=/dev/sda 20
---
 src/systemctl/systemctl.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/src/systemctl/systemctl.c b/src/systemctl/systemctl.c
index 4ea301a..bd9abdb 100644
--- a/src/systemctl/systemctl.c
+++ b/src/systemctl/systemctl.c
@@ -3750,6 +3750,43 @@ static int append_assignment(DBusMessageIter *iter, 
const char *assignment) {
 if (!dbus_message_iter_close_container(sub, sub2))
 return log_oom();
 
+} else if (streq(field, BlockIOReadBandwidth) || streq(field, 
BlockIOWriteBandwidth)) {
+DBusMessageIter sub2;
+
+if (!dbus_message_iter_open_container(iter, DBUS_TYPE_VARIANT, 
a(st), sub) ||
+!dbus_message_iter_open_container(sub, DBUS_TYPE_ARRAY, 
(st), sub2))
+return log_oom();
+
+if (!isempty(eq)) {
+const char *path, *bandwidth;
+DBusMessageIter sub3;
+uint64_t u;
+char *e;
+
+e = strchr(eq, ' ');
+if (e) {
+path = strndupa(eq, e - eq);
+bandwidth = e+1;
+} else {
+path = eq;
+bandwidth = ;
+}
+
+r = safe_atou64(bandwidth, u);
+if (r  0) {
+log_error(Failed to parse %s value %s., 
field, bandwidth);
+return -EINVAL;
+}
+if (!dbus_message_iter_open_container(sub2, 
DBUS_TYPE_STRUCT, NULL, sub3) ||
+!dbus_message_iter_append_basic(sub3, 
DBUS_TYPE_STRING, path) ||
+!dbus_message_iter_append_basic(sub3, 
DBUS_TYPE_UINT64, u) ||
+!dbus_message_iter_close_container(sub2, sub3))
+return log_oom();
+}
+
+if (!dbus_message_iter_close_container(sub, sub2))
+return log_oom();
+
 } else {
 log_error(Unknown assignment %s., assignment);
 return -EINVAL;
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] implies MemoryAccounting=true

2013-08-29 Thread Gao feng
On 08/28/2013 07:45 PM, Zbigniew Jędrzejewski-Szmek wrote:
 On Mon, Aug 26, 2013 at 11:19:54AM +0800, Gao feng wrote:
 Hi

 The SYSTEMD.CGROUP(5) said if MemoryLimit=bytes is set for unit, it
 implies MemeoryAccounting=true for this unit.

 But seems systemd didn't implement this hint. CPUShares  BlockIO have
 the same problem, this is a shortage? patch needed?
 Hm, without looking at code actually, I thought that this
 is basically a limitation/requirement of the cgroup controller:
 if a memory share is assigned, memory accounting will be performed
 by the kernelfor it to work.
 

seems you are right, thanks!

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] blcokio bandwidth: add missing set of CGroupBlockIODeviceBandwidth's read

2013-08-27 Thread Gao feng
BlockIOReadBandwidth and BlockIOWriteBandwidth both use
config_parse_blockio_bandwidth to set up CGroupBlockIODeviceBandwidth,
We should set the read value based on the left values
in config files.
---
 src/core/load-fragment.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/core/load-fragment.c b/src/core/load-fragment.c
index e5fc4a3..4714687 100644
--- a/src/core/load-fragment.c
+++ b/src/core/load-fragment.c
@@ -2253,6 +2253,7 @@ int config_parse_blockio_bandwidth(
 b-path = path;
 path = NULL;
 b-bandwidth = (uint64_t) bytes;
+b-read = streq(BlockIOReadBandwidth, lvalue);
 
 LIST_PREPEND(CGroupBlockIODeviceBandwidth, device_bandwidths, 
c-blockio_device_bandwidths, b);
 
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] device cgroup: don't create a new CGroupDeviceAllow when it already in the list

2013-08-26 Thread Gao feng
If a device node is already in the device_allow list of
CGroupContext, we should replace it instead of create a
new one and append this new one to the end of device_allow
list.
---
 src/core/dbus-cgroup.c | 32 +++-
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/src/core/dbus-cgroup.c b/src/core/dbus-cgroup.c
index 9e97b20..30c99dd 100644
--- a/src/core/dbus-cgroup.c
+++ b/src/core/dbus-cgroup.c
@@ -314,21 +314,35 @@ int bus_cgroup_set_property(
 }
 
 if (mode != UNIT_CHECK) {
-a = new0(CGroupDeviceAllow, 1);
-if (!a)
-return -ENOMEM;
-
-a-path = strdup(path);
-if (!a-path) {
-free(a);
-return -ENOMEM;
+CGroupDeviceAllow *b;
+bool exist = false;
+
+LIST_FOREACH(device_allow, b, c-device_allow) 
{
+if (!strcmp(b-path, path)) {
+a = b;
+exist = true;
+break;
+}
+}
+
+if (!exist) {
+a = new0(CGroupDeviceAllow, 1);
+if (!a)
+return -ENOMEM;
+
+a-path = strdup(path);
+if (!a-path) {
+free(a);
+return -ENOMEM;
+}
 }
 
 a-r = !!strchr(rwm, 'r');
 a-w = !!strchr(rwm, 'w');
 a-m = !!strchr(rwm, 'm');
 
-LIST_PREPEND(CGroupDeviceAllow, device_allow, 
c-device_allow, a);
+if (!exist)
+LIST_PREPEND(CGroupDeviceAllow, 
device_allow, c-device_allow, a);
 }
 
 n++;
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 1/3] cgroup: setup BlockIODeviceWeight in bus_cgroup_set_property

2013-08-26 Thread Gao feng
This patch adds the support for setting up BlockIODeviceWeight
in bus_cgroup_set_property. most of the codes are copied from
the case that sets up DeviceAllow.
---
 src/core/dbus-cgroup.c | 85 ++
 1 file changed, 85 insertions(+)

diff --git a/src/core/dbus-cgroup.c b/src/core/dbus-cgroup.c
index 30c99dd..1555c34 100644
--- a/src/core/dbus-cgroup.c
+++ b/src/core/dbus-cgroup.c
@@ -222,6 +222,91 @@ int bus_cgroup_set_property(
 
 return 1;
 
+} else if (streq(name, BlockIODeviceWeight)) {
+DBusMessageIter sub;
+unsigned n = 0;
+
+if (dbus_message_iter_get_arg_type(i) != DBUS_TYPE_ARRAY ||
+dbus_message_iter_get_element_type(i) != DBUS_TYPE_STRUCT)
+return -EINVAL;
+
+dbus_message_iter_recurse(i, sub);
+while (dbus_message_iter_get_arg_type(sub) == 
DBUS_TYPE_STRUCT) {
+DBusMessageIter sub2;
+const char *path;
+uint64_t u64;
+unsigned long ul;
+CGroupBlockIODeviceWeight *a;
+
+dbus_message_iter_recurse(sub, sub2);
+
+if (bus_iter_get_basic_and_next(sub2, 
DBUS_TYPE_STRING, path, true)  0 ||
+bus_iter_get_basic_and_next(sub2, 
DBUS_TYPE_UINT64, u64, false)  0)
+return -EINVAL;
+
+ul = (unsigned long) u64;
+
+if (u64  10 || u64  1000)
+return -EINVAL;
+
+if (mode != UNIT_CHECK) {
+CGroupBlockIODeviceWeight *b;
+bool exist = false;
+
+LIST_FOREACH(device_weights, b, 
c-blockio_device_weights) {
+if (streq(b-path, path)) {
+a = b;
+exist = true;
+break;
+}
+}
+
+if (!exist) {
+a = new0(CGroupBlockIODeviceWeight, 1);
+if (!a)
+return -ENOMEM;
+
+a-path = strdup(path);
+if (!a-path) {
+free(a);
+return -ENOMEM;
+}
+}
+
+a-weight = ul;
+
+if (!exist)
+
LIST_PREPEND(CGroupBlockIODeviceWeight, device_weights,
+ 
c-blockio_device_weights, a);
+}
+
+n++;
+dbus_message_iter_next(sub);
+}
+
+if (mode != UNIT_CHECK) {
+_cleanup_free_ char *buf = NULL;
+_cleanup_fclose_ FILE *f = NULL;
+CGroupBlockIODeviceWeight *a;
+size_t size = 0;
+
+if (n == 0) {
+while (c-blockio_device_weights)
+
cgroup_context_free_blockio_device_weight(c, c-blockio_device_weights);
+}
+
+f = open_memstream(buf, size);
+if (!f)
+return -ENOMEM;
+
+fputs(BlockIODeviceWeight=\n, f);
+LIST_FOREACH(device_weights, a, 
c-blockio_device_weights)
+fprintf(f, BlockIODeviceWeight=%s %lu\n, 
a-path, a-weight);
+fflush(f);
+unit_write_drop_in_private(u, mode, name, buf);
+}
+
+return 1;
 } else if (streq(name, MemoryAccounting)) {
 
 if (dbus_message_iter_get_arg_type(i) != DBUS_TYPE_BOOLEAN)
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 2/3] systemcl: add support for setting BlockIODeviceWeight for unit

2013-08-26 Thread Gao feng
This patch allows user to set up BlockIODeviceWeight for unit
through systemctl. Such as

systemctl set-property sshd.service BlockIODeviceWeight=/dev/sda 100
---
 src/systemctl/systemctl.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/src/systemctl/systemctl.c b/src/systemctl/systemctl.c
index a635891..ff29b0f 100644
--- a/src/systemctl/systemctl.c
+++ b/src/systemctl/systemctl.c
@@ -3695,6 +3695,43 @@ static int append_assignment(DBusMessageIter *iter, 
const char *assignment) {
 if (!dbus_message_iter_close_container(sub, sub2))
 return log_oom();
 
+} else if (streq(field, BlockIODeviceWeight)) {
+DBusMessageIter sub2;
+
+if (!dbus_message_iter_open_container(iter, DBUS_TYPE_VARIANT, 
a(st), sub) ||
+!dbus_message_iter_open_container(sub, DBUS_TYPE_ARRAY, 
(st), sub2))
+return log_oom();
+
+if (!isempty(eq)) {
+const char *path, *weight;
+DBusMessageIter sub3;
+uint64_t u;
+char *e;
+
+e = strchr(eq, ' ');
+if (e) {
+path = strndupa(eq, e - eq);
+weight = e+1;
+} else {
+path = eq;
+weight = ;
+}
+
+r = safe_atou64(weight, u);
+if (r  0) {
+log_error(Failed to parse %s value %s., 
field, weight);
+return -EINVAL;
+}
+if (!dbus_message_iter_open_container(sub2, 
DBUS_TYPE_STRUCT, NULL, sub3) ||
+!dbus_message_iter_append_basic(sub3, 
DBUS_TYPE_STRING, path) ||
+!dbus_message_iter_append_basic(sub3, 
DBUS_TYPE_UINT64, u) ||
+!dbus_message_iter_close_container(sub2, sub3))
+return log_oom();
+}
+
+if (!dbus_message_iter_close_container(sub, sub2))
+return log_oom();
+
 } else {
 log_error(Unknown assignment %s., assignment);
 return -EINVAL;
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 3/3] systemctl: show BlockIODeviceWeight for unit

2013-08-26 Thread Gao feng
We can use systemctl show unitname to show the BlockIODeviceWeight
of unit.
---
 src/systemctl/systemctl.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/systemctl/systemctl.c b/src/systemctl/systemctl.c
index ff29b0f..4ea301a 100644
--- a/src/systemctl/systemctl.c
+++ b/src/systemctl/systemctl.c
@@ -3315,6 +3315,24 @@ static int print_property(const char *name, 
DBusMessageIter *iter) {
 }
 return 0;
 
+} else if (dbus_message_iter_get_element_type(iter) == 
DBUS_TYPE_STRUCT  streq(name, BlockIODeviceWeight)) {
+DBusMessageIter sub, sub2;
+
+dbus_message_iter_recurse(iter, sub);
+while (dbus_message_iter_get_arg_type(sub) == 
DBUS_TYPE_STRUCT) {
+const char *path;
+uint64_t weight;
+
+dbus_message_iter_recurse(sub, sub2);
+
+if (bus_iter_get_basic_and_next(sub2, 
DBUS_TYPE_STRING, path, true) = 0 
+bus_iter_get_basic_and_next(sub2, 
DBUS_TYPE_UINT64, weight, false) = 0)
+printf(%s=%s % PRIu64 \n, name, 
strna(path), weight);
+
+dbus_message_iter_next(sub);
+}
+return 0;
+
 } else if (dbus_message_iter_get_element_type(iter) == 
DBUS_TYPE_STRUCT  (streq(name, BlockIOReadBandwidth) || streq(name, 
BlockIOWriteBandwidth))) {
 DBusMessageIter sub, sub2;
 
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] netns: unix: only allow to find out unix socket in same net namespace

2013-08-25 Thread Gao feng
On 08/26/2013 02:16 AM, James Bottomley wrote:
 On Sun, 2013-08-25 at 19:37 +0200, Kay Sievers wrote:
 On Sun, Aug 25, 2013 at 7:16 PM, James Bottomley
 jbottom...@parallels.com wrote:
 On Wed, 2013-08-21 at 11:51 +0200, Kay Sievers wrote:
 On Wed, Aug 21, 2013 at 9:22 AM, Gao feng gaof...@cn.fujitsu.com wrote:
 On 08/21/2013 03:06 PM, Eric W. Biederman wrote:

 I suspect libvirt should simply not share /run or any other normally
 writable directory with the host.  Sharing /run /var/run or even /tmp
 seems extremely dubious if you want some kind of containment, and
 without strange things spilling through.

 Right, /run or /var cannot be shared. It's not only about sockets,
 many other things will also go really wrong that way.

 This is very narrow thinking about what a container might be and will
 cause trouble as people start to create novel uses for containers in the
 cloud if you try to impose this on our current infrastructure.

 One of the cgroup only container uses we see at Parallels (so no
 separate filesystem and no net namespaces) is pure apache load balancer
 type shared hosting.  In this scenario, base apache is effectively
 brought up in the host environment, but then spawned instances are
 resource limited using cgroups according to what the customer has paid.
 Obviously all apache instances are sharing /var and /run from the host
 (mostly for logging and pid storage and static pages).  The reason some
 hosters do this is that it allows much higher density simple web serving
 (either static pages from quota limited chroots or dynamic pages limited
 by database space constraints) because each instance shares so much
 from the host.  The service is obviously much more basic than giving
 each customer a container running apache, but it's much easier for the
 hoster to administer and it serves the customer just as well for a large
 cross section of use cases and for those it doesn't serve, the hoster
 usually has separate container hosting (for a higher price, of course).

 The container as we talk about has it's own init, and no, it cannot
 share /var or /run.
 
 This is what we would call an IaaS container: bringing up init and
 effectively a new OS inside a container is the closest containers come
 to being like hypervisors.  It's the most common use case of Parallels
 containers in the field, so I'm certainly not telling you it's a bad
 idea.
 
 The stuff you talk about has nothing to do with that, it's not
 different from all services or a multi-instantiated service on the
 host sharing the same /run and /var.
 
 I gave you one example: a really simplistic one.  A more sophisticated
 example is a PaaS or SaaS container where you bring the OS up in the
 host but spawn a particular application into its own container (this is
 essentially similar to what Docker does).  Often in this case, you do
 add separate mount and network namespaces to make the application
 isolated and migrateable with its own IP address.  The reason you share
 init and most of the OS from the host is for elasticity and density,
 which are fast becoming a holy grail type quest of cloud orchestration
 systems: if you don't have to bring up the OS from init and you can just
 start the application from a C/R image (orders of magnitude smaller than
 a full system image) and slap on the necessary namespaces as you clone
 it, you have something that comes online in miliseconds which is a feat
 no hypervisor based virtualisation can match.
 
 I'm not saying don't pursue the IaaS case, it's definitely useful ...
 I'm just saying it would be a serious mistake to think that's the only
 use case for containers and we certainly shouldn't adjust Linux to serve
 only that use case.


The feature you said above VS contianer-reboot-host bug, I prefer to fix
the bug. and this feature can be achieved even container unshares /run directory
with host by default, for libvirt, user can set the container configuration to
make the container shares the /run directory with host.

I would like to say, the reboot from container bug is more urgent and need
to be fixed.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] cgroup: only check once when mode is UNIT_CHECK

2013-08-25 Thread Gao feng
If the mode is UNIT_CHECK,it means we only want to check if
the paramaters are valid. the first round of cycle already
did this check, no need to check again.
---
 src/core/dbus-unit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/core/dbus-unit.c b/src/core/dbus-unit.c
index 4cd3a13..2ea59b2 100644
--- a/src/core/dbus-unit.c
+++ b/src/core/dbus-unit.c
@@ -981,7 +981,7 @@ int bus_unit_set_properties(
 
 if (dbus_message_iter_get_arg_type(sub) == DBUS_TYPE_INVALID) 
{
 
-if (for_real)
+if (for_real || mode == UNIT_CHECK)
 break;
 
 /* Reached EOF. Let's try again, and this time for 
realz... */
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] netns: unix: only allow to find out unix socket in same net namespace

2013-08-25 Thread Gao feng
On 08/26/2013 11:19 AM, James Bottomley wrote:
 On Mon, 2013-08-26 at 09:06 +0800, Gao feng wrote:
 On 08/26/2013 02:16 AM, James Bottomley wrote:
 On Sun, 2013-08-25 at 19:37 +0200, Kay Sievers wrote:
 On Sun, Aug 25, 2013 at 7:16 PM, James Bottomley
 jbottom...@parallels.com wrote:
 On Wed, 2013-08-21 at 11:51 +0200, Kay Sievers wrote:
 On Wed, Aug 21, 2013 at 9:22 AM, Gao feng gaof...@cn.fujitsu.com wrote:
 On 08/21/2013 03:06 PM, Eric W. Biederman wrote:

 I suspect libvirt should simply not share /run or any other normally
 writable directory with the host.  Sharing /run /var/run or even /tmp
 seems extremely dubious if you want some kind of containment, and
 without strange things spilling through.

 Right, /run or /var cannot be shared. It's not only about sockets,
 many other things will also go really wrong that way.

 This is very narrow thinking about what a container might be and will
 cause trouble as people start to create novel uses for containers in the
 cloud if you try to impose this on our current infrastructure.

 One of the cgroup only container uses we see at Parallels (so no
 separate filesystem and no net namespaces) is pure apache load balancer
 type shared hosting.  In this scenario, base apache is effectively
 brought up in the host environment, but then spawned instances are
 resource limited using cgroups according to what the customer has paid.
 Obviously all apache instances are sharing /var and /run from the host
 (mostly for logging and pid storage and static pages).  The reason some
 hosters do this is that it allows much higher density simple web serving
 (either static pages from quota limited chroots or dynamic pages limited
 by database space constraints) because each instance shares so much
 from the host.  The service is obviously much more basic than giving
 each customer a container running apache, but it's much easier for the
 hoster to administer and it serves the customer just as well for a large
 cross section of use cases and for those it doesn't serve, the hoster
 usually has separate container hosting (for a higher price, of course).

 The container as we talk about has it's own init, and no, it cannot
 share /var or /run.

 This is what we would call an IaaS container: bringing up init and
 effectively a new OS inside a container is the closest containers come
 to being like hypervisors.  It's the most common use case of Parallels
 containers in the field, so I'm certainly not telling you it's a bad
 idea.

 The stuff you talk about has nothing to do with that, it's not
 different from all services or a multi-instantiated service on the
 host sharing the same /run and /var.

 I gave you one example: a really simplistic one.  A more sophisticated
 example is a PaaS or SaaS container where you bring the OS up in the
 host but spawn a particular application into its own container (this is
 essentially similar to what Docker does).  Often in this case, you do
 add separate mount and network namespaces to make the application
 isolated and migrateable with its own IP address.  The reason you share
 init and most of the OS from the host is for elasticity and density,
 which are fast becoming a holy grail type quest of cloud orchestration
 systems: if you don't have to bring up the OS from init and you can just
 start the application from a C/R image (orders of magnitude smaller than
 a full system image) and slap on the necessary namespaces as you clone
 it, you have something that comes online in miliseconds which is a feat
 no hypervisor based virtualisation can match.

 I'm not saying don't pursue the IaaS case, it's definitely useful ...
 I'm just saying it would be a serious mistake to think that's the only
 use case for containers and we certainly shouldn't adjust Linux to serve
 only that use case.


 The feature you said above VS contianer-reboot-host bug, I prefer to
 fix
 the bug.
 
 What bug?
 
  and this feature can be achieved even container unshares /run
 directory
 with host by default, for libvirt, user can set the container
 configuration to
 make the container shares the /run directory with host.

 I would like to say, the reboot from container bug is more urgent and
 need
 to be fixed.
 
 Are you talking about the old bug where trying to reboot an lxc
 container from within it would reboot the entire system? 

Yes, we are discussing this problem in this whole thread.

 If so, OpenVZ
 has never suffered from that problem and I thought it was fixed
 upstream.  I've not tested lxc tools, but the latest vzctl from the
 openvz website will bring up a container on the vanilla 3.9 kernel
 (provided you have USER_NS compiled in) can also be used to reboot the
 container, so I see no reason it wouldn't work for lxc as well.
 

I'm using libvirt lxc not lxc-tools.
Not all of users enable user namespace, I trust these container management
tools can have right/proper setting which inhibit this reboot-problem occur.
but I don't think this reboot-problem

[systemd-devel] [PATCH] blkio: fix incorrect setting of cpu_shares

2013-08-23 Thread Gao feng
We should set up blockio_weight not cpu_shares.

Signed-off-by: Gao feng gaof...@cn.fujitsu.com
---
 src/core/dbus-cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/core/dbus-cgroup.c b/src/core/dbus-cgroup.c
index 8ad3d11..9e97b20 100644
--- a/src/core/dbus-cgroup.c
+++ b/src/core/dbus-cgroup.c
@@ -216,7 +216,7 @@ int bus_cgroup_set_property(
 return -EINVAL;
 
 if (mode != UNIT_CHECK) {
-c-cpu_shares = ul;
+c-blockio_weight = ul;
 unit_write_drop_in_private_format(u, mode, name, 
BlockIOWeight=%lu, ul);
 }
 
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] netns: unix: only allow to find out unix socket in same net namespace

2013-08-21 Thread Gao feng
cc libvirt-list

On 08/21/2013 01:30 PM, Eric W. Biederman wrote:
 Gao feng gaof...@cn.fujitsu.com writes:
 
 Unix sockets are private resources of net namespace,
 allowing one net namespace to access to other netns's unix
 sockets is meaningless.
 
 Allowing one net namespace to access another netns's unix socket is
 deliberate behavior.  This is a desired and useful feature, and
 only a misconfiguration of visible files would allow this to be a
 problem.
 
 I'm researching a problem about shutdown from container,
 if the cotainer shares the same file /run/systemd/private
 with host, when we run shutdown -h xxx in container, the
 shutdown message will be send to the systemd-shutdownd
 through unix socket /run/systemd/private, and because
 systemd-shutdownd is running in host, so finally, the host
 will become shutdown.
 
 The simple answer is don't do that then.  I can see no reason
 to share /run outside of the container unless you want this kind of
 behavior.
 
 Quite frankly I want this behavior if I am using network namespaces
 to support multiple routing contexts. That is if I am using scripts
 like:
 
 ip netns add other
 ip netns exec other script
 
 I don't want to have to remember to say 
 ip netns orig exec shutdown -h now
 
 There are more compelling uses and there is no cost in supporting this
 in the kernel.
 
 What kind of misconfiguration caused someone to complain about this?
 

libvirt lxc allows user to set up a container which shares the same root
directory with host.

seems like the unix sockets whose sun_path is an abstract socket address
are net namespace aware.

Should we use abstract type of address instead of a file system pathname
for systemd in this case?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] netns: unix: only allow to find out unix socket in same net namespace

2013-08-21 Thread Gao feng
On 08/21/2013 03:06 PM, Eric W. Biederman wrote:
 Gao feng gaof...@cn.fujitsu.com writes:
 
 cc libvirt-list

 On 08/21/2013 01:30 PM, Eric W. Biederman wrote:
 Gao feng gaof...@cn.fujitsu.com writes:

 Unix sockets are private resources of net namespace,
 allowing one net namespace to access to other netns's unix
 sockets is meaningless.

 Allowing one net namespace to access another netns's unix socket is
 deliberate behavior.  This is a desired and useful feature, and
 only a misconfiguration of visible files would allow this to be a
 problem.

 I'm researching a problem about shutdown from container,
 if the cotainer shares the same file /run/systemd/private
 with host, when we run shutdown -h xxx in container, the
 shutdown message will be send to the systemd-shutdownd
 through unix socket /run/systemd/private, and because
 systemd-shutdownd is running in host, so finally, the host
 will become shutdown.

 The simple answer is don't do that then.  I can see no reason
 to share /run outside of the container unless you want this kind of
 behavior.

 Quite frankly I want this behavior if I am using network namespaces
 to support multiple routing contexts. That is if I am using scripts
 like:

 ip netns add other
 ip netns exec other script

 I don't want to have to remember to say 
 ip netns orig exec shutdown -h now

 There are more compelling uses and there is no cost in supporting this
 in the kernel.

 What kind of misconfiguration caused someone to complain about this?


 libvirt lxc allows user to set up a container which shares the same root
 directory with host.

 seems like the unix sockets whose sun_path is an abstract socket address
 are net namespace aware.

 Should we use abstract type of address instead of a file system pathname
 for systemd in this case?
 
 I suspect libvirt should simply not share /run or any other normally
 writable directory with the host.  Sharing /run /var/run or even /tmp
 seems extremely dubious if you want some kind of containment, and
 without strange things spilling through.
 

right now I only take note of the unix socket /run/systemd/private,
but there may have many similar unix sockets, they can exist in any
path. the strange problems will still happen.

anyway, I will send a patch to setup a fresh tmpfs for the /run directory of
container first.

Eric, Thanks for your help!
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] netns: unix: only allow to find out unix socket in same net namespace

2013-08-20 Thread Gao feng
Unix sockets are private resources of net namespace,
allowing one net namespace to access to other netns's unix
sockets is meaningless.

I'm researching a problem about shutdown from container,
if the cotainer shares the same file /run/systemd/private
with host, when we run shutdown -h xxx in container, the
shutdown message will be send to the systemd-shutdownd
through unix socket /run/systemd/private, and because
systemd-shutdownd is running in host, so finally, the host
will become shutdown.

We should make sure unix sockets are per net namespace to
avoid this problem.

Signed-off-by: Gao feng gaof...@cn.fujitsu.com
---
 net/unix/af_unix.c |  8 ++--
 net/unix/diag.c| 11 ---
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index c4ce243..98e3689 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -295,7 +295,8 @@ static inline struct sock *unix_find_socket_byname(struct 
net *net,
return s;
 }
 
-static struct sock *unix_find_socket_byinode(struct inode *i)
+static struct sock *unix_find_socket_byinode(struct net *net,
+struct inode *i)
 {
struct sock *s;
 
@@ -304,6 +305,9 @@ static struct sock *unix_find_socket_byinode(struct inode 
*i)
unix_socket_table[i-i_ino  (UNIX_HASH_SIZE - 1)]) {
struct dentry *dentry = unix_sk(s)-path.dentry;
 
+   if (!net_eq(sock_net(s), net))
+   continue;
+
if (dentry  dentry-d_inode == i) {
sock_hold(s);
goto found;
@@ -784,7 +788,7 @@ static struct sock *unix_find_other(struct net *net,
err = -ECONNREFUSED;
if (!S_ISSOCK(inode-i_mode))
goto put_fail;
-   u = unix_find_socket_byinode(inode);
+   u = unix_find_socket_byinode(net, inode);
if (!u)
goto put_fail;
 
diff --git a/net/unix/diag.c b/net/unix/diag.c
index d591091..80ada12 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -218,20 +218,25 @@ done:
return skb-len;
 }
 
-static struct sock *unix_lookup_by_ino(int ino)
+static struct sock *unix_lookup_by_ino(struct net *net, int ino)
 {
int i;
struct sock *sk;
 
spin_lock(unix_table_lock);
for (i = 0; i  ARRAY_SIZE(unix_socket_table); i++) {
-   sk_for_each(sk, unix_socket_table[i])
+   sk_for_each(sk, unix_socket_table[i]) {
+
+   if (!net_eq(sock_net(sk), net))
+   continue;
+
if (ino == sock_i_ino(sk)) {
sock_hold(sk);
spin_unlock(unix_table_lock);
 
return sk;
}
+   }
}
 
spin_unlock(unix_table_lock);
@@ -251,7 +256,7 @@ static int unix_diag_get_exact(struct sk_buff *in_skb,
if (req-udiag_ino == 0)
goto out_nosk;
 
-   sk = unix_lookup_by_ino(req-udiag_ino);
+   sk = unix_lookup_by_ino(net, req-udiag_ino);
err = -ENOENT;
if (sk == NULL)
goto out_nosk;
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] netns: unix: only allow to find out unix socket in same net namespace

2013-08-20 Thread Gao feng
cc contain...@lists.linux-foundation.org

On 08/21/2013 12:31 PM, Gao feng wrote:
 Unix sockets are private resources of net namespace,
 allowing one net namespace to access to other netns's unix
 sockets is meaningless.
 
 I'm researching a problem about shutdown from container,
 if the cotainer shares the same file /run/systemd/private
 with host, when we run shutdown -h xxx in container, the
 shutdown message will be send to the systemd-shutdownd
 through unix socket /run/systemd/private, and because
 systemd-shutdownd is running in host, so finally, the host
 will become shutdown.
 
 We should make sure unix sockets are per net namespace to
 avoid this problem.
 
 Signed-off-by: Gao feng gaof...@cn.fujitsu.com
 ---
  net/unix/af_unix.c |  8 ++--
  net/unix/diag.c| 11 ---
  2 files changed, 14 insertions(+), 5 deletions(-)
 
 diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
 index c4ce243..98e3689 100644
 --- a/net/unix/af_unix.c
 +++ b/net/unix/af_unix.c
 @@ -295,7 +295,8 @@ static inline struct sock *unix_find_socket_byname(struct 
 net *net,
   return s;
  }
  
 -static struct sock *unix_find_socket_byinode(struct inode *i)
 +static struct sock *unix_find_socket_byinode(struct net *net,
 +  struct inode *i)
  {
   struct sock *s;
  
 @@ -304,6 +305,9 @@ static struct sock *unix_find_socket_byinode(struct inode 
 *i)
   unix_socket_table[i-i_ino  (UNIX_HASH_SIZE - 1)]) {
   struct dentry *dentry = unix_sk(s)-path.dentry;
  
 + if (!net_eq(sock_net(s), net))
 + continue;
 +
   if (dentry  dentry-d_inode == i) {
   sock_hold(s);
   goto found;
 @@ -784,7 +788,7 @@ static struct sock *unix_find_other(struct net *net,
   err = -ECONNREFUSED;
   if (!S_ISSOCK(inode-i_mode))
   goto put_fail;
 - u = unix_find_socket_byinode(inode);
 + u = unix_find_socket_byinode(net, inode);
   if (!u)
   goto put_fail;
  
 diff --git a/net/unix/diag.c b/net/unix/diag.c
 index d591091..80ada12 100644
 --- a/net/unix/diag.c
 +++ b/net/unix/diag.c
 @@ -218,20 +218,25 @@ done:
   return skb-len;
  }
  
 -static struct sock *unix_lookup_by_ino(int ino)
 +static struct sock *unix_lookup_by_ino(struct net *net, int ino)
  {
   int i;
   struct sock *sk;
  
   spin_lock(unix_table_lock);
   for (i = 0; i  ARRAY_SIZE(unix_socket_table); i++) {
 - sk_for_each(sk, unix_socket_table[i])
 + sk_for_each(sk, unix_socket_table[i]) {
 +
 + if (!net_eq(sock_net(sk), net))
 + continue;
 +
   if (ino == sock_i_ino(sk)) {
   sock_hold(sk);
   spin_unlock(unix_table_lock);
  
   return sk;
   }
 + }
   }
  
   spin_unlock(unix_table_lock);
 @@ -251,7 +256,7 @@ static int unix_diag_get_exact(struct sk_buff *in_skb,
   if (req-udiag_ino == 0)
   goto out_nosk;
  
 - sk = unix_lookup_by_ino(req-udiag_ino);
 + sk = unix_lookup_by_ino(net, req-udiag_ino);
   err = -ENOENT;
   if (sk == NULL)
   goto out_nosk;
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel