Re: [PATCH] fs/jffs2: Add missing call to posix_acl_release

2008-01-07 Thread KaiGai Kohei

Julia Lawall wrote:

From: Julia Lawall <[EMAIL PROTECTED]>

posix_acl_clone does a memory allocation and sets a reference count, so
posix_acl_release is needed afterwards to free it.


The problem was fixed using the following semantic patch.
(http://www.emn.fr/x-info/coccinelle/)

// 
@@
type T;
identifier E;
expression E1, E2;
int ret;
statement S;
@@

  T E;
  <+...
(
  E = \(posix_acl_clone\|posix_acl_alloc\|posix_acl_dup\)(...);
  if (E == NULL) S
|
  if ((E = \(posix_acl_clone\|posix_acl_alloc\|posix_acl_dup\)(...)) == NULL) S
)
  ... when != E2 = E
  when strict
(
  posix_acl_release(E);
|
  E1 = E;
|
+ posix_acl_release(E);
  return;
|
+ posix_acl_release(E);
  return ret;
)
  ...+>
// 

Signed-off-by: Julia Lawall <[EMAIL PROTECTED]>
---

diff -u -p a/fs/jffs2/acl.c b/fs/jffs2/acl.c
--- a/fs/jffs2/acl.c 2008-01-03 09:49:31.0 +0100
+++ b/fs/jffs2/acl.c 2008-01-06 17:38:52.0 +0100
@@ -345,8 +345,10 @@ int jffs2_init_acl_pre(struct inode *dir
if (!clone)
return -ENOMEM;
rc = posix_acl_create_masq(clone, (mode_t *)i_mode);
-   if (rc < 0)
+   if (rc < 0) {
+   posix_acl_release(clone);
return rc;
+   }
if (rc > 0)
jffs2_iset_acl(inode, >i_acl_access, clone);


Indeed, there was a possibility to cause memory leaking.

Acked-by: KaiGai Kohei <[EMAIL PROTECTED]>

--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs/jffs2: Add missing call to posix_acl_release

2008-01-07 Thread KaiGai Kohei

Julia Lawall wrote:

From: Julia Lawall [EMAIL PROTECTED]

posix_acl_clone does a memory allocation and sets a reference count, so
posix_acl_release is needed afterwards to free it.


The problem was fixed using the following semantic patch.
(http://www.emn.fr/x-info/coccinelle/)

// smpl
@@
type T;
identifier E;
expression E1, E2;
int ret;
statement S;
@@

  T E;
  +...
(
  E = \(posix_acl_clone\|posix_acl_alloc\|posix_acl_dup\)(...);
  if (E == NULL) S
|
  if ((E = \(posix_acl_clone\|posix_acl_alloc\|posix_acl_dup\)(...)) == NULL) S
)
  ... when != E2 = E
  when strict
(
  posix_acl_release(E);
|
  E1 = E;
|
+ posix_acl_release(E);
  return;
|
+ posix_acl_release(E);
  return ret;
)
  ...+
// /smpl

Signed-off-by: Julia Lawall [EMAIL PROTECTED]
---

diff -u -p a/fs/jffs2/acl.c b/fs/jffs2/acl.c
--- a/fs/jffs2/acl.c 2008-01-03 09:49:31.0 +0100
+++ b/fs/jffs2/acl.c 2008-01-06 17:38:52.0 +0100
@@ -345,8 +345,10 @@ int jffs2_init_acl_pre(struct inode *dir
if (!clone)
return -ENOMEM;
rc = posix_acl_create_masq(clone, (mode_t *)i_mode);
-   if (rc  0)
+   if (rc  0) {
+   posix_acl_release(clone);
return rc;
+   }
if (rc  0)
jffs2_iset_acl(inode, f-i_acl_access, clone);


Indeed, there was a possibility to cause memory leaking.

Acked-by: KaiGai Kohei [EMAIL PROTECTED]

--
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Exporting capability code/name pairs

2008-01-03 Thread KaiGai Kohei
James Morris wrote:
> On Wed, 2 Jan 2008, KaiGai Kohei wrote:
> 
>>> Another issue is that securityfs depends on CONFIG_SECURITY, which might be
>>> undesirable, given that capabilities are a standard feature.
>> We can implement this feature on another pseudo filesystems.
>> Do you think what filesystem is the best candidate?
>> I prefer procfs or sysfs instead.
> 
> Sysfs makes more sense, as this information is system-wide and does not 
> relate to specific processes.

The following patch enables to export any capabilities which are supported
on the working kernel, under the /sys/kernel/capability.
You can obtain the code/name pairs of capabilities with scanning this directory.

Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]>
--
 kernel/Makefile   |9 +
 kernel/capability.c   |   36 
 scripts/mkcapnames.sh |   45 +
 3 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/kernel/Makefile b/kernel/Makefile
index dfa9695..29cd3ac 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -80,3 +80,12 @@ quiet_cmd_ikconfiggz = IKCFG   $@
 targets += config_data.h
 $(obj)/config_data.h: $(obj)/config_data.gz FORCE
$(call if_changed,ikconfiggz)
+
+# cap_names.h contains the code/name pair of capabilities.
+# It is generated using include/linux/capability.h automatically.
+$(obj)/capability.o: $(obj)/cap_names.h
+quiet_cmd_cap_names  = CAPS$@
+  cmd_cap_names  = /bin/sh $(src)/../scripts/mkcapnames.sh > $@
+targets += cap_names.h
+$(obj)/cap_names.h: $(src)/../scripts/mkcapnames.sh 
$(src)/../include/linux/capability.h FORCE
+   $(call if_changed,cap_names)
diff --git a/kernel/capability.c b/kernel/capability.c
index efbd9cd..14b4f4b 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -245,3 +245,39 @@ int capable(int cap)
return __capable(current, cap);
 }
 EXPORT_SYMBOL(capable);
+
+/*
+ * Export the list of capabilities on /sys/kernel/capability
+ */
+#define SYSFS_CAPABILITY_ENTRY(_name, _fmt, ...)   \
+   static ssize_t _name##_show(struct kset *kset, char *buffer)\
+   {   \
+   return scnprintf(buffer, PAGE_SIZE, _fmt, __VA_ARGS__); \
+   }   \
+   static struct subsys_attribute _name##_attr = __ATTR_RO(_name)
+
+/*
+ * capability_attrs[] is generated automatically by scripts/mkcapnames.sh
+ * This script parses include/linux/capability.h
+ */
+#include "cap_names.h"
+
+static struct attribute_group capability_attr_group = {
+   .name = "capability",
+   .attrs = capability_attrs,
+};
+
+static int __init capability_export_names(void)
+{
+   int rc;
+
+   rc = sysfs_create_group(_subsys.kobj,
+   _attr_group);
+   if (rc) {
+   printk(KERN_ERR "Unable to export capabilities\n");
+   return rc;
+   }
+
+   return 0;
+}
+__initcall(capability_export_names);
diff --git a/scripts/mkcapnames.sh b/scripts/mkcapnames.sh
index e69de29..c1c8b1f 100644
--- a/scripts/mkcapnames.sh
+++ b/scripts/mkcapnames.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+#
+# generate a cap_names.h file from include/linux/capability.h
+#
+
+BASEDIR=`dirname $0`
+
+echo '#ifndef CAP_NAMES_H'
+echo '#define CAP_NAMES_H'
+echo
+echo '#ifndef SYSFS_CAPABILITY_ENTRY'
+echo '#error cap_names.h should be included from kernel/capability.c'
+echo '#else'
+
+echo 'SYSFS_CAPABILITY_ENTRY(version, "0x%08x\n", _LINUX_CAPABILITY_VERSION);'
+
+cat ${BASEDIR}/../include/linux/capability.h   \
+| egrep '^#define CAP_[A-Z_]+[ ]+[0-9]+$'  \
+| awk 'BEGIN {
+max_code = -1;
+}
+{
+if ($3 > max_code)
+max_code = $3;
+printf("\tSYSFS_CAPABILITY_ENTRY(%s, \"%%u\\n\", %s);\n", 
tolower($2), $2);
+}
+END {
+printf("\tSYSFS_CAPABILITY_ENTRY(index, \"%%u\\n\", %u);\n", 
max_code);
+}'
+
+echo
+echo 'static struct attribute *capability_attrs[] = {'
+echo '_attr.attr,'
+echo '_attr.attr,'
+
+cat ${BASEDIR}/../include/linux/capability.h\
+| egrep '^#define CAP_[A-Z_]+[ ]+[0-9]+$'   \
+| awk '{ printf ("&%s_attr.attr,\n", tolower($2)); }'
+
+echo 'NULL,'
+echo '};'
+
+echo '#endif   /* SYSFS_CAPABILITY_ENTRY */'
+echo '#endif   /* CAP_NAMES_H */'
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Exporting capability code/name pairs

2008-01-03 Thread KaiGai Kohei
> There is also the issue of compiled code which explicitly raises and
> lowers capabilities around critical code sections (ie., as they were
> intended to be used) is also not well served by this change.
> 
> That is, unless the code was compiled with things like CAP_MAC_ADMIN
> being #define'd then it won't be able to do things like cap_set_flag()
> appropriately.

A new function will be necessary, like:
cap_value_t cap_get_value_by_name(const char *cap_name);

If a program tries to use specific capabilities explicitly, it can
apply pre-defined capability code like CAP_MAC_ADMIN.

However, when we implement a program which can deal with arbitrary
capabilities, it should obtain codes of them dynamically, because
it enables to work itself on the feature kernel.

Thanks,

> Cheers
> 
> Andrew
> 
> KaiGai Kohei wrote:
>> Andrew Morgan wrote:
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA1
>>>
>>> KaiGai Kohei wrote:
>>>> Remaining issues:
>>>> - We have to mount securityfs explicitly, or use /etc/fstab.
>>>>   It can cause a matter when we want to use this feature on
>>>>   very early phase on boot. (like /sbin/init)
>>> I'm not altogether clear how you intend this to work.
>>>
>>> Are you saying that some future version of libcap will require that
>>> securityfs be mounted before it (libcap) will work?
>> Yes, but implementing this feature on securityfs might be not good
>> idea as as James said. If this feature is on procfs or sysfs, it is
>> not necessary to mount securityfs explicitly.
>>
>> Thanks,
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.7 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFHfD7n+bHCR3gb8jsRAsgcAKDY6qXBR3JV2addHUg5sxyZHJEkDQCfdLOL
> zJlf3T4SQsUNENr3kwR5pr8=
> =v8C5
> -END PGP SIGNATURE-
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Exporting capability code/name pairs

2008-01-03 Thread KaiGai Kohei
 There is also the issue of compiled code which explicitly raises and
 lowers capabilities around critical code sections (ie., as they were
 intended to be used) is also not well served by this change.
 
 That is, unless the code was compiled with things like CAP_MAC_ADMIN
 being #define'd then it won't be able to do things like cap_set_flag()
 appropriately.

A new function will be necessary, like:
cap_value_t cap_get_value_by_name(const char *cap_name);

If a program tries to use specific capabilities explicitly, it can
apply pre-defined capability code like CAP_MAC_ADMIN.

However, when we implement a program which can deal with arbitrary
capabilities, it should obtain codes of them dynamically, because
it enables to work itself on the feature kernel.

Thanks,

 Cheers
 
 Andrew
 
 KaiGai Kohei wrote:
 Andrew Morgan wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 KaiGai Kohei wrote:
 Remaining issues:
 - We have to mount securityfs explicitly, or use /etc/fstab.
   It can cause a matter when we want to use this feature on
   very early phase on boot. (like /sbin/init)
 I'm not altogether clear how you intend this to work.

 Are you saying that some future version of libcap will require that
 securityfs be mounted before it (libcap) will work?
 Yes, but implementing this feature on securityfs might be not good
 idea as as James said. If this feature is on procfs or sysfs, it is
 not necessary to mount securityfs explicitly.

 Thanks,
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (Darwin)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
 iD8DBQFHfD7n+bHCR3gb8jsRAsgcAKDY6qXBR3JV2addHUg5sxyZHJEkDQCfdLOL
 zJlf3T4SQsUNENr3kwR5pr8=
 =v8C5
 -END PGP SIGNATURE-
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Exporting capability code/name pairs

2008-01-03 Thread KaiGai Kohei
James Morris wrote:
 On Wed, 2 Jan 2008, KaiGai Kohei wrote:
 
 Another issue is that securityfs depends on CONFIG_SECURITY, which might be
 undesirable, given that capabilities are a standard feature.
 We can implement this feature on another pseudo filesystems.
 Do you think what filesystem is the best candidate?
 I prefer procfs or sysfs instead.
 
 Sysfs makes more sense, as this information is system-wide and does not 
 relate to specific processes.

The following patch enables to export any capabilities which are supported
on the working kernel, under the /sys/kernel/capability.
You can obtain the code/name pairs of capabilities with scanning this directory.

Signed-off-by: KaiGai Kohei [EMAIL PROTECTED]
--
 kernel/Makefile   |9 +
 kernel/capability.c   |   36 
 scripts/mkcapnames.sh |   45 +
 3 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/kernel/Makefile b/kernel/Makefile
index dfa9695..29cd3ac 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -80,3 +80,12 @@ quiet_cmd_ikconfiggz = IKCFG   $@
 targets += config_data.h
 $(obj)/config_data.h: $(obj)/config_data.gz FORCE
$(call if_changed,ikconfiggz)
+
+# cap_names.h contains the code/name pair of capabilities.
+# It is generated using include/linux/capability.h automatically.
+$(obj)/capability.o: $(obj)/cap_names.h
+quiet_cmd_cap_names  = CAPS$@
+  cmd_cap_names  = /bin/sh $(src)/../scripts/mkcapnames.sh  $@
+targets += cap_names.h
+$(obj)/cap_names.h: $(src)/../scripts/mkcapnames.sh 
$(src)/../include/linux/capability.h FORCE
+   $(call if_changed,cap_names)
diff --git a/kernel/capability.c b/kernel/capability.c
index efbd9cd..14b4f4b 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -245,3 +245,39 @@ int capable(int cap)
return __capable(current, cap);
 }
 EXPORT_SYMBOL(capable);
+
+/*
+ * Export the list of capabilities on /sys/kernel/capability
+ */
+#define SYSFS_CAPABILITY_ENTRY(_name, _fmt, ...)   \
+   static ssize_t _name##_show(struct kset *kset, char *buffer)\
+   {   \
+   return scnprintf(buffer, PAGE_SIZE, _fmt, __VA_ARGS__); \
+   }   \
+   static struct subsys_attribute _name##_attr = __ATTR_RO(_name)
+
+/*
+ * capability_attrs[] is generated automatically by scripts/mkcapnames.sh
+ * This script parses include/linux/capability.h
+ */
+#include cap_names.h
+
+static struct attribute_group capability_attr_group = {
+   .name = capability,
+   .attrs = capability_attrs,
+};
+
+static int __init capability_export_names(void)
+{
+   int rc;
+
+   rc = sysfs_create_group(kernel_subsys.kobj,
+   capability_attr_group);
+   if (rc) {
+   printk(KERN_ERR Unable to export capabilities\n);
+   return rc;
+   }
+
+   return 0;
+}
+__initcall(capability_export_names);
diff --git a/scripts/mkcapnames.sh b/scripts/mkcapnames.sh
index e69de29..c1c8b1f 100644
--- a/scripts/mkcapnames.sh
+++ b/scripts/mkcapnames.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+#
+# generate a cap_names.h file from include/linux/capability.h
+#
+
+BASEDIR=`dirname $0`
+
+echo '#ifndef CAP_NAMES_H'
+echo '#define CAP_NAMES_H'
+echo
+echo '#ifndef SYSFS_CAPABILITY_ENTRY'
+echo '#error cap_names.h should be included from kernel/capability.c'
+echo '#else'
+
+echo 'SYSFS_CAPABILITY_ENTRY(version, 0x%08x\n, _LINUX_CAPABILITY_VERSION);'
+
+cat ${BASEDIR}/../include/linux/capability.h   \
+| egrep '^#define CAP_[A-Z_]+[ ]+[0-9]+$'  \
+| awk 'BEGIN {
+max_code = -1;
+}
+{
+if ($3  max_code)
+max_code = $3;
+printf(\tSYSFS_CAPABILITY_ENTRY(%s, \%%u\\n\, %s);\n, 
tolower($2), $2);
+}
+END {
+printf(\tSYSFS_CAPABILITY_ENTRY(index, \%%u\\n\, %u);\n, 
max_code);
+}'
+
+echo
+echo 'static struct attribute *capability_attrs[] = {'
+echo 'version_attr.attr,'
+echo 'index_attr.attr,'
+
+cat ${BASEDIR}/../include/linux/capability.h\
+| egrep '^#define CAP_[A-Z_]+[ ]+[0-9]+$'   \
+| awk '{ printf (%s_attr.attr,\n, tolower($2)); }'
+
+echo 'NULL,'
+echo '};'
+
+echo '#endif   /* SYSFS_CAPABILITY_ENTRY */'
+echo '#endif   /* CAP_NAMES_H */'
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Exporting capability code/name pairs

2008-01-02 Thread KaiGai Kohei
Andrew Morgan wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> KaiGai Kohei wrote:
>> Remaining issues:
>> - We have to mount securityfs explicitly, or use /etc/fstab.
>>   It can cause a matter when we want to use this feature on
>>   very early phase on boot. (like /sbin/init)
> 
> I'm not altogether clear how you intend this to work.
> 
> Are you saying that some future version of libcap will require that
> securityfs be mounted before it (libcap) will work?

Yes, but implementing this feature on securityfs might be not good
idea as as James said. If this feature is on procfs or sysfs, it is
not necessary to mount securityfs explicitly.

Thanks,
-- 
KaiGai Kohei <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Exporting capability code/name pairs

2008-01-02 Thread KaiGai Kohei

James Morris wrote:

On Fri, 28 Dec 2007, KaiGai Kohei wrote:


Remaining issues:
- We have to mount securityfs explicitly, or use /etc/fstab.
  It can cause a matter when we want to use this feature on
  very early phase on boot. (like /sbin/init)


Why can't early userspace itself mount securityfs?


Hmm,,,
It might be possible as load_policy() doing, if necessary.
Please forget the previous my opinion.

I'm not even sure this is a good idea at all.  Existing capabilities will 
never disappear, and, as with syscalls, it's probably up to userland to 
handle new ones not existing.


When we use libcap built on older kernel for newer kernel, libcap cannot
handle newly added capabilities, because it is not exist on the build 
environment.
Therefore, any available capabilities should be exported dynamically by the 
kernel.


In any case, some more technical issues:


kernel/cap_names.sh generates the body of cap_entries[] array,


This needs to be in the scripts directory.


OK, it will be moved.

The generated header should be made idempotent (#ifdef wrapping), and also 
include a warning that it is automatically generated (identifying the 
script which does so), and that is should not be edited.



+   d_caps = securityfs_create_dir("capability", NULL);
+   if (!d_caps)


Wrong way to check for error -- the function returns an ERR_PTR().


+   f_caps[i] = securityfs_create_file(cap_entries[i].name, 0444,
+  d_caps, _entries[i],
+  _entry_fops);
+   if (!f_caps[i])


Ditto.


OK,

Another issue is that securityfs depends on CONFIG_SECURITY, which might 
be undesirable, given that capabilities are a standard feature.


We can implement this feature on another pseudo filesystems.
Do you think what filesystem is the best candidate?
I prefer procfs or sysfs instead.

Thanks,
--
KaiGai Kohei <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Exporting capability code/name pairs

2008-01-02 Thread KaiGai Kohei

James Morris wrote:

On Fri, 28 Dec 2007, KaiGai Kohei wrote:


Remaining issues:
- We have to mount securityfs explicitly, or use /etc/fstab.
  It can cause a matter when we want to use this feature on
  very early phase on boot. (like /sbin/init)


Why can't early userspace itself mount securityfs?


Hmm,,,
It might be possible as load_policy() doing, if necessary.
Please forget the previous my opinion.

I'm not even sure this is a good idea at all.  Existing capabilities will 
never disappear, and, as with syscalls, it's probably up to userland to 
handle new ones not existing.


When we use libcap built on older kernel for newer kernel, libcap cannot
handle newly added capabilities, because it is not exist on the build 
environment.
Therefore, any available capabilities should be exported dynamically by the 
kernel.


In any case, some more technical issues:


kernel/cap_names.sh generates the body of cap_entries[] array,


This needs to be in the scripts directory.


OK, it will be moved.

The generated header should be made idempotent (#ifdef wrapping), and also 
include a warning that it is automatically generated (identifying the 
script which does so), and that is should not be edited.



+   d_caps = securityfs_create_dir(capability, NULL);
+   if (!d_caps)


Wrong way to check for error -- the function returns an ERR_PTR().


+   f_caps[i] = securityfs_create_file(cap_entries[i].name, 0444,
+  d_caps, cap_entries[i],
+  cap_entry_fops);
+   if (!f_caps[i])


Ditto.


OK,

Another issue is that securityfs depends on CONFIG_SECURITY, which might 
be undesirable, given that capabilities are a standard feature.


We can implement this feature on another pseudo filesystems.
Do you think what filesystem is the best candidate?
I prefer procfs or sysfs instead.

Thanks,
--
KaiGai Kohei [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Exporting capability code/name pairs

2008-01-02 Thread KaiGai Kohei
Andrew Morgan wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 KaiGai Kohei wrote:
 Remaining issues:
 - We have to mount securityfs explicitly, or use /etc/fstab.
   It can cause a matter when we want to use this feature on
   very early phase on boot. (like /sbin/init)
 
 I'm not altogether clear how you intend this to work.
 
 Are you saying that some future version of libcap will require that
 securityfs be mounted before it (libcap) will work?

Yes, but implementing this feature on securityfs might be not good
idea as as James said. If this feature is on procfs or sysfs, it is
not necessary to mount securityfs explicitly.

Thanks,
-- 
KaiGai Kohei [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Exporting capability code/name pairs

2007-12-27 Thread KaiGai Kohei

James Morris wrote:

On Fri, 28 Dec 2007, KaiGai Kohei wrote:


+   snprintf(tmp, sizeof(tmp),
+cap_entry == _entries[0] ? "0x%08x" :  "%u",
+cap_entry->code);
+   len = strlen(tmp);


You don't need to call strlen(), just use scnprintf() and grab the return 
value.


Thanks for your suggestion,

I'll fix it on the next posting.
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Exporting capability code/name pairs

2007-12-27 Thread KaiGai Kohei
The attached patch enables to export capability code/name pairs
under /capability of securityfs (revision 2).

Inprovements from the first revison:
- simple_read_from_buffer() is used for read method.
- cap_entries[] array is generated from include/linux/capability.h
  automatically.

Remaining issues:
- We have to mount securityfs explicitly, or use /etc/fstab.
  It can cause a matter when we want to use this feature on
  very early phase on boot. (like /sbin/init)

It was also concerned at the past.
  http://marc.info/?l=linux-kernel=112063963623190=2

How do you think an idea that the root of securityfs is mounted on
kernel/security of sysfs when it is initialized?
If securityfs got being available when kernel initializing process,
we can always use this feature and the matter will be resolved.

>>> +static struct cap_entry_data cap_entries[] = {
>>> +/* max number of supported format */
>>> +{ _LINUX_CAPABILITY_VERSION,"version" },
>>> +/* max number of capability */
>>> +{ CAP_LAST_CAP,"index" },
>>> +/* list of capabilities */
>>> +{ CAP_CHOWN,"cap_chown" },
>>> +{ CAP_DAC_OVERRIDE,"cap_dac_override" },
>   - snip -
>>> +{ CAP_MAC_OVERRIDE,"cap_mac_override" },
>>> +{ CAP_MAC_ADMIN,"cap_mac_admin" },
>>> +{ -1,NULL},
>>> +};
>>
>> I don't like this duplication with the list in 
>> include/linux/capability.h.
>> Now when a new cap is added, it needs to be
>>
>> 1. added to capability.h
>> 2. swapped as the new CAP_LAST_CAP in capability.h
>> 3. added to this list...
>>
>> Could you integrate the two lists (not sure how offhand), or at least
>> put them in the same place?

kernel/cap_names.sh generates the body of cap_entries[] array,
and it is invoked when we make the kernel.

Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]>
---
 Makefile |9 +++
 cap_names.sh |   21 
 capability.c |   75 +++
 3 files changed, 105 insertions(+)

diff --git a/kernel/Makefile b/kernel/Makefile
index dfa9695..45d6034 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -80,3 +80,12 @@ quiet_cmd_ikconfiggz = IKCFG   $@
 targets += config_data.h
 $(obj)/config_data.h: $(obj)/config_data.gz FORCE
$(call if_changed,ikconfiggz)
+
+# cap_names.h contains the code/name pair of capabilities.
+# It is generated using include/linux/capability.h automatically.
+$(obj)/capability.o: $(obj)/cap_names.h
+quiet_cmd_cap_names  = CAPS$@
+  cmd_cap_names  = /bin/sh $(src)/cap_names.sh > $@
+targets += cap_names.h
+$(obj)/cap_names.h: $(src)/cap_names.sh $(src)/../include/linux/capability.h 
FORCE
+   $(call if_changed,cap_names)
diff --git a/kernel/cap_names.sh b/kernel/cap_names.sh
index e69de29..7b2fcfe 100644
--- a/kernel/cap_names.sh
+++ b/kernel/cap_names.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+
+#
+# generate a cap_names.h file from include/linux/capability.h
+#
+
+BASEDIR=`dirname $0`
+
+cat ${BASEDIR}/../include/linux/capability.h   \
+| egrep '^#define CAP_[A-Z_]+[ ]+[0-9]+$'  \
+| awk 'BEGIN {
+   max_code = -1;
+  }
+  {
+   if ($3 > max_code)
+   max_code = $3;
+   printf("\t{ %s, \"%s\" },\n", $2, tolower($2));
+  }
+  END {
+   printf("\t{ %u, \"index\" },\n", max_code);
+  }'
diff --git a/kernel/capability.c b/kernel/capability.c
index efbd9cd..03a9b62 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -245,3 +245,78 @@ int capable(int cap)
return __capable(current, cap);
 }
 EXPORT_SYMBOL(capable);
+
+/*
+ * capability code/name pairs are exported under /sys/security/capability/
+ */
+struct cap_entry_data {
+   unsigned int code;
+   const char *name;
+};
+
+static struct cap_entry_data cap_entries[] = {
+   /* max number of supported format */
+   { _LINUX_CAPABILITY_VERSION, "version" },
+   /* list of capabilities */
+#include "cap_names.h"
+   { -1, NULL},
+};
+
+static ssize_t cap_entry_read(struct file *file, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+   struct cap_entry_data *cap_entry;
+   char tmp[32];
+   int len;
+
+   cap_entry = file->f_dentry->d_inode->i_private;
+   BUG_ON(!cap_entry);
+
+   snprintf(tmp, sizeof(tmp),
+cap_entry == _entries[0] ? "0x%08x" :  "%u",
+cap_entry->code);
+   len = strlen(tmp);
+
+   return simple_read_from_buffer(buffer, count, ppos, tmp, len);
+}
+
+const s

Re: [PATCH] Exporting capability code/name pairs

2007-12-27 Thread KaiGai Kohei

Serge E. Hallyn wrote:

Quoting KaiGai Kohei ([EMAIL PROTECTED]):

This patch enables to export the code/name pairs of capabilities under
/capability of securityfs.

In the current libcap, it obtains the list of capabilities from header file
on the build environment statically. However, it is not enough portable
between different versions of kernels, because an already built libcap
cannot have knowledge about new added capabilities.

Dynamic collection of code/name pairs of capabilities will resolve this
matter.

But it is not perfect one. I have a bit concern about this patch now.

1. I want to generate cap_entries array from linux/capability.h
   automatically. Is there any good idea?
2. We have to mount securityfs explicitly, or using /etc/fstab.
   It can make a matter when we want to use this features
   in very early boot sequence.

Any comment please.


I like the idea, but

  - snip -

+/*
+ * capability code/name pairs are exported under /sys/security/capability/
+ */
+struct cap_entry_data {
+   unsigned int code;
+   const char *name;
+};
+
+static struct cap_entry_data cap_entries[] = {
+   /* max number of supported format */
+   { _LINUX_CAPABILITY_VERSION,"version" },
+   /* max number of capability */
+   { CAP_LAST_CAP, "index" },
+   /* list of capabilities */
+   { CAP_CHOWN,"cap_chown" },
+   { CAP_DAC_OVERRIDE, "cap_dac_override" },

  - snip -

+   { CAP_MAC_OVERRIDE, "cap_mac_override" },
+   { CAP_MAC_ADMIN,"cap_mac_admin" },
+   { -1,   NULL},
+};


I don't like this duplication with the list in include/linux/capability.h.
Now when a new cap is added, it needs to be

1. added to capability.h
2. swapped as the new CAP_LAST_CAP in capability.h
3. added to this list...

Could you integrate the two lists (not sure how offhand), or at least
put them in the same place?


The following script will generate cap_entries[] array.

cat include/linux/capability.h  \
 | egrep '^#define[ \t]+CAP_[A-Z_]+[ \t]+[0-9]+$'   \
 | awk '{ printf("\t{ %s, \"\" },\n", $2, tolower($2)); }'

It is nice to include the result of this script, like as:

static struct cap_entry_data cap_entries[] = {
/* max number of supported format */
{ _LINUX_CAPABILITY_VERSION,"version" },
/* max number of capability */
{ CAP_LAST_CAP, "index" },
#include "capability_names.h"
{ -1,   NULL },
};

I guess we can put this script on Makefile like as kernel/timeconst.pl doing.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Exporting capability code/name pairs

2007-12-27 Thread KaiGai Kohei

Serge E. Hallyn wrote:

Quoting KaiGai Kohei ([EMAIL PROTECTED]):

This patch enables to export the code/name pairs of capabilities under
/capability of securityfs.

In the current libcap, it obtains the list of capabilities from header file
on the build environment statically. However, it is not enough portable
between different versions of kernels, because an already built libcap
cannot have knowledge about new added capabilities.

Dynamic collection of code/name pairs of capabilities will resolve this
matter.

But it is not perfect one. I have a bit concern about this patch now.

1. I want to generate cap_entries array from linux/capability.h
   automatically. Is there any good idea?
2. We have to mount securityfs explicitly, or using /etc/fstab.
   It can make a matter when we want to use this features
   in very early boot sequence.

Any comment please.


I like the idea, but

  - snip -

+/*
+ * capability code/name pairs are exported under /sys/security/capability/
+ */
+struct cap_entry_data {
+   unsigned int code;
+   const char *name;
+};
+
+static struct cap_entry_data cap_entries[] = {
+   /* max number of supported format */
+   { _LINUX_CAPABILITY_VERSION,version },
+   /* max number of capability */
+   { CAP_LAST_CAP, index },
+   /* list of capabilities */
+   { CAP_CHOWN,cap_chown },
+   { CAP_DAC_OVERRIDE, cap_dac_override },

  - snip -

+   { CAP_MAC_OVERRIDE, cap_mac_override },
+   { CAP_MAC_ADMIN,cap_mac_admin },
+   { -1,   NULL},
+};


I don't like this duplication with the list in include/linux/capability.h.
Now when a new cap is added, it needs to be

1. added to capability.h
2. swapped as the new CAP_LAST_CAP in capability.h
3. added to this list...

Could you integrate the two lists (not sure how offhand), or at least
put them in the same place?


The following script will generate cap_entries[] array.

cat include/linux/capability.h  \
 | egrep '^#define[ \t]+CAP_[A-Z_]+[ \t]+[0-9]+$'   \
 | awk '{ printf(\t{ %s, \\ },\n, $2, tolower($2)); }'

It is nice to include the result of this script, like as:

static struct cap_entry_data cap_entries[] = {
/* max number of supported format */
{ _LINUX_CAPABILITY_VERSION,version },
/* max number of capability */
{ CAP_LAST_CAP, index },
#include capability_names.h
{ -1,   NULL },
};

I guess we can put this script on Makefile like as kernel/timeconst.pl doing.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Exporting capability code/name pairs

2007-12-27 Thread KaiGai Kohei
The attached patch enables to export capability code/name pairs
under /capability of securityfs (revision 2).

Inprovements from the first revison:
- simple_read_from_buffer() is used for read method.
- cap_entries[] array is generated from include/linux/capability.h
  automatically.

Remaining issues:
- We have to mount securityfs explicitly, or use /etc/fstab.
  It can cause a matter when we want to use this feature on
  very early phase on boot. (like /sbin/init)

It was also concerned at the past.
  http://marc.info/?l=linux-kernelm=112063963623190w=2

How do you think an idea that the root of securityfs is mounted on
kernel/security of sysfs when it is initialized?
If securityfs got being available when kernel initializing process,
we can always use this feature and the matter will be resolved.

 +static struct cap_entry_data cap_entries[] = {
 +/* max number of supported format */
 +{ _LINUX_CAPABILITY_VERSION,version },
 +/* max number of capability */
 +{ CAP_LAST_CAP,index },
 +/* list of capabilities */
 +{ CAP_CHOWN,cap_chown },
 +{ CAP_DAC_OVERRIDE,cap_dac_override },
   - snip -
 +{ CAP_MAC_OVERRIDE,cap_mac_override },
 +{ CAP_MAC_ADMIN,cap_mac_admin },
 +{ -1,NULL},
 +};

 I don't like this duplication with the list in 
 include/linux/capability.h.
 Now when a new cap is added, it needs to be

 1. added to capability.h
 2. swapped as the new CAP_LAST_CAP in capability.h
 3. added to this list...

 Could you integrate the two lists (not sure how offhand), or at least
 put them in the same place?

kernel/cap_names.sh generates the body of cap_entries[] array,
and it is invoked when we make the kernel.

Signed-off-by: KaiGai Kohei [EMAIL PROTECTED]
---
 Makefile |9 +++
 cap_names.sh |   21 
 capability.c |   75 +++
 3 files changed, 105 insertions(+)

diff --git a/kernel/Makefile b/kernel/Makefile
index dfa9695..45d6034 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -80,3 +80,12 @@ quiet_cmd_ikconfiggz = IKCFG   $@
 targets += config_data.h
 $(obj)/config_data.h: $(obj)/config_data.gz FORCE
$(call if_changed,ikconfiggz)
+
+# cap_names.h contains the code/name pair of capabilities.
+# It is generated using include/linux/capability.h automatically.
+$(obj)/capability.o: $(obj)/cap_names.h
+quiet_cmd_cap_names  = CAPS$@
+  cmd_cap_names  = /bin/sh $(src)/cap_names.sh  $@
+targets += cap_names.h
+$(obj)/cap_names.h: $(src)/cap_names.sh $(src)/../include/linux/capability.h 
FORCE
+   $(call if_changed,cap_names)
diff --git a/kernel/cap_names.sh b/kernel/cap_names.sh
index e69de29..7b2fcfe 100644
--- a/kernel/cap_names.sh
+++ b/kernel/cap_names.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+
+#
+# generate a cap_names.h file from include/linux/capability.h
+#
+
+BASEDIR=`dirname $0`
+
+cat ${BASEDIR}/../include/linux/capability.h   \
+| egrep '^#define CAP_[A-Z_]+[ ]+[0-9]+$'  \
+| awk 'BEGIN {
+   max_code = -1;
+  }
+  {
+   if ($3  max_code)
+   max_code = $3;
+   printf(\t{ %s, \%s\ },\n, $2, tolower($2));
+  }
+  END {
+   printf(\t{ %u, \index\ },\n, max_code);
+  }'
diff --git a/kernel/capability.c b/kernel/capability.c
index efbd9cd..03a9b62 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -245,3 +245,78 @@ int capable(int cap)
return __capable(current, cap);
 }
 EXPORT_SYMBOL(capable);
+
+/*
+ * capability code/name pairs are exported under /sys/security/capability/
+ */
+struct cap_entry_data {
+   unsigned int code;
+   const char *name;
+};
+
+static struct cap_entry_data cap_entries[] = {
+   /* max number of supported format */
+   { _LINUX_CAPABILITY_VERSION, version },
+   /* list of capabilities */
+#include cap_names.h
+   { -1, NULL},
+};
+
+static ssize_t cap_entry_read(struct file *file, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+   struct cap_entry_data *cap_entry;
+   char tmp[32];
+   int len;
+
+   cap_entry = file-f_dentry-d_inode-i_private;
+   BUG_ON(!cap_entry);
+
+   snprintf(tmp, sizeof(tmp),
+cap_entry == cap_entries[0] ? 0x%08x :  %u,
+cap_entry-code);
+   len = strlen(tmp);
+
+   return simple_read_from_buffer(buffer, count, ppos, tmp, len);
+}
+
+const struct file_operations cap_entry_fops = {
+   .read = cap_entry_read,
+};
+
+int __init cap_names_export(void)
+{
+   struct dentry *d_caps, *f_caps[ARRAY_SIZE(cap_entries)];
+   int i;
+
+   /* init max number of capability*/
+   cap_entries[1].code = ARRAY_SIZE(cap_entries) - 3;
+
+   d_caps = securityfs_create_dir(capability, NULL);
+   if (!d_caps)
+   goto error0;
+
+   memset(f_caps, 0

Re: [PATCH] Exporting capability code/name pairs

2007-12-27 Thread KaiGai Kohei

James Morris wrote:

On Fri, 28 Dec 2007, KaiGai Kohei wrote:


+   snprintf(tmp, sizeof(tmp),
+cap_entry == cap_entries[0] ? 0x%08x :  %u,
+cap_entry-code);
+   len = strlen(tmp);


You don't need to call strlen(), just use scnprintf() and grab the return 
value.


Thanks for your suggestion,

I'll fix it on the next posting.
--
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Exporting capability code/name pairs

2007-12-26 Thread KaiGai Kohei
This patch enables to export the code/name pairs of capabilities under
/capability of securityfs.

In the current libcap, it obtains the list of capabilities from header file
on the build environment statically. However, it is not enough portable
between different versions of kernels, because an already built libcap
cannot have knowledge about new added capabilities.

Dynamic collection of code/name pairs of capabilities will resolve this
matter.

But it is not perfect one. I have a bit concern about this patch now.

1. I want to generate cap_entries array from linux/capability.h
   automatically. Is there any good idea?
2. We have to mount securityfs explicitly, or using /etc/fstab.
   It can make a matter when we want to use this features
   in very early boot sequence.

Any comment please.

usage:
---
# mount -t securityfs none /sys/kernel/security
# cd /sys/kernel/security/capability
# ls
cap_audit_controlcap_kill  cap_setpcap cap_sys_rawio
cap_audit_write  cap_lease cap_setuid  cap_sys_resource
cap_chowncap_linux_immutable   cap_sys_admin   cap_sys_time
cap_dac_override cap_mknod cap_sys_bootcap_sys_tty_config
cap_dac_read_search  cap_net_admin cap_sys_chroot  index
cap_fowner   cap_net_bind_service  cap_sys_module  version
cap_fsetid   cap_net_broadcast cap_sys_nice
cap_ipc_lock cap_setfcap   cap_sys_pacct
cap_ipc_ownercap_setgidcap_sys_ptrace
# cat cap_audit_write ; echo
29
# cat cap_sys_chroot ; echo
18
# cat version ; echo
0x19980330
# cat index; echo
31
#

-- 
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>

Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]>
---
 capability.c |  127 +++
 1 file changed, 127 insertions(+)

diff --git a/kernel/capability.c b/kernel/capability.c
index efbd9cd..5d9bf53 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -245,3 +245,131 @@ int capable(int cap)
return __capable(current, cap);
 }
 EXPORT_SYMBOL(capable);
+
+/*
+ * Capability code/name pair exporting
+ */
+
+/*
+ * capability code/name pairs are exported under /sys/security/capability/
+ */
+struct cap_entry_data {
+   unsigned int code;
+   const char *name;
+};
+
+static struct cap_entry_data cap_entries[] = {
+   /* max number of supported format */
+   { _LINUX_CAPABILITY_VERSION,"version" },
+   /* max number of capability */
+   { CAP_LAST_CAP, "index" },
+   /* list of capabilities */
+   { CAP_CHOWN,"cap_chown" },
+   { CAP_DAC_OVERRIDE, "cap_dac_override" },
+   { CAP_DAC_READ_SEARCH,  "cap_dac_read_search" },
+   { CAP_FOWNER,   "cap_fowner" },
+   { CAP_FSETID,   "cap_fsetid" },
+   { CAP_KILL, "cap_kill" },
+   { CAP_SETGID,   "cap_setgid" },
+   { CAP_SETUID,   "cap_setuid" },
+   { CAP_SETPCAP,  "cap_setpcap" },
+   { CAP_LINUX_IMMUTABLE,  "cap_linux_immutable" },
+   { CAP_NET_BIND_SERVICE, "cap_net_bind_service" },
+   { CAP_NET_BROADCAST,"cap_net_broadcast" },
+   { CAP_NET_ADMIN,"cap_net_admin" },
+   { CAP_NET_RAW,  "cap_net_admin" },
+   { CAP_IPC_LOCK, "cap_ipc_lock" },
+   { CAP_IPC_OWNER,"cap_ipc_owner" },
+   { CAP_SYS_MODULE,   "cap_sys_module" },
+   { CAP_SYS_RAWIO,"cap_sys_rawio" },
+   { CAP_SYS_CHROOT,   "cap_sys_chroot" },
+   { CAP_SYS_PTRACE,   "cap_sys_ptrace" },
+   { CAP_SYS_PACCT,"cap_sys_pacct" },
+   { CAP_SYS_ADMIN,"cap_sys_admin" },
+   { CAP_SYS_BOOT, "cap_sys_boot" },
+   { CAP_SYS_NICE, "cap_sys_nice" },
+   { CAP_SYS_RESOURCE, "cap_sys_resource" },
+   { CAP_SYS_TIME, "cap_sys_time" },
+   { CAP_SYS_TTY_CONFIG,   "cap_sys_tty_config" },
+   { CAP_MKNOD,"cap_mknod" },
+   { CAP_LEASE,"cap_lease" },
+   { CAP_AUDIT_WRITE,  "cap_audit_write" },
+   { CAP_AUDIT_CONTROL,"cap_audit_control" },
+   { CAP_SETFCAP,  "cap_setfcap" },
+   { CAP_MAC_OVERRIDE, "cap_mac_override" },
+   { CAP_MAC_ADM

[PATCH] Exporting capability code/name pairs

2007-12-26 Thread KaiGai Kohei
This patch enables to export the code/name pairs of capabilities under
/capability of securityfs.

In the current libcap, it obtains the list of capabilities from header file
on the build environment statically. However, it is not enough portable
between different versions of kernels, because an already built libcap
cannot have knowledge about new added capabilities.

Dynamic collection of code/name pairs of capabilities will resolve this
matter.

But it is not perfect one. I have a bit concern about this patch now.

1. I want to generate cap_entries array from linux/capability.h
   automatically. Is there any good idea?
2. We have to mount securityfs explicitly, or using /etc/fstab.
   It can make a matter when we want to use this features
   in very early boot sequence.

Any comment please.

usage:
---
# mount -t securityfs none /sys/kernel/security
# cd /sys/kernel/security/capability
# ls
cap_audit_controlcap_kill  cap_setpcap cap_sys_rawio
cap_audit_write  cap_lease cap_setuid  cap_sys_resource
cap_chowncap_linux_immutable   cap_sys_admin   cap_sys_time
cap_dac_override cap_mknod cap_sys_bootcap_sys_tty_config
cap_dac_read_search  cap_net_admin cap_sys_chroot  index
cap_fowner   cap_net_bind_service  cap_sys_module  version
cap_fsetid   cap_net_broadcast cap_sys_nice
cap_ipc_lock cap_setfcap   cap_sys_pacct
cap_ipc_ownercap_setgidcap_sys_ptrace
# cat cap_audit_write ; echo
29
# cat cap_sys_chroot ; echo
18
# cat version ; echo
0x19980330
# cat index; echo
31
#

-- 
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]

Signed-off-by: KaiGai Kohei [EMAIL PROTECTED]
---
 capability.c |  127 +++
 1 file changed, 127 insertions(+)

diff --git a/kernel/capability.c b/kernel/capability.c
index efbd9cd..5d9bf53 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -245,3 +245,131 @@ int capable(int cap)
return __capable(current, cap);
 }
 EXPORT_SYMBOL(capable);
+
+/*
+ * Capability code/name pair exporting
+ */
+
+/*
+ * capability code/name pairs are exported under /sys/security/capability/
+ */
+struct cap_entry_data {
+   unsigned int code;
+   const char *name;
+};
+
+static struct cap_entry_data cap_entries[] = {
+   /* max number of supported format */
+   { _LINUX_CAPABILITY_VERSION,version },
+   /* max number of capability */
+   { CAP_LAST_CAP, index },
+   /* list of capabilities */
+   { CAP_CHOWN,cap_chown },
+   { CAP_DAC_OVERRIDE, cap_dac_override },
+   { CAP_DAC_READ_SEARCH,  cap_dac_read_search },
+   { CAP_FOWNER,   cap_fowner },
+   { CAP_FSETID,   cap_fsetid },
+   { CAP_KILL, cap_kill },
+   { CAP_SETGID,   cap_setgid },
+   { CAP_SETUID,   cap_setuid },
+   { CAP_SETPCAP,  cap_setpcap },
+   { CAP_LINUX_IMMUTABLE,  cap_linux_immutable },
+   { CAP_NET_BIND_SERVICE, cap_net_bind_service },
+   { CAP_NET_BROADCAST,cap_net_broadcast },
+   { CAP_NET_ADMIN,cap_net_admin },
+   { CAP_NET_RAW,  cap_net_admin },
+   { CAP_IPC_LOCK, cap_ipc_lock },
+   { CAP_IPC_OWNER,cap_ipc_owner },
+   { CAP_SYS_MODULE,   cap_sys_module },
+   { CAP_SYS_RAWIO,cap_sys_rawio },
+   { CAP_SYS_CHROOT,   cap_sys_chroot },
+   { CAP_SYS_PTRACE,   cap_sys_ptrace },
+   { CAP_SYS_PACCT,cap_sys_pacct },
+   { CAP_SYS_ADMIN,cap_sys_admin },
+   { CAP_SYS_BOOT, cap_sys_boot },
+   { CAP_SYS_NICE, cap_sys_nice },
+   { CAP_SYS_RESOURCE, cap_sys_resource },
+   { CAP_SYS_TIME, cap_sys_time },
+   { CAP_SYS_TTY_CONFIG,   cap_sys_tty_config },
+   { CAP_MKNOD,cap_mknod },
+   { CAP_LEASE,cap_lease },
+   { CAP_AUDIT_WRITE,  cap_audit_write },
+   { CAP_AUDIT_CONTROL,cap_audit_control },
+   { CAP_SETFCAP,  cap_setfcap },
+   { CAP_MAC_OVERRIDE, cap_mac_override },
+   { CAP_MAC_ADMIN,cap_mac_admin },
+   { -1,   NULL},
+};
+
+static ssize_t cap_entry_read(struct file *file, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+   struct cap_entry_data *cap_entry;
+   size_t len, ofs = *ppos;
+   char tmp[32];
+   int rc;
+
+   cap_entry = file-f_dentry-d_inode-i_private;
+   if (!cap_entry)
+   return

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-06 Thread KaiGai Kohei
Sorry, any TABs are replaced by MUA.
I'll send the patch again.

> The attached patch provides several improvement for pam_cap module.
> 1. It enables pam_cap to drop capabilities from process'es capability
>bounding set.
> 2. It enables to specify allowing inheritable capability set or dropping
>bounding capability set for groups, not only users.
> 3. It provide pam_sm_session() method, not only pam_sm_authenticate()
>and pam_sm_setcred(). A system administrator can select more
>appropriate mode for his purpose.
> 4. In the auth/cred mode, it enables to cache the configuration file,
>to avoid read and analyze it twice.
> (Therefore, most of the part in the original one got replaced)
> 
> The default configuration file is "/etc/security/capability.conf".
> You can describe as follows:
> 
> # kaigai get cap_net_raw and cap_kill, tak get cap_sys_pacct pI.
> # We can omit "i:" in the head of each line.
> i:cap_net_raw,cap_kill kaigai
> cap_sys_pacct  tak
> 
> # ymj and tak lost cap_sys_chroot from cap_bset
> b:cap_sys_chroot   ymj  tak
> 
> # Any user within webadm group get cap_net_bind_service pI.
> i:cap_net_bind_service @webadm
> 
> # Any user within users group lost cap_sys_module from cap_bset
> b:cap_sys_module   @users
> 
> 
> When a user or groups he belongs is on several lines, all configurations
> are simplly compounded.
> 
> In the above example, if tak belongs to webadm and users group,
> he will get cap_sys_pacct and cap_net_bind_service pI, and lost
> cap_sys_chroot and cap_sys_module from his cap_bset.
> 
> Thanks,

Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]>
--
 pam_cap/capability.conf |6 +
 pam_cap/pam_cap.c   |  495 ---
 2 files changed, 305 insertions(+), 196 deletions(-)

diff --git a/pam_cap/capability.conf b/pam_cap/capability.conf
index b543142..707cdc3 100644
--- a/pam_cap/capability.conf
+++ b/pam_cap/capability.conf
@@ -24,6 +24,12 @@ cap_setfcap  morgan
 ## 'everyone else' gets no inheritable capabilities
 none  *

+# user 'kaigai' lost CAP_NET_RAW capability from bounding set
+b:cap_net_raw   kaigai
+
+# group 'acctadm' get CAP_SYS_PACCT inheritable capability
+i:cap_sys_pacct @acctadm
+
 ## if there is no '*' entry, all users not explicitly mentioned will
 ## get all available capabilities. This is a permissive default, and
 ## probably not what you want...
diff --git a/pam_cap/pam_cap.c b/pam_cap/pam_cap.c
index 94c5ebc..a917d5c 100644
--- a/pam_cap/pam_cap.c
+++ b/pam_cap/pam_cap.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 1999,2007 Andrew G. Morgan <[EMAIL PROTECTED]>
+ * Copyright (c) 2007  KaiGai Kohei <[EMAIL PROTECTED]>
  *
  * The purpose of this module is to enforce inheritable capability sets
  * for a specified user.
@@ -13,298 +14,400 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include 
+#include 

 #include 
 #include 

+#define MODULE_NAME"pam_cap"
 #define USER_CAP_FILE   "/etc/security/capability.conf"
 #define CAP_FILE_BUFFER_SIZE4096
 #define CAP_FILE_DELIMITERS " \t\n"
-#define CAP_COMBINED_FORMAT "%s all-i %s+i"
-#define CAP_DROP_ALL"%s all-i"
+
+#ifndef PR_CAPBSET_DROP
+#define PR_CAPBSET_DROP24
+#endif
+
+extern char const *_cap_names[];

 struct pam_cap_s {
 int debug;
 const char *user;
 const char *conf_filename;
+/* set in read_capabilities_for_user() */
+cap_t result;
+int do_set_inh : 1;
+int do_set_bset : 1;
 };

-/* obtain the inheritable capabilities for the current user */
-
-static char *read_capabilities_for_user(const char *user, const char *source)
+/* obtain the inheritable/bounding capabilities for the current user */
+static int read_capabilities_for_user(struct pam_cap_s *pcs)
 {
-char *cap_string = NULL;
-char buffer[CAP_FILE_BUFFER_SIZE], *line;
+char buffer[CAP_FILE_BUFFER_SIZE];
 FILE *cap_file;
+struct passwd *pwd;
+int line_num = 0;
+int rc = -1;   /* PAM_(AUTH|CRED|SESSION)_ERR */
+
+pwd = getpwnam(pcs->user);
+if (!pwd) {
+   syslog(LOG_ERR, "user %s not in passwd entries", pcs->user);
+   return PAM_AUTH_ERR;
+}

-cap_file = fopen(source, "r");
-if (cap_file == NULL) {
-   D(("failed to open capability file"));
-   return NULL;
+cap_file = fopen(pcs->conf_filename, "r");
+if (!cap_file) {
+   if (errno == ENOENT) {
+   syslog(LOG_NOTICE, "%s is not found",
+  pcs->conf_filename);
+   return PAM_IGNORE;
+   } else {
+   syslog(LOG_ERR, "unable to open '%s

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-06 Thread KaiGai Kohei

Andrew Morgan wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

KaiGai Kohei wrote:

BTW, could you tell me your intention about pam_cap.c is implemented
with pam_sm_authenticate() and pam_sm_setcred()?
I think it can be done with pam_sm_open_session(), and this approach
enables to reduce the iteration of reading /etc/security/capability.conf.

How do you think the idea?


Good question! If you want to add session support you can. I'd prefer it
if you retained support for the auth/cred API too: admin choice and all
that. To remove the second read of the file, you can use a PAM data item
to cache the desired capability info after the first read of the file.


I added session API support with retaining auth/cred API, and
PAM data item caching feature using pam_(get|set)_data. It enables to
kill the seconf read of the configuration file.

Thanks for your suggestion.
Followings are detailed explanation and the patch.


I implemented it as a credential module (which has to get the
authentication return code right to make the credential stack execute
correctly) because I think of capabilities as credentials.

That being said, the credentials vs. session thing is not well
delineated by many applications, so it is arguably useful to provide
both interfaces for the admin to make use of on a per application basis.


The attached patch provides several improvement for pam_cap module.
1. It enables pam_cap to drop capabilities from process'es capability
   bounding set.
2. It enables to specify allowing inheritable capability set or dropping
   bounding capability set for groups, not only users.
3. It provide pam_sm_session() method, not only pam_sm_authenticate()
   and pam_sm_setcred(). A system administrator can select more
   appropriate mode for his purpose.
4. In the auth/cred mode, it enables to cache the configuration file,
   to avoid read and analyze it twice.
(Therefore, most of the part in the original one got replaced)

The default configuration file is "/etc/security/capability.conf".
You can describe as follows:

# kaigai get cap_net_raw and cap_kill, tak get cap_sys_pacct pI.
# We can omit "i:" in the head of each line.
i:cap_net_raw,cap_kill kaigai
cap_sys_pacct  tak

# ymj and tak lost cap_sys_chroot from cap_bset
b:cap_sys_chroot   ymj  tak

# Any user within webadm group get cap_net_bind_service pI.
i:cap_net_bind_service @webadm

# Any user within users group lost cap_sys_module from cap_bset
b:cap_sys_module   @users


When a user or groups he belongs is on several lines, all configurations
are simplly compounded.

In the above example, if tak belongs to webadm and users group,
he will get cap_sys_pacct and cap_net_bind_service pI, and lost
cap_sys_chroot and cap_sys_module from his cap_bset.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>


Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]>
--
 pam_cap/capability.conf |6 +
 pam_cap/pam_cap.c   |  495 ---
 2 files changed, 305 insertions(+), 196 deletions(-)

diff --git a/pam_cap/capability.conf b/pam_cap/capability.conf
index b543142..707cdc3 100644
--- a/pam_cap/capability.conf
+++ b/pam_cap/capability.conf
@@ -24,6 +24,12 @@ cap_setfcap  morgan
 ## 'everyone else' gets no inheritable capabilities
 none  *

+# user 'kaigai' lost CAP_NET_RAW capability from bounding set
+b:cap_net_raw   kaigai
+
+# group 'acctadm' get CAP_SYS_PACCT inheritable capability
+i:cap_sys_pacct @acctadm
+
 ## if there is no '*' entry, all users not explicitly mentioned will
 ## get all available capabilities. This is a permissive default, and
 ## probably not what you want...
diff --git a/pam_cap/pam_cap.c b/pam_cap/pam_cap.c
index 94c5ebc..a917d5c 100644
--- a/pam_cap/pam_cap.c
+++ b/pam_cap/pam_cap.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 1999,2007 Andrew G. Morgan <[EMAIL PROTECTED]>
+ * Copyright (c) 2007  KaiGai Kohei <[EMAIL PROTECTED]>
  *
  * The purpose of this module is to enforce inheritable capability sets
  * for a specified user.
@@ -13,298 +14,400 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include 
+#include 

 #include 
 #include 

+#define MODULE_NAME"pam_cap"
 #define USER_CAP_FILE   "/etc/security/capability.conf"
 #define CAP_FILE_BUFFER_SIZE4096
 #define CAP_FILE_DELIMITERS " \t\n"
-#define CAP_COMBINED_FORMAT "%s all-i %s+i"
-#define CAP_DROP_ALL"%s all-i"
+
+#ifndef PR_CAPBSET_DROP
+#define PR_CAPBSET_DROP24
+#endif
+
+extern char const *_cap_names[];

 struct pam_cap_s {
 int debug;
 const char *user;
 const char *conf_filename;
+/* set in read_capabilities_for_user() */
+cap_t result;
+int do_set_inh : 1;
+int do_set

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-06 Thread KaiGai Kohei

Andrew Morgan wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

KaiGai Kohei wrote:

BTW, could you tell me your intention about pam_cap.c is implemented
with pam_sm_authenticate() and pam_sm_setcred()?
I think it can be done with pam_sm_open_session(), and this approach
enables to reduce the iteration of reading /etc/security/capability.conf.

How do you think the idea?


Good question! If you want to add session support you can. I'd prefer it
if you retained support for the auth/cred API too: admin choice and all
that. To remove the second read of the file, you can use a PAM data item
to cache the desired capability info after the first read of the file.


I added session API support with retaining auth/cred API, and
PAM data item caching feature using pam_(get|set)_data. It enables to
kill the seconf read of the configuration file.

Thanks for your suggestion.
Followings are detailed explanation and the patch.


I implemented it as a credential module (which has to get the
authentication return code right to make the credential stack execute
correctly) because I think of capabilities as credentials.

That being said, the credentials vs. session thing is not well
delineated by many applications, so it is arguably useful to provide
both interfaces for the admin to make use of on a per application basis.


The attached patch provides several improvement for pam_cap module.
1. It enables pam_cap to drop capabilities from process'es capability
   bounding set.
2. It enables to specify allowing inheritable capability set or dropping
   bounding capability set for groups, not only users.
3. It provide pam_sm_session() method, not only pam_sm_authenticate()
   and pam_sm_setcred(). A system administrator can select more
   appropriate mode for his purpose.
4. In the auth/cred mode, it enables to cache the configuration file,
   to avoid read and analyze it twice.
(Therefore, most of the part in the original one got replaced)

The default configuration file is /etc/security/capability.conf.
You can describe as follows:

# kaigai get cap_net_raw and cap_kill, tak get cap_sys_pacct pI.
# We can omit i: in the head of each line.
i:cap_net_raw,cap_kill kaigai
cap_sys_pacct  tak

# ymj and tak lost cap_sys_chroot from cap_bset
b:cap_sys_chroot   ymj  tak

# Any user within webadm group get cap_net_bind_service pI.
i:cap_net_bind_service @webadm

# Any user within users group lost cap_sys_module from cap_bset
b:cap_sys_module   @users


When a user or groups he belongs is on several lines, all configurations
are simplly compounded.

In the above example, if tak belongs to webadm and users group,
he will get cap_sys_pacct and cap_net_bind_service pI, and lost
cap_sys_chroot and cap_sys_module from his cap_bset.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]


Signed-off-by: KaiGai Kohei [EMAIL PROTECTED]
--
 pam_cap/capability.conf |6 +
 pam_cap/pam_cap.c   |  495 ---
 2 files changed, 305 insertions(+), 196 deletions(-)

diff --git a/pam_cap/capability.conf b/pam_cap/capability.conf
index b543142..707cdc3 100644
--- a/pam_cap/capability.conf
+++ b/pam_cap/capability.conf
@@ -24,6 +24,12 @@ cap_setfcap  morgan
 ## 'everyone else' gets no inheritable capabilities
 none  *

+# user 'kaigai' lost CAP_NET_RAW capability from bounding set
+b:cap_net_raw   kaigai
+
+# group 'acctadm' get CAP_SYS_PACCT inheritable capability
+i:cap_sys_pacct @acctadm
+
 ## if there is no '*' entry, all users not explicitly mentioned will
 ## get all available capabilities. This is a permissive default, and
 ## probably not what you want...
diff --git a/pam_cap/pam_cap.c b/pam_cap/pam_cap.c
index 94c5ebc..a917d5c 100644
--- a/pam_cap/pam_cap.c
+++ b/pam_cap/pam_cap.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 1999,2007 Andrew G. Morgan [EMAIL PROTECTED]
+ * Copyright (c) 2007  KaiGai Kohei [EMAIL PROTECTED]
  *
  * The purpose of this module is to enforce inheritable capability sets
  * for a specified user.
@@ -13,298 +14,400 @@
 #include stdarg.h
 #include stdlib.h
 #include syslog.h
+#include pwd.h
+#include grp.h

 #include sys/capability.h
+#include sys/prctl.h

 #include security/pam_modules.h
 #include security/_pam_macros.h

+#define MODULE_NAMEpam_cap
 #define USER_CAP_FILE   /etc/security/capability.conf
 #define CAP_FILE_BUFFER_SIZE4096
 #define CAP_FILE_DELIMITERS  \t\n
-#define CAP_COMBINED_FORMAT %s all-i %s+i
-#define CAP_DROP_ALL%s all-i
+
+#ifndef PR_CAPBSET_DROP
+#define PR_CAPBSET_DROP24
+#endif
+
+extern char const *_cap_names[];

 struct pam_cap_s {
 int debug;
 const char *user;
 const char *conf_filename;
+/* set in read_capabilities_for_user() */
+cap_t result;
+int do_set_inh : 1;
+int do_set_bset : 1;
 };

-/* obtain

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-06 Thread KaiGai Kohei
Sorry, any TABs are replaced by MUA.
I'll send the patch again.

 The attached patch provides several improvement for pam_cap module.
 1. It enables pam_cap to drop capabilities from process'es capability
bounding set.
 2. It enables to specify allowing inheritable capability set or dropping
bounding capability set for groups, not only users.
 3. It provide pam_sm_session() method, not only pam_sm_authenticate()
and pam_sm_setcred(). A system administrator can select more
appropriate mode for his purpose.
 4. In the auth/cred mode, it enables to cache the configuration file,
to avoid read and analyze it twice.
 (Therefore, most of the part in the original one got replaced)
 
 The default configuration file is /etc/security/capability.conf.
 You can describe as follows:
 
 # kaigai get cap_net_raw and cap_kill, tak get cap_sys_pacct pI.
 # We can omit i: in the head of each line.
 i:cap_net_raw,cap_kill kaigai
 cap_sys_pacct  tak
 
 # ymj and tak lost cap_sys_chroot from cap_bset
 b:cap_sys_chroot   ymj  tak
 
 # Any user within webadm group get cap_net_bind_service pI.
 i:cap_net_bind_service @webadm
 
 # Any user within users group lost cap_sys_module from cap_bset
 b:cap_sys_module   @users
 
 
 When a user or groups he belongs is on several lines, all configurations
 are simplly compounded.
 
 In the above example, if tak belongs to webadm and users group,
 he will get cap_sys_pacct and cap_net_bind_service pI, and lost
 cap_sys_chroot and cap_sys_module from his cap_bset.
 
 Thanks,

Signed-off-by: KaiGai Kohei [EMAIL PROTECTED]
--
 pam_cap/capability.conf |6 +
 pam_cap/pam_cap.c   |  495 ---
 2 files changed, 305 insertions(+), 196 deletions(-)

diff --git a/pam_cap/capability.conf b/pam_cap/capability.conf
index b543142..707cdc3 100644
--- a/pam_cap/capability.conf
+++ b/pam_cap/capability.conf
@@ -24,6 +24,12 @@ cap_setfcap  morgan
 ## 'everyone else' gets no inheritable capabilities
 none  *

+# user 'kaigai' lost CAP_NET_RAW capability from bounding set
+b:cap_net_raw   kaigai
+
+# group 'acctadm' get CAP_SYS_PACCT inheritable capability
+i:cap_sys_pacct @acctadm
+
 ## if there is no '*' entry, all users not explicitly mentioned will
 ## get all available capabilities. This is a permissive default, and
 ## probably not what you want...
diff --git a/pam_cap/pam_cap.c b/pam_cap/pam_cap.c
index 94c5ebc..a917d5c 100644
--- a/pam_cap/pam_cap.c
+++ b/pam_cap/pam_cap.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 1999,2007 Andrew G. Morgan [EMAIL PROTECTED]
+ * Copyright (c) 2007  KaiGai Kohei [EMAIL PROTECTED]
  *
  * The purpose of this module is to enforce inheritable capability sets
  * for a specified user.
@@ -13,298 +14,400 @@
 #include stdarg.h
 #include stdlib.h
 #include syslog.h
+#include pwd.h
+#include grp.h

 #include sys/capability.h
+#include sys/prctl.h

 #include security/pam_modules.h
 #include security/_pam_macros.h

+#define MODULE_NAMEpam_cap
 #define USER_CAP_FILE   /etc/security/capability.conf
 #define CAP_FILE_BUFFER_SIZE4096
 #define CAP_FILE_DELIMITERS  \t\n
-#define CAP_COMBINED_FORMAT %s all-i %s+i
-#define CAP_DROP_ALL%s all-i
+
+#ifndef PR_CAPBSET_DROP
+#define PR_CAPBSET_DROP24
+#endif
+
+extern char const *_cap_names[];

 struct pam_cap_s {
 int debug;
 const char *user;
 const char *conf_filename;
+/* set in read_capabilities_for_user() */
+cap_t result;
+int do_set_inh : 1;
+int do_set_bset : 1;
 };

-/* obtain the inheritable capabilities for the current user */
-
-static char *read_capabilities_for_user(const char *user, const char *source)
+/* obtain the inheritable/bounding capabilities for the current user */
+static int read_capabilities_for_user(struct pam_cap_s *pcs)
 {
-char *cap_string = NULL;
-char buffer[CAP_FILE_BUFFER_SIZE], *line;
+char buffer[CAP_FILE_BUFFER_SIZE];
 FILE *cap_file;
+struct passwd *pwd;
+int line_num = 0;
+int rc = -1;   /* PAM_(AUTH|CRED|SESSION)_ERR */
+
+pwd = getpwnam(pcs-user);
+if (!pwd) {
+   syslog(LOG_ERR, user %s not in passwd entries, pcs-user);
+   return PAM_AUTH_ERR;
+}

-cap_file = fopen(source, r);
-if (cap_file == NULL) {
-   D((failed to open capability file));
-   return NULL;
+cap_file = fopen(pcs-conf_filename, r);
+if (!cap_file) {
+   if (errno == ENOENT) {
+   syslog(LOG_NOTICE, %s is not found,
+  pcs-conf_filename);
+   return PAM_IGNORE;
+   } else {
+   syslog(LOG_ERR, unable to open '%s' (%s),
+  pcs-conf_filename, strerror(errno));
+   return rc;
+   }
 }

-while ((line = fgets(buffer, CAP_FILE_BUFFER_SIZE, cap_file))) {
-   int found_one = 0;
-   const char *cap_text

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-05 Thread KaiGai Kohei

Andrew Morgan wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

KaiGai Kohei wrote:

Andrew Morgan wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

KaiGai Kohei wrote:

+if (!!cap_issubset(*inheritable,
+   cap_combine(target->cap_inheritable,
+   current->cap_bset))) {
+/* no new pI capabilities outside bounding set */
+return -EPERM;
+}
 



Yes, the !! was a bug. The correct check is a single !.

I was in trouble with getting -EPERM at pam_cap.so :-)


(Thus, the correct check says no 'new' pI bits can be outside cap_bset.)

If this condition intends to dominate 'new' pI bits by 'old' pI bits masked
with bounding set, we should not apply cap_combine() here.
I think applying cap_intersect() is correct for the purpose.


The check is not meant to limit existing pI bits.

The check is meant to limit what new bits can be 'added' to pI (in the
case that pE & CAP_SETPCAP is true).


Thanks, I got understood as I wrote in the previous reply.

BTW, could you tell me your intention about pam_cap.c is implemented
with pam_sm_authenticate() and pam_sm_setcred()?
I think it can be done with pam_sm_open_session(), and this approach
enables to reduce the iteration of reading /etc/security/capability.conf.

How do you think the idea?
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-05 Thread KaiGai Kohei

(Thus, the correct check says no 'new' pI bits can be outside cap_bset.)

If this condition intends to dominate 'new' pI bits by 'old' pI bits masked
with bounding set, we should not apply cap_combine() here.
I think applying cap_intersect() is correct for the purpose.


That would have been my first inclination, but Andrew actually
wanted to be able to keep a pI with bits not in the capability
bounding set.  And it's really not a big problem, since

1. you can never grow cap_bset
2. the capbound.c program just makes sure to call capset
   to take the bit being removed from cap_bset out of
   pI'
3. It could be advantageous for some daemon to keep a bit
   in pI which can never be gained through fP but can be
   gained by a child through (fI).

Does that seem reasonable to you?


OK, I got understood the intention of the condition.
It seems to me reasonable policy.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>


This patch fixes incorrect condition added by per-process capability
bounding set patch.
It intends to limit no new pI capabilities outside bounding set.

Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]>

 commoncap.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- linux-2.6.24-rc3/security/commoncap.c.old   2007-12-06 10:51:48.0 
+0900
+++ linux-2.6.24-rc3/security/commoncap.c   2007-12-06 10:52:15.0 
+0900
@@ -119,9 +119,9 @@ int cap_capset_check (struct task_struct
/* incapable of using this inheritable set */
return -EPERM;
}
-   if (!!cap_issubset(*inheritable,
-  cap_combine(target->cap_inheritable,
-  current->cap_bset))) {
+   if (!cap_issubset(*inheritable,
+ cap_combine(target->cap_inheritable,
+ current->cap_bset))) {
/* no new pI capabilities outside bounding set */
return -EPERM;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-05 Thread KaiGai Kohei

(Thus, the correct check says no 'new' pI bits can be outside cap_bset.)

If this condition intends to dominate 'new' pI bits by 'old' pI bits masked
with bounding set, we should not apply cap_combine() here.
I think applying cap_intersect() is correct for the purpose.


That would have been my first inclination, but Andrew actually
wanted to be able to keep a pI with bits not in the capability
bounding set.  And it's really not a big problem, since

1. you can never grow cap_bset
2. the capbound.c program just makes sure to call capset
   to take the bit being removed from cap_bset out of
   pI'
3. It could be advantageous for some daemon to keep a bit
   in pI which can never be gained through fP but can be
   gained by a child through (fIpI).

Does that seem reasonable to you?


OK, I got understood the intention of the condition.
It seems to me reasonable policy.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]


This patch fixes incorrect condition added by per-process capability
bounding set patch.
It intends to limit no new pI capabilities outside bounding set.

Signed-off-by: KaiGai Kohei [EMAIL PROTECTED]

 commoncap.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- linux-2.6.24-rc3/security/commoncap.c.old   2007-12-06 10:51:48.0 
+0900
+++ linux-2.6.24-rc3/security/commoncap.c   2007-12-06 10:52:15.0 
+0900
@@ -119,9 +119,9 @@ int cap_capset_check (struct task_struct
/* incapable of using this inheritable set */
return -EPERM;
}
-   if (!!cap_issubset(*inheritable,
-  cap_combine(target-cap_inheritable,
-  current-cap_bset))) {
+   if (!cap_issubset(*inheritable,
+ cap_combine(target-cap_inheritable,
+ current-cap_bset))) {
/* no new pI capabilities outside bounding set */
return -EPERM;
}

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-05 Thread KaiGai Kohei

Andrew Morgan wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

KaiGai Kohei wrote:

Andrew Morgan wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

KaiGai Kohei wrote:

+if (!!cap_issubset(*inheritable,
+   cap_combine(target-cap_inheritable,
+   current-cap_bset))) {
+/* no new pI capabilities outside bounding set */
+return -EPERM;
+}
 



Yes, the !! was a bug. The correct check is a single !.

I was in trouble with getting -EPERM at pam_cap.so :-)


(Thus, the correct check says no 'new' pI bits can be outside cap_bset.)

If this condition intends to dominate 'new' pI bits by 'old' pI bits masked
with bounding set, we should not apply cap_combine() here.
I think applying cap_intersect() is correct for the purpose.


The check is not meant to limit existing pI bits.

The check is meant to limit what new bits can be 'added' to pI (in the
case that pE  CAP_SETPCAP is true).


Thanks, I got understood as I wrote in the previous reply.

BTW, could you tell me your intention about pam_cap.c is implemented
with pam_sm_authenticate() and pam_sm_setcred()?
I think it can be done with pam_sm_open_session(), and this approach
enables to reduce the iteration of reading /etc/security/capability.conf.

How do you think the idea?
--
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-04 Thread KaiGai Kohei

Andrew Morgan wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

KaiGai Kohei wrote:

Serge,

Please tell me the meanings of the following condition.


diff --git a/security/commoncap.c b/security/commoncap.c
index 3a95990..cb71bb0 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -133,6 +119,12 @@ int cap_capset_check (struct task_struct *target,
kernel_cap_t *effective,
 /* incapable of using this inheritable set */
 return -EPERM;
 }
+if (!!cap_issubset(*inheritable,
+   cap_combine(target->cap_inheritable,
+   current->cap_bset))) {
+/* no new pI capabilities outside bounding set */
+return -EPERM;
+}
 
 /* verify restrictions on target's new Permitted set */

 if (!cap_issubset (*permitted,

It seems to me this condition requires the new inheritable capability
set must have a capability more than bounding set, at least.
What is the purpose of this checking?


Yes, the !! was a bug. The correct check is a single !.


I was in trouble with getting -EPERM at pam_cap.so :-)


(Thus, the correct check says no 'new' pI bits can be outside cap_bset.)


If this condition intends to dominate 'new' pI bits by 'old' pI bits masked
with bounding set, we should not apply cap_combine() here.
I think applying cap_intersect() is correct for the purpose.

Thanks,
--
KaiGai Kohei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-04 Thread KaiGai Kohei

Andrew Morgan wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

KaiGai Kohei wrote:

Serge,

Please tell me the meanings of the following condition.


diff --git a/security/commoncap.c b/security/commoncap.c
index 3a95990..cb71bb0 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -133,6 +119,12 @@ int cap_capset_check (struct task_struct *target,
kernel_cap_t *effective,
 /* incapable of using this inheritable set */
 return -EPERM;
 }
+if (!!cap_issubset(*inheritable,
+   cap_combine(target-cap_inheritable,
+   current-cap_bset))) {
+/* no new pI capabilities outside bounding set */
+return -EPERM;
+}
 
 /* verify restrictions on target's new Permitted set */

 if (!cap_issubset (*permitted,

It seems to me this condition requires the new inheritable capability
set must have a capability more than bounding set, at least.
What is the purpose of this checking?


Yes, the !! was a bug. The correct check is a single !.


I was in trouble with getting -EPERM at pam_cap.so :-)


(Thus, the correct check says no 'new' pI bits can be outside cap_bset.)


If this condition intends to dominate 'new' pI bits by 'old' pI bits masked
with bounding set, we should not apply cap_combine() here.
I think applying cap_intersect() is correct for the purpose.

Thanks,
--
KaiGai Kohei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-03 Thread KaiGai Kohei

Serge,

Please tell me the meanings of the following condition.


diff --git a/security/commoncap.c b/security/commoncap.c
index 3a95990..cb71bb0 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -133,6 +119,12 @@ int cap_capset_check (struct task_struct *target, 
kernel_cap_t *effective,
/* incapable of using this inheritable set */
return -EPERM;
}
+   if (!!cap_issubset(*inheritable,
+  cap_combine(target->cap_inheritable,
+  current->cap_bset))) {
+   /* no new pI capabilities outside bounding set */
+   return -EPERM;
+   }
 
 	/* verify restrictions on target's new Permitted set */

if (!cap_issubset (*permitted,


It seems to me this condition requires the new inheritable capability
set must have a capability more than bounding set, at least.
What is the purpose of this checking?

In the initial state, any process have no inheritable capability set
and full bounding set. Thus, we cannot do capset() always.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-03 Thread KaiGai Kohei

Serge,

Please tell me the meanings of the following condition.


diff --git a/security/commoncap.c b/security/commoncap.c
index 3a95990..cb71bb0 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -133,6 +119,12 @@ int cap_capset_check (struct task_struct *target, 
kernel_cap_t *effective,
/* incapable of using this inheritable set */
return -EPERM;
}
+   if (!!cap_issubset(*inheritable,
+  cap_combine(target-cap_inheritable,
+  current-cap_bset))) {
+   /* no new pI capabilities outside bounding set */
+   return -EPERM;
+   }
 
 	/* verify restrictions on target's new Permitted set */

if (!cap_issubset (*permitted,


It seems to me this condition requires the new inheritable capability
set must have a capability more than bounding set, at least.
What is the purpose of this checking?

In the initial state, any process have no inheritable capability set
and full bounding set. Thus, we cannot do capset() always.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-02 Thread KaiGai Kohei

Andrew Morgan wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

KaiGai Kohei wrote:

There is already a pam_cap module in the libcap2 package. Can we merge
this functionality?

I think it is a good idea.

However, this module already have a feature to modify inheritable
capability set.
How does it to be described in the "/etc/security/capability.conf"?

One idea is like a following convention:

# compatible configuration. We can omit "i:" at the head of line
cap_setfcap tak
# It drops any capabilities from b-set except for cap_net_raw and
cap_fowner
b:cap_net_raw,cap_fownerymj
# It drops only cap_dac_override from b-set.
b:-cap_dac_override kaigai
# It drops only cap_sys_admin from b-set of any user within users group.
b:-cap_sys_admingroup:users


I like the idea of a separate line for bounds.

For ease of parsing, perhaps '!' or some other symbol prefix to the line
could be used to identify lines that refer to cap_bound?

In other modules, @groupname is used to capture a group association.

Lines like this should be supported:

!cap_net_raw @regularusers# suppress from cap_bset
cap_net_raw  @pingers morgan  # add to pI

where morgan is not in group @pingers but is in group @regularusers.


The "@groupname" is intuitive convention. I also think it is good idea.

But !cap_xxx is a bit misunderstandable for me. Someone may misunderstand
this line means any capabilities except for cap_xxx.
Thus, I think that using "b:" and omittable "i:" prefix is better than "!".
In addition, what is your opinion about using "-b:" and "-i:" to represent
dropping capabilities currently they have?

There is one more uncertain case.
When a user belongs to several groups with capabilities configuration,
what capabilities are to be attached for the user?

e.g) When kaigai belong to @pingers and @paccters

b:cap_sys_pacct @paccters
b:cap_net_raw   @pingers
-b:cap_dac_override,cap_net_raw kaigai

If we apply "OR" policy, kaigai get only cap_sys_pacct, because
he got cap_sys_pacct and cap_net_raw came from @paccters and @pingers
but cap_dac_override and cap_net_raw are dropped by the third line.

Thanks,


Cheers

Andrew


Thanks,


Cheers

Andrew

[EMAIL PROTECTED] wrote:

Quoting KaiGai Kohei ([EMAIL PROTECTED]):

Serge E. Hallyn wrote:

The capability bounding set is a set beyond which capabilities
cannot grow.  Currently cap_bset is per-system.  It can be
manipulated through sysctl, but only init can add capabilities.
Root can remove capabilities.  By default it includes all caps
except CAP_SETPCAP.

Serge,

This feature makes me being interested in.
I think you intend to apply this feature for the primary process
of security container.
However, it is also worthwhile to apply when a session is starting up.

The following PAM module enables to drop capability bounding bit
specified by the fifth field in /etc/passwd entry.
This code is just an example now, but considerable feature.

build and install:
# gcc -Wall -c pam_cap_drop.c
# gcc -Wall -shared -Xlinker -x -o pam_cap_drop.so pam_cap_drop.o -lpam
# cp pam_cap_drop.so /lib/security

modify /etc/passwd as follows:

tak:x:1004:100:cap_drop=cap_net_raw,cap_chown:/home/tak:/bin/bash
   ^^
example:
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.23 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.02 ms

--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 1.023/1.130/1.237/0.107 ms

[EMAIL PROTECTED] ~]$ ssh [EMAIL PROTECTED]
[EMAIL PROTECTED]'s password:
Last login: Sat Dec  1 10:09:29 2007 from masu.myhome.cx
[EMAIL PROTECTED] ~]$ export LANG=C
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
ping: icmp open socket: Operation not permitted

[EMAIL PROTECTED] ~]$ su
Password:
pam_cap_bset[6921]: user root does not have 'cap_drop=' property
[EMAIL PROTECTED] tak]# cat /proc/self/status | grep ^Cap
CapInh: 
CapPrm: dffe
CapEff: dffe
[EMAIL PROTECTED] tak]#

Neat.  A bigger-stick version of not adding the account to
group wheel.  I'll use that.

Is there any reason not to have a separate /etc/login.capbounds
config file, though, so the account can still have a full name?
Did you only use that for convenience of proof of concept, or
is there another reason?


# BTW, I replaced the James's address in the Cc: list,
# because MTA does not accept it.

Thanks!  I don't know what happened to my alias for him...

thanks,
-serge


--
KaiGai Kohei <[EMAIL PROTECTED]>


pam_cap_drop.c
********

/*
 * pam_cap_drop.c module -- drop capabilities bounding set
 *

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-02 Thread KaiGai Kohei

Andrew Morgan wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

KaiGai Kohei wrote:

There is already a pam_cap module in the libcap2 package. Can we merge
this functionality?

I think it is a good idea.

However, this module already have a feature to modify inheritable
capability set.
How does it to be described in the /etc/security/capability.conf?

One idea is like a following convention:

# compatible configuration. We can omit i: at the head of line
cap_setfcap tak
# It drops any capabilities from b-set except for cap_net_raw and
cap_fowner
b:cap_net_raw,cap_fownerymj
# It drops only cap_dac_override from b-set.
b:-cap_dac_override kaigai
# It drops only cap_sys_admin from b-set of any user within users group.
b:-cap_sys_admingroup:users


I like the idea of a separate line for bounds.

For ease of parsing, perhaps '!' or some other symbol prefix to the line
could be used to identify lines that refer to cap_bound?

In other modules, @groupname is used to capture a group association.

Lines like this should be supported:

!cap_net_raw @regularusers# suppress from cap_bset
cap_net_raw  @pingers morgan  # add to pI

where morgan is not in group @pingers but is in group @regularusers.


The @groupname is intuitive convention. I also think it is good idea.

But !cap_xxx is a bit misunderstandable for me. Someone may misunderstand
this line means any capabilities except for cap_xxx.
Thus, I think that using b: and omittable i: prefix is better than !.
In addition, what is your opinion about using -b: and -i: to represent
dropping capabilities currently they have?

There is one more uncertain case.
When a user belongs to several groups with capabilities configuration,
what capabilities are to be attached for the user?

e.g) When kaigai belong to @pingers and @paccters

b:cap_sys_pacct @paccters
b:cap_net_raw   @pingers
-b:cap_dac_override,cap_net_raw kaigai

If we apply OR policy, kaigai get only cap_sys_pacct, because
he got cap_sys_pacct and cap_net_raw came from @paccters and @pingers
but cap_dac_override and cap_net_raw are dropped by the third line.

Thanks,


Cheers

Andrew


Thanks,


Cheers

Andrew

[EMAIL PROTECTED] wrote:

Quoting KaiGai Kohei ([EMAIL PROTECTED]):

Serge E. Hallyn wrote:

The capability bounding set is a set beyond which capabilities
cannot grow.  Currently cap_bset is per-system.  It can be
manipulated through sysctl, but only init can add capabilities.
Root can remove capabilities.  By default it includes all caps
except CAP_SETPCAP.

Serge,

This feature makes me being interested in.
I think you intend to apply this feature for the primary process
of security container.
However, it is also worthwhile to apply when a session is starting up.

The following PAM module enables to drop capability bounding bit
specified by the fifth field in /etc/passwd entry.
This code is just an example now, but considerable feature.

build and install:
# gcc -Wall -c pam_cap_drop.c
# gcc -Wall -shared -Xlinker -x -o pam_cap_drop.so pam_cap_drop.o -lpam
# cp pam_cap_drop.so /lib/security

modify /etc/passwd as follows:

tak:x:1004:100:cap_drop=cap_net_raw,cap_chown:/home/tak:/bin/bash
   ^^
example:
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.23 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.02 ms

--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 1.023/1.130/1.237/0.107 ms

[EMAIL PROTECTED] ~]$ ssh [EMAIL PROTECTED]
[EMAIL PROTECTED]'s password:
Last login: Sat Dec  1 10:09:29 2007 from masu.myhome.cx
[EMAIL PROTECTED] ~]$ export LANG=C
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
ping: icmp open socket: Operation not permitted

[EMAIL PROTECTED] ~]$ su
Password:
pam_cap_bset[6921]: user root does not have 'cap_drop=' property
[EMAIL PROTECTED] tak]# cat /proc/self/status | grep ^Cap
CapInh: 
CapPrm: dffe
CapEff: dffe
[EMAIL PROTECTED] tak]#

Neat.  A bigger-stick version of not adding the account to
group wheel.  I'll use that.

Is there any reason not to have a separate /etc/login.capbounds
config file, though, so the account can still have a full name?
Did you only use that for convenience of proof of concept, or
is there another reason?


# BTW, I replaced the James's address in the Cc: list,
# because MTA does not accept it.

Thanks!  I don't know what happened to my alias for him...

thanks,
-serge


--
KaiGai Kohei [EMAIL PROTECTED]


pam_cap_drop.c


/*
 * pam_cap_drop.c module -- drop capabilities bounding set
 *
 * Copyright: 2007 KaiGai Kohei [EMAIL PROTECTED]
 */

#include errno.h
#include pwd.h
#include stdlib.h
#include

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-01 Thread KaiGai Kohei

There is already a pam_cap module in the libcap2 package. Can we merge
this functionality?


I think it is a good idea.

However, this module already have a feature to modify inheritable
capability set.
How does it to be described in the "/etc/security/capability.conf"?

One idea is like a following convention:

# compatible configuration. We can omit "i:" at the head of line
cap_setfcap tak
# It drops any capabilities from b-set except for cap_net_raw and cap_fowner
b:cap_net_raw,cap_fownerymj
# It drops only cap_dac_override from b-set.
b:-cap_dac_override kaigai
# It drops only cap_sys_admin from b-set of any user within users group.
b:-cap_sys_admingroup:users

Thanks,


Cheers

Andrew

[EMAIL PROTECTED] wrote:

Quoting KaiGai Kohei ([EMAIL PROTECTED]):

Serge E. Hallyn wrote:

The capability bounding set is a set beyond which capabilities
cannot grow.  Currently cap_bset is per-system.  It can be
manipulated through sysctl, but only init can add capabilities.
Root can remove capabilities.  By default it includes all caps
except CAP_SETPCAP.

Serge,

This feature makes me being interested in.
I think you intend to apply this feature for the primary process
of security container.
However, it is also worthwhile to apply when a session is starting up.

The following PAM module enables to drop capability bounding bit
specified by the fifth field in /etc/passwd entry.
This code is just an example now, but considerable feature.

build and install:
# gcc -Wall -c pam_cap_drop.c
# gcc -Wall -shared -Xlinker -x -o pam_cap_drop.so pam_cap_drop.o -lpam
# cp pam_cap_drop.so /lib/security

modify /etc/passwd as follows:

tak:x:1004:100:cap_drop=cap_net_raw,cap_chown:/home/tak:/bin/bash
   ^^
example:
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.23 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.02 ms

--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 1.023/1.130/1.237/0.107 ms

[EMAIL PROTECTED] ~]$ ssh [EMAIL PROTECTED]
[EMAIL PROTECTED]'s password:
Last login: Sat Dec  1 10:09:29 2007 from masu.myhome.cx
[EMAIL PROTECTED] ~]$ export LANG=C
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
ping: icmp open socket: Operation not permitted

[EMAIL PROTECTED] ~]$ su
Password:
pam_cap_bset[6921]: user root does not have 'cap_drop=' property
[EMAIL PROTECTED] tak]# cat /proc/self/status | grep ^Cap
CapInh: 
CapPrm: dffe
CapEff: dffe
[EMAIL PROTECTED] tak]#

Neat.  A bigger-stick version of not adding the account to
group wheel.  I'll use that.

Is there any reason not to have a separate /etc/login.capbounds
config file, though, so the account can still have a full name?
Did you only use that for convenience of proof of concept, or
is there another reason?


# BTW, I replaced the James's address in the Cc: list,
# because MTA does not accept it.

Thanks!  I don't know what happened to my alias for him...

thanks,
-serge


--
KaiGai Kohei <[EMAIL PROTECTED]>


pam_cap_drop.c


/*
 * pam_cap_drop.c module -- drop capabilities bounding set
 *
 * Copyright: 2007 KaiGai Kohei <[EMAIL PROTECTED]>
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 

#ifndef PR_CAPBSET_DROP
#define PR_CAPBSET_DROP 24
#endif

static char *captable[] = {
"cap_chown",
"cap_dac_override",
"cap_dac_read_search",
"cap_fowner",
"cap_fsetid",
"cap_kill",
"cap_setgid",
"cap_setuid",
"cap_setpcap",
"cap_linux_immutable",
"cap_net_bind_service",
"cap_net_broadcast",
"cap_net_admin",
"cap_net_raw",
"cap_ipc_lock",
"cap_ipc_owner",
"cap_sys_module",
"cap_sys_rawio",
"cap_sys_chroot",
"cap_sys_ptrace",
"cap_sys_pacct",
"cap_sys_admin",
"cap_sys_boot",
"cap_sys_nice",
"cap_sys_resource",
"cap_sys_time",
"cap_sys_tty_config",
"cap_mknod",
"cap_lease",
"cap_audit_write",
"cap_audit_control",
"cap_setfcap",
NULL,
};


PAM_EXTERN int
pam_sm_open_session(pam_handle_t *pamh, int flags,
int argc, const char **argv)
{
struct passwd *pwd;
char *pos, *buf;
c

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-01 Thread KaiGai Kohei

Serge,


Is there any reason not to have a separate /etc/login.capbounds
config file, though, so the account can still have a full name?
Did you only use that for convenience of proof of concept, or
is there another reason?


passwd(5) says the fifth field is optional and only used for
informational purpose (like ulimit, umask).

However, using any other separate config file is conservative
and better. One candidate is "/etc/security/capability.conf"
defined as the config file of pam_cap.

Thanks,
--
KaiGai Kohei <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-01 Thread KaiGai Kohei

Serge,


Is there any reason not to have a separate /etc/login.capbounds
config file, though, so the account can still have a full name?
Did you only use that for convenience of proof of concept, or
is there another reason?


passwd(5) says the fifth field is optional and only used for
informational purpose (like ulimit, umask).

However, using any other separate config file is conservative
and better. One candidate is /etc/security/capability.conf
defined as the config file of pam_cap.

Thanks,
--
KaiGai Kohei [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-01 Thread KaiGai Kohei

There is already a pam_cap module in the libcap2 package. Can we merge
this functionality?


I think it is a good idea.

However, this module already have a feature to modify inheritable
capability set.
How does it to be described in the /etc/security/capability.conf?

One idea is like a following convention:

# compatible configuration. We can omit i: at the head of line
cap_setfcap tak
# It drops any capabilities from b-set except for cap_net_raw and cap_fowner
b:cap_net_raw,cap_fownerymj
# It drops only cap_dac_override from b-set.
b:-cap_dac_override kaigai
# It drops only cap_sys_admin from b-set of any user within users group.
b:-cap_sys_admingroup:users

Thanks,


Cheers

Andrew

[EMAIL PROTECTED] wrote:

Quoting KaiGai Kohei ([EMAIL PROTECTED]):

Serge E. Hallyn wrote:

The capability bounding set is a set beyond which capabilities
cannot grow.  Currently cap_bset is per-system.  It can be
manipulated through sysctl, but only init can add capabilities.
Root can remove capabilities.  By default it includes all caps
except CAP_SETPCAP.

Serge,

This feature makes me being interested in.
I think you intend to apply this feature for the primary process
of security container.
However, it is also worthwhile to apply when a session is starting up.

The following PAM module enables to drop capability bounding bit
specified by the fifth field in /etc/passwd entry.
This code is just an example now, but considerable feature.

build and install:
# gcc -Wall -c pam_cap_drop.c
# gcc -Wall -shared -Xlinker -x -o pam_cap_drop.so pam_cap_drop.o -lpam
# cp pam_cap_drop.so /lib/security

modify /etc/passwd as follows:

tak:x:1004:100:cap_drop=cap_net_raw,cap_chown:/home/tak:/bin/bash
   ^^
example:
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.23 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.02 ms

--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 1.023/1.130/1.237/0.107 ms

[EMAIL PROTECTED] ~]$ ssh [EMAIL PROTECTED]
[EMAIL PROTECTED]'s password:
Last login: Sat Dec  1 10:09:29 2007 from masu.myhome.cx
[EMAIL PROTECTED] ~]$ export LANG=C
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
ping: icmp open socket: Operation not permitted

[EMAIL PROTECTED] ~]$ su
Password:
pam_cap_bset[6921]: user root does not have 'cap_drop=' property
[EMAIL PROTECTED] tak]# cat /proc/self/status | grep ^Cap
CapInh: 
CapPrm: dffe
CapEff: dffe
[EMAIL PROTECTED] tak]#

Neat.  A bigger-stick version of not adding the account to
group wheel.  I'll use that.

Is there any reason not to have a separate /etc/login.capbounds
config file, though, so the account can still have a full name?
Did you only use that for convenience of proof of concept, or
is there another reason?


# BTW, I replaced the James's address in the Cc: list,
# because MTA does not accept it.

Thanks!  I don't know what happened to my alias for him...

thanks,
-serge


--
KaiGai Kohei [EMAIL PROTECTED]


pam_cap_drop.c


/*
 * pam_cap_drop.c module -- drop capabilities bounding set
 *
 * Copyright: 2007 KaiGai Kohei [EMAIL PROTECTED]
 */

#include errno.h
#include pwd.h
#include stdlib.h
#include stdio.h
#include string.h
#include syslog.h
#include sys/prctl.h
#include sys/types.h

#include security/pam_modules.h

#ifndef PR_CAPBSET_DROP
#define PR_CAPBSET_DROP 24
#endif

static char *captable[] = {
cap_chown,
cap_dac_override,
cap_dac_read_search,
cap_fowner,
cap_fsetid,
cap_kill,
cap_setgid,
cap_setuid,
cap_setpcap,
cap_linux_immutable,
cap_net_bind_service,
cap_net_broadcast,
cap_net_admin,
cap_net_raw,
cap_ipc_lock,
cap_ipc_owner,
cap_sys_module,
cap_sys_rawio,
cap_sys_chroot,
cap_sys_ptrace,
cap_sys_pacct,
cap_sys_admin,
cap_sys_boot,
cap_sys_nice,
cap_sys_resource,
cap_sys_time,
cap_sys_tty_config,
cap_mknod,
cap_lease,
cap_audit_write,
cap_audit_control,
cap_setfcap,
NULL,
};


PAM_EXTERN int
pam_sm_open_session(pam_handle_t *pamh, int flags,
int argc, const char **argv)
{
struct passwd *pwd;
char *pos, *buf;
char *username = NULL;

/* open system logger */
openlog(pam_cap_bset, LOG_PERROR | LOG_PID, LOG_AUTHPRIV);

/* get the unix username */
if (pam_get_item(pamh, PAM_USER, (void *) username) != PAM_SUCCESS || 
!username)
return PAM_USER_UNKNOWN;

/* get the passwd entry */
pwd = getpwnam

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-11-30 Thread KaiGai Kohei
Serge E. Hallyn wrote:
> The capability bounding set is a set beyond which capabilities
> cannot grow.  Currently cap_bset is per-system.  It can be
> manipulated through sysctl, but only init can add capabilities.
> Root can remove capabilities.  By default it includes all caps
> except CAP_SETPCAP.

Serge,

This feature makes me being interested in.
I think you intend to apply this feature for the primary process
of security container.
However, it is also worthwhile to apply when a session is starting up.

The following PAM module enables to drop capability bounding bit
specified by the fifth field in /etc/passwd entry.
This code is just an example now, but considerable feature.

build and install:
# gcc -Wall -c pam_cap_drop.c
# gcc -Wall -shared -Xlinker -x -o pam_cap_drop.so pam_cap_drop.o -lpam
# cp pam_cap_drop.so /lib/security

modify /etc/passwd as follows:

tak:x:1004:100:cap_drop=cap_net_raw,cap_chown:/home/tak:/bin/bash
   ^^
example:
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.23 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.02 ms

--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 1.023/1.130/1.237/0.107 ms

[EMAIL PROTECTED] ~]$ ssh [EMAIL PROTECTED]
[EMAIL PROTECTED]'s password:
Last login: Sat Dec  1 10:09:29 2007 from masu.myhome.cx
[EMAIL PROTECTED] ~]$ export LANG=C
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
ping: icmp open socket: Operation not permitted

[EMAIL PROTECTED] ~]$ su
Password:
pam_cap_bset[6921]: user root does not have 'cap_drop=' property
[EMAIL PROTECTED] tak]# cat /proc/self/status | grep ^Cap
CapInh: 
CapPrm: dffe
CapEff: dffe
[EMAIL PROTECTED] tak]#

# BTW, I replaced the James's address in the Cc: list,
# because MTA does not accept it.
-- 
KaiGai Kohei <[EMAIL PROTECTED]>


pam_cap_drop.c


/*
 * pam_cap_drop.c module -- drop capabilities bounding set
 *
 * Copyright: 2007 KaiGai Kohei <[EMAIL PROTECTED]>
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 

#ifndef PR_CAPBSET_DROP
#define PR_CAPBSET_DROP 24
#endif

static char *captable[] = {
"cap_chown",
"cap_dac_override",
"cap_dac_read_search",
"cap_fowner",
"cap_fsetid",
"cap_kill",
"cap_setgid",
"cap_setuid",
"cap_setpcap",
"cap_linux_immutable",
"cap_net_bind_service",
"cap_net_broadcast",
"cap_net_admin",
"cap_net_raw",
"cap_ipc_lock",
"cap_ipc_owner",
"cap_sys_module",
"cap_sys_rawio",
"cap_sys_chroot",
"cap_sys_ptrace",
"cap_sys_pacct",
"cap_sys_admin",
"cap_sys_boot",
"cap_sys_nice",
"cap_sys_resource",
"cap_sys_time",
"cap_sys_tty_config",
"cap_mknod",
"cap_lease",
"cap_audit_write",
"cap_audit_control",
"cap_setfcap",
NULL,
};


PAM_EXTERN int
pam_sm_open_session(pam_handle_t *pamh, int flags,
int argc, const char **argv)
{
struct passwd *pwd;
char *pos, *buf;
char *username = NULL;

/* open system logger */
openlog("pam_cap_bset", LOG_PERROR | LOG_PID, LOG_AUTHPRIV);

/* get the unix username */
if (pam_get_item(pamh, PAM_USER, (void *) ) != PAM_SUCCESS || 
!username)
return PAM_USER_UNKNOWN;

/* get the passwd entry */
pwd = getpwnam(username);
if (!pwd)
return PAM_USER_UNKNOWN;

/* Is there "cap_drop=" ? */
pos = strstr(pwd->pw_gecos, "cap_drop=");
if (pos) {
buf = strdup(pos + sizeof("cap_drop=") - 1);
if (!buf)
return PAM_SESSION_ERR;
pos = strtok(buf, ",");
while (pos) {
int rc, i;

for (i=0; captable[i]; i++) {
if (!strcmp(pos, captable[i])) {
rc = prctl(PR_CAPBSET_DROP, i);
if (rc < 0) {
syslog(LOG_NOTICE, "user %s 
could not drop %s (%s)",
   

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-11-30 Thread KaiGai Kohei
Serge E. Hallyn wrote:
 The capability bounding set is a set beyond which capabilities
 cannot grow.  Currently cap_bset is per-system.  It can be
 manipulated through sysctl, but only init can add capabilities.
 Root can remove capabilities.  By default it includes all caps
 except CAP_SETPCAP.

Serge,

This feature makes me being interested in.
I think you intend to apply this feature for the primary process
of security container.
However, it is also worthwhile to apply when a session is starting up.

The following PAM module enables to drop capability bounding bit
specified by the fifth field in /etc/passwd entry.
This code is just an example now, but considerable feature.

build and install:
# gcc -Wall -c pam_cap_drop.c
# gcc -Wall -shared -Xlinker -x -o pam_cap_drop.so pam_cap_drop.o -lpam
# cp pam_cap_drop.so /lib/security

modify /etc/passwd as follows:

tak:x:1004:100:cap_drop=cap_net_raw,cap_chown:/home/tak:/bin/bash
   ^^
example:
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.23 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.02 ms

--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 1.023/1.130/1.237/0.107 ms

[EMAIL PROTECTED] ~]$ ssh [EMAIL PROTECTED]
[EMAIL PROTECTED]'s password:
Last login: Sat Dec  1 10:09:29 2007 from masu.myhome.cx
[EMAIL PROTECTED] ~]$ export LANG=C
[EMAIL PROTECTED] ~]$ ping 192.168.1.1
ping: icmp open socket: Operation not permitted

[EMAIL PROTECTED] ~]$ su
Password:
pam_cap_bset[6921]: user root does not have 'cap_drop=' property
[EMAIL PROTECTED] tak]# cat /proc/self/status | grep ^Cap
CapInh: 
CapPrm: dffe
CapEff: dffe
[EMAIL PROTECTED] tak]#

# BTW, I replaced the James's address in the Cc: list,
# because MTA does not accept it.
-- 
KaiGai Kohei [EMAIL PROTECTED]


pam_cap_drop.c


/*
 * pam_cap_drop.c module -- drop capabilities bounding set
 *
 * Copyright: 2007 KaiGai Kohei [EMAIL PROTECTED]
 */

#include errno.h
#include pwd.h
#include stdlib.h
#include stdio.h
#include string.h
#include syslog.h
#include sys/prctl.h
#include sys/types.h

#include security/pam_modules.h

#ifndef PR_CAPBSET_DROP
#define PR_CAPBSET_DROP 24
#endif

static char *captable[] = {
cap_chown,
cap_dac_override,
cap_dac_read_search,
cap_fowner,
cap_fsetid,
cap_kill,
cap_setgid,
cap_setuid,
cap_setpcap,
cap_linux_immutable,
cap_net_bind_service,
cap_net_broadcast,
cap_net_admin,
cap_net_raw,
cap_ipc_lock,
cap_ipc_owner,
cap_sys_module,
cap_sys_rawio,
cap_sys_chroot,
cap_sys_ptrace,
cap_sys_pacct,
cap_sys_admin,
cap_sys_boot,
cap_sys_nice,
cap_sys_resource,
cap_sys_time,
cap_sys_tty_config,
cap_mknod,
cap_lease,
cap_audit_write,
cap_audit_control,
cap_setfcap,
NULL,
};


PAM_EXTERN int
pam_sm_open_session(pam_handle_t *pamh, int flags,
int argc, const char **argv)
{
struct passwd *pwd;
char *pos, *buf;
char *username = NULL;

/* open system logger */
openlog(pam_cap_bset, LOG_PERROR | LOG_PID, LOG_AUTHPRIV);

/* get the unix username */
if (pam_get_item(pamh, PAM_USER, (void *) username) != PAM_SUCCESS || 
!username)
return PAM_USER_UNKNOWN;

/* get the passwd entry */
pwd = getpwnam(username);
if (!pwd)
return PAM_USER_UNKNOWN;

/* Is there cap_drop= ? */
pos = strstr(pwd-pw_gecos, cap_drop=);
if (pos) {
buf = strdup(pos + sizeof(cap_drop=) - 1);
if (!buf)
return PAM_SESSION_ERR;
pos = strtok(buf, ,);
while (pos) {
int rc, i;

for (i=0; captable[i]; i++) {
if (!strcmp(pos, captable[i])) {
rc = prctl(PR_CAPBSET_DROP, i);
if (rc  0) {
syslog(LOG_NOTICE, user %s 
could not drop %s (%s),
   username, captable[i], 
strerror(errno));
break;
}
syslog(LOG_NOTICE, user %s drops 
%s\n, username, captable[i]);
goto next

Re: [2.6 patch] make jffs2_get_acl() static

2007-10-24 Thread KaiGai Kohei

Adrian Bunk wrote:

jffs2_get_acl() can now become static again.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>


Acked-by: KaiGai Kohei <[EMAIL PROTECTED]>


---

 fs/jffs2/acl.c |2 +-
 fs/jffs2/acl.h |2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

add2b887d64536f3fe978e62f0774292456f1ddb 
diff --git a/fs/jffs2/acl.c b/fs/jffs2/acl.c

index 9728614..b14e805 100644
--- a/fs/jffs2/acl.c
+++ b/fs/jffs2/acl.c
@@ -176,7 +176,7 @@ static void jffs2_iset_acl(struct inode *inode, struct 
posix_acl **i_acl, struct
spin_unlock(>i_lock);
 }
 
-struct posix_acl *jffs2_get_acl(struct inode *inode, int type)

+static struct posix_acl *jffs2_get_acl(struct inode *inode, int type)
 {
struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
struct posix_acl *acl;
diff --git a/fs/jffs2/acl.h b/fs/jffs2/acl.h
index 76c6ebd..0bb7f00 100644
--- a/fs/jffs2/acl.h
+++ b/fs/jffs2/acl.h
@@ -28,7 +28,6 @@ struct jffs2_acl_header {
 
 #define JFFS2_ACL_NOT_CACHED ((void *)-1)
 
-extern struct posix_acl *jffs2_get_acl(struct inode *inode, int type);

 extern int jffs2_permission(struct inode *, int, struct nameidata *);
 extern int jffs2_acl_chmod(struct inode *);
 extern int jffs2_init_acl_pre(struct inode *, struct inode *, int *);
@@ -40,7 +39,6 @@ extern struct xattr_handler jffs2_acl_default_xattr_handler;
 
 #else
 
-#define jffs2_get_acl(inode, type)		(NULL)

 #define jffs2_permission   (NULL)
 #define jffs2_acl_chmod(inode) (0)
 #define jffs2_init_acl_pre(dir_i,inode,mode)   (0)




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] make jffs2_get_acl() static

2007-10-24 Thread KaiGai Kohei

Adrian Bunk wrote:

jffs2_get_acl() can now become static again.

Signed-off-by: Adrian Bunk [EMAIL PROTECTED]


Acked-by: KaiGai Kohei [EMAIL PROTECTED]


---

 fs/jffs2/acl.c |2 +-
 fs/jffs2/acl.h |2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

add2b887d64536f3fe978e62f0774292456f1ddb 
diff --git a/fs/jffs2/acl.c b/fs/jffs2/acl.c

index 9728614..b14e805 100644
--- a/fs/jffs2/acl.c
+++ b/fs/jffs2/acl.c
@@ -176,7 +176,7 @@ static void jffs2_iset_acl(struct inode *inode, struct 
posix_acl **i_acl, struct
spin_unlock(inode-i_lock);
 }
 
-struct posix_acl *jffs2_get_acl(struct inode *inode, int type)

+static struct posix_acl *jffs2_get_acl(struct inode *inode, int type)
 {
struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
struct posix_acl *acl;
diff --git a/fs/jffs2/acl.h b/fs/jffs2/acl.h
index 76c6ebd..0bb7f00 100644
--- a/fs/jffs2/acl.h
+++ b/fs/jffs2/acl.h
@@ -28,7 +28,6 @@ struct jffs2_acl_header {
 
 #define JFFS2_ACL_NOT_CACHED ((void *)-1)
 
-extern struct posix_acl *jffs2_get_acl(struct inode *inode, int type);

 extern int jffs2_permission(struct inode *, int, struct nameidata *);
 extern int jffs2_acl_chmod(struct inode *);
 extern int jffs2_init_acl_pre(struct inode *, struct inode *, int *);
@@ -40,7 +39,6 @@ extern struct xattr_handler jffs2_acl_default_xattr_handler;
 
 #else
 
-#define jffs2_get_acl(inode, type)		(NULL)

 #define jffs2_permission   (NULL)
 #define jffs2_acl_chmod(inode) (0)
 #define jffs2_init_acl_pre(dir_i,inode,mode)   (0)




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [TOMOYO 05/15](repost) Domain transition handler functions.

2007-10-03 Thread KaiGai Kohei

Tetsuo Handa wrote:

James Morris wrote:
I'm pretty sure that the singly linked list idea has been rejected a few 
times.  Just use the existing API.

Too bad...

Well, is there a way to avoid read_lock when reading list?

Currently, TOMOYO Linux avoids read_lock, on the assumption that
(1) First, ptr->next is initialized with NULL.
(2) Later, ptr->next is assigned non-NULL address.
(3) Assigning to ptr->next is done atomically.

Regards.


Is it all of the purpose for the list structure?
If so, you can apply RCU instead to avoid read lock
when scanning the list, like:

rcu_read_lock();
list_for_each_entry(...) {

}
rcu_read_unlock();

--
KaiGai Kohei <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [TOMOYO 05/15](repost) Domain transition handler functions.

2007-10-03 Thread KaiGai Kohei

Tetsuo Handa wrote:

James Morris wrote:
I'm pretty sure that the singly linked list idea has been rejected a few 
times.  Just use the existing API.

Too bad...

Well, is there a way to avoid read_lock when reading list?

Currently, TOMOYO Linux avoids read_lock, on the assumption that
(1) First, ptr-next is initialized with NULL.
(2) Later, ptr-next is assigned non-NULL address.
(3) Assigning to ptr-next is done atomically.

Regards.


Is it all of the purpose for the list structure?
If so, you can apply RCU instead to avoid read lock
when scanning the list, like:

rcu_read_lock();
list_for_each_entry(...) {

}
rcu_read_unlock();

--
KaiGai Kohei [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Wrappers to load bitmaps (Re: [PATCH] Improve ebitmap scanning)

2007-09-13 Thread KaiGai Kohei
Now I'm improving the performance to scan bitmap in SELinux,
with replacing its original bitmap implementation (ebitmap)
by common bitops like find_next_bit().

I posted a patch to replace them, however, it got a bit complex
bacause we had to translate u64 <--> unsigned long by myself
to adjust between the format of security policy and common bitops.
  http://marc.info/?l=selinux=118956715414494=2

I have an idea to provide several wrapper functions to copy u64/u32
to/from unsigned long for each architecture.
Maybe, it will be defined as follows:
  int arraycpy_u64_to_ulong(u64 *src, unsigned long *dest, size_t len);
  int arraycpy_ulong_to_u64(unsigned long *src, u64 *dest, size_t len);

I believe this feature will help getting code simpler and reducing bugs
for any other subsystem, not only SELinux, which loads bitmaps from/to
userspace and handle them using common bitops.

Any comment please.

Stephen Smalley wrote:
> On Thu, 2007-09-13 at 10:37 +0900, KaiGai Kohei wrote:
>> Paul Moore wrote:
>>> On Tuesday, September 11 2007 11:08:44 pm KaiGai Kohei wrote:
>>>> The attached patch applies the standard bitmap operations
>>>> for the iteration macro of ebitmap, and enables to improve
>>>> the performance in AVC-misses case.

  <...snip...>

>> BTW, is there any wrapper to copy an array of u64 to/from architecture 
>> specific
>> unsigned long? If so, it will help implement ebitmap_netlbl_{import|export}()
>> and ebitmap_read() more simply.
> 
> Might want to ask on linux-kernel.  More generally, it might be a good
> idea to cc linux-kernel on your next posting of the patch to get wider
> review of how you are using the native linux bitmap support.
> 
> The patch looks very promising, although a detailed review and testing
> might take a little bit.

-- 
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Wrappers to load bitmaps (Re: [PATCH] Improve ebitmap scanning)

2007-09-13 Thread KaiGai Kohei
Now I'm improving the performance to scan bitmap in SELinux,
with replacing its original bitmap implementation (ebitmap)
by common bitops like find_next_bit().

I posted a patch to replace them, however, it got a bit complex
bacause we had to translate u64 -- unsigned long by myself
to adjust between the format of security policy and common bitops.
  http://marc.info/?l=selinuxm=118956715414494w=2

I have an idea to provide several wrapper functions to copy u64/u32
to/from unsigned long for each architecture.
Maybe, it will be defined as follows:
  int arraycpy_u64_to_ulong(u64 *src, unsigned long *dest, size_t len);
  int arraycpy_ulong_to_u64(unsigned long *src, u64 *dest, size_t len);

I believe this feature will help getting code simpler and reducing bugs
for any other subsystem, not only SELinux, which loads bitmaps from/to
userspace and handle them using common bitops.

Any comment please.

Stephen Smalley wrote:
 On Thu, 2007-09-13 at 10:37 +0900, KaiGai Kohei wrote:
 Paul Moore wrote:
 On Tuesday, September 11 2007 11:08:44 pm KaiGai Kohei wrote:
 The attached patch applies the standard bitmap operations
 for the iteration macro of ebitmap, and enables to improve
 the performance in AVC-misses case.

  ...snip...

 BTW, is there any wrapper to copy an array of u64 to/from architecture 
 specific
 unsigned long? If so, it will help implement ebitmap_netlbl_{import|export}()
 and ebitmap_read() more simply.
 
 Might want to ask on linux-kernel.  More generally, it might be a good
 idea to cc linux-kernel on your next posting of the patch to get wider
 review of how you are using the native linux bitmap support.
 
 The patch looks very promising, although a detailed review and testing
 might take a little bit.

-- 
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] file capabilities: introduce cap_setfcap

2007-06-27 Thread KaiGai Kohei

Serge E. Hallyn wrote:

Here's the first patch (of several or many to come) to address some of
Andrew's comments.

Kaigai, IIUC cap_names.h will eventually be automatically updated?  (I
had to manually tweak it for testing as the new kernel sources were not
located on the test system)


The origin of cap_names.h is "/usr/include/linux/capability.h".
Some scripts kicked by Makefile convert it, then cap_names.h will
be generated.

I don't know whether we can expect the kernel headers are always
deployed under "/usr/include/linux", or not.
In Fedora system, the kernel-headers package deploys all headers
there, so cap_names.h will eventually be automatically updated.

Thanks,


thanks,
-serge


From fefcd341e478bd9e490d34abe9efd3c3c4f0b8a0 Mon Sep 17 00:00:00 2001

From: Serge E. Hallyn <[EMAIL PROTECTED]>
Date: Wed, 27 Jun 2007 13:09:20 -0400
Subject: [PATCH 1/1] file capabilities: introduce cap_setfcap

Setting file capabilities previously required the
cap_sys_admin capability, since they are stored as
extended attributes in the security.* namespace.

Introduce CAP_SETFCAP (to mirror CAP_SETPCAP), and
require it for setting file capabilities instead of
CAP_SYS_ADMIN.

Quoting Andrew Morgan,

"CAP_SYS_ADMIN is way too overloaded and this
functionality is special."

Signed-off-by: Serge E. Hallyn <[EMAIL PROTECTED]>
---
 include/linux/capability.h |4 +++-
 security/commoncap.c   |   12 ++--
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 89125df..cdfaa10 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -324,7 +324,9 @@ typedef __u32 kernel_cap_t;
 
 #define CAP_AUDIT_CONTROL30
 
-#define CAP_NUMCAPS	 31

+#define CAP_SETFCAP 31
+
+#define CAP_NUMCAPS 32
 
 #ifdef __KERNEL__
 /* 
diff --git a/security/commoncap.c b/security/commoncap.c

index 4e9ff02..24de4fa 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -290,7 +290,11 @@ int cap_bprm_secureexec (struct linux_binprm *bprm)
 int cap_inode_setxattr(struct dentry *dentry, char *name, void *value,
   size_t size, int flags)
 {
-   if (!strncmp(name, XATTR_SECURITY_PREFIX,
+   if (!strcmp(name, XATTR_NAME_CAPS)) {
+   if (!capable(CAP_SETFCAP))
+   return -EPERM;
+   return 0;
+   } else if (!strncmp(name, XATTR_SECURITY_PREFIX,
 sizeof(XATTR_SECURITY_PREFIX) - 1)  &&
!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -299,7 +303,11 @@ int cap_inode_setxattr(struct dentry *dentry, char *name, 
void *value,
 
 int cap_inode_removexattr(struct dentry *dentry, char *name)

 {
-   if (!strncmp(name, XATTR_SECURITY_PREFIX,
+   if (!strcmp(name, XATTR_NAME_CAPS)) {
+   if (!capable(CAP_SETFCAP))
+   return -EPERM;
+   return 0;
+   } else if (!strncmp(name, XATTR_SECURITY_PREFIX,
 sizeof(XATTR_SECURITY_PREFIX) - 1)  &&
!capable(CAP_SYS_ADMIN))
return -EPERM;



--
Open Source Software Promotion Center, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] file capabilities: introduce cap_setfcap

2007-06-27 Thread KaiGai Kohei

Serge E. Hallyn wrote:

Here's the first patch (of several or many to come) to address some of
Andrew's comments.

Kaigai, IIUC cap_names.h will eventually be automatically updated?  (I
had to manually tweak it for testing as the new kernel sources were not
located on the test system)


The origin of cap_names.h is /usr/include/linux/capability.h.
Some scripts kicked by Makefile convert it, then cap_names.h will
be generated.

I don't know whether we can expect the kernel headers are always
deployed under /usr/include/linux, or not.
In Fedora system, the kernel-headers package deploys all headers
there, so cap_names.h will eventually be automatically updated.

Thanks,


thanks,
-serge


From fefcd341e478bd9e490d34abe9efd3c3c4f0b8a0 Mon Sep 17 00:00:00 2001

From: Serge E. Hallyn [EMAIL PROTECTED]
Date: Wed, 27 Jun 2007 13:09:20 -0400
Subject: [PATCH 1/1] file capabilities: introduce cap_setfcap

Setting file capabilities previously required the
cap_sys_admin capability, since they are stored as
extended attributes in the security.* namespace.

Introduce CAP_SETFCAP (to mirror CAP_SETPCAP), and
require it for setting file capabilities instead of
CAP_SYS_ADMIN.

Quoting Andrew Morgan,

CAP_SYS_ADMIN is way too overloaded and this
functionality is special.

Signed-off-by: Serge E. Hallyn [EMAIL PROTECTED]
---
 include/linux/capability.h |4 +++-
 security/commoncap.c   |   12 ++--
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 89125df..cdfaa10 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -324,7 +324,9 @@ typedef __u32 kernel_cap_t;
 
 #define CAP_AUDIT_CONTROL30
 
-#define CAP_NUMCAPS	 31

+#define CAP_SETFCAP 31
+
+#define CAP_NUMCAPS 32
 
 #ifdef __KERNEL__
 /* 
diff --git a/security/commoncap.c b/security/commoncap.c

index 4e9ff02..24de4fa 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -290,7 +290,11 @@ int cap_bprm_secureexec (struct linux_binprm *bprm)
 int cap_inode_setxattr(struct dentry *dentry, char *name, void *value,
   size_t size, int flags)
 {
-   if (!strncmp(name, XATTR_SECURITY_PREFIX,
+   if (!strcmp(name, XATTR_NAME_CAPS)) {
+   if (!capable(CAP_SETFCAP))
+   return -EPERM;
+   return 0;
+   } else if (!strncmp(name, XATTR_SECURITY_PREFIX,
 sizeof(XATTR_SECURITY_PREFIX) - 1)  
!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -299,7 +303,11 @@ int cap_inode_setxattr(struct dentry *dentry, char *name, 
void *value,
 
 int cap_inode_removexattr(struct dentry *dentry, char *name)

 {
-   if (!strncmp(name, XATTR_SECURITY_PREFIX,
+   if (!strcmp(name, XATTR_NAME_CAPS)) {
+   if (!capable(CAP_SETFCAP))
+   return -EPERM;
+   return 0;
+   } else if (!strncmp(name, XATTR_SECURITY_PREFIX,
 sizeof(XATTR_SECURITY_PREFIX) - 1)  
!capable(CAP_SYS_ADMIN))
return -EPERM;



--
Open Source Software Promotion Center, NEC
KaiGai Kohei [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH -mm] file caps: make on-disk capabilities future-proof

2007-02-20 Thread KaiGai Kohei
   return -EPERM;
-   }
-
return 0;
 }
 
@@ -148,50 +145,65 @@ static int set_file_caps(struct linux_bi

 {
struct dentry *dentry;
ssize_t rc;
-   struct vfs_cap_data_disk dcaps;
+   struct vfs_cap_data_disk *dcaps;
struct vfs_cap_data caps;
struct inode *inode;
-   int err;
 
 	if (bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)

return 0;
 
 	dentry = dget(bprm->file->f_dentry);

inode = dentry->d_inode;
-   if (!inode->i_op || !inode->i_op->getxattr) {
-   dput(dentry);
-   return 0;
+   rc = 0;
+   if (!inode->i_op || !inode->i_op->getxattr)
+   goto out;
+
+   rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, NULL, 0);
+   if (rc == -ENODATA) {
+   rc = 0;
+   goto out;
+   }
+   if (rc < 0)
+   goto out;
+   if (rc < sizeof(struct vfs_cap_data_disk)) {
+   rc = -EINVAL;
+   goto out;
+   }
+
+   dcaps = kmalloc(rc, GFP_KERNEL);
+   if (!dcaps) {
+   rc = -ENOMEM;
+   goto out;
+   }
+   rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, dcaps,
+   XATTR_CAPS_SZ);
+   if (rc == -ENODATA) {
+   rc = 0;
+   goto out_free;
}
 
-	rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, ,

-   sizeof(dcaps));
-   dput(dentry);
-
-   if (rc == -ENODATA)
-   return 0;
-
if (rc < 0) {
printk(KERN_NOTICE "%s: Error (%zd) getting xattr\n",
__FUNCTION__, rc);
-   return rc;
+   goto out_free;
}
 
-	if (rc != sizeof(dcaps)) {

-   printk(KERN_NOTICE "%s: got wrong size for getxattr (%zd)\n",
-   __FUNCTION__, rc);
-   return -EPERM;
-   }
-
-   cap_from_disk(, );
-   err = check_cap_sanity();
-   if (err)
-   return err;
+   rc = cap_from_disk(dcaps, , rc);
+   if (rc)
+   goto out_free;
+   rc = check_cap_sanity();
+   if (rc)
+   goto out_free;
 
 	bprm->cap_effective = caps.effective;

bprm->cap_permitted = caps.permitted;
bprm->cap_inheritable = caps.inheritable;
 
-	return 0;

+out_free:
+   kfree(dcaps);
+out:
+   dput(dentry);
+   return rc;
 }
 #else
 static inline int set_file_caps(struct linux_binprm *bprm)



--
KaiGai Kohei <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH -mm] file caps: make on-disk capabilities future-proof

2007-02-20 Thread KaiGai Kohei
 *dentry;
ssize_t rc;
-   struct vfs_cap_data_disk dcaps;
+   struct vfs_cap_data_disk *dcaps;
struct vfs_cap_data caps;
struct inode *inode;
-   int err;
 
 	if (bprm-file-f_vfsmnt-mnt_flags  MNT_NOSUID)

return 0;
 
 	dentry = dget(bprm-file-f_dentry);

inode = dentry-d_inode;
-   if (!inode-i_op || !inode-i_op-getxattr) {
-   dput(dentry);
-   return 0;
+   rc = 0;
+   if (!inode-i_op || !inode-i_op-getxattr)
+   goto out;
+
+   rc = inode-i_op-getxattr(dentry, XATTR_NAME_CAPS, NULL, 0);
+   if (rc == -ENODATA) {
+   rc = 0;
+   goto out;
+   }
+   if (rc  0)
+   goto out;
+   if (rc  sizeof(struct vfs_cap_data_disk)) {
+   rc = -EINVAL;
+   goto out;
+   }
+
+   dcaps = kmalloc(rc, GFP_KERNEL);
+   if (!dcaps) {
+   rc = -ENOMEM;
+   goto out;
+   }
+   rc = inode-i_op-getxattr(dentry, XATTR_NAME_CAPS, dcaps,
+   XATTR_CAPS_SZ);
+   if (rc == -ENODATA) {
+   rc = 0;
+   goto out_free;
}
 
-	rc = inode-i_op-getxattr(dentry, XATTR_NAME_CAPS, dcaps,

-   sizeof(dcaps));
-   dput(dentry);
-
-   if (rc == -ENODATA)
-   return 0;
-
if (rc  0) {
printk(KERN_NOTICE %s: Error (%zd) getting xattr\n,
__FUNCTION__, rc);
-   return rc;
+   goto out_free;
}
 
-	if (rc != sizeof(dcaps)) {

-   printk(KERN_NOTICE %s: got wrong size for getxattr (%zd)\n,
-   __FUNCTION__, rc);
-   return -EPERM;
-   }
-
-   cap_from_disk(dcaps, caps);
-   err = check_cap_sanity(caps);
-   if (err)
-   return err;
+   rc = cap_from_disk(dcaps, caps, rc);
+   if (rc)
+   goto out_free;
+   rc = check_cap_sanity(caps);
+   if (rc)
+   goto out_free;
 
 	bprm-cap_effective = caps.effective;

bprm-cap_permitted = caps.permitted;
bprm-cap_inheritable = caps.inheritable;
 
-	return 0;

+out_free:
+   kfree(dcaps);
+out:
+   dput(dentry);
+   return rc;
 }
 #else
 static inline int set_file_caps(struct linux_binprm *bprm)



--
KaiGai Kohei [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Implement file posix capabilities

2006-12-01 Thread KaiGai Kohei

Oops, it's my stupid bug.


Ah, this actually makes sense.  The setfcaps usage() statement does

for (i=0; _cap_names[i]; i++) {
printf...

so it expects _cap_names to end with a terminating NULL, but that
doesn't seem to be the case in cap_names.h in libcap.

KaiGai, perhaps setfcaps should do something like

diff setfcaps.c.orig setfcaps.c
25c25
< for (i=0; _cap_names[i]; i++)
---

for (i=0; i<__CAP_BITS; i++)


I fixed the matter as follows:

[EMAIL PROTECTED] libcap-1.10.kg]$ env LANG=C svn diff -r2:3
Index: libcap/_makenames.c
===
--- libcap/_makenames.c (revision 2)
+++ libcap/_makenames.c (revision 3)
@@ -45,6 +45,7 @@
   "#define __CAP_BITS   %d\n"
   "\n"
   "#ifdef LIBCAP_PLEASE_INCLUDE_ARRAY\n"
+  "  int const _cap_names_num = __CAP_BITS;\n"
   "  char const *_cap_names[__CAP_BITS] = {\n", maxcaps);

 for (i=0; i
 #include 

+extern int const _cap_names_num;
 extern char const *_cap_names[];

 static void usage() {
@@ -21,8 +22,8 @@
 int i;

 fputs(message, stderr);
-for (i=0; _cap_names[i]; i++)
-   fprintf(stderr, "%s%s", i%4==0 ? "\n\t" : ", ", _cap_names[i]);
+for (i=0; i < _cap_names_num; i++)
+fprintf(stderr, "%s%s", i%4==0 ? "\n\t" : ", ", _cap_names[i]);
 fputc('\n', stderr);
 exit(0);
 }
[EMAIL PROTECTED] libcap-1.10.kg]$

Because '__CAP_BITS' is decided at compiling time, I think it's not
appropriate to indicate the length of _cap_names[] which is linked
at run time.
Therefore, I add a new integer variable _cap_names_num to represent
the length of _cap_names at run time.

You can download it from http://www.kaigai.gr.jp/index.php?FrontPage#b556e50d

Thanks,
--
KaiGai Kohei <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Implement file posix capabilities

2006-12-01 Thread KaiGai Kohei

Oops, it's my stupid bug.


Ah, this actually makes sense.  The setfcaps usage() statement does

for (i=0; _cap_names[i]; i++) {
printf...

so it expects _cap_names to end with a terminating NULL, but that
doesn't seem to be the case in cap_names.h in libcap.

KaiGai, perhaps setfcaps should do something like

diff setfcaps.c.orig setfcaps.c
25c25
 for (i=0; _cap_names[i]; i++)
---

for (i=0; i__CAP_BITS; i++)


I fixed the matter as follows:

[EMAIL PROTECTED] libcap-1.10.kg]$ env LANG=C svn diff -r2:3
Index: libcap/_makenames.c
===
--- libcap/_makenames.c (revision 2)
+++ libcap/_makenames.c (revision 3)
@@ -45,6 +45,7 @@
   #define __CAP_BITS   %d\n
   \n
   #ifdef LIBCAP_PLEASE_INCLUDE_ARRAY\n
+int const _cap_names_num = __CAP_BITS;\n
 char const *_cap_names[__CAP_BITS] = {\n, maxcaps);

 for (i=0; imaxcaps; ++i) {
Index: libcap/include/sys/capability.h
===
--- libcap/include/sys/capability.h (revision 2)
+++ libcap/include/sys/capability.h (revision 3)
@@ -113,6 +113,7 @@
 extern int capgetp(pid_t pid, cap_t cap_d);
 extern int capsetp(pid_t pid, cap_t cap_d);
 extern char const *_cap_names[];
+extern int const _cap_names_num;

 #endif /* !defined(_POSIX_SOURCE) */

Index: progs/setfcaps.c
===
--- progs/setfcaps.c(revision 2)
+++ progs/setfcaps.c(revision 3)
@@ -14,6 +14,7 @@
 #include errno.h
 #include sys/capability.h

+extern int const _cap_names_num;
 extern char const *_cap_names[];

 static void usage() {
@@ -21,8 +22,8 @@
 int i;

 fputs(message, stderr);
-for (i=0; _cap_names[i]; i++)
-   fprintf(stderr, %s%s, i%4==0 ? \n\t : , , _cap_names[i]);
+for (i=0; i  _cap_names_num; i++)
+fprintf(stderr, %s%s, i%4==0 ? \n\t : , , _cap_names[i]);
 fputc('\n', stderr);
 exit(0);
 }
[EMAIL PROTECTED] libcap-1.10.kg]$

Because '__CAP_BITS' is decided at compiling time, I think it's not
appropriate to indicate the length of _cap_names[] which is linked
at run time.
Therefore, I add a new integer variable _cap_names_num to represent
the length of _cap_names at run time.

You can download it from http://www.kaigai.gr.jp/index.php?FrontPage#b556e50d

Thanks,
--
KaiGai Kohei [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.11-rc4-mm1] connector: Add a fork connector

2005-03-02 Thread Kaigai Kohei
Hello, Guillaume

I tried to measure the process-creation/destruction performance on 
2.6.11-rc4-mm1 plus
some extensiton(Normal/with PAGG/with Fork-Connector).
But I received a following messages endlessly on system console with 
Fork-Connector extensiton.

# on IA-64 environment / When an simple fork() iteration is run in parallel.
skb does not have enough length: requested msg->len=10[28], 
nlh->nlmsg_len=48[32], skb->len=48[must be 30].
skb does not have enough length: requested msg->len=10[28], 
nlh->nlmsg_len=48[32], skb->len=48[must be 30].
skb does not have enough length: requested msg->len=10[28], 
nlh->nlmsg_len=48[32], skb->len=48[must be 30].
  :

Is's generated at drivers/connector/connector.c:__cn_rx_skb(), and this warn 
the length of msg's payload
does not fit in nlmsghdr's length.
This message means netlink packet is not sent to user space.
I was notified occurence of fork() by printk(). :-(

The attached simple *.c file can enable/disable fork-connector and listen the 
fork-notification.
Because It's first experimence for me to write a code to use netlink, point out 
a right how-to-use
if there's some mistakes at user side apprication.

Thanks.

P.S. I can't reproduce lockup on 367th-fork() with your latest patch.

Guillaume Thouvenin wrote:
>   ChangeLog:
> 
> - Add parenthesis around sizeof(struct cn_msg) + CN_FORK_INFO_SIZE
>   in the CN_FORK_MSG_SIZE macro
> - fork_cn_lock is declareed with DEFINE_SPINLOCK()
> - fork_cn_lock is defined as static and local to fork_connector()
> - Create a specific module cn_fork.c in drivers/connector to
>   register the callback.
> - Improve the callback that turns on/off the fork connector
> 
>   I also run the lmbench and results are send in response to another
> thread "A common layer for Accounting packages". When fork connector is
> turned off the overhead is negligible. This patch works with another
> small patch that fix a problem in the connector. Without it, there is a
> message that says "skb does not have enough length". It will be fix in
> the next -mm tree I think.
> 
> 
> Thanks everyone for the comments,
> Guillaume

-- 
Linux Promotion Center, NEC
KaiGai Kohei <[EMAIL PROTECTED]>#include 
#include 
#include 
#include 
#include 
#include 
#include 

void usage(){
  puts("usage: fclisten ");
  puts("  Default -> listening fork-connector");
  puts("  on  -> fork-connector Enable");
  puts("  off -> fork-connector Disable");
  exit(0);
}

#define MODE_LISTEN  (1)
#define MODE_ENABLE  (2)
#define MODE_DISABLE (3)

struct cb_id
{
  __u32   idx;
  __u32   val;
};

struct cn_msg
{
  struct cb_idid;
  __u32   seq;
  __u32   ack;
  __u32   len;/* Length of the following data */
  __u8data[0];
};


int main(int argc, char *argv[]){
  char buf[4096];
  int mode, sockfd, len;
  struct sockaddr_nl ad;
  struct nlmsghdr *hdr = (struct nlmsghdr *)buf;
  struct cn_msg *msg = (struct cn_msg *)(buf+sizeof(struct nlmsghdr));
  
  switch(argc){
  case 1:
mode = MODE_LISTEN;
break;
  case 2:
if (strcasecmp("on",argv[1])==0) {
  mode = MODE_ENABLE;
}else if (strcasecmp("off",argv[1])==0){
  mode = MODE_DISABLE;
}else{
  usage();
}
break;
  default:
usage();
break;
  }
  
  if( (sockfd=socket(PF_NETLINK, SOCK_RAW, NETLINK_NFLOG)) < 0 ){
fprintf(stderr, "Fault on socket().\n");
return( 1 );
  }
  ad.nl_family = AF_NETLINK;
  ad.nl_pad = 0;
  ad.nl_pid = getpid();
  ad.nl_groups = -1;
  if( bind(sockfd, (struct sockaddr *), sizeof(ad)) ){
fprintf(stderr, "Fault on bind to netlink.\n");
return( 2 );
  }

  if (mode==MODE_LISTEN) {
while(-1){
  len = recvfrom(sockfd, buf, 4096, 0, NULL, NULL);
  printf("%d-byte recv Seq=%d\n", len, hdr->nlmsg_seq);
}
  }else{
ad.nl_family = AF_NETLINK;
ad.nl_pad = 0;
ad.nl_pid = 0;
ad.nl_groups = 1;

hdr->nlmsg_len = sizeof(struct nlmsghdr) + sizeof(struct cn_msg) + 
sizeof(int);
hdr->nlmsg_type = 0;
hdr->nlmsg_flags = 0;
hdr->nlmsg_seq = 0;
hdr->nlmsg_pid = getpid();
msg->id.idx = 0xfeed;
msg->id.val = 0xbeef;
msg->seq = msg->ack = 0;
msg->len = sizeof(int);

if (mode==MODE_ENABLE){
  (*(int *)(msg->data)) = 1;
} else {
  (*(int *)(msg->data)) = 0;
}
sendto(sockfd, buf, sizeof(struct nlmsghdr)+sizeof(struct 
cn_msg)+sizeof(int),
   0, (struct sockaddr *), sizeof(ad));
  }
}


Re: [PATCH 2.6.11-rc4-mm1] connector: Add a fork connector

2005-03-02 Thread Kaigai Kohei
Hello, Guillaume

I tried to measure the process-creation/destruction performance on 
2.6.11-rc4-mm1 plus
some extensiton(Normal/with PAGG/with Fork-Connector).
But I received a following messages endlessly on system console with 
Fork-Connector extensiton.

# on IA-64 environment / When an simple fork() iteration is run in parallel.
skb does not have enough length: requested msg-len=10[28], 
nlh-nlmsg_len=48[32], skb-len=48[must be 30].
skb does not have enough length: requested msg-len=10[28], 
nlh-nlmsg_len=48[32], skb-len=48[must be 30].
skb does not have enough length: requested msg-len=10[28], 
nlh-nlmsg_len=48[32], skb-len=48[must be 30].
  :

Is's generated at drivers/connector/connector.c:__cn_rx_skb(), and this warn 
the length of msg's payload
does not fit in nlmsghdr's length.
This message means netlink packet is not sent to user space.
I was notified occurence of fork() by printk(). :-(

The attached simple *.c file can enable/disable fork-connector and listen the 
fork-notification.
Because It's first experimence for me to write a code to use netlink, point out 
a right how-to-use
if there's some mistakes at user side apprication.

Thanks.

P.S. I can't reproduce lockup on 367th-fork() with your latest patch.

Guillaume Thouvenin wrote:
   ChangeLog:
 
 - Add parenthesis around sizeof(struct cn_msg) + CN_FORK_INFO_SIZE
   in the CN_FORK_MSG_SIZE macro
 - fork_cn_lock is declareed with DEFINE_SPINLOCK()
 - fork_cn_lock is defined as static and local to fork_connector()
 - Create a specific module cn_fork.c in drivers/connector to
   register the callback.
 - Improve the callback that turns on/off the fork connector
 
   I also run the lmbench and results are send in response to another
 thread "A common layer for Accounting packages". When fork connector is
 turned off the overhead is negligible. This patch works with another
 small patch that fix a problem in the connector. Without it, there is a
 message that says "skb does not have enough length". It will be fix in
 the next -mm tree I think.
 
 
 Thanks everyone for the comments,
 Guillaume

-- 
Linux Promotion Center, NEC
KaiGai Kohei [EMAIL PROTECTED]#include stdio.h
#include stdlib.h
#include string.h
#include asm/types.h
#include sys/types.h
#include sys/socket.h
#include linux/netlink.h

void usage(){
  puts(usage: fclisten on|off);
  puts(  Default - listening fork-connector);
  puts(  on  - fork-connector Enable);
  puts(  off - fork-connector Disable);
  exit(0);
}

#define MODE_LISTEN  (1)
#define MODE_ENABLE  (2)
#define MODE_DISABLE (3)

struct cb_id
{
  __u32   idx;
  __u32   val;
};

struct cn_msg
{
  struct cb_idid;
  __u32   seq;
  __u32   ack;
  __u32   len;/* Length of the following data */
  __u8data[0];
};


int main(int argc, char *argv[]){
  char buf[4096];
  int mode, sockfd, len;
  struct sockaddr_nl ad;
  struct nlmsghdr *hdr = (struct nlmsghdr *)buf;
  struct cn_msg *msg = (struct cn_msg *)(buf+sizeof(struct nlmsghdr));
  
  switch(argc){
  case 1:
mode = MODE_LISTEN;
break;
  case 2:
if (strcasecmp(on,argv[1])==0) {
  mode = MODE_ENABLE;
}else if (strcasecmp(off,argv[1])==0){
  mode = MODE_DISABLE;
}else{
  usage();
}
break;
  default:
usage();
break;
  }
  
  if( (sockfd=socket(PF_NETLINK, SOCK_RAW, NETLINK_NFLOG))  0 ){
fprintf(stderr, Fault on socket().\n);
return( 1 );
  }
  ad.nl_family = AF_NETLINK;
  ad.nl_pad = 0;
  ad.nl_pid = getpid();
  ad.nl_groups = -1;
  if( bind(sockfd, (struct sockaddr *)ad, sizeof(ad)) ){
fprintf(stderr, Fault on bind to netlink.\n);
return( 2 );
  }

  if (mode==MODE_LISTEN) {
while(-1){
  len = recvfrom(sockfd, buf, 4096, 0, NULL, NULL);
  printf(%d-byte recv Seq=%d\n, len, hdr-nlmsg_seq);
}
  }else{
ad.nl_family = AF_NETLINK;
ad.nl_pad = 0;
ad.nl_pid = 0;
ad.nl_groups = 1;

hdr-nlmsg_len = sizeof(struct nlmsghdr) + sizeof(struct cn_msg) + 
sizeof(int);
hdr-nlmsg_type = 0;
hdr-nlmsg_flags = 0;
hdr-nlmsg_seq = 0;
hdr-nlmsg_pid = getpid();
msg-id.idx = 0xfeed;
msg-id.val = 0xbeef;
msg-seq = msg-ack = 0;
msg-len = sizeof(int);

if (mode==MODE_ENABLE){
  (*(int *)(msg-data)) = 1;
} else {
  (*(int *)(msg-data)) = 0;
}
sendto(sockfd, buf, sizeof(struct nlmsghdr)+sizeof(struct 
cn_msg)+sizeof(int),
   0, (struct sockaddr *)ad, sizeof(ad));
  }
}


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-03-01 Thread Kaigai Kohei
Hello,

> I tested without user space listeners and the cost is negligible. I will
> test with a user space listeners and see the results. I'm going to run
> the test this week after improving the mechanism that switch on/off the
> sending of the message.

I'm also trying to mesure the process-creation/destruction performance on 
following three environment.
Archtechture: i686 / Distribution: Fedora Core 3
* Kernel Preemption is DISABLE
* SMP kernel but UP-machine / Not Hyper Threading
[1] 2.6.11-rc4-mm1 normal
[2] 2.6.11-rc4-mm1 with PAGG based Process Accounting Module
[3] 2.6.11-rc4-mm1 with fork-connector notification (it's enabled)

When 367th-fork() was called after fork-connector notification, kernel was 
locked up.
(User-Space-Listener has been also run until 366th-fork() notification was 
received)

Does this number have any sort of means ?
In my second trial, kernel was also locked up after 366th-fork() notification.
Currently, I don't know its causition. Is there a person encounted it?

# I wanted to say "[2] is faster than [3]" when process-grouping is enable, but 
the plan came off.  :(

Thanks.
-- 
Linux Promotion Center, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Lse-tech] Re: A common layer for Accounting packages

2005-03-01 Thread Kaigai Kohei
Hello,

 I tested without user space listeners and the cost is negligible. I will
 test with a user space listeners and see the results. I'm going to run
 the test this week after improving the mechanism that switch on/off the
 sending of the message.

I'm also trying to mesure the process-creation/destruction performance on 
following three environment.
Archtechture: i686 / Distribution: Fedora Core 3
* Kernel Preemption is DISABLE
* SMP kernel but UP-machine / Not Hyper Threading
[1] 2.6.11-rc4-mm1 normal
[2] 2.6.11-rc4-mm1 with PAGG based Process Accounting Module
[3] 2.6.11-rc4-mm1 with fork-connector notification (it's enabled)

When 367th-fork() was called after fork-connector notification, kernel was 
locked up.
(User-Space-Listener has been also run until 366th-fork() notification was 
received)

Does this number have any sort of means ?
In my second trial, kernel was also locked up after 366th-fork() notification.
Currently, I don't know its causition. Is there a person encounted it?

# I wanted to say "[2] is faster than [3]" when process-grouping is enable, but 
the plan came off.  :(

Thanks.
-- 
Linux Promotion Center, NEC
KaiGai Kohei [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-27 Thread Kaigai Kohei
Hello,
Marcelo Tosatti wrote:
> Yep, the netlink people should be able to help - they known what would be
> required for not sending messages in case there is no listener registered.
>
> Maybe its already possible? I have never used netlink myself.
If we notify the fork/exec/exit-events to user-space directly as you said,
I don't think some hackings on netlink is necessary.
For example, such packets is sent only when /proc/sys/.../process_grouping is 
set,
and user-side daemon set this value, and unset when daemon will exit.
It's not necessary to take too seriously.
>>And, why can't netlink packets send always?
>>If there are fork/exec/exit hooks, and they call CSA or other
>>process-grouping modules,
>>then those modules will decide whether packets for interaction with the
>>daemon should be
>>sent or not.
>
>
> The netlink data will be sent to userspace at fork/exec/exit hooks - one wants
> to avoid that if there are no listeners, so setups which dont want to run the
> accounting daemon dont pay the cost of building and sending the information
> through netlink.
>
> Thats what Andrew asked for if I understand correctly.
Does it mean "netlink packets shouled be sent to userspace unconditionally." ?
I have advocated steadfastly that fork/exec/exit hooks is necessary to support
process-grouping and to account per process-grouping.
It intend to be decided whether packets should be sent or not by hooked 
functions,
in my understanding.
Is it also one of the implementations whether using netlink-socket or not ?
>>In most considerable case, CSA's kernel-loadable-module using such hooks
>>will not be loaded
>>when no accounting daemon is running. Adversely, this module must be loaded
>>when accounting
>>daemon needs CSA's netlink packets.
>>Thus, it is only necessary to refer flag valiable and to execute
>>conditional-jump
>>when no-accounting daemon is running.
>
>
> That would be one hack, although it is uglier than the pure netlink
> selection.
No, I can't agree this opinion.
It means netlink-packets will be sent unconditionally when fork/exec/exit occur.
Nobady can decide which packet is sent user-space, I think.
In addition, the definition of process grouping is lightweight in many cases.
For example, CpuSet can define own process-group by one increment-operation.
I think it's not impossible to implement it in userspace, but it's not 
reasonable.
An implementation as a kernel loadable module is reasonable and enough tiny.
>>In my estimation, we must pay additional cost for an increment-operation,
>>an decrement-op,
>>an comparison-op and an conditional jump-op. It's enough lightweight, I
>>think.
>>
>>For example:
>>If CSA's module isn't loaded, 'privates_for_grouping' will be empty.
>>
>>inline int on_fork_hook(task_struct *parent, task_struct *newtask){
>>  rcu_read_lock();
>>  if( !list_empty(>privates_for_grouping) ){
>>;
>>  }
>>  rcu_read_unlock();
>>}
>
>
> Andrew has been talking about sending data over netlink to implement the
> accounting at userspace, so this piece of code is out of the game, no?
Indeed, I'm not opposed to implement the accounting in userspace and
using netlink-socket for kernel-daemon communication. But definition
of process-grouping based on any grouping policy should be done
in kernel space at reasonability viewpoint.
Thanks.
--
Linux Promotion Center, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-27 Thread KaiGai Kohei
Hi,
Kaigai Kohei <[EMAIL PROTECTED]> wrote:
In my understanding, what Andrew Morton said is "If target functionality can
implement in user space only, then we should not modify the kernel-tree".
fork, exec and exit upcalls sound pretty good to me.  As long as
a) they use the same common machinery and
b) they are next-to-zero cost if something is listening on the netlink
  socket but no accounting daemon is running.

b) would involved being able to avoid sending netlink messages in case there are 
no listeners. AFAIK that isnt possible currently, netlink sends 
packets unconditionally.

Am I wrong? 
In current implementaion, you might be right.
But we should make an effort to achieve the requirement-(b) from now.
And, why can't netlink packets send always?
If there are fork/exec/exit hooks, and they call CSA or other process-grouping 
modules,
then those modules will decide whether packets for interaction with the daemon 
should be
sent or not.
In most considerable case, CSA's kernel-loadable-module using such hooks will 
not be loaded
when no accounting daemon is running. Adversely, this module must be loaded 
when accounting
daemon needs CSA's netlink packets.
Thus, it is only necessary to refer flag valiable and to execute 
conditional-jump
when no-accounting daemon is running.
In my estimation, we must pay additional cost for an increment-operation, an 
decrement-op,
an comparison-op and an conditional jump-op. It's enough lightweight, I think.
For example:
If CSA's module isn't loaded, 'privates_for_grouping' will be empty.
inline int on_fork_hook(task_struct *parent, task_struct *newtask){
  rcu_read_lock();
  if( !list_empty(>privates_for_grouping) ){
;
  }
  rcu_read_unlock();
}
Thanks,
--
Linux Promotion Center, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-27 Thread KaiGai Kohei
Hi,
Kaigai Kohei [EMAIL PROTECTED] wrote:
In my understanding, what Andrew Morton said is If target functionality can
implement in user space only, then we should not modify the kernel-tree.
fork, exec and exit upcalls sound pretty good to me.  As long as
a) they use the same common machinery and
b) they are next-to-zero cost if something is listening on the netlink
  socket but no accounting daemon is running.

b) would involved being able to avoid sending netlink messages in case there are 
no listeners. AFAIK that isnt possible currently, netlink sends 
packets unconditionally.

Am I wrong? 
In current implementaion, you might be right.
But we should make an effort to achieve the requirement-(b) from now.
And, why can't netlink packets send always?
If there are fork/exec/exit hooks, and they call CSA or other process-grouping 
modules,
then those modules will decide whether packets for interaction with the daemon 
should be
sent or not.
In most considerable case, CSA's kernel-loadable-module using such hooks will 
not be loaded
when no accounting daemon is running. Adversely, this module must be loaded 
when accounting
daemon needs CSA's netlink packets.
Thus, it is only necessary to refer flag valiable and to execute 
conditional-jump
when no-accounting daemon is running.
In my estimation, we must pay additional cost for an increment-operation, an 
decrement-op,
an comparison-op and an conditional jump-op. It's enough lightweight, I think.
For example:
If CSA's module isn't loaded, 'privates_for_grouping' will be empty.
inline int on_fork_hook(task_struct *parent, task_struct *newtask){
  rcu_read_lock();
  if( !list_empty(parent-privates_for_grouping) ){
..Calling to any process grouping module..;
  }
  rcu_read_unlock();
}
Thanks,
--
Linux Promotion Center, NEC
KaiGai Kohei [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-27 Thread Kaigai Kohei
Hello,
Marcelo Tosatti wrote:
 Yep, the netlink people should be able to help - they known what would be
 required for not sending messages in case there is no listener registered.

 Maybe its already possible? I have never used netlink myself.
If we notify the fork/exec/exit-events to user-space directly as you said,
I don't think some hackings on netlink is necessary.
For example, such packets is sent only when /proc/sys/.../process_grouping is 
set,
and user-side daemon set this value, and unset when daemon will exit.
It's not necessary to take too seriously.
And, why can't netlink packets send always?
If there are fork/exec/exit hooks, and they call CSA or other
process-grouping modules,
then those modules will decide whether packets for interaction with the
daemon should be
sent or not.


 The netlink data will be sent to userspace at fork/exec/exit hooks - one wants
 to avoid that if there are no listeners, so setups which dont want to run the
 accounting daemon dont pay the cost of building and sending the information
 through netlink.

 Thats what Andrew asked for if I understand correctly.
Does it mean netlink packets shouled be sent to userspace unconditionally. ?
I have advocated steadfastly that fork/exec/exit hooks is necessary to support
process-grouping and to account per process-grouping.
It intend to be decided whether packets should be sent or not by hooked 
functions,
in my understanding.
Is it also one of the implementations whether using netlink-socket or not ?
In most considerable case, CSA's kernel-loadable-module using such hooks
will not be loaded
when no accounting daemon is running. Adversely, this module must be loaded
when accounting
daemon needs CSA's netlink packets.
Thus, it is only necessary to refer flag valiable and to execute
conditional-jump
when no-accounting daemon is running.


 That would be one hack, although it is uglier than the pure netlink
 selection.
No, I can't agree this opinion.
It means netlink-packets will be sent unconditionally when fork/exec/exit occur.
Nobady can decide which packet is sent user-space, I think.
In addition, the definition of process grouping is lightweight in many cases.
For example, CpuSet can define own process-group by one increment-operation.
I think it's not impossible to implement it in userspace, but it's not 
reasonable.
An implementation as a kernel loadable module is reasonable and enough tiny.
In my estimation, we must pay additional cost for an increment-operation,
an decrement-op,
an comparison-op and an conditional jump-op. It's enough lightweight, I
think.

For example:
If CSA's module isn't loaded, 'privates_for_grouping' will be empty.

inline int on_fork_hook(task_struct *parent, task_struct *newtask){
  rcu_read_lock();
  if( !list_empty(parent-privates_for_grouping) ){
..Calling to any process grouping module..;
  }
  rcu_read_unlock();
}


 Andrew has been talking about sending data over netlink to implement the
 accounting at userspace, so this piece of code is out of the game, no?
Indeed, I'm not opposed to implement the accounting in userspace and
using netlink-socket for kernel-daemon communication. But definition
of process-grouping based on any grouping policy should be done
in kernel space at reasonability viewpoint.
Thanks.
--
Linux Promotion Center, NEC
KaiGai Kohei [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-24 Thread Kaigai Kohei
Sorry for this late reply.
>> [1] Is it necessary 'fork/exec/exit' event handling framework ?
   ..
>> Some process-aggregation model have own philosophy and implemantation,
>> so it's hard to integrate. Thus, I think that common 'fork/exec/exit'
>> event handling
>> framework to implement any kinds of process-aggregation.
>
>
> BSD needs an exit hook and ELSA needs a fork hook. I am still
> evaluating whether CSA can use the ELSA module. If CSA can use the
> ELSA module, CSA maybe would be fine with the fork hook.
If CSA can use an ELSA module, then we must modify the kernel-tree
for ELSA's fork-connecter. This means it's hard to implement the fork/exec/exit
event notification to userspace (,or any kernel module) without kernel-support.
How CSA shoule be implemented is interesting and important, but should it be
main subject in this discussion that such a kinds of kernel hook is necessary
to implement process-accounting per process-aggregation reasonable ?
In my understanding, what Andrew Morton said is "If target functionality can
implement in user space only, then we should not modify the kernel-tree".
But, any kind of kernel support was required to handle process lifecycle events
for the accounting per process-aggregation and so on, from our discussion.
I'm also opposed to an adhoc approach, like CSA depending on ELSA.
We should walk hight road.
Thanks,
--
Linux Promotion Center, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-24 Thread Kaigai Kohei
Sorry for this late reply.
 [1] Is it necessary 'fork/exec/exit' event handling framework ?
   ...ommited...
 Some process-aggregation model have own philosophy and implemantation,
 so it's hard to integrate. Thus, I think that common 'fork/exec/exit'
 event handling
 framework to implement any kinds of process-aggregation.


 BSD needs an exit hook and ELSA needs a fork hook. I am still
 evaluating whether CSA can use the ELSA module. If CSA can use the
 ELSA module, CSA maybe would be fine with the fork hook.
If CSA can use an ELSA module, then we must modify the kernel-tree
for ELSA's fork-connecter. This means it's hard to implement the fork/exec/exit
event notification to userspace (,or any kernel module) without kernel-support.
How CSA shoule be implemented is interesting and important, but should it be
main subject in this discussion that such a kinds of kernel hook is necessary
to implement process-accounting per process-aggregation reasonable ?
In my understanding, what Andrew Morton said is If target functionality can
implement in user space only, then we should not modify the kernel-tree.
But, any kind of kernel support was required to handle process lifecycle events
for the accounting per process-aggregation and so on, from our discussion.
I'm also opposed to an adhoc approach, like CSA depending on ELSA.
We should walk hight road.
Thanks,
--
Linux Promotion Center, NEC
KaiGai Kohei [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-23 Thread Kaigai Kohei
Hi, Thanks for your comments.
Andrew Morton wrote:
>> Some process-aggregation model have own philosophy and implemantation,
>> so it's hard to integrate. Thus, I think that common 'fork/exec/exit' event 
handling
>> framework to implement any kinds of process-aggregation.
>
>
> We really want to avoid doing such stuff in-kernel if at all possible, of
> course.
>
> Is it not possible to implement the fork/exec/exit notifications to
> userspace so that a daemon can track the process relationships and perform
> aggregation based upon individual tasks' accounting?  That's what one of
> the accounting systems is proposing doing, I believe.
>
> (In fact, why do we even need the notifications?  /bin/ps can work this
> stuff out).
It's hard to prove that we can't implement the process-aggregation only
in user-space, but there are some difficulties on imaplementation, I think.
For example, each process must have a tag or another identifier to explain
what process-aggregation does it belong, but kernel does not support thoes
kind of information, currently. Thus, we can't guarantee associating one
process-aggregation with one process.
# /proc//loginuid might be candidate, but it's out of original purpose.
We might be able to make alike system, but is it hard to implement strict
process-aggregation without any kernel supports?
I think that well thought out kernel-modification is better than ad-hoc
implementation on user-space.
Thanks.
--
Linux Promotion Center, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-23 Thread Kaigai Kohei
Hi, Thanks for your comments.
Andrew Morton wrote:
 Some process-aggregation model have own philosophy and implemantation,
 so it's hard to integrate. Thus, I think that common 'fork/exec/exit' event 
handling
 framework to implement any kinds of process-aggregation.


 We really want to avoid doing such stuff in-kernel if at all possible, of
 course.

 Is it not possible to implement the fork/exec/exit notifications to
 userspace so that a daemon can track the process relationships and perform
 aggregation based upon individual tasks' accounting?  That's what one of
 the accounting systems is proposing doing, I believe.

 (In fact, why do we even need the notifications?  /bin/ps can work this
 stuff out).
It's hard to prove that we can't implement the process-aggregation only
in user-space, but there are some difficulties on imaplementation, I think.
For example, each process must have a tag or another identifier to explain
what process-aggregation does it belong, but kernel does not support thoes
kind of information, currently. Thus, we can't guarantee associating one
process-aggregation with one process.
# /proc/uid/loginuid might be candidate, but it's out of original purpose.
We might be able to make alike system, but is it hard to implement strict
process-aggregation without any kernel supports?
I think that well thought out kernel-modification is better than ad-hoc
implementation on user-space.
Thanks.
--
Linux Promotion Center, NEC
KaiGai Kohei [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-22 Thread Kaigai Kohei
Hi, Thanks for your comments.
>> I think there are two issues about system accounting framework.
>>
>> Issue: 1) How to define the appropriate unit for accounting ?
>> Current BSD-accountiong make a collection per process accounting
>> information.
>> CSA make additionally a collection per process-aggregation accounting.
>
>
> The 'enhanced acct data collection' patches that were added to
> 2-6-11-rc* tree still do collection of per process data.
Hmm, I have not noticed this extension. But I made sure about it.
The following your two patches implements enhanced data collection, didn't it?
- ChangeLog for 2.6.11-rc1
[PATCH] enhanced I/O accounting data patch
[PATCH] enhanced Memory accounting data collection
Since making a collection per process accounting is unified to the stock kernel,
I want to have a discussion about remaining half, "How to define the appropriate
unit for accounting ?"
We can agree that only per process-accounting is so rigid, I think.
Then, process-aggregation should be provided in one way or another.
[1] Is it necessary 'fork/exec/exit' event handling framework ?
The common agreement for the method of dealing with process aggregation
has not been constructed yet, I understood. And, we will not able to
integrate each process aggregation model because of its diverseness.
For example, a process which belong to JOB-A must not belong any other
'JOB-X' in CSA-model. But, In ELSA-model, a process in BANK-B can concurrently
belong to BANK-B1 which is a child of BANK-B.
And, there are other defferences:
Whether a process not to belong to any process-aggregation is permitted or not ?
Whether a process-aggregation should be inherited to child process or not ?
(There is possibility not to be inherited in a rule-based process aggregation 
like CKRM)
Some process-aggregation model have own philosophy and implemantation,
so it's hard to integrate. Thus, I think that common 'fork/exec/exit' event 
handling
framework to implement any kinds of process-aggregation.
[2] What implementation should be adopted ?
I think registerable hooks on fork/execve/exit is necessary, not only exit() 
hook.
Because a rule or policy based process-aggregation model requirees to catch
the transition of a process status.
It might be enough to hook the exit() event only in process-accounting,
but it's not kind for another customer.
Thus, I recommend SGI's PAGG.
In my understanding, the reason for not to include such a framework is that
increase of unidentifiable (proprietary) modules is worried.
But, SI can divert LSM to implemente process-aggregation if they ignore
the LSM's original purpose, for example.
# I'm strongly opposed to such a movement as a SELinux's user :-)
So, I think such a fork/execve/exit hooks is harmless now.
Is this the time to unify it?
Thanks.
> CSA added those per-process data to per-aggregation ("job") data
> structure at do_exit() time when a process termintes.
>
>>
>> It is appropriate to make the fork-exit event handling framework for
>> definition
>> of the process-aggregation, such as PAGG.
>>
>> This system-accounting per process-aggregation is quite useful,
>> thought I tried the SGI's implementation named 'job' in past days.
>>
>>
>> Issue: 2) What items should be collected for accounting information ?
>> BSD-accounting collects PID/UID/GID, User/Sys/Elapsed-Time, and # of
>> minor/major page faults. SGI's CSA collects VM/RSS size on exit time,
>> Integral-VM/RSS, and amount of block-I/O additionally.
>
>
> These data are now collected in 2.6.11-rc* code. Note that these data
> are still per-process.
>
>>
>> I think it's hard to implement the accounting-engine as a kernel loadable
>> module using any kinds of framework. Because, we must put callback
>> functions
>> into all around the kernel for this purpose.
>>
>> Thus, I make a proposion as follows:
>> We should separate the process-aggregation functionality and collecting
>> accounting informations.
>
>
> I totally agree with this! Actually that was what we have done. The data
> collection part of code has been unified.
>
>> Something of framework to implement process-aggregation is necessary.
>> And, making a collection of accounting information should be merged
>> into BSD-accounting and implemented as a part of monolithic kernel
>> as Guillaume said.
>
>
> This sounds good. I am interested in learning how ELSA saves off
> the per-process accounting data before the data got disposed. If
> that scheme works for CSA, we would be very happy to adopt the
> scheme. The current BSD scheme is very insufficient. The code is
> very BSD centric and it provides no way to handle process-aggregation.
>
> Thanks,
>  - jay
>
>>
>> Thanks.
--
Linux Promotion Center, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-22 Thread Kaigai Kohei
Hi, Thanks for your comments.
 I think there are two issues about system accounting framework.

 Issue: 1) How to define the appropriate unit for accounting ?
 Current BSD-accountiong make a collection per process accounting
 information.
 CSA make additionally a collection per process-aggregation accounting.


 The 'enhanced acct data collection' patches that were added to
 2-6-11-rc* tree still do collection of per process data.
Hmm, I have not noticed this extension. But I made sure about it.
The following your two patches implements enhanced data collection, didn't it?
- ChangeLog for 2.6.11-rc1
[PATCH] enhanced I/O accounting data patch
[PATCH] enhanced Memory accounting data collection
Since making a collection per process accounting is unified to the stock kernel,
I want to have a discussion about remaining half, How to define the appropriate
unit for accounting ?
We can agree that only per process-accounting is so rigid, I think.
Then, process-aggregation should be provided in one way or another.
[1] Is it necessary 'fork/exec/exit' event handling framework ?
The common agreement for the method of dealing with process aggregation
has not been constructed yet, I understood. And, we will not able to
integrate each process aggregation model because of its diverseness.
For example, a process which belong to JOB-A must not belong any other
'JOB-X' in CSA-model. But, In ELSA-model, a process in BANK-B can concurrently
belong to BANK-B1 which is a child of BANK-B.
And, there are other defferences:
Whether a process not to belong to any process-aggregation is permitted or not ?
Whether a process-aggregation should be inherited to child process or not ?
(There is possibility not to be inherited in a rule-based process aggregation 
like CKRM)
Some process-aggregation model have own philosophy and implemantation,
so it's hard to integrate. Thus, I think that common 'fork/exec/exit' event 
handling
framework to implement any kinds of process-aggregation.
[2] What implementation should be adopted ?
I think registerable hooks on fork/execve/exit is necessary, not only exit() 
hook.
Because a rule or policy based process-aggregation model requirees to catch
the transition of a process status.
It might be enough to hook the exit() event only in process-accounting,
but it's not kind for another customer.
Thus, I recommend SGI's PAGG.
In my understanding, the reason for not to include such a framework is that
increase of unidentifiable (proprietary) modules is worried.
But, SI can divert LSM to implemente process-aggregation if they ignore
the LSM's original purpose, for example.
# I'm strongly opposed to such a movement as a SELinux's user :-)
So, I think such a fork/execve/exit hooks is harmless now.
Is this the time to unify it?
Thanks.
 CSA added those per-process data to per-aggregation (job) data
 structure at do_exit() time when a process termintes.


 It is appropriate to make the fork-exit event handling framework for
 definition
 of the process-aggregation, such as PAGG.

 This system-accounting per process-aggregation is quite useful,
 thought I tried the SGI's implementation named 'job' in past days.


 Issue: 2) What items should be collected for accounting information ?
 BSD-accounting collects PID/UID/GID, User/Sys/Elapsed-Time, and # of
 minor/major page faults. SGI's CSA collects VM/RSS size on exit time,
 Integral-VM/RSS, and amount of block-I/O additionally.


 These data are now collected in 2.6.11-rc* code. Note that these data
 are still per-process.


 I think it's hard to implement the accounting-engine as a kernel loadable
 module using any kinds of framework. Because, we must put callback
 functions
 into all around the kernel for this purpose.

 Thus, I make a proposion as follows:
 We should separate the process-aggregation functionality and collecting
 accounting informations.


 I totally agree with this! Actually that was what we have done. The data
 collection part of code has been unified.

 Something of framework to implement process-aggregation is necessary.
 And, making a collection of accounting information should be merged
 into BSD-accounting and implemented as a part of monolithic kernel
 as Guillaume said.


 This sounds good. I am interested in learning how ELSA saves off
 the per-process accounting data before the data got disposed. If
 that scheme works for CSA, we would be very happy to adopt the
 scheme. The current BSD scheme is very insufficient. The code is
 very BSD centric and it provides no way to handle process-aggregation.

 Thanks,
  - jay


 Thanks.
--
Linux Promotion Center, NEC
KaiGai Kohei [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-20 Thread Kaigai Kohei
Hello, everyone.
Andrew Morton wrote:
> Jay Lan <[EMAIL PROTECTED]> wrote:
>
>>Since the need of Linux system accounting has gone beyond what BSD
>>accounting provides, i think it is a good idea to create a thin layer
>>of common code for various accounting packages, such as BSD accounting,
>>CSA, ELSA, etc. The hook to do_exit() at exit.c was changed to invoke
>>a routine in the common code which would then invoke those accounting
>>packages that register to the acct_common to handle do_exit situation.
>
>
> This all seems to be heading in the wrong direction.  Do we really want to
> have lots of different system accounting packages all hooking into a
> generic we-cant-decide-what-to-do-so-we-added-some-pointless-overhead
> framework?
>
> Can't we get _one_ accounting system in there, get it right, avoid the
> framework?
I think there are two issues about system accounting framework.
Issue: 1) How to define the appropriate unit for accounting ?
Current BSD-accountiong make a collection per process accounting information.
CSA make additionally a collection per process-aggregation accounting.
It is appropriate to make the fork-exit event handling framework for definition
of the process-aggregation, such as PAGG.
This system-accounting per process-aggregation is quite useful,
thought I tried the SGI's implementation named 'job' in past days.
Issue: 2) What items should be collected for accounting information ?
BSD-accounting collects PID/UID/GID, User/Sys/Elapsed-Time, and # of
minor/major page faults. SGI's CSA collects VM/RSS size on exit time,
Integral-VM/RSS, and amount of block-I/O additionally.
I think it's hard to implement the accounting-engine as a kernel loadable
module using any kinds of framework. Because, we must put callback functions
into all around the kernel for this purpose.
Thus, I make a proposion as follows:
We should separate the process-aggregation functionality and collecting
accounting informations.
Something of framework to implement process-aggregation is necessary.
And, making a collection of accounting information should be merged
into BSD-accounting and implemented as a part of monolithic kernel
as Guillaume said.
Thanks.
--
Linux Promotion Center, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-20 Thread Kaigai Kohei
Hello, everyone.
Andrew Morton wrote:
 Jay Lan [EMAIL PROTECTED] wrote:

Since the need of Linux system accounting has gone beyond what BSD
accounting provides, i think it is a good idea to create a thin layer
of common code for various accounting packages, such as BSD accounting,
CSA, ELSA, etc. The hook to do_exit() at exit.c was changed to invoke
a routine in the common code which would then invoke those accounting
packages that register to the acct_common to handle do_exit situation.


 This all seems to be heading in the wrong direction.  Do we really want to
 have lots of different system accounting packages all hooking into a
 generic we-cant-decide-what-to-do-so-we-added-some-pointless-overhead
 framework?

 Can't we get _one_ accounting system in there, get it right, avoid the
 framework?
I think there are two issues about system accounting framework.
Issue: 1) How to define the appropriate unit for accounting ?
Current BSD-accountiong make a collection per process accounting information.
CSA make additionally a collection per process-aggregation accounting.
It is appropriate to make the fork-exit event handling framework for definition
of the process-aggregation, such as PAGG.
This system-accounting per process-aggregation is quite useful,
thought I tried the SGI's implementation named 'job' in past days.
Issue: 2) What items should be collected for accounting information ?
BSD-accounting collects PID/UID/GID, User/Sys/Elapsed-Time, and # of
minor/major page faults. SGI's CSA collects VM/RSS size on exit time,
Integral-VM/RSS, and amount of block-I/O additionally.
I think it's hard to implement the accounting-engine as a kernel loadable
module using any kinds of framework. Because, we must put callback functions
into all around the kernel for this purpose.
Thus, I make a proposion as follows:
We should separate the process-aggregation functionality and collecting
accounting informations.
Something of framework to implement process-aggregation is necessary.
And, making a collection of accounting information should be merged
into BSD-accounting and implemented as a part of monolithic kernel
as Guillaume said.
Thanks.
--
Linux Promotion Center, NEC
KaiGai Kohei [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/