Re: [PATCH] kobject: kobject_uevent() returns manageable value

2006-12-11 Thread Mauricio Lin

Hi Aneesh,

The patch update sounds good.

BR,

Mauricio Lin.

On 12/11/06, Aneesh Kumar K.V <[EMAIL PROTECTED]> wrote:

Since kobject_uevent() function does not return an integer value to
indicate if its operation was completed with success or not, it is
worth changing it in order to report a proper status (success or
error) instead of returning void.

CC: Mauricio Lin <[EMAIL PROTECTED]>
Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 include/linux/kobject.h |8 
 lib/kobject_uevent.c|   44 ++--
 2 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index d1c8d28..fc93c53 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -265,8 +265,8 @@ extern int __must_check subsys_create_file(struct subsystem 
* ,
struct subsys_attribute *);

 #if defined(CONFIG_HOTPLUG)
-void kobject_uevent(struct kobject *kobj, enum kobject_action action);
-void kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
+int kobject_uevent(struct kobject *kobj, enum kobject_action action);
+int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
char *envp[]);

 int add_uevent_var(char **envp, int num_envp, int *cur_index,
@@ -274,8 +274,8 @@ int add_uevent_var(char **envp, int num_envp, int 
*cur_index,
const char *format, ...)
__attribute__((format (printf, 7, 8)));
 #else
-static inline void kobject_uevent(struct kobject *kobj, enum kobject_action 
action) { }
-static inline void kobject_uevent_env(struct kobject *kobj,
+static inline int kobject_uevent(struct kobject *kobj, enum kobject_action 
action) { }
+static inline int kobject_uevent_env(struct kobject *kobj,
  enum kobject_action action,
  char *envp[])
 { }
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index a192276..84272ed 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -63,8 +63,11 @@ static char *action_to_string(enum kobject_action action)
  * @action: action that is happening (usually KOBJ_MOVE)
  * @kobj: struct kobject that the action is happening to
  * @envp_ext: pointer to environmental data
+ *
+ * Returns 0 if kobject_uevent() is completed with success or the
+ * corresponding error when it fails.
  */
-void kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
+int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
char *envp_ext[])
 {
char **envp;
@@ -79,14 +82,16 @@ void kobject_uevent_env(struct kobject *kobj, enum 
kobject_action action,
u64 seq;
char *seq_buff;
int i = 0;
-   int retval;
+   int retval = 0;
int j;

pr_debug("%s\n", __FUNCTION__);

action_string = action_to_string(action);
-   if (!action_string)
-   return;
+   if (!action_string) {
+   pr_debug("kobject attempted to send uevent without 
action_string!\n");
+   return -EINVAL;
+   }

/* search the kset we belong to */
top_kobj = kobj;
@@ -95,31 +100,39 @@ void kobject_uevent_env(struct kobject *kobj, enum 
kobject_action action,
top_kobj = top_kobj->parent;
} while (!top_kobj->kset && top_kobj->parent);
}
-   if (!top_kobj->kset)
-   return;
+   if (!top_kobj->kset) {
+   pr_debug("kobject attempted to send uevent without kset!\n");
+   return -EINVAL;
+   }

kset = top_kobj->kset;
uevent_ops = kset->uevent_ops;

/*  skip the event, if the filter returns zero. */
if (uevent_ops && uevent_ops->filter)
-   if (!uevent_ops->filter(kset, kobj))
-   return;
+   if (!uevent_ops->filter(kset, kobj)) {
+   pr_debug("kobject filter function caused the event to 
drop!\n");
+   return 0;
+   }

/* environment index */
envp = kzalloc(NUM_ENVP * sizeof (char *), GFP_KERNEL);
if (!envp)
-   return;
+   return -ENOMEM;

/* environment values */
buffer = kmalloc(BUFFER_SIZE, GFP_KERNEL);
-   if (!buffer)
+   if (!buffer) {
+   retval = -ENOMEM;
goto exit;
+   }

/* complete object path */
devpath = kobject_get_path(kobj, GFP_KERNEL);
-   if (!devpath)
+   if (!devpath) {
+   retval = -ENOENT;
goto exit;
+   }

/* originating subsystem */
if (uevent_ops && uevent_ops->name)
@@ -204,7 +217,7 @@ exit:
kfree(devpath);
kfree(buffer);
   

Re: [take26-resend1 7/8] kevent: Signal notifications.

2006-12-11 Thread Mauricio Lin

Hi Evgeniy,

I have used kobject_uevent() to notify userspace about some events.
For instance, when memory comsumption reaches a predefined watermark,
a signal is sent to userspace to allow applications to free resources.
But I am not sure if kobject_uevent() is the more appropriate way for
that since if I have many different levels of notifications (using
kobject_uevent()) from kernel space to user space, so how the
application could know or differentiate from which level of kernel
notification the signal was sent from?

The application should perform a specific task according to different
type of received notification. So I do not know if the current kernel
provides something like that. Do you know any current kernel (2.6.19)
implementation for that?

After reading about your Kevent implementation, I guess that your
patches are able to do what I need, right? Will it be included in the
mainline kernel? Do you have examples about how can I use your socket
and/or signal notifications to establish kernel and user space
communication?

BR,

Mauricio Lin.

On 12/11/06, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:


Signal notifications.

This type of notifications allows to deliver signals through kevent queue.
One can find example application signal.c on project homepage.

If KEVENT_SIGNAL_NOMASK bit is set in raw_u64 id then signal will be
delivered only through queue, otherwise both delivery types are used - old
through update of mask of pending signals and through queue.

If signal is delivered only through kevent queue mask of pending signals
is not updated at all, which is equal to putting signal into blocked mask,
but with delivery of that signal through kevent queue.

Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>


diff --git a/include/linux/sched.h b/include/linux/sched.h
index fc4a987..ef38a3c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -80,6 +80,7 @@ struct sched_param {
 #include 
 #include 
 #include 
+#include 

 #include 

@@ -1013,6 +1014,10 @@ struct task_struct {
 #ifdef CONFIG_TASK_DELAY_ACCT
struct task_delay_info *delays;
 #endif
+#ifdef CONFIG_KEVENT_SIGNAL
+   struct kevent_storage st;
+   u32 kevent_signals;
+#endif
 };

 static inline pid_t process_group(struct task_struct *tsk)
diff --git a/kernel/fork.c b/kernel/fork.c
index 1c999f3..e5b5b14 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -115,6 +116,9 @@ void __put_task_struct(struct task_struct *tsk)
WARN_ON(atomic_read(>usage));
WARN_ON(tsk == current);

+#ifdef CONFIG_KEVENT_SIGNAL
+   kevent_storage_fini(>st);
+#endif
security_task_free(tsk);
free_uid(tsk->user);
put_group_info(tsk->group_info);
@@ -1121,6 +1125,10 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
if (retval)
goto bad_fork_cleanup_namespace;

+#ifdef CONFIG_KEVENT_SIGNAL
+   kevent_storage_init(p, >st);
+#endif
+
p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : 
NULL;
/*
 * Clear TID on mm_release()?
diff --git a/kernel/kevent/kevent_signal.c b/kernel/kevent/kevent_signal.c
new file mode 100644
index 000..0edd2e4
--- /dev/null
+++ b/kernel/kevent/kevent_signal.c
@@ -0,0 +1,92 @@
+/*
+ * kevent_signal.c
+ *
+ * 2006 Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int kevent_signal_callback(struct kevent *k)
+{
+   struct task_struct *tsk = k->st->origin;
+   int sig = k->event.id.raw[0];
+   int ret = 0;
+
+   if (sig == tsk->kevent_signals)
+   ret = 1;
+
+   if (ret && (k->event.id.raw_u64 & KEVENT_SIGNAL_NOMASK))
+   tsk->kevent_signals |= 0x8000;
+
+   return ret;
+}
+
+int kevent_signal_enqueue(struct kevent *k)
+{
+   int err;
+
+   err = kevent_storage_enqueue(>st, k);
+   if (err)
+   goto err_out_exit;
+
+   if (k->event.req_flags & KEVENT_REQ_ALWAYS_QUEUE) {
+   keve

Re: kobject_uevent() question

2006-12-11 Thread Mauricio Lin

Hi Aneesh,

I have posted a patch for that as well. You can check it at
http://lkml.org/lkml/2006/11/30/315.

BR,

Mauricio Lin.

On 12/10/06, Aneesh Kumar K.V <[EMAIL PROTECTED]> wrote:

Greg KH wrote:
> On Tue, Nov 28, 2006 at 07:38:01PM +, Mauricio Lin wrote:
>> Hi Greg,
>>
>> It is working now. The failure was in the kobject_uevent() function. As
>> the kset of my kobject was not set properly, the kobject_uevent()
>> function just returned void.
>>
>> I wonder why the kobjec_uevent() does not return an integer to indicate
>> if the operation was completed with success or not.
>
> Feel free to send patches fixing this issue :)
>
> thanks,
>
>

Something like this ?

-aneesh




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kobject_uevent() question

2006-12-11 Thread Mauricio Lin

Hi Aneesh,

I have posted a patch for that as well. You can check it at
http://lkml.org/lkml/2006/11/30/315.

BR,

Mauricio Lin.

On 12/10/06, Aneesh Kumar K.V [EMAIL PROTECTED] wrote:

Greg KH wrote:
 On Tue, Nov 28, 2006 at 07:38:01PM +, Mauricio Lin wrote:
 Hi Greg,

 It is working now. The failure was in the kobject_uevent() function. As
 the kset of my kobject was not set properly, the kobject_uevent()
 function just returned void.

 I wonder why the kobjec_uevent() does not return an integer to indicate
 if the operation was completed with success or not.

 Feel free to send patches fixing this issue :)

 thanks,



Something like this ?

-aneesh




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kobject: kobject_uevent() returns manageable value

2006-12-11 Thread Mauricio Lin

Hi Aneesh,

The patch update sounds good.

BR,

Mauricio Lin.

On 12/11/06, Aneesh Kumar K.V [EMAIL PROTECTED] wrote:

Since kobject_uevent() function does not return an integer value to
indicate if its operation was completed with success or not, it is
worth changing it in order to report a proper status (success or
error) instead of returning void.

CC: Mauricio Lin [EMAIL PROTECTED]
Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
---
 include/linux/kobject.h |8 
 lib/kobject_uevent.c|   44 ++--
 2 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index d1c8d28..fc93c53 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -265,8 +265,8 @@ extern int __must_check subsys_create_file(struct subsystem 
* ,
struct subsys_attribute *);

 #if defined(CONFIG_HOTPLUG)
-void kobject_uevent(struct kobject *kobj, enum kobject_action action);
-void kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
+int kobject_uevent(struct kobject *kobj, enum kobject_action action);
+int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
char *envp[]);

 int add_uevent_var(char **envp, int num_envp, int *cur_index,
@@ -274,8 +274,8 @@ int add_uevent_var(char **envp, int num_envp, int 
*cur_index,
const char *format, ...)
__attribute__((format (printf, 7, 8)));
 #else
-static inline void kobject_uevent(struct kobject *kobj, enum kobject_action 
action) { }
-static inline void kobject_uevent_env(struct kobject *kobj,
+static inline int kobject_uevent(struct kobject *kobj, enum kobject_action 
action) { }
+static inline int kobject_uevent_env(struct kobject *kobj,
  enum kobject_action action,
  char *envp[])
 { }
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index a192276..84272ed 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -63,8 +63,11 @@ static char *action_to_string(enum kobject_action action)
  * @action: action that is happening (usually KOBJ_MOVE)
  * @kobj: struct kobject that the action is happening to
  * @envp_ext: pointer to environmental data
+ *
+ * Returns 0 if kobject_uevent() is completed with success or the
+ * corresponding error when it fails.
  */
-void kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
+int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
char *envp_ext[])
 {
char **envp;
@@ -79,14 +82,16 @@ void kobject_uevent_env(struct kobject *kobj, enum 
kobject_action action,
u64 seq;
char *seq_buff;
int i = 0;
-   int retval;
+   int retval = 0;
int j;

pr_debug(%s\n, __FUNCTION__);

action_string = action_to_string(action);
-   if (!action_string)
-   return;
+   if (!action_string) {
+   pr_debug(kobject attempted to send uevent without 
action_string!\n);
+   return -EINVAL;
+   }

/* search the kset we belong to */
top_kobj = kobj;
@@ -95,31 +100,39 @@ void kobject_uevent_env(struct kobject *kobj, enum 
kobject_action action,
top_kobj = top_kobj-parent;
} while (!top_kobj-kset  top_kobj-parent);
}
-   if (!top_kobj-kset)
-   return;
+   if (!top_kobj-kset) {
+   pr_debug(kobject attempted to send uevent without kset!\n);
+   return -EINVAL;
+   }

kset = top_kobj-kset;
uevent_ops = kset-uevent_ops;

/*  skip the event, if the filter returns zero. */
if (uevent_ops  uevent_ops-filter)
-   if (!uevent_ops-filter(kset, kobj))
-   return;
+   if (!uevent_ops-filter(kset, kobj)) {
+   pr_debug(kobject filter function caused the event to 
drop!\n);
+   return 0;
+   }

/* environment index */
envp = kzalloc(NUM_ENVP * sizeof (char *), GFP_KERNEL);
if (!envp)
-   return;
+   return -ENOMEM;

/* environment values */
buffer = kmalloc(BUFFER_SIZE, GFP_KERNEL);
-   if (!buffer)
+   if (!buffer) {
+   retval = -ENOMEM;
goto exit;
+   }

/* complete object path */
devpath = kobject_get_path(kobj, GFP_KERNEL);
-   if (!devpath)
+   if (!devpath) {
+   retval = -ENOENT;
goto exit;
+   }

/* originating subsystem */
if (uevent_ops  uevent_ops-name)
@@ -204,7 +217,7 @@ exit:
kfree(devpath);
kfree(buffer);
kfree(envp);
-   return;
+   return retval;
 }

 EXPORT_SYMBOL_GPL(kobject_uevent_env);
@@ -214,10 +227,13 @@ EXPORT_SYMBOL_GPL

Re: [take26-resend1 7/8] kevent: Signal notifications.

2006-12-11 Thread Mauricio Lin

Hi Evgeniy,

I have used kobject_uevent() to notify userspace about some events.
For instance, when memory comsumption reaches a predefined watermark,
a signal is sent to userspace to allow applications to free resources.
But I am not sure if kobject_uevent() is the more appropriate way for
that since if I have many different levels of notifications (using
kobject_uevent()) from kernel space to user space, so how the
application could know or differentiate from which level of kernel
notification the signal was sent from?

The application should perform a specific task according to different
type of received notification. So I do not know if the current kernel
provides something like that. Do you know any current kernel (2.6.19)
implementation for that?

After reading about your Kevent implementation, I guess that your
patches are able to do what I need, right? Will it be included in the
mainline kernel? Do you have examples about how can I use your socket
and/or signal notifications to establish kernel and user space
communication?

BR,

Mauricio Lin.

On 12/11/06, Evgeniy Polyakov [EMAIL PROTECTED] wrote:


Signal notifications.

This type of notifications allows to deliver signals through kevent queue.
One can find example application signal.c on project homepage.

If KEVENT_SIGNAL_NOMASK bit is set in raw_u64 id then signal will be
delivered only through queue, otherwise both delivery types are used - old
through update of mask of pending signals and through queue.

If signal is delivered only through kevent queue mask of pending signals
is not updated at all, which is equal to putting signal into blocked mask,
but with delivery of that signal through kevent queue.

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]


diff --git a/include/linux/sched.h b/include/linux/sched.h
index fc4a987..ef38a3c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -80,6 +80,7 @@ struct sched_param {
 #include linux/resource.h
 #include linux/timer.h
 #include linux/hrtimer.h
+#include linux/kevent_storage.h

 #include asm/processor.h

@@ -1013,6 +1014,10 @@ struct task_struct {
 #ifdef CONFIG_TASK_DELAY_ACCT
struct task_delay_info *delays;
 #endif
+#ifdef CONFIG_KEVENT_SIGNAL
+   struct kevent_storage st;
+   u32 kevent_signals;
+#endif
 };

 static inline pid_t process_group(struct task_struct *tsk)
diff --git a/kernel/fork.c b/kernel/fork.c
index 1c999f3..e5b5b14 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -46,6 +46,7 @@
 #include linux/delayacct.h
 #include linux/taskstats_kern.h
 #include linux/random.h
+#include linux/kevent.h

 #include asm/pgtable.h
 #include asm/pgalloc.h
@@ -115,6 +116,9 @@ void __put_task_struct(struct task_struct *tsk)
WARN_ON(atomic_read(tsk-usage));
WARN_ON(tsk == current);

+#ifdef CONFIG_KEVENT_SIGNAL
+   kevent_storage_fini(tsk-st);
+#endif
security_task_free(tsk);
free_uid(tsk-user);
put_group_info(tsk-group_info);
@@ -1121,6 +1125,10 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
if (retval)
goto bad_fork_cleanup_namespace;

+#ifdef CONFIG_KEVENT_SIGNAL
+   kevent_storage_init(p, p-st);
+#endif
+
p-set_child_tid = (clone_flags  CLONE_CHILD_SETTID) ? child_tidptr : 
NULL;
/*
 * Clear TID on mm_release()?
diff --git a/kernel/kevent/kevent_signal.c b/kernel/kevent/kevent_signal.c
new file mode 100644
index 000..0edd2e4
--- /dev/null
+++ b/kernel/kevent/kevent_signal.c
@@ -0,0 +1,92 @@
+/*
+ * kevent_signal.c
+ *
+ * 2006 Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED]
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include linux/kernel.h
+#include linux/types.h
+#include linux/slab.h
+#include linux/spinlock.h
+#include linux/file.h
+#include linux/fs.h
+#include linux/kevent.h
+
+static int kevent_signal_callback(struct kevent *k)
+{
+   struct task_struct *tsk = k-st-origin;
+   int sig = k-event.id.raw[0];
+   int ret = 0;
+
+   if (sig == tsk-kevent_signals)
+   ret = 1;
+
+   if (ret  (k-event.id.raw_u64  KEVENT_SIGNAL_NOMASK))
+   tsk-kevent_signals |= 0x8000;
+
+   return ret;
+}
+
+int kevent_signal_enqueue(struct kevent *k)
+{
+   int err;
+
+   err

Re: [PATCH 2.6.19] kobject: kobject_uevent() returns manageable value

2006-12-01 Thread Mauricio Lin

Hi Andrew,

On 12/1/06, Andrew Morton <[EMAIL PROTECTED]> wrote:

On Thu, 30 Nov 2006 16:58:47 -0400
Mauricio Lin <[EMAIL PROTECTED]> wrote:

> Since kobject_uevent() function does not return an integer value to
> indicate if its operation was completed with success or not, it is
> worth changing it in order to report a proper status (success or
> error) instead of returning void.
>
> Keep kobject_uevent() returning the status as integer provide a easier
> way for detecting possible failure in the function. Using void
> returning style may take people to waste more time to figure out if
> the "send to" or "receive from" an event is a bug in the kernel or
> user space. Furthermore, the current way to detect where the error is
> taking place in the kobject_uevent() requires additional inclusion of
> printk() in each "if" condition that can lead to failure.

Admirable idea, but we have large changes pending against that code
and none of this patch applies.


I have used the kobject_uevent() to send event to user space, but when
the event does not happen I have to figure out if this problem is
related to kernel or user space code. After mentioning this issue with
Greg KH, he recommended to provide a patch to fix it.



A patch against Greg's driver tree, or against latest -mm would suit, thanks.


OK. That's good. Thanks.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.19] kobject: kobject_uevent() returns manageable value

2006-12-01 Thread Mauricio Lin

Hi Andrew,

On 12/1/06, Andrew Morton [EMAIL PROTECTED] wrote:

On Thu, 30 Nov 2006 16:58:47 -0400
Mauricio Lin [EMAIL PROTECTED] wrote:

 Since kobject_uevent() function does not return an integer value to
 indicate if its operation was completed with success or not, it is
 worth changing it in order to report a proper status (success or
 error) instead of returning void.

 Keep kobject_uevent() returning the status as integer provide a easier
 way for detecting possible failure in the function. Using void
 returning style may take people to waste more time to figure out if
 the send to or receive from an event is a bug in the kernel or
 user space. Furthermore, the current way to detect where the error is
 taking place in the kobject_uevent() requires additional inclusion of
 printk() in each if condition that can lead to failure.

Admirable idea, but we have large changes pending against that code
and none of this patch applies.


I have used the kobject_uevent() to send event to user space, but when
the event does not happen I have to figure out if this problem is
related to kernel or user space code. After mentioning this issue with
Greg KH, he recommended to provide a patch to fix it.



A patch against Greg's driver tree, or against latest -mm would suit, thanks.


OK. That's good. Thanks.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.19] kobject: kobject_uevent() returns manageable value

2006-11-30 Thread Mauricio Lin
[PATCH 2.6.19] kobject: kobject_uevent() returns manageable value

Since kobject_uevent() function does not return an integer value to
indicate if its operation was completed with success or not, it is
worth changing it in order to report a proper status (success or
error) instead of returning void.

Keep kobject_uevent() returning the status as integer provide a easier
way for detecting possible failure in the function. Using void
returning style may take people to waste more time to figure out if
the "send to" or "receive from" an event is a bug in the kernel or
user space. Furthermore, the current way to detect where the error is
taking place in the kobject_uevent() requires additional inclusion of
printk() in each "if" condition that can lead to failure.

Signed-off-by: Mauricio Lin <[EMAIL PROTECTED]>

Index: kernel/linux-2.6.19-rc6/include/linux/kobject.h
===
--- linux-2.6.19-rc6.orig/include/linux/kobject.h   2006-11-29 
16:15:19.0 -0400
+++ linux-2.6.19-rc6/include/linux/kobject.h2006-11-29 16:22:40.0 
-0400
@@ -263,14 +263,17 @@
struct subsys_attribute *);
 
 #if defined(CONFIG_HOTPLUG)
-void kobject_uevent(struct kobject *kobj, enum kobject_action action);
+int kobject_uevent(struct kobject *kobj, enum kobject_action action);
 
 int add_uevent_var(char **envp, int num_envp, int *cur_index,
char *buffer, int buffer_size, int *cur_len,
const char *format, ...)
__attribute__((format (printf, 7, 8)));
 #else
-static inline void kobject_uevent(struct kobject *kobj, enum kobject_action 
action) { }
+static inline int kobject_uevent(struct kobject *kobj, enum kobject_action 
action)
+{
+   return 0;
+}
 
 static inline int add_uevent_var(char **envp, int num_envp, int *cur_index,
  char *buffer, int buffer_size, int 
*cur_len, 
Index: kernel/linux-2.6.19-rc6/lib/kobject_uevent.c
===
--- linux-2.6.19-rc6.orig/lib/kobject_uevent.c  2006-11-29 16:15:12.0 
-0400
+++ linux-2.6.19-rc6/lib/kobject_uevent.c   2006-11-29 16:31:16.0 
-0400
@@ -60,8 +60,11 @@
  *
  * @action: action that is happening (usually KOBJ_ADD and KOBJ_REMOVE)
  * @kobj: struct kobject that the action is happening to
+ *
+ * Returns 0 if kobject_uevent() is completed with success or the
+ * corresponding error when it fails.
  */
-void kobject_uevent(struct kobject *kobj, enum kobject_action action)
+int kobject_uevent(struct kobject *kobj, enum kobject_action action)
 {
char **envp;
char *buffer;
@@ -81,7 +84,7 @@
 
action_string = action_to_string(action);
if (!action_string)
-   return;
+   return -EINVAL;
 
/* search the kset we belong to */
top_kobj = kobj;
@@ -91,7 +94,7 @@
} while (!top_kobj->kset && top_kobj->parent);
}
if (!top_kobj->kset)
-   return;
+   return -EINVAL;
 
kset = top_kobj->kset;
uevent_ops = kset->uevent_ops;
@@ -99,22 +102,27 @@
/*  skip the event, if the filter returns zero. */
if (uevent_ops && uevent_ops->filter)
if (!uevent_ops->filter(kset, kobj))
-   return;
+   return -EINVAL;
 
/* environment index */
envp = kzalloc(NUM_ENVP * sizeof (char *), GFP_KERNEL);
if (!envp)
-   return;
+   return -ENOMEM;
 
/* environment values */
buffer = kmalloc(BUFFER_SIZE, GFP_KERNEL);
-   if (!buffer)
-   goto exit;
+   if (!buffer) {
+   kfree(envp);
+   return -ENOMEM;
+   }
 
/* complete object path */
devpath = kobject_get_path(kobj, GFP_KERNEL);
-   if (!devpath)
-   goto exit;
+   if (!devpath) {
+   kfree(envp);
+   kfree(buffer);
+   return -ENOMEM;
+   }
 
/* originating subsystem */
if (uevent_ops && uevent_ops->name)
@@ -179,7 +187,11 @@
}
 
NETLINK_CB(skb).dst_group = 1;
-   netlink_broadcast(uevent_sock, skb, 0, 1, GFP_KERNEL);
+   retval = netlink_broadcast(uevent_sock, skb, 0, 1,
+  GFP_KERNEL);
+   if (retval)
+   pr_debug ("%s - netlink_broadcast() returned "
+ "%d\n", __FUNCTION__, retval);
}
}
 #endif
@@ -198,7 +210,7 @@
kfree(devpath);
kfree(buffer);
kfree(envp);
-   return;
+   return 0;
 }
 EXPORT_SYMBOL_GPL(

[PATCH 2.6.19] kobject: kobject_uevent() returns manageable value

2006-11-30 Thread Mauricio Lin
[PATCH 2.6.19] kobject: kobject_uevent() returns manageable value

Since kobject_uevent() function does not return an integer value to
indicate if its operation was completed with success or not, it is
worth changing it in order to report a proper status (success or
error) instead of returning void.

Keep kobject_uevent() returning the status as integer provide a easier
way for detecting possible failure in the function. Using void
returning style may take people to waste more time to figure out if
the send to or receive from an event is a bug in the kernel or
user space. Furthermore, the current way to detect where the error is
taking place in the kobject_uevent() requires additional inclusion of
printk() in each if condition that can lead to failure.

Signed-off-by: Mauricio Lin [EMAIL PROTECTED]

Index: kernel/linux-2.6.19-rc6/include/linux/kobject.h
===
--- linux-2.6.19-rc6.orig/include/linux/kobject.h   2006-11-29 
16:15:19.0 -0400
+++ linux-2.6.19-rc6/include/linux/kobject.h2006-11-29 16:22:40.0 
-0400
@@ -263,14 +263,17 @@
struct subsys_attribute *);
 
 #if defined(CONFIG_HOTPLUG)
-void kobject_uevent(struct kobject *kobj, enum kobject_action action);
+int kobject_uevent(struct kobject *kobj, enum kobject_action action);
 
 int add_uevent_var(char **envp, int num_envp, int *cur_index,
char *buffer, int buffer_size, int *cur_len,
const char *format, ...)
__attribute__((format (printf, 7, 8)));
 #else
-static inline void kobject_uevent(struct kobject *kobj, enum kobject_action 
action) { }
+static inline int kobject_uevent(struct kobject *kobj, enum kobject_action 
action)
+{
+   return 0;
+}
 
 static inline int add_uevent_var(char **envp, int num_envp, int *cur_index,
  char *buffer, int buffer_size, int 
*cur_len, 
Index: kernel/linux-2.6.19-rc6/lib/kobject_uevent.c
===
--- linux-2.6.19-rc6.orig/lib/kobject_uevent.c  2006-11-29 16:15:12.0 
-0400
+++ linux-2.6.19-rc6/lib/kobject_uevent.c   2006-11-29 16:31:16.0 
-0400
@@ -60,8 +60,11 @@
  *
  * @action: action that is happening (usually KOBJ_ADD and KOBJ_REMOVE)
  * @kobj: struct kobject that the action is happening to
+ *
+ * Returns 0 if kobject_uevent() is completed with success or the
+ * corresponding error when it fails.
  */
-void kobject_uevent(struct kobject *kobj, enum kobject_action action)
+int kobject_uevent(struct kobject *kobj, enum kobject_action action)
 {
char **envp;
char *buffer;
@@ -81,7 +84,7 @@
 
action_string = action_to_string(action);
if (!action_string)
-   return;
+   return -EINVAL;
 
/* search the kset we belong to */
top_kobj = kobj;
@@ -91,7 +94,7 @@
} while (!top_kobj-kset  top_kobj-parent);
}
if (!top_kobj-kset)
-   return;
+   return -EINVAL;
 
kset = top_kobj-kset;
uevent_ops = kset-uevent_ops;
@@ -99,22 +102,27 @@
/*  skip the event, if the filter returns zero. */
if (uevent_ops  uevent_ops-filter)
if (!uevent_ops-filter(kset, kobj))
-   return;
+   return -EINVAL;
 
/* environment index */
envp = kzalloc(NUM_ENVP * sizeof (char *), GFP_KERNEL);
if (!envp)
-   return;
+   return -ENOMEM;
 
/* environment values */
buffer = kmalloc(BUFFER_SIZE, GFP_KERNEL);
-   if (!buffer)
-   goto exit;
+   if (!buffer) {
+   kfree(envp);
+   return -ENOMEM;
+   }
 
/* complete object path */
devpath = kobject_get_path(kobj, GFP_KERNEL);
-   if (!devpath)
-   goto exit;
+   if (!devpath) {
+   kfree(envp);
+   kfree(buffer);
+   return -ENOMEM;
+   }
 
/* originating subsystem */
if (uevent_ops  uevent_ops-name)
@@ -179,7 +187,11 @@
}
 
NETLINK_CB(skb).dst_group = 1;
-   netlink_broadcast(uevent_sock, skb, 0, 1, GFP_KERNEL);
+   retval = netlink_broadcast(uevent_sock, skb, 0, 1,
+  GFP_KERNEL);
+   if (retval)
+   pr_debug (%s - netlink_broadcast() returned 
+ %d\n, __FUNCTION__, retval);
}
}
 #endif
@@ -198,7 +210,7 @@
kfree(devpath);
kfree(buffer);
kfree(envp);
-   return;
+   return 0;
 }
 EXPORT_SYMBOL_GPL(kobject_uevent);
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL

Re: How do you accurately determine a process' RAM usage?

2005-07-20 Thread Mauricio Lin
Hi Brady,

On 7/20/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Mauricio Lin wrote:
> > Hi,
> >
> > On 7/12/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> >
> >>Andrew Morton wrote:
> >>
> >>>OK, please let us know how it goes.
> >>
> >>It went very well. I could find no problems at all.
> >>I've updated my script to use the new method, so please merge smaps :)
> >>http://www.pixelbeat.org/scripts/ps_mem.py
> >>
> >>Usually the shared mem reported by /proc/$$/statm
> >>is the same as summing all the shared values in in /proc/$$/smaps
> >>but there can be large discrepancies.
> >
> >
> > Have you checked how the statm shared is calculated? I guess it does
> > something like:
> > shared = mm->rss - mm->anon_rss
> 
> yes
> 
> > But in smaps output you can have anonymous area like:
> >
> > b6e0e000-b6e13000 rw-p
> > Size:20 KB
> > Rss:  4 KB
> > Shared_Clean: 0 KB
> > Shared_Dirty: 4 KB
> > Private_Clean:0 KB
> > Private_Dirty:0 KB
> >
> > Look that it presents 4 KB of shared value in area considered anonymous.
> >
> > ANDREW: anon_rss is the rss for anonymous area, right?
> 
> I see your point and I'm not sure.
> The following shell gets the shared values for the
> first httpd process:
> 
> FIRST_HTTPD=`ps -C httpd -o pid= | head -1 | tr -d ' '`
> HTTPD_STATM_SHARED=$(expr 4 '*' `cut -f3 -d' ' /proc/$FIRST_HTTPD/statm`)
> HTTPD_SMAPS_SHARED=$(grep Shared /proc/$FIRST_HTTPD/smaps | tr -s ' '
> | cut -f2 -d' ' | ( tr '\n' +; echo 0 ) | bc)
> 
> 
> This shows that "smaps" reports 3060 KB more shared mem than "statm".
> However adding up all the anon sections in smaps only gives 2456 KB?

You are adding up all Shared_Clean and Shared_Dirty as Shared, right?

> 
> When doing this I also noticed that there are duplicate
> entries in smaps. Any ideas why?

Each pair of address per line indicates the start and end address of a
memory area (VMA) such as:

b7f7d000-b7f7e000 

This means that an specific memory area start on virtual address 
b7f7d000 and end on b7f7e000 .

An mapped file like /lib/ld-2.3.3.so is organized in different memory
areas. The memory area can be a text section, data section or bss. So
it is normal you find the same filename mapped in more than one memory
area.

You can find more information about VMA on Linux Kernel Development
book (chapter 14) written by Robert Love.

For instance:

> grep -F - /proc/$FIRST_HTTPD/smaps | sort | uniq -d -c
> 
>2 b7f7d000-b7f7e000 r-xp  03:05 246646
> /usr/lib/httpd/modules/mod_auth_anon.so
This is a text section.

>2 b7f7e000-b7f7f000 rwxp  03:05 246646
> /usr/lib/httpd/modules/mod_auth_anon.so
This should be a data section.

IMHO, bss section corresponds to the anonymous area where the mapping
is not backed by a file.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How do you accurately determine a process' RAM usage?

2005-07-20 Thread Mauricio Lin
Hi Brady,

On 7/20/05, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Mauricio Lin wrote:
  Hi,
 
  On 7/12/05, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 
 Andrew Morton wrote:
 
 OK, please let us know how it goes.
 
 It went very well. I could find no problems at all.
 I've updated my script to use the new method, so please merge smaps :)
 http://www.pixelbeat.org/scripts/ps_mem.py
 
 Usually the shared mem reported by /proc/$$/statm
 is the same as summing all the shared values in in /proc/$$/smaps
 but there can be large discrepancies.
 
 
  Have you checked how the statm shared is calculated? I guess it does
  something like:
  shared = mm-rss - mm-anon_rss
 
 yes
 
  But in smaps output you can have anonymous area like:
 
  b6e0e000-b6e13000 rw-p
  Size:20 KB
  Rss:  4 KB
  Shared_Clean: 0 KB
  Shared_Dirty: 4 KB
  Private_Clean:0 KB
  Private_Dirty:0 KB
 
  Look that it presents 4 KB of shared value in area considered anonymous.
 
  ANDREW: anon_rss is the rss for anonymous area, right?
 
 I see your point and I'm not sure.
 The following shell gets the shared values for the
 first httpd process:
 
 FIRST_HTTPD=`ps -C httpd -o pid= | head -1 | tr -d ' '`
 HTTPD_STATM_SHARED=$(expr 4 '*' `cut -f3 -d' ' /proc/$FIRST_HTTPD/statm`)
 HTTPD_SMAPS_SHARED=$(grep Shared /proc/$FIRST_HTTPD/smaps | tr -s ' '
 | cut -f2 -d' ' | ( tr '\n' +; echo 0 ) | bc)
 
 
 This shows that smaps reports 3060 KB more shared mem than statm.
 However adding up all the anon sections in smaps only gives 2456 KB?

You are adding up all Shared_Clean and Shared_Dirty as Shared, right?

 
 When doing this I also noticed that there are duplicate
 entries in smaps. Any ideas why?

Each pair of address per line indicates the start and end address of a
memory area (VMA) such as:

b7f7d000-b7f7e000 

This means that an specific memory area start on virtual address 
b7f7d000 and end on b7f7e000 .

An mapped file like /lib/ld-2.3.3.so is organized in different memory
areas. The memory area can be a text section, data section or bss. So
it is normal you find the same filename mapped in more than one memory
area.

You can find more information about VMA on Linux Kernel Development
book (chapter 14) written by Robert Love.

For instance:

 grep -F - /proc/$FIRST_HTTPD/smaps | sort | uniq -d -c
 
2 b7f7d000-b7f7e000 r-xp  03:05 246646
 /usr/lib/httpd/modules/mod_auth_anon.so
This is a text section.

2 b7f7e000-b7f7f000 rwxp  03:05 246646
 /usr/lib/httpd/modules/mod_auth_anon.so
This should be a data section.

IMHO, bss section corresponds to the anonymous area where the mapping
is not backed by a file.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How do you accurately determine a process' RAM usage?

2005-07-19 Thread Mauricio Lin
Hi,

On 7/12/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Andrew Morton wrote:
> > OK, please let us know how it goes.
> 
> It went very well. I could find no problems at all.
> I've updated my script to use the new method, so please merge smaps :)
> http://www.pixelbeat.org/scripts/ps_mem.py
> 
> Usually the shared mem reported by /proc/$$/statm
> is the same as summing all the shared values in in /proc/$$/smaps
> but there can be large discrepancies.

Have you checked how the statm shared is calculated? I guess it does
something like:
shared = mm->rss - mm->anon_rss

But in smaps output you can have anonymous area like:

b6e0e000-b6e13000 rw-p
Size:20 KB
Rss:  4 KB
Shared_Clean: 0 KB
Shared_Dirty: 4 KB
Private_Clean:0 KB
Private_Dirty:0 KB

Look that it presents 4 KB of shared value in area considered anonymous.

ANDREW: anon_rss is the rss for anonymous area, right?

> In the real world you can see this with a newly started apache.
> On my system statm reported that apache was using 35MB,
> whereas smaps reported the correct amount of 11MB.

How dou you know that 11MB is the correct shared value  and the 35MB
is the wrong value?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How do you accurately determine a process' RAM usage?

2005-07-19 Thread Mauricio Lin
Hi,

On 7/12/05, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Andrew Morton wrote:
  OK, please let us know how it goes.
 
 It went very well. I could find no problems at all.
 I've updated my script to use the new method, so please merge smaps :)
 http://www.pixelbeat.org/scripts/ps_mem.py
 
 Usually the shared mem reported by /proc/$$/statm
 is the same as summing all the shared values in in /proc/$$/smaps
 but there can be large discrepancies.

Have you checked how the statm shared is calculated? I guess it does
something like:
shared = mm-rss - mm-anon_rss

But in smaps output you can have anonymous area like:

b6e0e000-b6e13000 rw-p
Size:20 KB
Rss:  4 KB
Shared_Clean: 0 KB
Shared_Dirty: 4 KB
Private_Clean:0 KB
Private_Dirty:0 KB

Look that it presents 4 KB of shared value in area considered anonymous.

ANDREW: anon_rss is the rss for anonymous area, right?

 In the real world you can see this with a newly started apache.
 On my system statm reported that apache was using 35MB,
 whereas smaps reported the correct amount of 11MB.

How dou you know that 11MB is the correct shared value  and the 35MB
is the wrong value?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-15 Thread Mauricio Lin
Hi Christian,

On Fri, 11 Mar 2005 16:09:24 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote:
> Mauricio Lin wrote:
> > Hi Christian,
> >
> > I would like to know what are the kernel versions this problem happened.
> >
> > Did this problem start from 2.6.11-rc2-bk10?
> 
> i noticed it first at 2.6.11, then again with 2.6.11-rc5-bk2. suspecting
> pppd to be the culprit to chew up all RAM after being terminated by my ISP
> once a day - i just have to wait (must be around 2a.m.).

Have you tried with 2.6.10 in order to check this problem?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-15 Thread Mauricio Lin
Hi Christian,

On Fri, 11 Mar 2005 16:09:24 +0100, Christian Kujau [EMAIL PROTECTED] wrote:
 Mauricio Lin wrote:
  Hi Christian,
 
  I would like to know what are the kernel versions this problem happened.
 
  Did this problem start from 2.6.11-rc2-bk10?
 
 i noticed it first at 2.6.11, then again with 2.6.11-rc5-bk2. suspecting
 pppd to be the culprit to chew up all RAM after being terminated by my ISP
 once a day - i just have to wait (must be around 2a.m.).

Have you tried with 2.6.10 in order to check this problem?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-11 Thread Mauricio Lin
Hi Christian,

I would like to know what are the kernel versions this problem happened.

Did this problem start from 2.6.11-rc2-bk10?

BR,

Mauricio Lin.

On Thu, 10 Mar 2005 16:12:27 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote:
> ok,
> 
> as "promised", it the OOM happened again with the same plain 2.6.11,
> details here.
> 
> http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt
> 
> the following is a quite long, but please read on
> (if anyone is reading at all :))
> 
> this time it happened at 08:01, and i could image some heavy cron jobs
> were going on. but as i said: "it did not happen before". there are also
> output of SYSRQ-T/M/P. i did SYSRQ-E to recover the machine, but then
> decided to reboot back to 2.6.11-rc5-bk2.
> 
> i had a look at the changelogs too and noticed that ChangeLog-2.6.11
> contains 7 occurrences of "OOM" in the patch desctiption:
> 
> [PATCH] mm: overcommit updates, 2005-01-03
> [PATCH] vmscan: count writeback pages in nr_scanned, 2005-01-08
> [PATCH] possible rq starvation on oom, 2005-01-13
> [PATCH] mm: adjust dirty threshold for lowmem-only mappings, 2005-01-25
> [PATCH] mm: oom-killer tunable, 2005-02-02
> [PATCH] mm: fix several oom killer bugs, 2005-02-02
> [PATCH] Fix oops in alloc_zeroed_user_highpage() when [...],2005-02-09
> 
> release dates:
> 2.6.11-rc5-bk1  26-Feb-2005
> 2.6.11-rc5-bk2  27-Feb-2005  <
> 2.6.11-rc5-bk3  28-Feb-2005
> 2.6.11-rc5-bk4  01-Mar-2005
> 2.6.11  02-Mar-2005
> 
> so i really don't see any patches that *could* have something to do with
> the issue here.
> 
> now comes the weird part:
> 
> i was going to compile 2.6.11-rc5-bk4, to sort out the "bad" kernel.
> compiling went fine. ok, finished some email, ok, suddenly my swap was
> used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2!
> 
> to summarize it:
> i've run 2.6.11-rc2-bk10 during whole february, then switched to
> 2.6.11-rc5-bk2 on 28.02.2005, then to 2.6.11 on 05.03.2005 - and only
> noticed with 2.6.11 first, now with 2.6.11-rc5-bk2 too.
> 
> there is an interesting part in the logfiles:
> 
> http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
> http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt
> http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2.txt
> 
> every last message before the "OOM" messages is something with pppd:
> 
> Mar 10 13:45:55 sheep pppd[1567]: Starting link
> Mar 10 14:12:29 sheep kernel: oom-killer: gfp_mask=0x1d2
> 
> Mar  8 00:59:58 sheep pppd[418]: Starting link
> Mar  8 01:27:33 sheep kernel: oom-killer: gfp_mask=0xd0
> 
> Mar  9 07:33:49 sheep pppd[30937]: Starting link
> Mar  9 08:01:35 sheep kernel: oom-killer: gfp_mask=0x1d2
> 
> and 30min later OOM kicks in. normally, pppd (pppoe) gives messages like this:
> 
> Mar 10 14:23:38 sheep pppd[26365]: Starting link
> Mar 10 14:23:38 sheep pppd[26365]: Serial connection established.
> Mar 10 14:23:38 sheep pppd[26365]: Connect: ppp0 <--> /dev/pts/0
> Mar 10 14:23:38 sheep pppoe[26383]: PADS: Service-Name: ''
> Mar 10 14:23:38 sheep pppoe[26383]: PPP session is 6804
> Mar 10 14:23:39 sheep pppd[26365]: CHAP authentication succeeded
> Mar 10 14:23:40 sheep pppd[26365]: Local IP address changed to
> [...]
> 
> is this strange? or not?
> 
> i hope someone has a hint for me, because "going back to the stable
> kernel" would mean "being bound to 2.6.11-rc2-bk10" :(
> 
> thank you for any hints,
> Christian.
> 
> PS: Steven, i've cc'ed you because you have trouble with new 2.6.11
> kernels and pppd too. maybe unrelated, maybe not.
> --
> BOFH excuse #185:
> 
> system consumed all the paper for paging
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-11 Thread Mauricio Lin
Hi Christian,

I would like to know what are the kernel versions this problem happened.

Did this problem start from 2.6.11-rc2-bk10?

BR,

Mauricio Lin.

On Thu, 10 Mar 2005 16:12:27 +0100, Christian Kujau [EMAIL PROTECTED] wrote:
 ok,
 
 as promised, it the OOM happened again with the same plain 2.6.11,
 details here.
 
 http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt
 
 the following is a quite long, but please read on
 (if anyone is reading at all :))
 
 this time it happened at 08:01, and i could image some heavy cron jobs
 were going on. but as i said: it did not happen before. there are also
 output of SYSRQ-T/M/P. i did SYSRQ-E to recover the machine, but then
 decided to reboot back to 2.6.11-rc5-bk2.
 
 i had a look at the changelogs too and noticed that ChangeLog-2.6.11
 contains 7 occurrences of OOM in the patch desctiption:
 
 [PATCH] mm: overcommit updates, 2005-01-03
 [PATCH] vmscan: count writeback pages in nr_scanned, 2005-01-08
 [PATCH] possible rq starvation on oom, 2005-01-13
 [PATCH] mm: adjust dirty threshold for lowmem-only mappings, 2005-01-25
 [PATCH] mm: oom-killer tunable, 2005-02-02
 [PATCH] mm: fix several oom killer bugs, 2005-02-02
 [PATCH] Fix oops in alloc_zeroed_user_highpage() when [...],2005-02-09
 
 release dates:
 2.6.11-rc5-bk1  26-Feb-2005
 2.6.11-rc5-bk2  27-Feb-2005  
 2.6.11-rc5-bk3  28-Feb-2005
 2.6.11-rc5-bk4  01-Mar-2005
 2.6.11  02-Mar-2005
 
 so i really don't see any patches that *could* have something to do with
 the issue here.
 
 now comes the weird part:
 
 i was going to compile 2.6.11-rc5-bk4, to sort out the bad kernel.
 compiling went fine. ok, finished some email, ok, suddenly my swap was
 used up again, and no memory left - uh oh! OOM again, with 2.6.11-rc5-bk2!
 
 to summarize it:
 i've run 2.6.11-rc2-bk10 during whole february, then switched to
 2.6.11-rc5-bk2 on 28.02.2005, then to 2.6.11 on 05.03.2005 - and only
 noticed with 2.6.11 first, now with 2.6.11-rc5-bk2 too.
 
 there is an interesting part in the logfiles:
 
 http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
 http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11_2.txt
 http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11-rc5-bk2.txt
 
 every last message before the OOM messages is something with pppd:
 
 Mar 10 13:45:55 sheep pppd[1567]: Starting link
 Mar 10 14:12:29 sheep kernel: oom-killer: gfp_mask=0x1d2
 
 Mar  8 00:59:58 sheep pppd[418]: Starting link
 Mar  8 01:27:33 sheep kernel: oom-killer: gfp_mask=0xd0
 
 Mar  9 07:33:49 sheep pppd[30937]: Starting link
 Mar  9 08:01:35 sheep kernel: oom-killer: gfp_mask=0x1d2
 
 and 30min later OOM kicks in. normally, pppd (pppoe) gives messages like this:
 
 Mar 10 14:23:38 sheep pppd[26365]: Starting link
 Mar 10 14:23:38 sheep pppd[26365]: Serial connection established.
 Mar 10 14:23:38 sheep pppd[26365]: Connect: ppp0 -- /dev/pts/0
 Mar 10 14:23:38 sheep pppoe[26383]: PADS: Service-Name: ''
 Mar 10 14:23:38 sheep pppoe[26383]: PPP session is 6804
 Mar 10 14:23:39 sheep pppd[26365]: CHAP authentication succeeded
 Mar 10 14:23:40 sheep pppd[26365]: Local IP address changed to
 [...]
 
 is this strange? or not?
 
 i hope someone has a hint for me, because going back to the stable
 kernel would mean being bound to 2.6.11-rc2-bk10 :(
 
 thank you for any hints,
 Christian.
 
 PS: Steven, i've cc'ed you because you have trouble with new 2.6.11
 kernels and pppd too. maybe unrelated, maybe not.
 --
 BOFH excuse #185:
 
 system consumed all the paper for paging

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-09 Thread Mauricio Lin
Hi Christian,

I found the 2.6.11-rc3 patch. The oom killer modification from
Arcangeli was included in 2.6.11-rc3. Right? So this is correct, so
the problem is not related to Arcangeli modification.

Does anyone have idea?

BR,

Mauricio Lin.

On Wed, 9 Mar 2005 09:18:31 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote:
> Hi Christian,
> 
> Could you check the mm/oom_kill.c for your kernel 2.6.11-rc3?
> 
> During the 2.6.11-rc development, the oom killer was changed by Andrea
> Arcangeli. I do not remember exactly which was the version that this
> modification was included, perhaps in kernel 2.6.11-rc4.
> 
> Now this oom killer modification is part of 2.6.11 vanilla kernel.
> 
> Send the mm/oom_kill.c of 2.6.11-rc3 to me, please. Let me confirm my doubt.
> 
> BR,
> 
> Mauricio Lin.
> 
> On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote:
> > hallo list,
> >
> > today my machine went out out memory and noticing it several hours after
> > the first OOM message in the log, i wonder
> >1) why this happened at all and
> >2) why almost every service was killed despite the clever algorithms
> >   documented in mm/oom_kill.c.
> >
> > the first oom message went to the syslog at 01:27, i was away and no heavy
> > tasks were scheduled:
> >
> > http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
> >
> > mysqld got killed by the oom killer, so i have to suspect mysql for being
> > the reason for oom here, even that i know that mysqld is running all day
> > long. several other tasks got killed, but "Free swap" stays at 0kB and the
> > oom killer kills almost every other tasks, with no success in freeing ram.
> >
> > the log stops at 03:21, perhaps syslog-ng got killed.
> > at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to
> > login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this
> > time loadavg was at 249 ;)
> >
> > i went to runlevel 2, then up again to 3 and all services are up and
> > running again.
> >
> > some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30
> > days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not)
> > reproduce the problem the next night, i wonder if it will happen again.
> >
> > do you vm-gurus have any idea to the points asked above?
> >
> > more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/
> >
> > thank you for your comments,
> > Christian.
> > --
> > BOFH excuse #281:
> >
> > The co-locator cannot verify the frame-relay gateway to the ISDN server.
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-09 Thread Mauricio Lin
Hi Christian,

Could you check the mm/oom_kill.c for your kernel 2.6.11-rc3?

During the 2.6.11-rc development, the oom killer was changed by Andrea
Arcangeli. I do not remember exactly which was the version that this
modification was included, perhaps in kernel 2.6.11-rc4.

Now this oom killer modification is part of 2.6.11 vanilla kernel.

Send the mm/oom_kill.c of 2.6.11-rc3 to me, please. Let me confirm my doubt.

BR,

Mauricio Lin. 

On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau <[EMAIL PROTECTED]> wrote:
> hallo list,
> 
> today my machine went out out memory and noticing it several hours after
> the first OOM message in the log, i wonder
>1) why this happened at all and
>2) why almost every service was killed despite the clever algorithms
>   documented in mm/oom_kill.c.
> 
> the first oom message went to the syslog at 01:27, i was away and no heavy
> tasks were scheduled:
> 
> http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
> 
> mysqld got killed by the oom killer, so i have to suspect mysql for being
> the reason for oom here, even that i know that mysqld is running all day
> long. several other tasks got killed, but "Free swap" stays at 0kB and the
> oom killer kills almost every other tasks, with no success in freeing ram.
> 
> the log stops at 03:21, perhaps syslog-ng got killed.
> at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to
> login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this
> time loadavg was at 249 ;)
> 
> i went to runlevel 2, then up again to 3 and all services are up and
> running again.
> 
> some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30
> days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not)
> reproduce the problem the next night, i wonder if it will happen again.
> 
> do you vm-gurus have any idea to the points asked above?
> 
> more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/
> 
> thank you for your comments,
> Christian.
> --
> BOFH excuse #281:
> 
> The co-locator cannot verify the frame-relay gateway to the ISDN server.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-09 Thread Mauricio Lin
Hi Christian,

Could you check the mm/oom_kill.c for your kernel 2.6.11-rc3?

During the 2.6.11-rc development, the oom killer was changed by Andrea
Arcangeli. I do not remember exactly which was the version that this
modification was included, perhaps in kernel 2.6.11-rc4.

Now this oom killer modification is part of 2.6.11 vanilla kernel.

Send the mm/oom_kill.c of 2.6.11-rc3 to me, please. Let me confirm my doubt.

BR,

Mauricio Lin. 

On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau [EMAIL PROTECTED] wrote:
 hallo list,
 
 today my machine went out out memory and noticing it several hours after
 the first OOM message in the log, i wonder
1) why this happened at all and
2) why almost every service was killed despite the clever algorithms
   documented in mm/oom_kill.c.
 
 the first oom message went to the syslog at 01:27, i was away and no heavy
 tasks were scheduled:
 
 http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
 
 mysqld got killed by the oom killer, so i have to suspect mysql for being
 the reason for oom here, even that i know that mysqld is running all day
 long. several other tasks got killed, but Free swap stays at 0kB and the
 oom killer kills almost every other tasks, with no success in freeing ram.
 
 the log stops at 03:21, perhaps syslog-ng got killed.
 at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to
 login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this
 time loadavg was at 249 ;)
 
 i went to runlevel 2, then up again to 3 and all services are up and
 running again.
 
 some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30
 days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not)
 reproduce the problem the next night, i wonder if it will happen again.
 
 do you vm-gurus have any idea to the points asked above?
 
 more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/
 
 thank you for your comments,
 Christian.
 --
 BOFH excuse #281:
 
 The co-locator cannot verify the frame-relay gateway to the ISDN server.
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-09 Thread Mauricio Lin
Hi Christian,

I found the 2.6.11-rc3 patch. The oom killer modification from
Arcangeli was included in 2.6.11-rc3. Right? So this is correct, so
the problem is not related to Arcangeli modification.

Does anyone have idea?

BR,

Mauricio Lin.

On Wed, 9 Mar 2005 09:18:31 -0400, Mauricio Lin [EMAIL PROTECTED] wrote:
 Hi Christian,
 
 Could you check the mm/oom_kill.c for your kernel 2.6.11-rc3?
 
 During the 2.6.11-rc development, the oom killer was changed by Andrea
 Arcangeli. I do not remember exactly which was the version that this
 modification was included, perhaps in kernel 2.6.11-rc4.
 
 Now this oom killer modification is part of 2.6.11 vanilla kernel.
 
 Send the mm/oom_kill.c of 2.6.11-rc3 to me, please. Let me confirm my doubt.
 
 BR,
 
 Mauricio Lin.
 
 On Tue, 08 Mar 2005 16:21:21 +0100, Christian Kujau [EMAIL PROTECTED] wrote:
  hallo list,
 
  today my machine went out out memory and noticing it several hours after
  the first OOM message in the log, i wonder
 1) why this happened at all and
 2) why almost every service was killed despite the clever algorithms
documented in mm/oom_kill.c.
 
  the first oom message went to the syslog at 01:27, i was away and no heavy
  tasks were scheduled:
 
  http://nerdbynature.de/bits/sheep/2.6.11/oom/oom_2.6.11.txt
 
  mysqld got killed by the oom killer, so i have to suspect mysql for being
  the reason for oom here, even that i know that mysqld is running all day
  long. several other tasks got killed, but Free swap stays at 0kB and the
  oom killer kills almost every other tasks, with no success in freeing ram.
 
  the log stops at 03:21, perhaps syslog-ng got killed.
  at around 07:31 i noticed the mess, did SYSRQ-E and now i was able to
  login again. i pressed SYSRQ-M/T/P too, they are all in the log. at this
  time loadavg was at 249 ;)
 
  i went to runlevel 2, then up again to 3 and all services are up and
  running again.
 
  some 2.6.11-rc3 BK snapshot was running pretty stable (no OOM) for ~30
  days before i switched to 2.6.11 (vanilla) a few days ago. i have to (not)
  reproduce the problem the next night, i wonder if it will happen again.
 
  do you vm-gurus have any idea to the points asked above?
 
  more infos about the box here: http://nerdbynature.de/bits/sheep/2.6.11/oom/
 
  thank you for your comments,
  Christian.
  --
  BOFH excuse #281:
 
  The co-locator cannot verify the frame-relay gateway to the ISDN server.
  -
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-03-03 Thread Mauricio Lin
Hi all,

I am sending some modifications about smaps PATCH.

BTW, thanks Hugh by all your suggestions. The page_table_lock was
already included in the smaps.

BR,

Mauricio Lin.


diff -uprN linux-2.6.11-rc4-bk9/Documentation/filesystems/proc.txt
linux-2.6.11-rc4-bk9-smaps/Documentation/filesystems/proc.txt
--- linux-2.6.11-rc4-bk9/Documentation/filesystems/proc.txt 2005-02-28
06:24:09.0 -0400
+++ linux-2.6.11-rc4-bk9-smaps/Documentation/filesystems/proc.txt   
2005-02-28
06:28:10.0 -0400
@@ -133,6 +133,7 @@ Table 1-1: Process specific entries in /
  statm   Process memory status information  
  status  Process status in human readable form  
  wchan   If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ smaps  Extension based on maps, presenting the rss size for each mapped file
 ..
 
 For example, to get the status information of a process, all you have to do is
diff -uprN linux-2.6.11-rc4-bk9/Makefile linux-2.6.11-rc4-bk9-smaps/Makefile
--- linux-2.6.11-rc4-bk9/Makefile   2005-02-28 06:24:59.0 -0400
+++ linux-2.6.11-rc4-bk9-smaps/Makefile 2005-02-28 06:28:10.0 -0400
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 11
-EXTRAVERSION = -rc4-bk9
+EXTRAVERSION = -rc4-bk9-smaps
 NAME=Woozy Numbat
 
 # *DOCUMENTATION*
diff -uprN linux-2.6.11-rc4-bk9/fs/proc/base.c
linux-2.6.11-rc4-bk9-smaps/fs/proc/base.c
--- linux-2.6.11-rc4-bk9/fs/proc/base.c 2005-02-28 06:24:41.0 -0400
+++ linux-2.6.11-rc4-bk9-smaps/fs/proc/base.c   2005-02-28
06:28:10.0 -0400
@@ -11,6 +11,28 @@
  *  go into icache. We cache the reference to task_struct upon lookup too.
  *  Eventually it should become a filesystem in its own. We don't use the
  *  rest of procfs anymore.
+ *
+ *
+ *  Changelog:
+ *  17-Jan-2005
+ *  Allan Bezerra
+ *  Bruna Moreira <[EMAIL PROTECTED]>
+ *  Edjard Mota <[EMAIL PROTECTED]>
+ *  Ilias Biris <[EMAIL PROTECTED]>
+ *  Mauricio Lin <[EMAIL PROTECTED]>
+ *
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
+ *
+ *  A new process specific entry (smaps) included in /proc. It shows the
+ *  size of rss for each memory area. The maps entry lacks information
+ *  about physical memory size (rss) for each mapped file, i.e.,
+ *  rss information for executables and library files.
+ *  This additional information is useful for any tools that need to know
+ *  about physical memory consumption for a process specific library.
+ *  
+ *  Changelog:
+ *  21-Feb-2005
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT 
  */
 
 #include 
@@ -61,6 +83,7 @@ enum pid_directory_inos {
PROC_TGID_MAPS,
PROC_TGID_MOUNTS,
PROC_TGID_WCHAN,
+   PROC_TGID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TGID_SCHEDSTAT,
 #endif
@@ -92,6 +115,7 @@ enum pid_directory_inos {
PROC_TID_MAPS,
PROC_TID_MOUNTS,
PROC_TID_WCHAN,
+   PROC_TID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TID_SCHEDSTAT,
 #endif
@@ -134,6 +158,7 @@ static struct pid_entry tgid_base_stuff[
E(PROC_TGID_ROOT,  "root",S_IFLNK|S_IRWXUGO),
E(PROC_TGID_EXE,   "exe", S_IFLNK|S_IRWXUGO),
E(PROC_TGID_MOUNTS,"mounts",  S_IFREG|S_IRUGO),
+   E(PROC_TGID_SMAPS, "smaps",   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -164,6 +189,7 @@ static struct pid_entry tid_base_stuff[]
E(PROC_TID_ROOT,   "root",S_IFLNK|S_IRWXUGO),
E(PROC_TID_EXE,"exe", S_IFLNK|S_IRWXUGO),
E(PROC_TID_MOUNTS, "mounts",  S_IFREG|S_IRUGO),
+   E(PROC_TID_SMAPS,  "smaps",   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -488,6 +514,25 @@ static struct file_operations proc_maps_
.release= seq_release,
 };
 
+extern struct seq_operations proc_pid_smaps_op;
+static int smaps_open(struct inode *inode, struct file *file)
+{
+   struct task_struct *task = proc_task(inode);
+   int ret = seq_open(file, _pid_smaps_op);
+   if (!ret) {
+   struct seq_file *m = file->private_data;
+   m->private = task;
+   }
+   return ret;
+}
+
+static struct file_operations proc_smaps_operations = {
+   .open   = smaps_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 extern struct seq_operations mounts_op;
 static int mounts_open(struct inode *inode, struct file *file)
 {
@@ -1447,6 +1492,10 @@ static struct dentry *proc_pident_lookup
case PROC_TGID_MOUNTS:
inode->i_fop = _mounts_operations;
break

Re: [PATCH] A new entry for /proc

2005-03-03 Thread Mauricio Lin
Hi all,

I am sending some modifications about smaps PATCH.

BTW, thanks Hugh by all your suggestions. The page_table_lock was
already included in the smaps.

BR,

Mauricio Lin.


diff -uprN linux-2.6.11-rc4-bk9/Documentation/filesystems/proc.txt
linux-2.6.11-rc4-bk9-smaps/Documentation/filesystems/proc.txt
--- linux-2.6.11-rc4-bk9/Documentation/filesystems/proc.txt 2005-02-28
06:24:09.0 -0400
+++ linux-2.6.11-rc4-bk9-smaps/Documentation/filesystems/proc.txt   
2005-02-28
06:28:10.0 -0400
@@ -133,6 +133,7 @@ Table 1-1: Process specific entries in /
  statm   Process memory status information  
  status  Process status in human readable form  
  wchan   If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ smaps  Extension based on maps, presenting the rss size for each mapped file
 ..
 
 For example, to get the status information of a process, all you have to do is
diff -uprN linux-2.6.11-rc4-bk9/Makefile linux-2.6.11-rc4-bk9-smaps/Makefile
--- linux-2.6.11-rc4-bk9/Makefile   2005-02-28 06:24:59.0 -0400
+++ linux-2.6.11-rc4-bk9-smaps/Makefile 2005-02-28 06:28:10.0 -0400
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 11
-EXTRAVERSION = -rc4-bk9
+EXTRAVERSION = -rc4-bk9-smaps
 NAME=Woozy Numbat
 
 # *DOCUMENTATION*
diff -uprN linux-2.6.11-rc4-bk9/fs/proc/base.c
linux-2.6.11-rc4-bk9-smaps/fs/proc/base.c
--- linux-2.6.11-rc4-bk9/fs/proc/base.c 2005-02-28 06:24:41.0 -0400
+++ linux-2.6.11-rc4-bk9-smaps/fs/proc/base.c   2005-02-28
06:28:10.0 -0400
@@ -11,6 +11,28 @@
  *  go into icache. We cache the reference to task_struct upon lookup too.
  *  Eventually it should become a filesystem in its own. We don't use the
  *  rest of procfs anymore.
+ *
+ *
+ *  Changelog:
+ *  17-Jan-2005
+ *  Allan Bezerra
+ *  Bruna Moreira [EMAIL PROTECTED]
+ *  Edjard Mota [EMAIL PROTECTED]
+ *  Ilias Biris [EMAIL PROTECTED]
+ *  Mauricio Lin [EMAIL PROTECTED]
+ *
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
+ *
+ *  A new process specific entry (smaps) included in /proc. It shows the
+ *  size of rss for each memory area. The maps entry lacks information
+ *  about physical memory size (rss) for each mapped file, i.e.,
+ *  rss information for executables and library files.
+ *  This additional information is useful for any tools that need to know
+ *  about physical memory consumption for a process specific library.
+ *  
+ *  Changelog:
+ *  21-Feb-2005
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT 
  */
 
 #include asm/uaccess.h
@@ -61,6 +83,7 @@ enum pid_directory_inos {
PROC_TGID_MAPS,
PROC_TGID_MOUNTS,
PROC_TGID_WCHAN,
+   PROC_TGID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TGID_SCHEDSTAT,
 #endif
@@ -92,6 +115,7 @@ enum pid_directory_inos {
PROC_TID_MAPS,
PROC_TID_MOUNTS,
PROC_TID_WCHAN,
+   PROC_TID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TID_SCHEDSTAT,
 #endif
@@ -134,6 +158,7 @@ static struct pid_entry tgid_base_stuff[
E(PROC_TGID_ROOT,  root,S_IFLNK|S_IRWXUGO),
E(PROC_TGID_EXE,   exe, S_IFLNK|S_IRWXUGO),
E(PROC_TGID_MOUNTS,mounts,  S_IFREG|S_IRUGO),
+   E(PROC_TGID_SMAPS, smaps,   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -164,6 +189,7 @@ static struct pid_entry tid_base_stuff[]
E(PROC_TID_ROOT,   root,S_IFLNK|S_IRWXUGO),
E(PROC_TID_EXE,exe, S_IFLNK|S_IRWXUGO),
E(PROC_TID_MOUNTS, mounts,  S_IFREG|S_IRUGO),
+   E(PROC_TID_SMAPS,  smaps,   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -488,6 +514,25 @@ static struct file_operations proc_maps_
.release= seq_release,
 };
 
+extern struct seq_operations proc_pid_smaps_op;
+static int smaps_open(struct inode *inode, struct file *file)
+{
+   struct task_struct *task = proc_task(inode);
+   int ret = seq_open(file, proc_pid_smaps_op);
+   if (!ret) {
+   struct seq_file *m = file-private_data;
+   m-private = task;
+   }
+   return ret;
+}
+
+static struct file_operations proc_smaps_operations = {
+   .open   = smaps_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 extern struct seq_operations mounts_op;
 static int mounts_open(struct inode *inode, struct file *file)
 {
@@ -1447,6 +1492,10 @@ static struct dentry *proc_pident_lookup
case PROC_TGID_MOUNTS:
inode-i_fop = proc_mounts_operations;
break;
+   case PROC_TID_SMAPS:
+   case PROC_TGID_SMAPS:
+   inode-i_fop = proc_smaps_operations

Re: [PATCH] A new entry for /proc

2005-03-02 Thread Mauricio Lin
Hi Hugh,

How about map an unmap each pte?

I mean remove the pte++ and use pte_offset_map for each incremented
address and then pte_unmap. So each incremented address is an index to
get the next pte via pte_offset_map.

BR,

Mauricio Lin.

On Wed, 2 Mar 2005 19:07:15 + (GMT), Hugh Dickins <[EMAIL PROTECTED]> wrote:
> On Wed, 2 Mar 2005, Mauricio Lin wrote:
> > Does anyone know if the place I put pte_unmap is logical and safe
> > after several pte increments?
> 
> The place is logical and safe, but it's still not quite right.
> You should have found several examples of loops having the same
> problem, and what do they do? 
> 
> >   pte = pte_offset_map(pmd, address);
> >   address &= ~PMD_MASK;
> >   end = address + size;
> >   if (end > PMD_SIZE)
> >   end = PMD_SIZE;
> >   do {
> >   pte_t page = *pte;
> >
> >   address += PAGE_SIZE;
> >   pte++;
> >   if (pte_none(page) || (!pte_present(page)))
> >   continue;
> >   *rss += PAGE_SIZE;
> >   } while (address < end);
> >   pte_unmap(pte);
> 
> pte_unmap(pte - 1);
> 
> which works because it's a do {} while () loop which has certainly
> incremented pte at least once.  But some people probably loathe that
> style, and would prefer to save orig_pte then pte_unmap(orig_pte).
> 
> Hugh
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-03-02 Thread Mauricio Lin
Does anyone know if the place I put pte_unmap is logical and safe
after several pte increments?

pte = pte_offset_map(pmd, address);
address &= ~PMD_MASK;
end = address + size;
if (end > PMD_SIZE)
end = PMD_SIZE;
do {
pte_t page = *pte;

address += PAGE_SIZE;
pte++;
if (pte_none(page) || (!pte_present(page)))
continue;
*rss += PAGE_SIZE;
} while (address < end);
pte_unmap(pte);

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-03-02 Thread Mauricio Lin
Does anyone know if the place I put pte_unmap is logical and safe
after several pte increments?

pte = pte_offset_map(pmd, address);
address = ~PMD_MASK;
end = address + size;
if (end  PMD_SIZE)
end = PMD_SIZE;
do {
pte_t page = *pte;

address += PAGE_SIZE;
pte++;
if (pte_none(page) || (!pte_present(page)))
continue;
*rss += PAGE_SIZE;
} while (address  end);
pte_unmap(pte);

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-03-02 Thread Mauricio Lin
Hi Hugh,

How about map an unmap each pte?

I mean remove the pte++ and use pte_offset_map for each incremented
address and then pte_unmap. So each incremented address is an index to
get the next pte via pte_offset_map.

BR,

Mauricio Lin.

On Wed, 2 Mar 2005 19:07:15 + (GMT), Hugh Dickins [EMAIL PROTECTED] wrote:
 On Wed, 2 Mar 2005, Mauricio Lin wrote:
  Does anyone know if the place I put pte_unmap is logical and safe
  after several pte increments?
 
 The place is logical and safe, but it's still not quite right.
 You should have found several examples of loops having the same
 problem, and what do they do? 
 
pte = pte_offset_map(pmd, address);
address = ~PMD_MASK;
end = address + size;
if (end  PMD_SIZE)
end = PMD_SIZE;
do {
pte_t page = *pte;
 
address += PAGE_SIZE;
pte++;
if (pte_none(page) || (!pte_present(page)))
continue;
*rss += PAGE_SIZE;
} while (address  end);
pte_unmap(pte);
 
 pte_unmap(pte - 1);
 
 which works because it's a do {} while () loop which has certainly
 incremented pte at least once.  But some people probably loathe that
 style, and would prefer to save orig_pte then pte_unmap(orig_pte).
 
 Hugh

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-03-01 Thread Mauricio Lin
Hi,

Here are some values about the experiments. The values are the elapsed
real time used by the process, in seconds. Each row corresponds to
1 cat /proc/pid/smaps command.

Old smaps
19.41
19.31
21.38
20.16

New smaps
16.82
16.75
16.75
16.79


BR,

Mauricio Lin.

On Tue, 1 Mar 2005 10:17:56 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote:
> Well,
> 
> It is working better now. You are right Hugh. Now the new version is
> faster than the old one. I removed the struct page and its related
> function.
> 
> Thanks,
> 
> BR,
> 
> Mauricio Lin.
> 
> On Tue, 1 Mar 2005 04:08:15 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote:
> > On Mon, 28 Feb 2005 20:41:31 + (GMT), Hugh Dickins <[EMAIL PROTECTED]> 
> > wrote:
> > > On Mon, 28 Feb 2005, Mauricio Lin wrote:
> > > >
> > > > Now I am testing with /proc/pid/smaps and the values are showing that
> > > > the old one is faster than the new one. So I will keep using the old
> > > > smaps version.
> > >
> > > Sorry, I don't have time for more than the briefest look.
> > >
> > > It appears that your old resident_mem_size method is just checking
> > > pte_present, whereas your new smaps_pte_range method is also doing
> > > pte_page (yet no prior check for pfn_valid: wrong) and checking
> > > !PageReserved i.e. accessing the struct page corresponding to each
> > > pte.  So it's not a fair comparison, your new method is accessing
> > > many more cachelines than your old method.
> > >
> > > Though it's correct to check pfn_valid and !PageReserved to get the
> > > same total rss as would be reported elsewhere, I'd suggest that it's
> > > really not worth the overhead of those struct page accesses: just
> > > stick with the pte_present test.
> > So, I can remove the PageReserved macro without no problems, right?
> >
> >
> > >
> > > Your smaps_pte_range is missing pte_unmap?
> > Yes, but I already fixed this problem.  Paul Mundt has checked the
> > unmap missing.
> >
> > Thanks,
> >
> > Let me perform new experiments now.
> >
> > BR,
> >
> > Mauricio Lin.
> >
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-03-01 Thread Mauricio Lin
Hi,
 
> The most important thing about a /proc file format is that it has
> a documented means of being extended in the future. Without such
> documentation, it is impossible to write a reliable parser.
> 
> The "Name: value" stuff is rather slow. Right now procps (ps, top, etc.)
> is using a perfect hash function to parse the /proc/*/status files.
> ("man gperf") This is just plain gross, but needed for decent performance.

So, change the output format is important, right?
 
> Extending the /proc/*/maps file might be possible. It is commonly used
> by debuggers I think, so you'd better at least verify that gdb is OK.
> The procps "pmap" tool uses it too. To satisfy the procps parser:
> 
> a. no more than 31 flags
> b. no '/' prior to the filename
> c. nothing after the filename
> d. no new fields inserted prior to the inode number
> 

Yes, probably smaps is more feasible for tracking environment. Do you
know any public kernel (I mean kernel version for tracking and
debugging)  where can I post the smaps PATCH in order to be included?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-03-01 Thread Mauricio Lin
Well,

It is working better now. You are right Hugh. Now the new version is
faster than the old one. I removed the struct page and its related
function.

Thanks,

BR,

Mauricio Lin.

On Tue, 1 Mar 2005 04:08:15 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote:
> On Mon, 28 Feb 2005 20:41:31 + (GMT), Hugh Dickins <[EMAIL PROTECTED]> 
> wrote:
> > On Mon, 28 Feb 2005, Mauricio Lin wrote:
> > >
> > > Now I am testing with /proc/pid/smaps and the values are showing that
> > > the old one is faster than the new one. So I will keep using the old
> > > smaps version.
> >
> > Sorry, I don't have time for more than the briefest look.
> >
> > It appears that your old resident_mem_size method is just checking
> > pte_present, whereas your new smaps_pte_range method is also doing
> > pte_page (yet no prior check for pfn_valid: wrong) and checking
> > !PageReserved i.e. accessing the struct page corresponding to each
> > pte.  So it's not a fair comparison, your new method is accessing
> > many more cachelines than your old method.
> >
> > Though it's correct to check pfn_valid and !PageReserved to get the
> > same total rss as would be reported elsewhere, I'd suggest that it's
> > really not worth the overhead of those struct page accesses: just
> > stick with the pte_present test.
> So, I can remove the PageReserved macro without no problems, right?
> 
> 
> >
> > Your smaps_pte_range is missing pte_unmap?
> Yes, but I already fixed this problem.  Paul Mundt has checked the
> unmap missing.
> 
> Thanks,
> 
> Let me perform new experiments now.
> 
> BR,
> 
> Mauricio Lin.
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-03-01 Thread Mauricio Lin
On Mon, 28 Feb 2005 20:41:31 + (GMT), Hugh Dickins <[EMAIL PROTECTED]> 
wrote:
> On Mon, 28 Feb 2005, Mauricio Lin wrote:
> >
> > Now I am testing with /proc/pid/smaps and the values are showing that
> > the old one is faster than the new one. So I will keep using the old
> > smaps version.
> 
> Sorry, I don't have time for more than the briefest look.
> 
> It appears that your old resident_mem_size method is just checking
> pte_present, whereas your new smaps_pte_range method is also doing
> pte_page (yet no prior check for pfn_valid: wrong) and checking
> !PageReserved i.e. accessing the struct page corresponding to each
> pte.  So it's not a fair comparison, your new method is accessing
> many more cachelines than your old method.
> 
> Though it's correct to check pfn_valid and !PageReserved to get the
> same total rss as would be reported elsewhere, I'd suggest that it's
> really not worth the overhead of those struct page accesses: just
> stick with the pte_present test.
So, I can remove the PageReserved macro without no problems, right?


> 
> Your smaps_pte_range is missing pte_unmap?
Yes, but I already fixed this problem.  Paul Mundt has checked the
unmap missing.

Thanks,

Let me perform new experiments now.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-03-01 Thread Mauricio Lin
On Mon, 28 Feb 2005 20:41:31 + (GMT), Hugh Dickins [EMAIL PROTECTED] 
wrote:
 On Mon, 28 Feb 2005, Mauricio Lin wrote:
 
  Now I am testing with /proc/pid/smaps and the values are showing that
  the old one is faster than the new one. So I will keep using the old
  smaps version.
 
 Sorry, I don't have time for more than the briefest look.
 
 It appears that your old resident_mem_size method is just checking
 pte_present, whereas your new smaps_pte_range method is also doing
 pte_page (yet no prior check for pfn_valid: wrong) and checking
 !PageReserved i.e. accessing the struct page corresponding to each
 pte.  So it's not a fair comparison, your new method is accessing
 many more cachelines than your old method.
 
 Though it's correct to check pfn_valid and !PageReserved to get the
 same total rss as would be reported elsewhere, I'd suggest that it's
 really not worth the overhead of those struct page accesses: just
 stick with the pte_present test.
So, I can remove the PageReserved macro without no problems, right?


 
 Your smaps_pte_range is missing pte_unmap?
Yes, but I already fixed this problem.  Paul Mundt has checked the
unmap missing.

Thanks,

Let me perform new experiments now.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-03-01 Thread Mauricio Lin
Well,

It is working better now. You are right Hugh. Now the new version is
faster than the old one. I removed the struct page and its related
function.

Thanks,

BR,

Mauricio Lin.

On Tue, 1 Mar 2005 04:08:15 -0400, Mauricio Lin [EMAIL PROTECTED] wrote:
 On Mon, 28 Feb 2005 20:41:31 + (GMT), Hugh Dickins [EMAIL PROTECTED] 
 wrote:
  On Mon, 28 Feb 2005, Mauricio Lin wrote:
  
   Now I am testing with /proc/pid/smaps and the values are showing that
   the old one is faster than the new one. So I will keep using the old
   smaps version.
 
  Sorry, I don't have time for more than the briefest look.
 
  It appears that your old resident_mem_size method is just checking
  pte_present, whereas your new smaps_pte_range method is also doing
  pte_page (yet no prior check for pfn_valid: wrong) and checking
  !PageReserved i.e. accessing the struct page corresponding to each
  pte.  So it's not a fair comparison, your new method is accessing
  many more cachelines than your old method.
 
  Though it's correct to check pfn_valid and !PageReserved to get the
  same total rss as would be reported elsewhere, I'd suggest that it's
  really not worth the overhead of those struct page accesses: just
  stick with the pte_present test.
 So, I can remove the PageReserved macro without no problems, right?
 
 
 
  Your smaps_pte_range is missing pte_unmap?
 Yes, but I already fixed this problem.  Paul Mundt has checked the
 unmap missing.
 
 Thanks,
 
 Let me perform new experiments now.
 
 BR,
 
 Mauricio Lin.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-03-01 Thread Mauricio Lin
Hi,
 
 The most important thing about a /proc file format is that it has
 a documented means of being extended in the future. Without such
 documentation, it is impossible to write a reliable parser.
 
 The Name: value stuff is rather slow. Right now procps (ps, top, etc.)
 is using a perfect hash function to parse the /proc/*/status files.
 (man gperf) This is just plain gross, but needed for decent performance.

So, change the output format is important, right?
 
 Extending the /proc/*/maps file might be possible. It is commonly used
 by debuggers I think, so you'd better at least verify that gdb is OK.
 The procps pmap tool uses it too. To satisfy the procps parser:
 
 a. no more than 31 flags
 b. no '/' prior to the filename
 c. nothing after the filename
 d. no new fields inserted prior to the inode number
 

Yes, probably smaps is more feasible for tracking environment. Do you
know any public kernel (I mean kernel version for tracking and
debugging)  where can I post the smaps PATCH in order to be included?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-03-01 Thread Mauricio Lin
Hi,

Here are some values about the experiments. The values are the elapsed
real time used by the process, in seconds. Each row corresponds to
1 cat /proc/pid/smaps command.

Old smaps
19.41
19.31
21.38
20.16

New smaps
16.82
16.75
16.75
16.79


BR,

Mauricio Lin.

On Tue, 1 Mar 2005 10:17:56 -0400, Mauricio Lin [EMAIL PROTECTED] wrote:
 Well,
 
 It is working better now. You are right Hugh. Now the new version is
 faster than the old one. I removed the struct page and its related
 function.
 
 Thanks,
 
 BR,
 
 Mauricio Lin.
 
 On Tue, 1 Mar 2005 04:08:15 -0400, Mauricio Lin [EMAIL PROTECTED] wrote:
  On Mon, 28 Feb 2005 20:41:31 + (GMT), Hugh Dickins [EMAIL PROTECTED] 
  wrote:
   On Mon, 28 Feb 2005, Mauricio Lin wrote:
   
Now I am testing with /proc/pid/smaps and the values are showing that
the old one is faster than the new one. So I will keep using the old
smaps version.
  
   Sorry, I don't have time for more than the briefest look.
  
   It appears that your old resident_mem_size method is just checking
   pte_present, whereas your new smaps_pte_range method is also doing
   pte_page (yet no prior check for pfn_valid: wrong) and checking
   !PageReserved i.e. accessing the struct page corresponding to each
   pte.  So it's not a fair comparison, your new method is accessing
   many more cachelines than your old method.
  
   Though it's correct to check pfn_valid and !PageReserved to get the
   same total rss as would be reported elsewhere, I'd suggest that it's
   really not worth the overhead of those struct page accesses: just
   stick with the pte_present test.
  So, I can remove the PageReserved macro without no problems, right?
 
 
  
   Your smaps_pte_range is missing pte_unmap?
  Yes, but I already fixed this problem.  Paul Mundt has checked the
  unmap missing.
 
  Thanks,
 
  Let me perform new experiments now.
 
  BR,
 
  Mauricio Lin.
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-02-28 Thread Mauricio Lin
Hi,

Just some explanation about the mistake.

I have put cat /proc/pid/status instead of /proc/pid/smaps.

So I was testing the /proc/pid/status and not the /proc/pid/smaps.

Now I am testing with /proc/pid/smaps and the values are showing that
the old one is faster than the new one. So I will keep using the old
smaps version.

Any suggestion???

BR,

Mauricio Lin.

On Mon, 28 Feb 2005 05:43:05 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote:
> Hi all,
> 
> I comitted a mistake. Indeed the old smaps is still faster than new one.
> 
> Take a look:
> 
> Old smaps
> real 19.52
> user 2.15
> sys 17.27
> 
> New smaps
> real 25.93
> user 3.19
> sys 22.31
> 
> Any comments
> 
> BR,
> 
> Mauricio Lin.
> 
> On Fri, 25 Feb 2005 11:14:36 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > I tested the two smaps entry using time command.
> >
> > I tested 100.000 cat commands with smaps for each version.
> >
> > I checked the difference between the two versions and the new one is
> > faster than old one. So Hugh is correct about the loop performance.
> >
> > Thanks!!!
> >
> > Mauricio Lin.
> >
> > On Thu, 24 Feb 2005 03:52:55 -0800, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > > Mauricio Lin <[EMAIL PROTECTED]> wrote:
> > > >
> > > > But can i use jiffies to measure this kind of performance??? AFAIK, if
> > > >  it is more efficient, then it is faster, right? How can I know how
> > > >  fast it is? Any idea?
> > >
> > > umm,
> > >
> > > time ( for i in $(seq 100); do; cat /proc/nnn/smaps; done > /dev/null )
> > >
> > > ?
> > >
> >
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-02-28 Thread Mauricio Lin
Hi all,

I comitted a mistake. Indeed the old smaps is still faster than new one.

Take a look:

Old smaps
real 19.52
user 2.15
sys 17.27

New smaps
real 25.93
user 3.19
sys 22.31

Any comments

BR,

Mauricio Lin.

On Fri, 25 Feb 2005 11:14:36 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote:
> Hi all,
> 
> I tested the two smaps entry using time command.
> 
> I tested 100.000 cat commands with smaps for each version.
> 
> I checked the difference between the two versions and the new one is
> faster than old one. So Hugh is correct about the loop performance.
> 
> Thanks!!!
> 
> Mauricio Lin.
> 
> On Thu, 24 Feb 2005 03:52:55 -0800, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > Mauricio Lin <[EMAIL PROTECTED]> wrote:
> > >
> > > But can i use jiffies to measure this kind of performance??? AFAIK, if
> > >  it is more efficient, then it is faster, right? How can I know how
> > >  fast it is? Any idea?
> >
> > umm,
> >
> > time ( for i in $(seq 100); do; cat /proc/nnn/smaps; done > /dev/null )
> >
> > ?
> >
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-02-28 Thread Mauricio Lin
Hi all,

I comitted a mistake. Indeed the old smaps is still faster than new one.

Take a look:

Old smaps
real 19.52
user 2.15
sys 17.27

New smaps
real 25.93
user 3.19
sys 22.31

Any comments

BR,

Mauricio Lin.

On Fri, 25 Feb 2005 11:14:36 -0400, Mauricio Lin [EMAIL PROTECTED] wrote:
 Hi all,
 
 I tested the two smaps entry using time command.
 
 I tested 100.000 cat commands with smaps for each version.
 
 I checked the difference between the two versions and the new one is
 faster than old one. So Hugh is correct about the loop performance.
 
 Thanks!!!
 
 Mauricio Lin.
 
 On Thu, 24 Feb 2005 03:52:55 -0800, Andrew Morton [EMAIL PROTECTED] wrote:
  Mauricio Lin [EMAIL PROTECTED] wrote:
  
   But can i use jiffies to measure this kind of performance??? AFAIK, if
it is more efficient, then it is faster, right? How can I know how
fast it is? Any idea?
 
  umm,
 
  time ( for i in $(seq 100); do; cat /proc/nnn/smaps; done  /dev/null )
 
  ?
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-02-28 Thread Mauricio Lin
Hi,

Just some explanation about the mistake.

I have put cat /proc/pid/status instead of /proc/pid/smaps.

So I was testing the /proc/pid/status and not the /proc/pid/smaps.

Now I am testing with /proc/pid/smaps and the values are showing that
the old one is faster than the new one. So I will keep using the old
smaps version.

Any suggestion???

BR,

Mauricio Lin.

On Mon, 28 Feb 2005 05:43:05 -0400, Mauricio Lin [EMAIL PROTECTED] wrote:
 Hi all,
 
 I comitted a mistake. Indeed the old smaps is still faster than new one.
 
 Take a look:
 
 Old smaps
 real 19.52
 user 2.15
 sys 17.27
 
 New smaps
 real 25.93
 user 3.19
 sys 22.31
 
 Any comments
 
 BR,
 
 Mauricio Lin.
 
 On Fri, 25 Feb 2005 11:14:36 -0400, Mauricio Lin [EMAIL PROTECTED] wrote:
  Hi all,
 
  I tested the two smaps entry using time command.
 
  I tested 100.000 cat commands with smaps for each version.
 
  I checked the difference between the two versions and the new one is
  faster than old one. So Hugh is correct about the loop performance.
 
  Thanks!!!
 
  Mauricio Lin.
 
  On Thu, 24 Feb 2005 03:52:55 -0800, Andrew Morton [EMAIL PROTECTED] wrote:
   Mauricio Lin [EMAIL PROTECTED] wrote:
   
But can i use jiffies to measure this kind of performance??? AFAIK, if
 it is more efficient, then it is faster, right? How can I know how
 fast it is? Any idea?
  
   umm,
  
   time ( for i in $(seq 100); do; cat /proc/nnn/smaps; done  /dev/null )
  
   ?
  
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-02-25 Thread Mauricio Lin
Hi all,

I tested the two smaps entry using time command.

I tested 100.000 cat commands with smaps for each version.

I checked the difference between the two versions and the new one is
faster than old one. So Hugh is correct about the loop performance.

Thanks!!!

Mauricio Lin.

On Thu, 24 Feb 2005 03:52:55 -0800, Andrew Morton <[EMAIL PROTECTED]> wrote:
> Mauricio Lin <[EMAIL PROTECTED]> wrote:
> >
> > But can i use jiffies to measure this kind of performance??? AFAIK, if
> >  it is more efficient, then it is faster, right? How can I know how
> >  fast it is? Any idea?
> 
> umm,
> 
> time ( for i in $(seq 100); do; cat /proc/nnn/smaps; done > /dev/null )
> 
> ?
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-02-25 Thread Mauricio Lin
Hi all,

I tested the two smaps entry using time command.

I tested 100.000 cat commands with smaps for each version.

I checked the difference between the two versions and the new one is
faster than old one. So Hugh is correct about the loop performance.

Thanks!!!

Mauricio Lin.

On Thu, 24 Feb 2005 03:52:55 -0800, Andrew Morton [EMAIL PROTECTED] wrote:
 Mauricio Lin [EMAIL PROTECTED] wrote:
 
  But can i use jiffies to measure this kind of performance??? AFAIK, if
   it is more efficient, then it is faster, right? How can I know how
   fast it is? Any idea?
 
 umm,
 
 time ( for i in $(seq 100); do; cat /proc/nnn/smaps; done  /dev/null )
 
 ?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-02-24 Thread Mauricio Lin
Hi Andrew,

But can i use jiffies to measure this kind of performance??? AFAIK, if
it is more efficient, then it is faster, right? How can I know how
fast it is? Any idea?

BR,

Mauricio Lin.


On Thu, 24 Feb 2005 01:09:47 -0800, Andrew Morton <[EMAIL PROTECTED]> wrote:
> Mauricio Lin <[EMAIL PROTECTED]> wrote:
> >
> > You said that the old smaps version is not efficient because the way
> >  it access each pte.
> 
> Nick is talking about changing the kenrel so that it "refcounts pagetable
> pages".  I'm not sure why.
> 
> I assume that this means that each pte page's refcount will be incremented
> by one for each instantiated pte.  If so, then /proc/pid/smaps can become a
> lot more efficient.  Just add up the page refcounts on all the pte pages -
> no need to walk the ptes themselves.
> 
> Maybe?
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-02-24 Thread Mauricio Lin
Hi Hugh,

You said that the old smaps version is not efficient because the way
it access each pte. So I changed it using pdg_range, pud_range,
pmd_range and pte_range.  Now I am trying to measure the efficiency
between the old and new smaps but something is wrong.

I put some timers before and after the function that executes the
traversing algorithm in order to measure the elapsed time.
Both version (old and new smaps) shows 0 jiffies as elapsed time. 

Is it anything wrong? Any idea?
 
BR,

Mauricio Lin.

On Tue, 22 Feb 2005 09:13:01 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote:
> Hi All,
> 
> Here goes the new smaps patch. As suggested by Hugh in another discussion, the
>  inefficient loop was removed and replaced by smaps_pgd_range,
> smaps_pud_range, smaps_pmd and smaps_pte_range functions. I mantained
> the old resident_mem_size function between comments just for anyone
> who wants to verify it. BTW, we are using smaps to figure out which
> shared libraries that have heavy physical memory comsumption.
> 
> diff -uprN linux-2.6.11-rc4-bk9/Documentation/filesystems/proc.txt
> linux-2.6.11-rc4-bk9-smaps/Documentation/filesystems/proc.txt
> --- linux-2.6.11-rc4-bk9/Documentation/filesystems/proc.txt 2005-02-20
> 11:35:13.0 -0400
> +++ linux-2.6.11-rc4-bk9-smaps/Documentation/filesystems/proc.txt   
> 2005-02-20
> 11:29:42.0 -0400
> @@ -133,6 +133,7 @@ Table 1-1: Process specific entries in /
>   statm   Process memory status information
>   status  Process status in human readable form
>   wchan   If CONFIG_KALLSYMS is set, a pre-decoded wchan
> + smaps  Extension based on maps, presenting the rss size for each mapped file
>  
> ..
> 
>  For example, to get the status information of a process, all you have to do 
> is
> diff -uprN linux-2.6.11-rc4-bk9/Makefile linux-2.6.11-rc4-bk9-smaps/Makefile
> --- linux-2.6.11-rc4-bk9/Makefile   2005-02-20 11:36:00.0 -0400
> +++ linux-2.6.11-rc4-bk9-smaps/Makefile 2005-02-20 11:31:44.0 -0400
> @@ -1,7 +1,7 @@
>  VERSION = 2
>  PATCHLEVEL = 6
>  SUBLEVEL = 11
> -EXTRAVERSION = -rc4-bk9
> +EXTRAVERSION = -rc4-bk9-smaps
>  NAME=Woozy Numbat
> 
>  # *DOCUMENTATION*
> diff -uprN linux-2.6.11-rc4-bk9/fs/proc/base.c
> linux-2.6.11-rc4-bk9-smaps/fs/proc/base.c
> --- linux-2.6.11-rc4-bk9/fs/proc/base.c 2005-02-20 11:35:22.0 -0400
> +++ linux-2.6.11-rc4-bk9-smaps/fs/proc/base.c   2005-02-20
> 11:28:00.0 -0400
> @@ -11,6 +11,28 @@
>   *  go into icache. We cache the reference to task_struct upon lookup too.
>   *  Eventually it should become a filesystem in its own. We don't use the
>   *  rest of procfs anymore.
> + *
> + *
> + *  Changelog:
> + *  17-Jan-2005
> + *  Allan Bezerra
> + *  Bruna Moreira <[EMAIL PROTECTED]>
> + *  Edjard Mota <[EMAIL PROTECTED]>
> + *  Ilias Biris <[EMAIL PROTECTED]>
> + *  Mauricio Lin <[EMAIL PROTECTED]>
> + *
> + *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
> + *
> + *  A new process specific entry (smaps) included in /proc. It shows the
> + *  size of rss for each memory area. The maps entry lacks information
> + *  about physical memory size (rss) for each mapped file, i.e.,
> + *  rss information for executables and library files.
> + *  This additional information is useful for any tools that need to know
> + *  about physical memory consumption for a process specific library.
> + *
> + *  Changelog:
> + *  21-Feb-2005
> + *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
>   */
> 
>  #include 
> @@ -61,6 +83,7 @@ enum pid_directory_inos {
> PROC_TGID_MAPS,
> PROC_TGID_MOUNTS,
> PROC_TGID_WCHAN,
> +   PROC_TGID_SMAPS,
>  #ifdef CONFIG_SCHEDSTATS
> PROC_TGID_SCHEDSTAT,
>  #endif
> @@ -92,6 +115,7 @@ enum pid_directory_inos {
> PROC_TID_MAPS,
> PROC_TID_MOUNTS,
> PROC_TID_WCHAN,
> +   PROC_TID_SMAPS,
>  #ifdef CONFIG_SCHEDSTATS
> PROC_TID_SCHEDSTAT,
>  #endif
> @@ -134,6 +158,7 @@ static struct pid_entry tgid_base_stuff[
> E(PROC_TGID_ROOT,  "root",S_IFLNK|S_IRWXUGO),
> E(PROC_TGID_EXE,   "exe", S_IFLNK|S_IRWXUGO),
> E(PROC_TGID_MOUNTS,"mounts",  S_IFREG|S_IRUGO),
> +   E(PROC_TGID_SMAPS, "smaps",   S_IFREG|S_IRUGO),
>  #ifdef CONFIG_SECURITY
> E(PROC_TGID_ATTR,  "attr",S_IFDIR|S_IRUGO|S_IXUGO),
>  #endif
> @@ -164,6 +189,7 @@ static struct pid_entry tid_base_stuff[]
> E(PROC_TID_ROOT,   "root",S_IFLNK|S_IRWXUGO),
> 

Re: [PATCH] A new entry for /proc

2005-02-24 Thread Mauricio Lin
Hi Hugh,

You said that the old smaps version is not efficient because the way
it access each pte. So I changed it using pdg_range, pud_range,
pmd_range and pte_range.  Now I am trying to measure the efficiency
between the old and new smaps but something is wrong.

I put some timers before and after the function that executes the
traversing algorithm in order to measure the elapsed time.
Both version (old and new smaps) shows 0 jiffies as elapsed time. 

Is it anything wrong? Any idea?
 
BR,

Mauricio Lin.

On Tue, 22 Feb 2005 09:13:01 -0400, Mauricio Lin [EMAIL PROTECTED] wrote:
 Hi All,
 
 Here goes the new smaps patch. As suggested by Hugh in another discussion, the
  inefficient loop was removed and replaced by smaps_pgd_range,
 smaps_pud_range, smaps_pmd and smaps_pte_range functions. I mantained
 the old resident_mem_size function between comments just for anyone
 who wants to verify it. BTW, we are using smaps to figure out which
 shared libraries that have heavy physical memory comsumption.
 
 diff -uprN linux-2.6.11-rc4-bk9/Documentation/filesystems/proc.txt
 linux-2.6.11-rc4-bk9-smaps/Documentation/filesystems/proc.txt
 --- linux-2.6.11-rc4-bk9/Documentation/filesystems/proc.txt 2005-02-20
 11:35:13.0 -0400
 +++ linux-2.6.11-rc4-bk9-smaps/Documentation/filesystems/proc.txt   
 2005-02-20
 11:29:42.0 -0400
 @@ -133,6 +133,7 @@ Table 1-1: Process specific entries in /
   statm   Process memory status information
   status  Process status in human readable form
   wchan   If CONFIG_KALLSYMS is set, a pre-decoded wchan
 + smaps  Extension based on maps, presenting the rss size for each mapped file
  
 ..
 
  For example, to get the status information of a process, all you have to do 
 is
 diff -uprN linux-2.6.11-rc4-bk9/Makefile linux-2.6.11-rc4-bk9-smaps/Makefile
 --- linux-2.6.11-rc4-bk9/Makefile   2005-02-20 11:36:00.0 -0400
 +++ linux-2.6.11-rc4-bk9-smaps/Makefile 2005-02-20 11:31:44.0 -0400
 @@ -1,7 +1,7 @@
  VERSION = 2
  PATCHLEVEL = 6
  SUBLEVEL = 11
 -EXTRAVERSION = -rc4-bk9
 +EXTRAVERSION = -rc4-bk9-smaps
  NAME=Woozy Numbat
 
  # *DOCUMENTATION*
 diff -uprN linux-2.6.11-rc4-bk9/fs/proc/base.c
 linux-2.6.11-rc4-bk9-smaps/fs/proc/base.c
 --- linux-2.6.11-rc4-bk9/fs/proc/base.c 2005-02-20 11:35:22.0 -0400
 +++ linux-2.6.11-rc4-bk9-smaps/fs/proc/base.c   2005-02-20
 11:28:00.0 -0400
 @@ -11,6 +11,28 @@
   *  go into icache. We cache the reference to task_struct upon lookup too.
   *  Eventually it should become a filesystem in its own. We don't use the
   *  rest of procfs anymore.
 + *
 + *
 + *  Changelog:
 + *  17-Jan-2005
 + *  Allan Bezerra
 + *  Bruna Moreira [EMAIL PROTECTED]
 + *  Edjard Mota [EMAIL PROTECTED]
 + *  Ilias Biris [EMAIL PROTECTED]
 + *  Mauricio Lin [EMAIL PROTECTED]
 + *
 + *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
 + *
 + *  A new process specific entry (smaps) included in /proc. It shows the
 + *  size of rss for each memory area. The maps entry lacks information
 + *  about physical memory size (rss) for each mapped file, i.e.,
 + *  rss information for executables and library files.
 + *  This additional information is useful for any tools that need to know
 + *  about physical memory consumption for a process specific library.
 + *
 + *  Changelog:
 + *  21-Feb-2005
 + *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
   */
 
  #include asm/uaccess.h
 @@ -61,6 +83,7 @@ enum pid_directory_inos {
 PROC_TGID_MAPS,
 PROC_TGID_MOUNTS,
 PROC_TGID_WCHAN,
 +   PROC_TGID_SMAPS,
  #ifdef CONFIG_SCHEDSTATS
 PROC_TGID_SCHEDSTAT,
  #endif
 @@ -92,6 +115,7 @@ enum pid_directory_inos {
 PROC_TID_MAPS,
 PROC_TID_MOUNTS,
 PROC_TID_WCHAN,
 +   PROC_TID_SMAPS,
  #ifdef CONFIG_SCHEDSTATS
 PROC_TID_SCHEDSTAT,
  #endif
 @@ -134,6 +158,7 @@ static struct pid_entry tgid_base_stuff[
 E(PROC_TGID_ROOT,  root,S_IFLNK|S_IRWXUGO),
 E(PROC_TGID_EXE,   exe, S_IFLNK|S_IRWXUGO),
 E(PROC_TGID_MOUNTS,mounts,  S_IFREG|S_IRUGO),
 +   E(PROC_TGID_SMAPS, smaps,   S_IFREG|S_IRUGO),
  #ifdef CONFIG_SECURITY
 E(PROC_TGID_ATTR,  attr,S_IFDIR|S_IRUGO|S_IXUGO),
  #endif
 @@ -164,6 +189,7 @@ static struct pid_entry tid_base_stuff[]
 E(PROC_TID_ROOT,   root,S_IFLNK|S_IRWXUGO),
 E(PROC_TID_EXE,exe, S_IFLNK|S_IRWXUGO),
 E(PROC_TID_MOUNTS, mounts,  S_IFREG|S_IRUGO),
 +   E(PROC_TID_SMAPS,  smaps,   S_IFREG|S_IRUGO),
  #ifdef CONFIG_SECURITY
 E(PROC_TID_ATTR,   attr,S_IFDIR|S_IRUGO|S_IXUGO),
  #endif
 @@ -488,6 +514,25 @@ static struct file_operations proc_maps_
 .release= seq_release,
  };
 
 +extern struct seq_operations proc_pid_smaps_op;
 +static int smaps_open(struct inode *inode, struct file *file

Re: [PATCH] A new entry for /proc

2005-02-24 Thread Mauricio Lin
Hi Andrew,

But can i use jiffies to measure this kind of performance??? AFAIK, if
it is more efficient, then it is faster, right? How can I know how
fast it is? Any idea?

BR,

Mauricio Lin.


On Thu, 24 Feb 2005 01:09:47 -0800, Andrew Morton [EMAIL PROTECTED] wrote:
 Mauricio Lin [EMAIL PROTECTED] wrote:
 
  You said that the old smaps version is not efficient because the way
   it access each pte.
 
 Nick is talking about changing the kenrel so that it refcounts pagetable
 pages.  I'm not sure why.
 
 I assume that this means that each pte page's refcount will be incremented
 by one for each instantiated pte.  If so, then /proc/pid/smaps can become a
 lot more efficient.  Just add up the page refcounts on all the pte pages -
 no need to walk the ptes themselves.
 
 Maybe?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-02-22 Thread Mauricio Lin
Hi All,

Here goes the new smaps patch. As suggested by Hugh in another discussion, the 
 inefficient loop was removed and replaced by smaps_pgd_range,
smaps_pud_range, smaps_pmd and smaps_pte_range functions. I mantained
the old resident_mem_size function between comments just for anyone
who wants to verify it. BTW, we are using smaps to figure out which
shared libraries that have heavy physical memory comsumption.


diff -uprN linux-2.6.11-rc4-bk9/Documentation/filesystems/proc.txt
linux-2.6.11-rc4-bk9-smaps/Documentation/filesystems/proc.txt
--- linux-2.6.11-rc4-bk9/Documentation/filesystems/proc.txt 2005-02-20
11:35:13.0 -0400
+++ linux-2.6.11-rc4-bk9-smaps/Documentation/filesystems/proc.txt   
2005-02-20
11:29:42.0 -0400
@@ -133,6 +133,7 @@ Table 1-1: Process specific entries in /
  statm   Process memory status information  
  status  Process status in human readable form  
  wchan   If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ smaps  Extension based on maps, presenting the rss size for each mapped file
 ..
 
 For example, to get the status information of a process, all you have to do is
diff -uprN linux-2.6.11-rc4-bk9/Makefile linux-2.6.11-rc4-bk9-smaps/Makefile
--- linux-2.6.11-rc4-bk9/Makefile   2005-02-20 11:36:00.0 -0400
+++ linux-2.6.11-rc4-bk9-smaps/Makefile 2005-02-20 11:31:44.0 -0400
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 11
-EXTRAVERSION = -rc4-bk9
+EXTRAVERSION = -rc4-bk9-smaps
 NAME=Woozy Numbat
 
 # *DOCUMENTATION*
diff -uprN linux-2.6.11-rc4-bk9/fs/proc/base.c
linux-2.6.11-rc4-bk9-smaps/fs/proc/base.c
--- linux-2.6.11-rc4-bk9/fs/proc/base.c 2005-02-20 11:35:22.0 -0400
+++ linux-2.6.11-rc4-bk9-smaps/fs/proc/base.c   2005-02-20
11:28:00.0 -0400
@@ -11,6 +11,28 @@
  *  go into icache. We cache the reference to task_struct upon lookup too.
  *  Eventually it should become a filesystem in its own. We don't use the
  *  rest of procfs anymore.
+ *
+ *
+ *  Changelog:
+ *  17-Jan-2005
+ *  Allan Bezerra
+ *  Bruna Moreira <[EMAIL PROTECTED]>
+ *  Edjard Mota <[EMAIL PROTECTED]>
+ *  Ilias Biris <[EMAIL PROTECTED]>
+ *  Mauricio Lin <[EMAIL PROTECTED]>
+ *
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
+ *
+ *  A new process specific entry (smaps) included in /proc. It shows the
+ *  size of rss for each memory area. The maps entry lacks information
+ *  about physical memory size (rss) for each mapped file, i.e.,
+ *  rss information for executables and library files.
+ *  This additional information is useful for any tools that need to know
+ *  about physical memory consumption for a process specific library.
+ *  
+ *  Changelog:
+ *  21-Feb-2005
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT 
  */
 
 #include 
@@ -61,6 +83,7 @@ enum pid_directory_inos {
PROC_TGID_MAPS,
PROC_TGID_MOUNTS,
PROC_TGID_WCHAN,
+   PROC_TGID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TGID_SCHEDSTAT,
 #endif
@@ -92,6 +115,7 @@ enum pid_directory_inos {
PROC_TID_MAPS,
PROC_TID_MOUNTS,
PROC_TID_WCHAN,
+   PROC_TID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TID_SCHEDSTAT,
 #endif
@@ -134,6 +158,7 @@ static struct pid_entry tgid_base_stuff[
E(PROC_TGID_ROOT,  "root",S_IFLNK|S_IRWXUGO),
E(PROC_TGID_EXE,   "exe", S_IFLNK|S_IRWXUGO),
E(PROC_TGID_MOUNTS,"mounts",  S_IFREG|S_IRUGO),
+   E(PROC_TGID_SMAPS, "smaps",   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -164,6 +189,7 @@ static struct pid_entry tid_base_stuff[]
E(PROC_TID_ROOT,   "root",S_IFLNK|S_IRWXUGO),
E(PROC_TID_EXE,"exe", S_IFLNK|S_IRWXUGO),
E(PROC_TID_MOUNTS, "mounts",  S_IFREG|S_IRUGO),
+   E(PROC_TID_SMAPS,  "smaps",   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -488,6 +514,25 @@ static struct file_operations proc_maps_
.release= seq_release,
 };
 
+extern struct seq_operations proc_pid_smaps_op;
+static int smaps_open(struct inode *inode, struct file *file)
+{
+   struct task_struct *task = proc_task(inode);
+   int ret = seq_open(file, _pid_smaps_op);
+   if (!ret) {
+   struct seq_file *m = file->private_data;
+   m->private = task;
+   }
+   return ret;
+}
+
+static struct file_operations proc_smaps_operations = {
+   .open   = smaps_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 extern struct seq_operations mounts_op;
 static int mounts_op

Re: [PATCH] A new entry for /proc

2005-02-22 Thread Mauricio Lin
Hi All,

Here goes the new smaps patch. As suggested by Hugh in another discussion, the 
 inefficient loop was removed and replaced by smaps_pgd_range,
smaps_pud_range, smaps_pmd and smaps_pte_range functions. I mantained
the old resident_mem_size function between comments just for anyone
who wants to verify it. BTW, we are using smaps to figure out which
shared libraries that have heavy physical memory comsumption.


diff -uprN linux-2.6.11-rc4-bk9/Documentation/filesystems/proc.txt
linux-2.6.11-rc4-bk9-smaps/Documentation/filesystems/proc.txt
--- linux-2.6.11-rc4-bk9/Documentation/filesystems/proc.txt 2005-02-20
11:35:13.0 -0400
+++ linux-2.6.11-rc4-bk9-smaps/Documentation/filesystems/proc.txt   
2005-02-20
11:29:42.0 -0400
@@ -133,6 +133,7 @@ Table 1-1: Process specific entries in /
  statm   Process memory status information  
  status  Process status in human readable form  
  wchan   If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ smaps  Extension based on maps, presenting the rss size for each mapped file
 ..
 
 For example, to get the status information of a process, all you have to do is
diff -uprN linux-2.6.11-rc4-bk9/Makefile linux-2.6.11-rc4-bk9-smaps/Makefile
--- linux-2.6.11-rc4-bk9/Makefile   2005-02-20 11:36:00.0 -0400
+++ linux-2.6.11-rc4-bk9-smaps/Makefile 2005-02-20 11:31:44.0 -0400
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 11
-EXTRAVERSION = -rc4-bk9
+EXTRAVERSION = -rc4-bk9-smaps
 NAME=Woozy Numbat
 
 # *DOCUMENTATION*
diff -uprN linux-2.6.11-rc4-bk9/fs/proc/base.c
linux-2.6.11-rc4-bk9-smaps/fs/proc/base.c
--- linux-2.6.11-rc4-bk9/fs/proc/base.c 2005-02-20 11:35:22.0 -0400
+++ linux-2.6.11-rc4-bk9-smaps/fs/proc/base.c   2005-02-20
11:28:00.0 -0400
@@ -11,6 +11,28 @@
  *  go into icache. We cache the reference to task_struct upon lookup too.
  *  Eventually it should become a filesystem in its own. We don't use the
  *  rest of procfs anymore.
+ *
+ *
+ *  Changelog:
+ *  17-Jan-2005
+ *  Allan Bezerra
+ *  Bruna Moreira [EMAIL PROTECTED]
+ *  Edjard Mota [EMAIL PROTECTED]
+ *  Ilias Biris [EMAIL PROTECTED]
+ *  Mauricio Lin [EMAIL PROTECTED]
+ *
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
+ *
+ *  A new process specific entry (smaps) included in /proc. It shows the
+ *  size of rss for each memory area. The maps entry lacks information
+ *  about physical memory size (rss) for each mapped file, i.e.,
+ *  rss information for executables and library files.
+ *  This additional information is useful for any tools that need to know
+ *  about physical memory consumption for a process specific library.
+ *  
+ *  Changelog:
+ *  21-Feb-2005
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT 
  */
 
 #include asm/uaccess.h
@@ -61,6 +83,7 @@ enum pid_directory_inos {
PROC_TGID_MAPS,
PROC_TGID_MOUNTS,
PROC_TGID_WCHAN,
+   PROC_TGID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TGID_SCHEDSTAT,
 #endif
@@ -92,6 +115,7 @@ enum pid_directory_inos {
PROC_TID_MAPS,
PROC_TID_MOUNTS,
PROC_TID_WCHAN,
+   PROC_TID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TID_SCHEDSTAT,
 #endif
@@ -134,6 +158,7 @@ static struct pid_entry tgid_base_stuff[
E(PROC_TGID_ROOT,  root,S_IFLNK|S_IRWXUGO),
E(PROC_TGID_EXE,   exe, S_IFLNK|S_IRWXUGO),
E(PROC_TGID_MOUNTS,mounts,  S_IFREG|S_IRUGO),
+   E(PROC_TGID_SMAPS, smaps,   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -164,6 +189,7 @@ static struct pid_entry tid_base_stuff[]
E(PROC_TID_ROOT,   root,S_IFLNK|S_IRWXUGO),
E(PROC_TID_EXE,exe, S_IFLNK|S_IRWXUGO),
E(PROC_TID_MOUNTS, mounts,  S_IFREG|S_IRUGO),
+   E(PROC_TID_SMAPS,  smaps,   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -488,6 +514,25 @@ static struct file_operations proc_maps_
.release= seq_release,
 };
 
+extern struct seq_operations proc_pid_smaps_op;
+static int smaps_open(struct inode *inode, struct file *file)
+{
+   struct task_struct *task = proc_task(inode);
+   int ret = seq_open(file, proc_pid_smaps_op);
+   if (!ret) {
+   struct seq_file *m = file-private_data;
+   m-private = task;
+   }
+   return ret;
+}
+
+static struct file_operations proc_smaps_operations = {
+   .open   = smaps_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 extern struct seq_operations mounts_op;
 static int mounts_open(struct inode *inode, struct file *file)
 {
@@ -1447,6 +1492,10 @@ static struct dentry *proc_pident_lookup
case PROC_TGID_MOUNTS

Re: /proc/*/statm, exactly what does "shared" mean?

2005-02-16 Thread Mauricio Lin
Hi Hugh,

Thanks by your suggestion. I did not know that kernel 2.4.29 has
changed the statm implementation. As I can see the statm
implementation is different between 2.4 and 2.6.

Let me see if I can use the 2.4.29 statm idea to improve the smaps for
kernel 2.6.11-rc.

BR,

Mauricio Lin.

On Wed, 16 Feb 2005 12:00:55 + (GMT), Hugh Dickins <[EMAIL PROTECTED]> 
wrote:
> On Wed, 16 Feb 2005, Mauricio Lin wrote:
> > Well, for each vma it is checked how many pages are mapped to rss. So
> > I have to check per page if it is allocated in physical memory. I know
> > that this is a heavy function, but do you have any suggestion to
> > improve this?  What do you mean "needs refactoring into pgd_range,
> > pud_range, pmd_range, pte_range levels like 2.4's statm"? Could you
> > give more details, please?
> 
> Just look at, say, linux-2.4.29/fs/proc/array.c proc_pid_statm:
> which calls statm_pgd_range which calls statm_pmd_range which
> calls statm_pte_range which scans along the array of ptes doing
> the pte examination you're doing.  There are plenty of examples
> in 2.6.11-rc mm/memory.c of how to do it with pud level too.
> 
> Whereas your way starts at the top and descends the tree each time
> for every leaf, repeatedly mapping and unmapping the page table if
> that pagetable is in highmem.  You took follow_page as your starting
> point, which is good for a single pte, but inefficient for many.
> 
> Your function(s) will still be heavyweight, but somewhat faster.
> 
> Hugh
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: /proc/*/statm, exactly what does "shared" mean?

2005-02-16 Thread Mauricio Lin
Hi all,

Sorry for responding this email so late. I was busy with my trip.

On Sat, 12 Feb 2005 15:42:15 + (GMT), Hugh Dickins <[EMAIL PROTECTED]> 
wrote:
> On Sat, 12 Feb 2005, Richard F. Rebel wrote:
> >
> > That said, many mod_perl users are *VERY* interested in being able to
> > detect and observe how "shared" our forked children are.  Shared meaning
> > private pages shared with children (copy on write).  Is it even possible
> > to do this in 2.6 kernels?  If so, any pointers would be very helpful.
> 
> Not in any of the vanilla kernels.
> 
> Mauricio has a /proc//smaps patch, in which he returns to looking
> at every pte slot of every vma of the process as /proc//statm did
> in 2.4.  I suggest you ask him offline for his latest version (the last
> I saw did not include support for 2.6.11's pud level; 
I put the pud level on the last patch I sent to the linux-kernel list
as suggested by Marcelo Tosatti.

> and looped in an
> inefficient way, repeatedly locating, mapping and unmapping the page
> table for each pte slot - needs refactoring into pgd_range, pud_range,
> pmd_range, pte_range levels like 2.4's statm).
Well, for each vma it is checked how many pages are mapped to rss. So
I have to check per page if it is allocated in physical memory. I know
that this is a heavy function, but do you have any suggestion to
improve this?  What do you mean "needs refactoring into pgd_range,
pud_range, pmd_range, pte_range levels like 2.4's statm"? Could you
give more details, please?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: /proc/*/statm, exactly what does shared mean?

2005-02-16 Thread Mauricio Lin
Hi all,

Sorry for responding this email so late. I was busy with my trip.

On Sat, 12 Feb 2005 15:42:15 + (GMT), Hugh Dickins [EMAIL PROTECTED] 
wrote:
 On Sat, 12 Feb 2005, Richard F. Rebel wrote:
 
  That said, many mod_perl users are *VERY* interested in being able to
  detect and observe how shared our forked children are.  Shared meaning
  private pages shared with children (copy on write).  Is it even possible
  to do this in 2.6 kernels?  If so, any pointers would be very helpful.
 
 Not in any of the vanilla kernels.
 
 Mauricio has a /proc/pid/smaps patch, in which he returns to looking
 at every pte slot of every vma of the process as /proc/pid/statm did
 in 2.4.  I suggest you ask him offline for his latest version (the last
 I saw did not include support for 2.6.11's pud level; 
I put the pud level on the last patch I sent to the linux-kernel list
as suggested by Marcelo Tosatti.

 and looped in an
 inefficient way, repeatedly locating, mapping and unmapping the page
 table for each pte slot - needs refactoring into pgd_range, pud_range,
 pmd_range, pte_range levels like 2.4's statm).
Well, for each vma it is checked how many pages are mapped to rss. So
I have to check per page if it is allocated in physical memory. I know
that this is a heavy function, but do you have any suggestion to
improve this?  What do you mean needs refactoring into pgd_range,
pud_range, pmd_range, pte_range levels like 2.4's statm? Could you
give more details, please?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: /proc/*/statm, exactly what does shared mean?

2005-02-16 Thread Mauricio Lin
Hi Hugh,

Thanks by your suggestion. I did not know that kernel 2.4.29 has
changed the statm implementation. As I can see the statm
implementation is different between 2.4 and 2.6.

Let me see if I can use the 2.4.29 statm idea to improve the smaps for
kernel 2.6.11-rc.

BR,

Mauricio Lin.

On Wed, 16 Feb 2005 12:00:55 + (GMT), Hugh Dickins [EMAIL PROTECTED] 
wrote:
 On Wed, 16 Feb 2005, Mauricio Lin wrote:
  Well, for each vma it is checked how many pages are mapped to rss. So
  I have to check per page if it is allocated in physical memory. I know
  that this is a heavy function, but do you have any suggestion to
  improve this?  What do you mean needs refactoring into pgd_range,
  pud_range, pmd_range, pte_range levels like 2.4's statm? Could you
  give more details, please?
 
 Just look at, say, linux-2.4.29/fs/proc/array.c proc_pid_statm:
 which calls statm_pgd_range which calls statm_pmd_range which
 calls statm_pte_range which scans along the array of ptes doing
 the pte examination you're doing.  There are plenty of examples
 in 2.6.11-rc mm/memory.c of how to do it with pud level too.
 
 Whereas your way starts at the top and descends the tree each time
 for every leaf, repeatedly mapping and unmapping the page table if
 that pagetable is in highmem.  You took follow_page as your starting
 point, which is good for a single pte, but inefficient for many.
 
 Your function(s) will still be heavyweight, but somewhat faster.
 
 Hugh

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-28 Thread Mauricio Lin
Hi Andrea,


On Fri, 28 Jan 2005 09:58:24 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote:
> Hi Andrea,
> 
> On Thu, 27 Jan 2005 23:11:29 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> 
> wrote:
> > On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote:
> > > Hi Andrea,
> > >
> > > On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> 
> > > wrote:
> > > > On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
> > > > > Sometimes the first application to be killed is XFree. AFAIK the
> > > >
> > > > This makes more sense now. You need somebody trapping sigterm in order
> > > > to lockup and X sure traps it to recover the text console.
> > > >
> > > > Can you replace this:
> > > >
> > > > if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
> > > > force_sig(SIGTERM, p);
> > > > } else {
> > > > force_sig(SIGKILL, p);
> > > > }
> > > >
> > > > with this?
> > > >
> > > > force_sig(SIGKILL, p);
> > > >
> > > > in mm/oom_kill.c.
> > >
> > > Nice. Your suggestion made the error goes away.
> > >
> > > We are still testing in order to compare between your OOM Killer and
> > > Original OOM Killer.
> >
> > Ok, thanks for the confirmation. So my theory was right.
> >
> > Basically we've to make this patch, now that you already edited the
> > code, can you diff and send a patch that will be the 6/5 in the serie?
> 
> OK. I will send the patch.

As you know, Andrew generated the patch. Here goes some test results
about your OOM Killer and the Original OOm Killer. We accomplished 10
experiments for each OOM Killer and below are average values.

"Invocations" is the number of times that out_of_memory function is
called. "Selections" is the number of times that select_bad_process
function is called and "Killed" is the number of killed process.

Original OOM Killer
Invocations average = 51620/10 = 5162
Selections average = 30/10 = 3
Killed average = 38/10 = 3.8

Andrea OOM Killer
Invocations average = 213/10 = 21.3
Selections average = 213/10 = 21.3
Killed average = 52/10 = 5.2

As you can see the number of invocations reduced significantly using
your OOM Killer.

I did not know about this problem when I was moving the original
ranking algorithm to userland. As Thomaz mentioned: invocation
madness, reentrancy problems and those strange timers and counter as
now, since, last, lastkill and count. I guess that now i can put some
OOM Killer stuffs in userland in a safer manner with those problems
solved, right?

BTW, will your OOM Killer be included in the kernel tree?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-28 Thread Mauricio Lin
Hi Andrea,

On Thu, 27 Jan 2005 23:11:29 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote:
> > Hi Andrea,
> >
> > On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> 
> > wrote:
> > > On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
> > > > Sometimes the first application to be killed is XFree. AFAIK the
> > >
> > > This makes more sense now. You need somebody trapping sigterm in order
> > > to lockup and X sure traps it to recover the text console.
> > >
> > > Can you replace this:
> > >
> > > if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
> > > force_sig(SIGTERM, p);
> > > } else {
> > > force_sig(SIGKILL, p);
> > > }
> > >
> > > with this?
> > >
> > > force_sig(SIGKILL, p);
> > >
> > > in mm/oom_kill.c.
> >
> > Nice. Your suggestion made the error goes away.
> >
> > We are still testing in order to compare between your OOM Killer and
> > Original OOM Killer.
> 
> Ok, thanks for the confirmation. So my theory was right.
> 
> Basically we've to make this patch, now that you already edited the
> code, can you diff and send a patch that will be the 6/5 in the serie?

OK. I will send the patch.

> (then after fixing this last very longstanding [now deadlock prone too]
> bug, we can think how to make at a 7/5 that will wait a few seconds
> after sending a sigterm, to fallback into a sigkill, that shouldn't be
> difficult, but the above 6/5 will already make the code correct)
> 
> Note, if you add swap it'll workaround it too since then the memhog will
> be allowed to grow to a larger rss than X. With 128m of ram and no swap,
> X is one of the biggest with xshm involved from some client app
> allocating lots of pictures. I could never notice since I always tested
> it either with swap or on higher mem systems and my test box runs
> with an idle X too which isn't that big ;).

Well, we like to reduce the memory resources, because we also think
about OOM Killer in small devices with few resources.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-28 Thread Mauricio Lin
Hi Andrea,

On Thu, 27 Jan 2005 23:11:29 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote:
 On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote:
  Hi Andrea,
 
  On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] 
  wrote:
   On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
Sometimes the first application to be killed is XFree. AFAIK the
  
   This makes more sense now. You need somebody trapping sigterm in order
   to lockup and X sure traps it to recover the text console.
  
   Can you replace this:
  
   if (cap_t(p-cap_effective)  CAP_TO_MASK(CAP_SYS_RAWIO)) {
   force_sig(SIGTERM, p);
   } else {
   force_sig(SIGKILL, p);
   }
  
   with this?
  
   force_sig(SIGKILL, p);
  
   in mm/oom_kill.c.
 
  Nice. Your suggestion made the error goes away.
 
  We are still testing in order to compare between your OOM Killer and
  Original OOM Killer.
 
 Ok, thanks for the confirmation. So my theory was right.
 
 Basically we've to make this patch, now that you already edited the
 code, can you diff and send a patch that will be the 6/5 in the serie?

OK. I will send the patch.

 (then after fixing this last very longstanding [now deadlock prone too]
 bug, we can think how to make at a 7/5 that will wait a few seconds
 after sending a sigterm, to fallback into a sigkill, that shouldn't be
 difficult, but the above 6/5 will already make the code correct)
 
 Note, if you add swap it'll workaround it too since then the memhog will
 be allowed to grow to a larger rss than X. With 128m of ram and no swap,
 X is one of the biggest with xshm involved from some client app
 allocating lots of pictures. I could never notice since I always tested
 it either with swap or on higher mem systems and my test box runs
 with an idle X too which isn't that big ;).

Well, we like to reduce the memory resources, because we also think
about OOM Killer in small devices with few resources.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-28 Thread Mauricio Lin
Hi Andrea,


On Fri, 28 Jan 2005 09:58:24 -0400, Mauricio Lin [EMAIL PROTECTED] wrote:
 Hi Andrea,
 
 On Thu, 27 Jan 2005 23:11:29 +0100, Andrea Arcangeli [EMAIL PROTECTED] 
 wrote:
  On Thu, Jan 27, 2005 at 02:54:13PM -0400, Mauricio Lin wrote:
   Hi Andrea,
  
   On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] 
   wrote:
On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
 Sometimes the first application to be killed is XFree. AFAIK the
   
This makes more sense now. You need somebody trapping sigterm in order
to lockup and X sure traps it to recover the text console.
   
Can you replace this:
   
if (cap_t(p-cap_effective)  CAP_TO_MASK(CAP_SYS_RAWIO)) {
force_sig(SIGTERM, p);
} else {
force_sig(SIGKILL, p);
}
   
with this?
   
force_sig(SIGKILL, p);
   
in mm/oom_kill.c.
  
   Nice. Your suggestion made the error goes away.
  
   We are still testing in order to compare between your OOM Killer and
   Original OOM Killer.
 
  Ok, thanks for the confirmation. So my theory was right.
 
  Basically we've to make this patch, now that you already edited the
  code, can you diff and send a patch that will be the 6/5 in the serie?
 
 OK. I will send the patch.

As you know, Andrew generated the patch. Here goes some test results
about your OOM Killer and the Original OOm Killer. We accomplished 10
experiments for each OOM Killer and below are average values.

Invocations is the number of times that out_of_memory function is
called. Selections is the number of times that select_bad_process
function is called and Killed is the number of killed process.

Original OOM Killer
Invocations average = 51620/10 = 5162
Selections average = 30/10 = 3
Killed average = 38/10 = 3.8

Andrea OOM Killer
Invocations average = 213/10 = 21.3
Selections average = 213/10 = 21.3
Killed average = 52/10 = 5.2

As you can see the number of invocations reduced significantly using
your OOM Killer.

I did not know about this problem when I was moving the original
ranking algorithm to userland. As Thomaz mentioned: invocation
madness, reentrancy problems and those strange timers and counter as
now, since, last, lastkill and count. I guess that now i can put some
OOM Killer stuffs in userland in a safer manner with those problems
solved, right?

BTW, will your OOM Killer be included in the kernel tree?

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Mauricio Lin
Hi Andrea,

On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
> > Sometimes the first application to be killed is XFree. AFAIK the
> 
> This makes more sense now. You need somebody trapping sigterm in order
> to lockup and X sure traps it to recover the text console.
> 
> Can you replace this:
> 
> if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
> force_sig(SIGTERM, p);
> } else {
> force_sig(SIGKILL, p);
> }
> 
> with this?
> 
> force_sig(SIGKILL, p);
> 
> in mm/oom_kill.c.

Nice. Your suggestion made the error goes away.

We are still testing in order to compare between your OOM Killer and
Original OOM Killer.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-27 Thread Mauricio Lin
Hi Andrea,

On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote:
 On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
  Sometimes the first application to be killed is XFree. AFAIK the
 
 This makes more sense now. You need somebody trapping sigterm in order
 to lockup and X sure traps it to recover the text console.
 
 Can you replace this:
 
 if (cap_t(p-cap_effective)  CAP_TO_MASK(CAP_SYS_RAWIO)) {
 force_sig(SIGTERM, p);
 } else {
 force_sig(SIGKILL, p);
 }
 
 with this?
 
 force_sig(SIGKILL, p);
 
 in mm/oom_kill.c.

Nice. Your suggestion made the error goes away.

We are still testing in order to compare between your OOM Killer and
Original OOM Killer.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-26 Thread Mauricio Lin
Hi Andrea,

On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
> > Sometimes the first application to be killed is XFree. AFAIK the
> 
> This makes more sense now. You need somebody trapping sigterm in order
> to lockup and X sure traps it to recover the text console.
> 
> Can you replace this:
> 
> if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) {
> force_sig(SIGTERM, p);
> } else {
> force_sig(SIGKILL, p);
> }
> 
> with this?

OK, let me test it. If I get some news, I will let you know.

> 
>         force_sig(SIGKILL, p);
> 
> in mm/oom_kill.c.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-26 Thread Mauricio Lin
Hi Andrea,

On Wed, 26 Jan 2005 01:49:01 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote:
 On Tue, Jan 25, 2005 at 08:11:19PM -0400, Mauricio Lin wrote:
  Sometimes the first application to be killed is XFree. AFAIK the
 
 This makes more sense now. You need somebody trapping sigterm in order
 to lockup and X sure traps it to recover the text console.
 
 Can you replace this:
 
 if (cap_t(p-cap_effective)  CAP_TO_MASK(CAP_SYS_RAWIO)) {
 force_sig(SIGTERM, p);
 } else {
 force_sig(SIGKILL, p);
 }
 
 with this?

OK, let me test it. If I get some news, I will let you know.

 
 force_sig(SIGKILL, p);
 
 in mm/oom_kill.c.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Mauricio Lin
Hi Thomaz,

On Tue, 25 Jan 2005 22:39:39 +0100, Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> On Tue, 2005-01-25 at 17:13 -0400, Mauricio Lin wrote:
> > Hi Andrea,
> >
> > Your OOM Killer patch was tested and a strange behaviour was found.
> > Basically as normal user we started some applications as openoffice,
> > mozilla and emacs.
> > And as a root (in another tty) we started a simple program that uses
> > malloc in a forever loop as below:
> >
> > int main (void)
> > {
> >   int * mem;
> >   for (;;)
> > mem = (int *) malloc(sizeof(int));
> >   return 0;
> > }
> >
> >
> > Using the original OOM Killer, malloc is the first killed application
> > and the sytem is restored in a useful state. After applying your patch
> > and accomplish the same experiment, the OOM Killer it does not kill
> > malloc program and it enters in a kind of forever loop as below:
> >
> > 1) out_of_memory is invoked;
> > 2) select_bad_process is invoked;
> 
> Which process is selected ?
Sometimes the first application to be killed is XFree. AFAIK the
malloc is never killed, because the OOM Killer does not stop to do its
work. Usually we are not able to check the kernel log file after
rebooting the system. Because nothing was written there (perhaps
syslogd or klogd were killed during OOM). But I can see the printk
messages on the screen during OOM Killer action. This does not happen
with original OOM Killer.

I put some printk in order to trace the OOM Killer and IMHO what is going is:

out_of_memory function is invoked and after that the
select_bad_process is also invoked.
So its starts to point each task. But during the do_each_thread /
while each_thread loop the
condition:

if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p->flags &
PF_EXITING)) &&
!(p->flags & PF_DEAD))
   return ERR_PTR(-1UL);

is true and it leaves from select_bad_process function because of the
return statement.

So the running code return from the point that select_bad_process was
called, i.e., in the out_of_memory function. The condition statement
in out_of_memory function:

if (PTR_ERR(p) == -1UL)
goto out;

is also true so it goes to "out" label and leaves from the
out_of_memory function. But because of the OOM state the out_of_memory
function is invoked again and after that the select_bad_process is
also invoked again. And during the do_each_thread / while each_thread
loop the same condition as mentioned above is true again. So it leaves
from select_bad_process function because of the return statement and
goes to "out" label and
leaves from the out_of_memory function again. This behaviour is
repeated continuously
during a long time until I stop waiting and reboot the system using my
own finger.

> Can you please show the kernel messages ?

OK. We will try to reach a situation that the printk messages can be
written entirely in the log file and show you the kernel messages. But
as I said: usually the printks messages are not written in the log
file using Andrea's patch. But using the original OOM Killer we can
see the messages in the log file. The syslog.conf file is the same for
both OOM Killer(Andrea and Original). Do you have any idea what is
happening to log file?

If you do not mind, you can accomplish the same test case as I
mentioned on my last email. I would like to know if this problem
happens to others people as well.

We tested on the laptop and desktop machines with 128MB of RAM and
swap space disabled.


BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Mauricio Lin
Hi Andrea,

Your OOM Killer patch was tested and a strange behaviour was found.
Basically as normal user we started some applications as openoffice,
mozilla and emacs.
And as a root (in another tty) we started a simple program that uses
malloc in a forever loop as below:

int main (void)
{
  int * mem;
  for (;;)
mem = (int *) malloc(sizeof(int));
  return 0;
}


Using the original OOM Killer, malloc is the first killed application
and the sytem is restored in a useful state. After applying your patch
and accomplish the same experiment, the OOM Killer it does not kill
malloc program and it enters in a kind of forever loop as below:

1) out_of_memory is invoked;
2) select_bad_process is invoked;
3) the following condition is fullfied;
if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p->flags &
PF_EXITING)) &&
!(p->flags & PF_DEAD))
return ERR_PTR(-1UL);
4) step 1, 2 ,3 above is executed again;

This loop (step 1 until step 4) lasts during a long time (and nothing
is killed) until I give up and reboot the system after waiting for
some minutes.

Any comments? What do you think about our test case? Could you
accomplish the same test case using malloc program as root and other
graphical applications as normal user?

Let me know about your ideas.

BR,

Mauricio Lin.

On Sat, 22 Jan 2005 04:32:19 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> On Fri, Jan 21, 2005 at 05:45:13PM -0400, Mauricio Lin wrote:
> > Hi Andrew,
> >
> > I have another question. You included an oom_adj entry in /proc for
> > each process. This was the approach you used in order to allow someone
> > or something to interfere the ranking algorithm from userland, right?
> > So if i have an another ranking algorithm in user space, I can use it
> > to complement the kernel decision as necessary. Was it your idea?
> 
> Yes, you should use your userspace algorithm to tune the oom killer via
> the oom_adj and you can check the effect of your changes with oom_score.
> I posted a one liner ugly script to do that a few days ago on l-k.
> 
> The oom_adj has this effect on the badness() code:
> 
> /*
>  * Adjust the score by oomkilladj.
>  */
> if (p->oomkilladj) {
> if (p->oomkilladj > 0)
> points <<= p->oomkilladj;
> else
> points >>= -(p->oomkilladj);
> }
> 
> The biggest the points become, the more likely the task will be choosen
> by the oom killer.
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Mauricio Lin
Hi Andrea,

Your OOM Killer patch was tested and a strange behaviour was found.
Basically as normal user we started some applications as openoffice,
mozilla and emacs.
And as a root (in another tty) we started a simple program that uses
malloc in a forever loop as below:

int main (void)
{
  int * mem;
  for (;;)
mem = (int *) malloc(sizeof(int));
  return 0;
}


Using the original OOM Killer, malloc is the first killed application
and the sytem is restored in a useful state. After applying your patch
and accomplish the same experiment, the OOM Killer it does not kill
malloc program and it enters in a kind of forever loop as below:

1) out_of_memory is invoked;
2) select_bad_process is invoked;
3) the following condition is fullfied;
if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p-flags 
PF_EXITING)) 
!(p-flags  PF_DEAD))
return ERR_PTR(-1UL);
4) step 1, 2 ,3 above is executed again;

This loop (step 1 until step 4) lasts during a long time (and nothing
is killed) until I give up and reboot the system after waiting for
some minutes.

Any comments? What do you think about our test case? Could you
accomplish the same test case using malloc program as root and other
graphical applications as normal user?

Let me know about your ideas.

BR,

Mauricio Lin.

On Sat, 22 Jan 2005 04:32:19 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote:
 On Fri, Jan 21, 2005 at 05:45:13PM -0400, Mauricio Lin wrote:
  Hi Andrew,
 
  I have another question. You included an oom_adj entry in /proc for
  each process. This was the approach you used in order to allow someone
  or something to interfere the ranking algorithm from userland, right?
  So if i have an another ranking algorithm in user space, I can use it
  to complement the kernel decision as necessary. Was it your idea?
 
 Yes, you should use your userspace algorithm to tune the oom killer via
 the oom_adj and you can check the effect of your changes with oom_score.
 I posted a one liner ugly script to do that a few days ago on l-k.
 
 The oom_adj has this effect on the badness() code:
 
 /*
  * Adjust the score by oomkilladj.
  */
 if (p-oomkilladj) {
 if (p-oomkilladj  0)
 points = p-oomkilladj;
 else
 points = -(p-oomkilladj);
 }
 
 The biggest the points become, the more likely the task will be choosen
 by the oom killer.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-25 Thread Mauricio Lin
Hi Thomaz,

On Tue, 25 Jan 2005 22:39:39 +0100, Thomas Gleixner [EMAIL PROTECTED] wrote:
 On Tue, 2005-01-25 at 17:13 -0400, Mauricio Lin wrote:
  Hi Andrea,
 
  Your OOM Killer patch was tested and a strange behaviour was found.
  Basically as normal user we started some applications as openoffice,
  mozilla and emacs.
  And as a root (in another tty) we started a simple program that uses
  malloc in a forever loop as below:
 
  int main (void)
  {
int * mem;
for (;;)
  mem = (int *) malloc(sizeof(int));
return 0;
  }
 
 
  Using the original OOM Killer, malloc is the first killed application
  and the sytem is restored in a useful state. After applying your patch
  and accomplish the same experiment, the OOM Killer it does not kill
  malloc program and it enters in a kind of forever loop as below:
 
  1) out_of_memory is invoked;
  2) select_bad_process is invoked;
 
 Which process is selected ?
Sometimes the first application to be killed is XFree. AFAIK the
malloc is never killed, because the OOM Killer does not stop to do its
work. Usually we are not able to check the kernel log file after
rebooting the system. Because nothing was written there (perhaps
syslogd or klogd were killed during OOM). But I can see the printk
messages on the screen during OOM Killer action. This does not happen
with original OOM Killer.

I put some printk in order to trace the OOM Killer and IMHO what is going is:

out_of_memory function is invoked and after that the
select_bad_process is also invoked.
So its starts to point each task. But during the do_each_thread /
while each_thread loop the
condition:

if ((unlikely(test_tsk_thread_flag(p, TIF_MEMDIE)) || (p-flags 
PF_EXITING)) 
!(p-flags  PF_DEAD))
   return ERR_PTR(-1UL);

is true and it leaves from select_bad_process function because of the
return statement.

So the running code return from the point that select_bad_process was
called, i.e., in the out_of_memory function. The condition statement
in out_of_memory function:

if (PTR_ERR(p) == -1UL)
goto out;

is also true so it goes to out label and leaves from the
out_of_memory function. But because of the OOM state the out_of_memory
function is invoked again and after that the select_bad_process is
also invoked again. And during the do_each_thread / while each_thread
loop the same condition as mentioned above is true again. So it leaves
from select_bad_process function because of the return statement and
goes to out label and
leaves from the out_of_memory function again. This behaviour is
repeated continuously
during a long time until I stop waiting and reboot the system using my
own finger.

 Can you please show the kernel messages ?

OK. We will try to reach a situation that the printk messages can be
written entirely in the log file and show you the kernel messages. But
as I said: usually the printks messages are not written in the log
file using Andrea's patch. But using the original OOM Killer we can
see the messages in the log file. The syslog.conf file is the same for
both OOM Killer(Andrea and Original). Do you have any idea what is
happening to log file?

If you do not mind, you can accomplish the same test case as I
mentioned on my last email. I would like to know if this problem
happens to others people as well.

We tested on the laptop and desktop machines with 128MB of RAM and
swap space disabled.


BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-01-24 Thread Mauricio Lin
Hi Tosatti and Andrew,

On Mon, 17 Jan 2005 15:30:23 -0200, Marcelo Tosatti
<[EMAIL PROTECTED]> wrote:
> 
> Hi Mauricio,
> 
> You want to update your patch to handle the new 4level pagetables which 
> introduces
> a new indirection table: the PUD.

Here goes the smaps patch updated for kernel 2.6.11-rc2-bk2 with PUD included.


diff -uprN linux-2.6.11-rc2/Documentation/filesystems/proc.txt
linux-2.6.11-rc2-smaps/Documentation/filesystems/proc.txt
--- linux-2.6.11-rc2/Documentation/filesystems/proc.txt 2004-12-24
17:34:29.0 -0400
+++ linux-2.6.11-rc2-smaps/Documentation/filesystems/proc.txt   2005-01-24
17:15:03.0 -0400
@@ -133,6 +133,7 @@ Table 1-1: Process specific entries in /
  statm   Process memory status information  
  status  Process status in human readable form  
  wchan   If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ smaps  Extension of maps, presenting the rss size for each mapped file
 ..
 
 For example, to get the status information of a process, all you have to do is
diff -uprN linux-2.6.11-rc2/Makefile linux-2.6.11-rc2-smaps/Makefile
--- linux-2.6.11-rc2/Makefile   2005-01-24 17:42:02.0 -0400
+++ linux-2.6.11-rc2-smaps/Makefile 2005-01-24 11:57:42.0 -0400
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 11
-EXTRAVERSION = -rc2-bk2
+EXTRAVERSION = -rc2-bk2-smaps
 NAME=Woozy Numbat
 
 # *DOCUMENTATION*
diff -uprN linux-2.6.11-rc2/fs/proc/base.c linux-2.6.11-rc2-smaps/fs/proc/base.c
--- linux-2.6.11-rc2/fs/proc/base.c 2005-01-24 17:41:51.0 -0400
+++ linux-2.6.11-rc2-smaps/fs/proc/base.c   2005-01-24 17:02:37.0 
-0400
@@ -11,6 +11,24 @@
  *  go into icache. We cache the reference to task_struct upon lookup too.
  *  Eventually it should become a filesystem in its own. We don't use the
  *  rest of procfs anymore.
+ *
+ *
+ *  Changelog:
+ *  24-Jan-2005
+ *  Allan Bezerra
+ *  Bruna Moreira <[EMAIL PROTECTED]>
+ *  Edjard Mota <[EMAIL PROTECTED]>
+ *  Ilias Biris <[EMAIL PROTECTED]>
+ *  Mauricio Lin <[EMAIL PROTECTED]>
+ *
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
+ *
+ *  A new process specific entry (smaps) included in /proc. It shows the
+ *  size of rss for each memory area. The maps entry lacks information
+ *  about physical memory size (rss) for each mapped file, i.e.,
+ *  rss information for executables and library files.
+ *  This additional information is useful for any tools that need to know
+ *  about physical memory consumption for a process specific library.
  */
 
 #include 
@@ -61,6 +79,7 @@ enum pid_directory_inos {
PROC_TGID_MAPS,
PROC_TGID_MOUNTS,
PROC_TGID_WCHAN,
+   PROC_TGID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TGID_SCHEDSTAT,
 #endif
@@ -87,6 +106,7 @@ enum pid_directory_inos {
PROC_TID_MAPS,
PROC_TID_MOUNTS,
PROC_TID_WCHAN,
+   PROC_TID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TID_SCHEDSTAT,
 #endif
@@ -124,6 +144,7 @@ static struct pid_entry tgid_base_stuff[
E(PROC_TGID_ROOT,  "root",S_IFLNK|S_IRWXUGO),
E(PROC_TGID_EXE,   "exe", S_IFLNK|S_IRWXUGO),
E(PROC_TGID_MOUNTS,"mounts",  S_IFREG|S_IRUGO),
+   E(PROC_TGID_SMAPS, "smaps",   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -149,6 +170,7 @@ static struct pid_entry tid_base_stuff[]
E(PROC_TID_ROOT,   "root",S_IFLNK|S_IRWXUGO),
E(PROC_TID_EXE,"exe", S_IFLNK|S_IRWXUGO),
E(PROC_TID_MOUNTS, "mounts",  S_IFREG|S_IRUGO),
+   E(PROC_TID_SMAPS,  "smaps",   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -456,6 +478,26 @@ static struct file_operations proc_maps_
.release= seq_release,
 };
 
+extern struct seq_operations proc_pid_smaps_op;
+static int smaps_open(struct inode *inode, struct file *file)
+{
+   struct task_struct *task = proc_task(inode);
+   int ret = seq_open(file, _pid_smaps_op);
+   if (!ret) {
+   struct seq_file *m = file->private_data;
+   m->private = task;
+   }
+   return ret;
+}
+
+static struct file_operations proc_smaps_operations = {
+   .open   = smaps_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
+
 extern struct seq_operations mounts_op;
 static int mounts_open(struct inode *inode, struct file *file)
 {
@@ -1300,6 +1342,10 @@ static struct dentry *proc_pident_lookup
case PROC_TGID_MOUNTS:
inode->i_fop = _mounts_operations;
 

Re: [PATCH] A new entry for /proc

2005-01-24 Thread Mauricio Lin
Hi Tosatti and Andrew,

On Mon, 17 Jan 2005 15:30:23 -0200, Marcelo Tosatti
[EMAIL PROTECTED] wrote:
 
 Hi Mauricio,
 
 You want to update your patch to handle the new 4level pagetables which 
 introduces
 a new indirection table: the PUD.

Here goes the smaps patch updated for kernel 2.6.11-rc2-bk2 with PUD included.


diff -uprN linux-2.6.11-rc2/Documentation/filesystems/proc.txt
linux-2.6.11-rc2-smaps/Documentation/filesystems/proc.txt
--- linux-2.6.11-rc2/Documentation/filesystems/proc.txt 2004-12-24
17:34:29.0 -0400
+++ linux-2.6.11-rc2-smaps/Documentation/filesystems/proc.txt   2005-01-24
17:15:03.0 -0400
@@ -133,6 +133,7 @@ Table 1-1: Process specific entries in /
  statm   Process memory status information  
  status  Process status in human readable form  
  wchan   If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ smaps  Extension of maps, presenting the rss size for each mapped file
 ..
 
 For example, to get the status information of a process, all you have to do is
diff -uprN linux-2.6.11-rc2/Makefile linux-2.6.11-rc2-smaps/Makefile
--- linux-2.6.11-rc2/Makefile   2005-01-24 17:42:02.0 -0400
+++ linux-2.6.11-rc2-smaps/Makefile 2005-01-24 11:57:42.0 -0400
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 11
-EXTRAVERSION = -rc2-bk2
+EXTRAVERSION = -rc2-bk2-smaps
 NAME=Woozy Numbat
 
 # *DOCUMENTATION*
diff -uprN linux-2.6.11-rc2/fs/proc/base.c linux-2.6.11-rc2-smaps/fs/proc/base.c
--- linux-2.6.11-rc2/fs/proc/base.c 2005-01-24 17:41:51.0 -0400
+++ linux-2.6.11-rc2-smaps/fs/proc/base.c   2005-01-24 17:02:37.0 
-0400
@@ -11,6 +11,24 @@
  *  go into icache. We cache the reference to task_struct upon lookup too.
  *  Eventually it should become a filesystem in its own. We don't use the
  *  rest of procfs anymore.
+ *
+ *
+ *  Changelog:
+ *  24-Jan-2005
+ *  Allan Bezerra
+ *  Bruna Moreira [EMAIL PROTECTED]
+ *  Edjard Mota [EMAIL PROTECTED]
+ *  Ilias Biris [EMAIL PROTECTED]
+ *  Mauricio Lin [EMAIL PROTECTED]
+ *
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
+ *
+ *  A new process specific entry (smaps) included in /proc. It shows the
+ *  size of rss for each memory area. The maps entry lacks information
+ *  about physical memory size (rss) for each mapped file, i.e.,
+ *  rss information for executables and library files.
+ *  This additional information is useful for any tools that need to know
+ *  about physical memory consumption for a process specific library.
  */
 
 #include asm/uaccess.h
@@ -61,6 +79,7 @@ enum pid_directory_inos {
PROC_TGID_MAPS,
PROC_TGID_MOUNTS,
PROC_TGID_WCHAN,
+   PROC_TGID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TGID_SCHEDSTAT,
 #endif
@@ -87,6 +106,7 @@ enum pid_directory_inos {
PROC_TID_MAPS,
PROC_TID_MOUNTS,
PROC_TID_WCHAN,
+   PROC_TID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TID_SCHEDSTAT,
 #endif
@@ -124,6 +144,7 @@ static struct pid_entry tgid_base_stuff[
E(PROC_TGID_ROOT,  root,S_IFLNK|S_IRWXUGO),
E(PROC_TGID_EXE,   exe, S_IFLNK|S_IRWXUGO),
E(PROC_TGID_MOUNTS,mounts,  S_IFREG|S_IRUGO),
+   E(PROC_TGID_SMAPS, smaps,   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -149,6 +170,7 @@ static struct pid_entry tid_base_stuff[]
E(PROC_TID_ROOT,   root,S_IFLNK|S_IRWXUGO),
E(PROC_TID_EXE,exe, S_IFLNK|S_IRWXUGO),
E(PROC_TID_MOUNTS, mounts,  S_IFREG|S_IRUGO),
+   E(PROC_TID_SMAPS,  smaps,   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -456,6 +478,26 @@ static struct file_operations proc_maps_
.release= seq_release,
 };
 
+extern struct seq_operations proc_pid_smaps_op;
+static int smaps_open(struct inode *inode, struct file *file)
+{
+   struct task_struct *task = proc_task(inode);
+   int ret = seq_open(file, proc_pid_smaps_op);
+   if (!ret) {
+   struct seq_file *m = file-private_data;
+   m-private = task;
+   }
+   return ret;
+}
+
+static struct file_operations proc_smaps_operations = {
+   .open   = smaps_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
+
 extern struct seq_operations mounts_op;
 static int mounts_open(struct inode *inode, struct file *file)
 {
@@ -1300,6 +1342,10 @@ static struct dentry *proc_pident_lookup
case PROC_TGID_MOUNTS:
inode-i_fop = proc_mounts_operations;
break;
+   case PROC_TID_SMAPS:
+   case PROC_TGID_SMAPS:
+   inode-i_fop = proc_smaps_operations;
+   break

Re: User space out of memory approach

2005-01-21 Thread Mauricio Lin
Hi Andrew,

I have another question. You included an oom_adj entry in /proc for
each process. This was the approach you used in order to allow someone
or something to interfere the ranking algorithm from userland, right?
So if i have an another ranking algorithm in user space, I can use it
to complement the kernel decision as necessary. Was it your idea?

BR,

Mauricio Lin.


On Fri, 21 Jan 2005 17:27:11 -0400, Mauricio Lin <[EMAIL PROTECTED]> wrote:
> Hi Andrea,
> 
> I applied your patch and I am checking your code. It is really a very
> interesting work. I have a question about the function
> __set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory
> function. Do not you think it would be better put set_current_state
> instead of __set_current_state function? AFAIK the set_current_state
> function is more feasible for SMP systems, right?
> 
> BR,
> 
> Mauricio Lin.
> 
> 
> On Tue, 11 Jan 2005 09:38:37 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> 
> wrote:
> > On Tue, Jan 11, 2005 at 01:35:47AM +0100, Thomas Gleixner wrote:
> > > confirmed fix for this available. It was posted more than once.
> >
> > I posted 6 patches (1/4,2/4,3/4,4/4,5/4,6/4), they should be all
> > applied to mainline, they're self contained. They add the userspace
> > ratings too.
> >
> > Those patches fixes a longstanding PF_MEMDIE race too and they optimize
> > used_math as well.
> >
> > I'm running with all 6 patches applied with an uptime of 6 days on SMP
> > and no problems at all. They're all 6 patches applied to the kotd too
> > (plus the other bits posted on l-k as well for the write throttling,
> > just one bit is still missing but I'll add it soon):
> >
> > ftp://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD
> >
> >
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-21 Thread Mauricio Lin
Hi Andrea,

I applied your patch and I am checking your code. It is really a very
interesting work. I have a question about the function
__set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory
function. Do not you think it would be better put set_current_state
instead of __set_current_state function? AFAIK the set_current_state
function is more feasible for SMP systems, right?

BR,

Mauricio Lin.


On Tue, 11 Jan 2005 09:38:37 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> On Tue, Jan 11, 2005 at 01:35:47AM +0100, Thomas Gleixner wrote:
> > confirmed fix for this available. It was posted more than once.
> 
> I posted 6 patches (1/4,2/4,3/4,4/4,5/4,6/4), they should be all
> applied to mainline, they're self contained. They add the userspace
> ratings too.
> 
> Those patches fixes a longstanding PF_MEMDIE race too and they optimize
> used_math as well.
> 
> I'm running with all 6 patches applied with an uptime of 6 days on SMP
> and no problems at all. They're all 6 patches applied to the kotd too
> (plus the other bits posted on l-k as well for the write throttling,
> just one bit is still missing but I'll add it soon):
> 
> ftp://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD
> 
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-21 Thread Mauricio Lin
Hi Andrea,

I applied your patch and I am checking your code. It is really a very
interesting work. I have a question about the function
__set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory
function. Do not you think it would be better put set_current_state
instead of __set_current_state function? AFAIK the set_current_state
function is more feasible for SMP systems, right?

BR,

Mauricio Lin.


On Tue, 11 Jan 2005 09:38:37 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote:
 On Tue, Jan 11, 2005 at 01:35:47AM +0100, Thomas Gleixner wrote:
  confirmed fix for this available. It was posted more than once.
 
 I posted 6 patches (1/4,2/4,3/4,4/4,5/4,6/4), they should be all
 applied to mainline, they're self contained. They add the userspace
 ratings too.
 
 Those patches fixes a longstanding PF_MEMDIE race too and they optimize
 used_math as well.
 
 I'm running with all 6 patches applied with an uptime of 6 days on SMP
 and no problems at all. They're all 6 patches applied to the kotd too
 (plus the other bits posted on l-k as well for the write throttling,
 just one bit is still missing but I'll add it soon):
 
 ftp://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User space out of memory approach

2005-01-21 Thread Mauricio Lin
Hi Andrew,

I have another question. You included an oom_adj entry in /proc for
each process. This was the approach you used in order to allow someone
or something to interfere the ranking algorithm from userland, right?
So if i have an another ranking algorithm in user space, I can use it
to complement the kernel decision as necessary. Was it your idea?

BR,

Mauricio Lin.


On Fri, 21 Jan 2005 17:27:11 -0400, Mauricio Lin [EMAIL PROTECTED] wrote:
 Hi Andrea,
 
 I applied your patch and I am checking your code. It is really a very
 interesting work. I have a question about the function
 __set_current_state(TASK_INTERRUPTIBLE) you put in out_of_memory
 function. Do not you think it would be better put set_current_state
 instead of __set_current_state function? AFAIK the set_current_state
 function is more feasible for SMP systems, right?
 
 BR,
 
 Mauricio Lin.
 
 
 On Tue, 11 Jan 2005 09:38:37 +0100, Andrea Arcangeli [EMAIL PROTECTED] 
 wrote:
  On Tue, Jan 11, 2005 at 01:35:47AM +0100, Thomas Gleixner wrote:
   confirmed fix for this available. It was posted more than once.
 
  I posted 6 patches (1/4,2/4,3/4,4/4,5/4,6/4), they should be all
  applied to mainline, they're self contained. They add the userspace
  ratings too.
 
  Those patches fixes a longstanding PF_MEMDIE race too and they optimize
  used_math as well.
 
  I'm running with all 6 patches applied with an uptime of 6 days on SMP
  and no problems at all. They're all 6 patches applied to the kotd too
  (plus the other bits posted on l-k as well for the write throttling,
  just one bit is still missing but I'll add it soon):
 
  ftp://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD
 
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-01-17 Thread Mauricio Lin
Hi Tosatti,


On Mon, 17 Jan 2005 15:30:23 -0200, Marcelo Tosatti
<[EMAIL PROTECTED]> wrote:
> 
> Hi Mauricio,
> 
> On Mon, Jan 17, 2005 at 03:02:14PM -0400, Mauricio Lin wrote:
> > Hi Andrew,
> >
> > I figured out the error. This patch works for others editors as well.
> 
> 
> 
> > diff -uprN linux-2.6.10/fs/proc/task_mmu.c 
> > linux-2.6.10-smaps/fs/proc/task_mmu.c
> > --- linux-2.6.10/fs/proc/task_mmu.c   2004-12-24 17:34:01.0 -0400
> > +++ linux-2.6.10-smaps/fs/proc/task_mmu.c 2005-01-17 14:55:17.0 
> > -0400
> > @@ -81,6 +81,76 @@ static int show_map(struct seq_file *m,
> >   return 0;
> >  }
> >
> > +static void resident_mem_size(struct mm_struct *mm,
> > +   unsigned long start_address,
> > +   unsigned long end_address,
> > +   unsigned long *size)
> > +{
> > + pgd_t *pgd;
> > + pmd_t *pmd;
> > + pte_t *ptep, pte;
> > + unsigned long each_page;
> > +
> > + for (each_page = start_address; each_page < end_address;
> > +  each_page += PAGE_SIZE) {
> > + pgd = pgd_offset(mm, each_page);
> > + if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
> > + continue;
> > +
> > + pmd = pmd_offset(pgd, each_page);
> > +
> > + if (pmd_none(*pmd))
> > + continue;
> > +
> > + if (unlikely(pmd_bad(*pmd)))
> > + continue;
> > +
> > + if (pmd_present(*pmd)) {
> > + ptep = pte_offset_map(pmd, each_page);
> > + if (!ptep)
> > + continue;
> > + pte = *ptep;
> > + pte_unmap(ptep);
> > + if (pte_present(pte))
> > + *size += PAGE_SIZE;
> > + }
> > +     }
> > +}
> 
> You want to update your patch to handle the new 4level pagetables which 
> introduces
> a new indirection table: the PUD.
> 
> Check 2.6.11-rc1 - mm/rmap.c.
OK, I will check the new pagetable included in 2.6.11-rc1 and change
the navigation algorithm of page table entries.

Thanks.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] A new entry for /proc

2005-01-17 Thread Mauricio Lin
Hi Andrew,

I figured out the error. This patch works for others editors as well.


diff -uprN linux-2.6.10/Documentation/filesystems/proc.txt
linux-2.6.10-smaps/Documentation/filesystems/proc.txt
--- linux-2.6.10/Documentation/filesystems/proc.txt 2004-12-24
17:34:29.0 -0400
+++ linux-2.6.10-smaps/Documentation/filesystems/proc.txt   2005-01-17
11:29:31.0 -0400
@@ -133,6 +133,7 @@ Table 1-1: Process specific entries in /
  statm   Process memory status information  
  status  Process status in human readable form  
  wchan   If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ smaps  Extension of maps, presenting the rss size for each mapped file
 ..
 
 For example, to get the status information of a process, all you have to do is
diff -uprN linux-2.6.10/fs/proc/base.c linux-2.6.10-smaps/fs/proc/base.c
--- linux-2.6.10/fs/proc/base.c 2004-12-24 17:35:00.0 -0400
+++ linux-2.6.10-smaps/fs/proc/base.c   2005-01-17 12:11:01.0 -0400
@@ -11,6 +11,24 @@
  *  go into icache. We cache the reference to task_struct upon lookup too.
  *  Eventually it should become a filesystem in its own. We don't use the
  *  rest of procfs anymore.
+ *
+ *
+ *  Changelog:
+ *  17-Jan-2005
+ *  Allan Bezerra
+ *  Bruna Moreira <[EMAIL PROTECTED]>
+ *  Edjard Mota <[EMAIL PROTECTED]>
+ *  Ilias Biris <[EMAIL PROTECTED]>
+ *  Mauricio Lin <[EMAIL PROTECTED]>
+ *
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
+ *
+ *  A new process specific entry (smaps) included in /proc. It shows the
+ *  size of rss for each memory area. The maps entry lacks information
+ *  about physical memory size (rss) for each mapped file, i.e.,
+ *  rss information for executables and library files.
+ *  This additional information is useful for any tools that need to know
+ *  about physical memory consumption for a process specific library.
  */
 
 #include 
@@ -60,6 +78,7 @@ enum pid_directory_inos {
PROC_TGID_MAPS,
PROC_TGID_MOUNTS,
PROC_TGID_WCHAN,
+   PROC_TGID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TGID_SCHEDSTAT,
 #endif
@@ -86,6 +105,7 @@ enum pid_directory_inos {
PROC_TID_MAPS,
PROC_TID_MOUNTS,
PROC_TID_WCHAN,
+   PROC_TID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TID_SCHEDSTAT,
 #endif
@@ -123,6 +143,7 @@ static struct pid_entry tgid_base_stuff[
E(PROC_TGID_ROOT,  "root",S_IFLNK|S_IRWXUGO),
E(PROC_TGID_EXE,   "exe", S_IFLNK|S_IRWXUGO),
E(PROC_TGID_MOUNTS,"mounts",  S_IFREG|S_IRUGO),
+   E(PROC_TGID_SMAPS, "smaps",   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -148,6 +169,7 @@ static struct pid_entry tid_base_stuff[]
E(PROC_TID_ROOT,   "root",S_IFLNK|S_IRWXUGO),
E(PROC_TID_EXE,"exe", S_IFLNK|S_IRWXUGO),
E(PROC_TID_MOUNTS, "mounts",  S_IFREG|S_IRUGO),
+   E(PROC_TID_SMAPS,  "smaps",   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -497,6 +519,25 @@ static struct file_operations proc_maps_
.release= seq_release,
 };
 
+extern struct seq_operations proc_pid_smaps_op;
+static int smaps_open(struct inode *inode, struct file *file)
+{
+   struct task_struct *task = proc_task(inode);
+   int ret = seq_open(file, _pid_smaps_op);
+   if (!ret) {
+   struct seq_file *m = file->private_data;
+   m->private = task;
+   }
+   return ret;
+}
+
+static struct file_operations proc_smaps_operations = {
+   .open   = smaps_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 extern struct seq_operations mounts_op;
 static int mounts_open(struct inode *inode, struct file *file)
 {
@@ -1341,6 +1382,11 @@ static struct dentry *proc_pident_lookup
case PROC_TGID_MOUNTS:
inode->i_fop = _mounts_operations;
break;
+   case PROC_TID_SMAPS:
+   case PROC_TGID_SMAPS:
+   inode->i_fop = _smaps_operations;
+   break;
+
 #ifdef CONFIG_SECURITY
case PROC_TID_ATTR:
inode->i_nlink = 2;
diff -uprN linux-2.6.10/fs/proc/task_mmu.c linux-2.6.10-smaps/fs/proc/task_mmu.c
--- linux-2.6.10/fs/proc/task_mmu.c 2004-12-24 17:34:01.0 -0400
+++ linux-2.6.10-smaps/fs/proc/task_mmu.c   2005-01-17 14:55:17.0 
-0400
@@ -81,6 +81,76 @@ static int show_map(struct seq_file *m, 
return 0;
 }
 
+static void resident_mem_si

Re: [PATCH] A new entry for /proc

2005-01-17 Thread Mauricio Lin
Hi Andrew,

Sorry for the patch errors.

Here goes the fixed patch. I used the xemacs editor for copying it.
The others editor (emacs and pico) I tried do not copy the patch
rightly. The patch copying also does not work with webmail.

diff -uprN linux-2.6.10/Documentation/filesystems/proc.txt
linux-2.6.10-smaps/Documentation/filesystems/proc.txt
--- linux-2.6.10/Documentation/filesystems/proc.txt 2004-12-24
17:34:29.0 -0400
+++ linux-2.6.10-smaps/Documentation/filesystems/proc.txt   2005-01-17
11:29:31.0 -0400
@@ -133,6 +133,7 @@ Table 1-1: Process specific entries in /
  statm   Process memory status information  
  status  Process status in human readable form  
  wchan   If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ smaps  Extension of maps, presenting the rss size for each mapped file
 ..
 
 For example, to get the status information of a process, all you have to do is
diff -uprN linux-2.6.10/fs/proc/base.c linux-2.6.10-smaps/fs/proc/base.c
--- linux-2.6.10/fs/proc/base.c 2004-12-24 17:35:00.0 -0400
+++ linux-2.6.10-smaps/fs/proc/base.c   2005-01-17 12:11:01.0 -0400
@@ -11,6 +11,24 @@
  *  go into icache. We cache the reference to task_struct upon lookup too.
  *  Eventually it should become a filesystem in its own. We don't use the
  *  rest of procfs anymore.
+ *
+ *
+ *  Changelog:
+ *  17-Jan-2005
+ *  Allan Bezerra
+ *  Bruna Moreira <[EMAIL PROTECTED]>
+ *  Edjard Mota <[EMAIL PROTECTED]>
+ *  Ilias Biris <[EMAIL PROTECTED]>
+ *  Mauricio Lin <[EMAIL PROTECTED]>
+ *
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
+ *
+ *  A new process specific entry (smaps) included in /proc. It shows the
+ *  size of rss for each memory area. The maps entry lacks information
+ *  about physical memory size (rss) for each mapped file, i.e.,
+ *  rss information for executables and library files.
+ *  This additional information is useful for any tools that need to know
+ *  about physical memory consumption for a process specific library.
  */
 
 #include 
@@ -60,6 +78,7 @@ enum pid_directory_inos {
PROC_TGID_MAPS,
PROC_TGID_MOUNTS,
PROC_TGID_WCHAN,
+   PROC_TGID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TGID_SCHEDSTAT,
 #endif
@@ -86,6 +105,7 @@ enum pid_directory_inos {
PROC_TID_MAPS,
PROC_TID_MOUNTS,
PROC_TID_WCHAN,
+   PROC_TID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TID_SCHEDSTAT,
 #endif
@@ -123,6 +143,7 @@ static struct pid_entry tgid_base_stuff[
E(PROC_TGID_ROOT,  "root",S_IFLNK|S_IRWXUGO),
E(PROC_TGID_EXE,   "exe", S_IFLNK|S_IRWXUGO),
E(PROC_TGID_MOUNTS,"mounts",  S_IFREG|S_IRUGO),
+   E(PROC_TGID_SMAPS, "smaps",   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -148,6 +169,7 @@ static struct pid_entry tid_base_stuff[]
E(PROC_TID_ROOT,   "root",S_IFLNK|S_IRWXUGO),
E(PROC_TID_EXE,"exe", S_IFLNK|S_IRWXUGO),
E(PROC_TID_MOUNTS, "mounts",  S_IFREG|S_IRUGO),
+   E(PROC_TID_SMAPS,  "smaps",   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   "attr",S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -497,6 +519,25 @@ static struct file_operations proc_maps_
.release= seq_release,
 };
 
+extern struct seq_operations proc_pid_smaps_op;
+static int smaps_open(struct inode *inode, struct file *file)
+{
+   struct task_struct *task = proc_task(inode);
+   int ret = seq_open(file, _pid_smaps_op);
+   if (!ret) {
+   struct seq_file *m = file->private_data;
+   m->private = task;
+   }
+   return ret;
+}
+
+static struct file_operations proc_smaps_operations = {
+   .open   = smaps_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 extern struct seq_operations mounts_op;
 static int mounts_open(struct inode *inode, struct file *file)
 {
@@ -1341,6 +1382,11 @@ static struct dentry *proc_pident_lookup
case PROC_TGID_MOUNTS:
inode->i_fop = _mounts_operations;
break;
+   case PROC_TID_SMAPS:
+   case PROC_TGID_SMAPS:
+   inode->i_fop = _smaps_operations;
+   break;
+
 #ifdef CONFIG_SECURITY
case PROC_TID_ATTR:
inode->i_nlink = 2;
diff -uprN linux-2.6.10/fs/proc/task_mmu.c linux-2.6.10-smaps/fs/proc/task_mmu.c
--- linux-2.6.10/fs/proc/task_mmu.c 2004-12-24 17:34:01.0 -0400
+++ linux-2.6.10-smaps/fs/proc/task_mmu.c   2005-01-17 09:29:38.0 
-

Re: [PATCH] A new entry for /proc

2005-01-17 Thread Mauricio Lin
Hi Andrew,

Sorry for the patch errors.

Here goes the fixed patch. I used the xemacs editor for copying it.
The others editor (emacs and pico) I tried do not copy the patch
rightly. The patch copying also does not work with webmail.

diff -uprN linux-2.6.10/Documentation/filesystems/proc.txt
linux-2.6.10-smaps/Documentation/filesystems/proc.txt
--- linux-2.6.10/Documentation/filesystems/proc.txt 2004-12-24
17:34:29.0 -0400
+++ linux-2.6.10-smaps/Documentation/filesystems/proc.txt   2005-01-17
11:29:31.0 -0400
@@ -133,6 +133,7 @@ Table 1-1: Process specific entries in /
  statm   Process memory status information  
  status  Process status in human readable form  
  wchan   If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ smaps  Extension of maps, presenting the rss size for each mapped file
 ..
 
 For example, to get the status information of a process, all you have to do is
diff -uprN linux-2.6.10/fs/proc/base.c linux-2.6.10-smaps/fs/proc/base.c
--- linux-2.6.10/fs/proc/base.c 2004-12-24 17:35:00.0 -0400
+++ linux-2.6.10-smaps/fs/proc/base.c   2005-01-17 12:11:01.0 -0400
@@ -11,6 +11,24 @@
  *  go into icache. We cache the reference to task_struct upon lookup too.
  *  Eventually it should become a filesystem in its own. We don't use the
  *  rest of procfs anymore.
+ *
+ *
+ *  Changelog:
+ *  17-Jan-2005
+ *  Allan Bezerra
+ *  Bruna Moreira [EMAIL PROTECTED]
+ *  Edjard Mota [EMAIL PROTECTED]
+ *  Ilias Biris [EMAIL PROTECTED]
+ *  Mauricio Lin [EMAIL PROTECTED]
+ *
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
+ *
+ *  A new process specific entry (smaps) included in /proc. It shows the
+ *  size of rss for each memory area. The maps entry lacks information
+ *  about physical memory size (rss) for each mapped file, i.e.,
+ *  rss information for executables and library files.
+ *  This additional information is useful for any tools that need to know
+ *  about physical memory consumption for a process specific library.
  */
 
 #include asm/uaccess.h
@@ -60,6 +78,7 @@ enum pid_directory_inos {
PROC_TGID_MAPS,
PROC_TGID_MOUNTS,
PROC_TGID_WCHAN,
+   PROC_TGID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TGID_SCHEDSTAT,
 #endif
@@ -86,6 +105,7 @@ enum pid_directory_inos {
PROC_TID_MAPS,
PROC_TID_MOUNTS,
PROC_TID_WCHAN,
+   PROC_TID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TID_SCHEDSTAT,
 #endif
@@ -123,6 +143,7 @@ static struct pid_entry tgid_base_stuff[
E(PROC_TGID_ROOT,  root,S_IFLNK|S_IRWXUGO),
E(PROC_TGID_EXE,   exe, S_IFLNK|S_IRWXUGO),
E(PROC_TGID_MOUNTS,mounts,  S_IFREG|S_IRUGO),
+   E(PROC_TGID_SMAPS, smaps,   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -148,6 +169,7 @@ static struct pid_entry tid_base_stuff[]
E(PROC_TID_ROOT,   root,S_IFLNK|S_IRWXUGO),
E(PROC_TID_EXE,exe, S_IFLNK|S_IRWXUGO),
E(PROC_TID_MOUNTS, mounts,  S_IFREG|S_IRUGO),
+   E(PROC_TID_SMAPS,  smaps,   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -497,6 +519,25 @@ static struct file_operations proc_maps_
.release= seq_release,
 };
 
+extern struct seq_operations proc_pid_smaps_op;
+static int smaps_open(struct inode *inode, struct file *file)
+{
+   struct task_struct *task = proc_task(inode);
+   int ret = seq_open(file, proc_pid_smaps_op);
+   if (!ret) {
+   struct seq_file *m = file-private_data;
+   m-private = task;
+   }
+   return ret;
+}
+
+static struct file_operations proc_smaps_operations = {
+   .open   = smaps_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 extern struct seq_operations mounts_op;
 static int mounts_open(struct inode *inode, struct file *file)
 {
@@ -1341,6 +1382,11 @@ static struct dentry *proc_pident_lookup
case PROC_TGID_MOUNTS:
inode-i_fop = proc_mounts_operations;
break;
+   case PROC_TID_SMAPS:
+   case PROC_TGID_SMAPS:
+   inode-i_fop = proc_smaps_operations;
+   break;
+
 #ifdef CONFIG_SECURITY
case PROC_TID_ATTR:
inode-i_nlink = 2;
diff -uprN linux-2.6.10/fs/proc/task_mmu.c linux-2.6.10-smaps/fs/proc/task_mmu.c
--- linux-2.6.10/fs/proc/task_mmu.c 2004-12-24 17:34:01.0 -0400
+++ linux-2.6.10-smaps/fs/proc/task_mmu.c   2005-01-17 09:29:38.0 
-0400
@@ -81,6 +81,75 @@ static int show_map(struct seq_file *m, 
return 0;
 }
 
+static void resident_mem_size(struct mm_struct *mm

Re: [PATCH] A new entry for /proc

2005-01-17 Thread Mauricio Lin
Hi Andrew,

I figured out the error. This patch works for others editors as well.


diff -uprN linux-2.6.10/Documentation/filesystems/proc.txt
linux-2.6.10-smaps/Documentation/filesystems/proc.txt
--- linux-2.6.10/Documentation/filesystems/proc.txt 2004-12-24
17:34:29.0 -0400
+++ linux-2.6.10-smaps/Documentation/filesystems/proc.txt   2005-01-17
11:29:31.0 -0400
@@ -133,6 +133,7 @@ Table 1-1: Process specific entries in /
  statm   Process memory status information  
  status  Process status in human readable form  
  wchan   If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ smaps  Extension of maps, presenting the rss size for each mapped file
 ..
 
 For example, to get the status information of a process, all you have to do is
diff -uprN linux-2.6.10/fs/proc/base.c linux-2.6.10-smaps/fs/proc/base.c
--- linux-2.6.10/fs/proc/base.c 2004-12-24 17:35:00.0 -0400
+++ linux-2.6.10-smaps/fs/proc/base.c   2005-01-17 12:11:01.0 -0400
@@ -11,6 +11,24 @@
  *  go into icache. We cache the reference to task_struct upon lookup too.
  *  Eventually it should become a filesystem in its own. We don't use the
  *  rest of procfs anymore.
+ *
+ *
+ *  Changelog:
+ *  17-Jan-2005
+ *  Allan Bezerra
+ *  Bruna Moreira [EMAIL PROTECTED]
+ *  Edjard Mota [EMAIL PROTECTED]
+ *  Ilias Biris [EMAIL PROTECTED]
+ *  Mauricio Lin [EMAIL PROTECTED]
+ *
+ *  Embedded Linux Lab - 10LE Instituto Nokia de Tecnologia - INdT
+ *
+ *  A new process specific entry (smaps) included in /proc. It shows the
+ *  size of rss for each memory area. The maps entry lacks information
+ *  about physical memory size (rss) for each mapped file, i.e.,
+ *  rss information for executables and library files.
+ *  This additional information is useful for any tools that need to know
+ *  about physical memory consumption for a process specific library.
  */
 
 #include asm/uaccess.h
@@ -60,6 +78,7 @@ enum pid_directory_inos {
PROC_TGID_MAPS,
PROC_TGID_MOUNTS,
PROC_TGID_WCHAN,
+   PROC_TGID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TGID_SCHEDSTAT,
 #endif
@@ -86,6 +105,7 @@ enum pid_directory_inos {
PROC_TID_MAPS,
PROC_TID_MOUNTS,
PROC_TID_WCHAN,
+   PROC_TID_SMAPS,
 #ifdef CONFIG_SCHEDSTATS
PROC_TID_SCHEDSTAT,
 #endif
@@ -123,6 +143,7 @@ static struct pid_entry tgid_base_stuff[
E(PROC_TGID_ROOT,  root,S_IFLNK|S_IRWXUGO),
E(PROC_TGID_EXE,   exe, S_IFLNK|S_IRWXUGO),
E(PROC_TGID_MOUNTS,mounts,  S_IFREG|S_IRUGO),
+   E(PROC_TGID_SMAPS, smaps,   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -148,6 +169,7 @@ static struct pid_entry tid_base_stuff[]
E(PROC_TID_ROOT,   root,S_IFLNK|S_IRWXUGO),
E(PROC_TID_EXE,exe, S_IFLNK|S_IRWXUGO),
E(PROC_TID_MOUNTS, mounts,  S_IFREG|S_IRUGO),
+   E(PROC_TID_SMAPS,  smaps,   S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -497,6 +519,25 @@ static struct file_operations proc_maps_
.release= seq_release,
 };
 
+extern struct seq_operations proc_pid_smaps_op;
+static int smaps_open(struct inode *inode, struct file *file)
+{
+   struct task_struct *task = proc_task(inode);
+   int ret = seq_open(file, proc_pid_smaps_op);
+   if (!ret) {
+   struct seq_file *m = file-private_data;
+   m-private = task;
+   }
+   return ret;
+}
+
+static struct file_operations proc_smaps_operations = {
+   .open   = smaps_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 extern struct seq_operations mounts_op;
 static int mounts_open(struct inode *inode, struct file *file)
 {
@@ -1341,6 +1382,11 @@ static struct dentry *proc_pident_lookup
case PROC_TGID_MOUNTS:
inode-i_fop = proc_mounts_operations;
break;
+   case PROC_TID_SMAPS:
+   case PROC_TGID_SMAPS:
+   inode-i_fop = proc_smaps_operations;
+   break;
+
 #ifdef CONFIG_SECURITY
case PROC_TID_ATTR:
inode-i_nlink = 2;
diff -uprN linux-2.6.10/fs/proc/task_mmu.c linux-2.6.10-smaps/fs/proc/task_mmu.c
--- linux-2.6.10/fs/proc/task_mmu.c 2004-12-24 17:34:01.0 -0400
+++ linux-2.6.10-smaps/fs/proc/task_mmu.c   2005-01-17 14:55:17.0 
-0400
@@ -81,6 +81,76 @@ static int show_map(struct seq_file *m, 
return 0;
 }
 
+static void resident_mem_size(struct mm_struct *mm,
+ unsigned long start_address,
+ unsigned long end_address,
+ unsigned long

Re: [PATCH] A new entry for /proc

2005-01-17 Thread Mauricio Lin
Hi Tosatti,


On Mon, 17 Jan 2005 15:30:23 -0200, Marcelo Tosatti
[EMAIL PROTECTED] wrote:
 
 Hi Mauricio,
 
 On Mon, Jan 17, 2005 at 03:02:14PM -0400, Mauricio Lin wrote:
  Hi Andrew,
 
  I figured out the error. This patch works for others editors as well.
 
 snip
 
  diff -uprN linux-2.6.10/fs/proc/task_mmu.c 
  linux-2.6.10-smaps/fs/proc/task_mmu.c
  --- linux-2.6.10/fs/proc/task_mmu.c   2004-12-24 17:34:01.0 -0400
  +++ linux-2.6.10-smaps/fs/proc/task_mmu.c 2005-01-17 14:55:17.0 
  -0400
  @@ -81,6 +81,76 @@ static int show_map(struct seq_file *m,
return 0;
   }
 
  +static void resident_mem_size(struct mm_struct *mm,
  +   unsigned long start_address,
  +   unsigned long end_address,
  +   unsigned long *size)
  +{
  + pgd_t *pgd;
  + pmd_t *pmd;
  + pte_t *ptep, pte;
  + unsigned long each_page;
  +
  + for (each_page = start_address; each_page  end_address;
  +  each_page += PAGE_SIZE) {
  + pgd = pgd_offset(mm, each_page);
  + if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
  + continue;
  +
  + pmd = pmd_offset(pgd, each_page);
  +
  + if (pmd_none(*pmd))
  + continue;
  +
  + if (unlikely(pmd_bad(*pmd)))
  + continue;
  +
  + if (pmd_present(*pmd)) {
  + ptep = pte_offset_map(pmd, each_page);
  + if (!ptep)
  + continue;
  + pte = *ptep;
  + pte_unmap(ptep);
  + if (pte_present(pte))
  + *size += PAGE_SIZE;
  + }
  + }
  +}
 
 You want to update your patch to handle the new 4level pagetables which 
 introduces
 a new indirection table: the PUD.
 
 Check 2.6.11-rc1 - mm/rmap.c.
OK, I will check the new pagetable included in 2.6.11-rc1 and change
the navigation algorithm of page table entries.

Thanks.

BR,

Mauricio Lin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/