i_version changes

2008-02-09 Thread Christoph Hellwig
I think the i_version changes that hit mainline about a week ago are
not as nice as they should be.

First there's a complete lack of documentation on this, which is very
bad.  Please document what the new semantics for i_version on regular
files are supposed to be, and how it differes from the existing
semantics for directories.

Second abusing one of the rather scare superblock mount flags is
a bad idea.  It would be much better to set this through ->setattr
and an extension of struct iattr.  Especially as we need to convert
file_update_time to update c and mtime through ->setattr anyway.

Third using the MS_ flag but then actually having a filesystem
mount option to enable it is more than confusing.  After all MS_
options (at least the exported parts) are the mount ABI for common
options.  Also this option doesn't show up in ->show_options,
which is something Miklos will beat you up for :)
I'm also not convinced this should be option behaviour, either you
do update i_version for a given filesystem or you don't - having
an obscure mount option will only give you confusion.

Beyond those any good reason for making inode_inc_iversion inline,
especially after the first patch introduced it properly out of line.

And as a last note please stop pushing these kind of core changes
through specific filesystem trees.  If this had been in ->mm we
would have caught this a lot earlier, and would have also meant you'd
get input and possible even implementations from other filesystem
maintainers.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: move headers out of include/linux

2008-02-09 Thread Christoph Hellwig
On Sat, Feb 09, 2008 at 10:39:33AM +0100, Christoph Hellwig wrote:
> Move ext4 headers out of include/linux.  This is just the trivial move,
> there's some more thing that could be done later.
> 
> Ted, is anything of these shared with e2fsprogs or can we rip out all
> that #ifdef __KERNEL__ junk?
> 
> Note that I plan to submit similar patches for ext2 and ext3 aswell,
> so the diverging from them argument doesn't count.
> 
> Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]>

Looks like the patch is to big for vger.  Here's a link instead:

http://verein.lst.de/~hch/ext4-move-headers

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [sample] mem_notify v6: usage example

2008-02-09 Thread Pavel Machek
On Sat 2008-02-09 11:07:09, Jon Masters wrote:
> This really needs to be triggered via a generic kernel 
> event in the  final version - I picture glibc having a 
> reservation API and having  generic support for freeing 
> such reservations.

Not sure what you are talking about. This seems very right to me.

We want memory-low notification, not yet another generic communication
mechanism.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8][for -mm] mem_notify v6

2008-02-09 Thread KOSAKI Motohiro
Hi Rik

> More importantly, all gtk+ programs, as well as most databases and other
> system daemons have a poll() loop as their main loop.

not only gtk+, may be all modern GUI program :)
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8][for -mm] mem_notify v6

2008-02-09 Thread Jon Masters

Yo,

Interesting patch series (I am being yuppie and reading this thread  
from my iPhone on a treadmill at the gym - so further comments later).  
I think that this is broadly along the lines that I was thinking, but  
this should be an RFC only patch series for now.


Some initial questions:

Where is the netlink interface? Polling an FD is so last century :)

What testing have you done?

Still, it is good to start with some code - eventually we might just  
have a full reservation API created. Rik and I and others have bounced  
ideas around for a while and I hope we can pitch in. I will play with  
these patches later.


Jon.



On Feb 9, 2008, at 10:19, "KOSAKI Motohiro" <[EMAIL PROTECTED] 
> wrote:



Hi

The /dev/mem_notify is low memory notification device.
it can avoid swappness and oom by cooperationg with the user process.

the Linux Today article is very nice description. (great works by  
Jake Edge)

http://www.linuxworld.com/news/2008/020508-kernel.html


When memory gets tight, it is quite possible that applications have  
memory
allocated—often caches for better performance—that they could fre 
e.
After all, it is generally better to lose some performance than to  
face the

consequences of being chosen by the OOM killer.
But, currently, there is no way for a process to know that the  
kernel is

feeling memory pressure.
The patch provides a way for interested programs to monitor the /dev/ 
mem_notify

file to be notified if memory starts to run low.



You need not be annoyed by OOM any longer :)
please any comments!

patch list
  [1/8] introduce poll_wait_exclusive() new API
  [2/8] introduce wake_up_locked_nr() new API
  [3/8] introduce /dev/mem_notify new device (the core of this
patch series)
  [4/8] memory_pressure_notify() caller
  [5/8] add new mem_notify field to /proc/zoneinfo
  [6/8] (optional) fixed incorrect shrink_zone
  [7/8] ignore very small zone for prevent incorrect low mem  
notify.

  [8/8] support fasync feature


related discussion:
--
LKML OOM notifications requirement discussion
   
http://www.gossamer-threads.com/lists/linux/kernel/832802?nohighlight=1#832802
OOM notifications patch [Marcelo Tosatti]
   http://marc.info/?l=linux-kernel&m=119273914027743&w=2
mem notifications v3 [Marcelo Tosatti]
   http://marc.info/?l=linux-mm&m=119852828327044&w=2
Thrashing notification patch  [Daniel Spang]
   http://marc.info/?l=linux-mm&m=119427416315676&w=2
mem notification v4
   http://marc.info/?l=linux-mm&m=120035840523718&w=2
mem notification v5
   http://marc.info/?l=linux-mm&m=120114835421602&w=2

Changelog
-
v5 -> v6 (by KOSAKI Motohiro)
  o rebase to 2.6.24-mm1
  o fixed thundering herd guard formula.

v4 -> v5 (by KOSAKI Motohiro)
  o rebase to 2.6.24-rc8-mm1
  o change display order of /proc/zoneinfo
  o ignore very small zone
  o support fcntl(F_SETFL, FASYNC)
  o fixed some trivial bugs.

v3 -> v4 (by KOSAKI Motohiro)
  o rebase to 2.6.24-rc6-mm1
  o avoid wake up all.
  o add judgement point to __free_one_page().
  o add zone awareness.

v2 -> v3 (by Marcelo Tosatti)
  o changes the notification point to happen whenever
the VM moves an anonymous page to the inactive list.
  o implement notification rate limit.

v1(oom notify) -> v2 (by Marcelo Tosatti)
  o name change
  o notify timing change from just swap thrashing to
just before thrashing.
  o also works with swapless device.


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8][for -mm] mem_notify v6

2008-02-09 Thread Rik van Riel
On Sun, 10 Feb 2008 01:33:49 +0900
"KOSAKI Motohiro" <[EMAIL PROTECTED]> wrote:

> > Where is the netlink interface? Polling an FD is so last century :)
> 
> to be honest, I don't know anyone use netlink and why hope receive
> low memory notify by netlink.
> 
> poll() is old way, but it works good enough.

More importantly, all gtk+ programs, as well as most databases and other
system daemons have a poll() loop as their main loop.

A file descriptor fits that main loop perfectly.

-- 
All rights reversed.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [sample] mem_notify v6: usage example

2008-02-09 Thread KOSAKI Motohiro
Hi Jon

> This really needs to be triggered via a generic kernel event in the
> final version - I picture glibc having a reservation API and having
> generic support for freeing such reservations.

to be honest, I doubt idea of generic reservation framework.

end up, we hope drop the application cache, not also dataless memory.
but, automatically drop mechanism only able to drop dataless memory.

and, many application have own memory management subsystem.
I afraid to nobody use too complex framework.

What do you think it?
I hope see your API. please post it.

Thanks!
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [sample] mem_notify v6: usage example

2008-02-09 Thread Jon Masters
This really needs to be triggered via a generic kernel event in the  
final version - I picture glibc having a reservation API and having  
generic support for freeing such reservations.


Jon



On Feb 9, 2008, at 10:55, "KOSAKI Motohiro" <[EMAIL PROTECTED] 
> wrote:



this is usage example of /dev/mem_notify.

Daniel Spang create original version.
kosaki add fasync related code.


Signed-off-by: Daniel Spang <[EMAIL PROTECTED]>
Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>

---
Documentation/mem_notify.c |  120 +++ 
++

1 file changed, 120 insertions(+)

Index: b/Documentation/mem_notify.c
===
--- /dev/null1970-01-01 00:00:00.0 +
+++ b/Documentation/mem_notify.c2008-02-10 00:44:00.0  
+0900

@@ -0,0 +1,120 @@
+/*
+ * Allocate 10 MB each second. Exit on notification.
+ */
+
+#define _GNU_SOURCE
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int count = 0;
+int size = 10;
+
+void *do_alloc()
+{
+for(;;) {
+int *buffer;
+buffer = mmap(NULL,  size*1024*1024,
+  PROT_READ | PROT_WRITE,
+  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+if (buffer == MAP_FAILED) {
+perror("mmap");
+exit(EXIT_FAILURE);
+}
+memset(buffer, 1 , size*1024*1024);
+
+printf("-");
+fflush(stdout);
+
+count++;
+sleep(1);
+}
+}
+
+int wait_for_notification(struct pollfd *pfd)
+{
+int ret;
+read(pfd->fd, 0, 0);
+ret = poll(pfd, 1, -1);  /* wake up when low  
memory */

+if (ret == -1 && errno != EINTR) {
+perror("poll");
+exit(EXIT_FAILURE);
+}
+return ret;
+}
+
+void do_free()
+{
+int fd;
+struct pollfd pfd;
+
+fd = open("/dev/mem_notify", O_RDONLY);
+if (fd == -1) {
+perror("open");
+exit(EXIT_FAILURE);
+}
+
+pfd.fd = fd;
+pfd.events = POLLIN;
+for(;;)
+if (wait_for_notification(&pfd) > 0) {
+printf("\nGot notification, allocated %d MB 
\n",

+   size * count);
+exit(EXIT_SUCCESS);
+}
+}
+
+void do_free_signal()
+{
+int fd;
+int flags;
+
+fd = open("/dev/mem_notify", O_RDONLY);
+if (fd == -1) {
+perror("open");
+exit(EXIT_FAILURE);
+}
+
+fcntl(fd, F_SETOWN, getpid());
+fcntl(fd, F_SETSIG, SIGUSR1);
+
+flags = fcntl(fd, F_GETFL);
+fcntl(fd, F_SETFL, flags|FASYNC); /* when low memory, receive  
SIGUSR1 */

+
+for(;;)
+sleep(1);
+}
+
+
+void daniel_exit(int signo)
+{
+printf("\nGot notification %d, allocated %d MB\n",
+   signo, size * count);
+exit(EXIT_SUCCESS);
+
+}
+
+int main(int argc, char *argv[])
+{
+pthread_t allocator;
+
+if(argc == 2 && (strcmp(argv[1], "-sig") == 0)) {
+printf("run signal mode\n");
+signal(SIGUSR1, daniel_exit);
+pthread_create(&allocator, NULL, do_alloc, NULL);
+do_free_signal();
+} else {
+printf("run poll mode\n");
+pthread_create(&allocator, NULL, do_alloc, NULL);
+do_free();
+}
+return 0;
+}


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8][for -mm] mem_notify v6

2008-02-09 Thread KOSAKI Motohiro
Hi

> Interesting patch series (I am being yuppie and reading this thread
> from my iPhone on a treadmill at the gym - so further comments later).
> I think that this is broadly along the lines that I was thinking, but
> this should be an RFC only patch series for now.

sorry, I fixed at next post.


> Some initial questions:

Thank you.
welcome to any discussion.

> Where is the netlink interface? Polling an FD is so last century :)

to be honest, I don't know anyone use netlink and why hope receive
low memory notify by netlink.

poll() is old way, but it works good enough.

and, netlink have a bit weak point.
end up, netlink philosophy is read/write model.

I afraid to many low-mem message queued in netlink buffer
at under heavy pressure.
it cause degrade memory pressure.


> Still, it is good to start with some code - eventually we might just
> have a full reservation API created. Rik and I and others have bounced
> ideas around for a while and I hope we can pitch in. I will play with
> these patches later.

Great.
Welcome to any idea and any discussion.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[sample] mem_notify v6: usage example

2008-02-09 Thread KOSAKI Motohiro
this is usage example of /dev/mem_notify.

Daniel Spang create original version.
kosaki add fasync related code.


Signed-off-by: Daniel Spang <[EMAIL PROTECTED]>
Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>

---
 Documentation/mem_notify.c |  120 +
 1 file changed, 120 insertions(+)

Index: b/Documentation/mem_notify.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ b/Documentation/mem_notify.c2008-02-10 00:44:00.0 +0900
@@ -0,0 +1,120 @@
+/*
+ * Allocate 10 MB each second. Exit on notification.
+ */
+
+#define _GNU_SOURCE
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int count = 0;
+int size = 10;
+
+void *do_alloc()
+{
+for(;;) {
+int *buffer;
+buffer = mmap(NULL,  size*1024*1024,
+  PROT_READ | PROT_WRITE,
+  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+if (buffer == MAP_FAILED) {
+perror("mmap");
+exit(EXIT_FAILURE);
+}
+memset(buffer, 1 , size*1024*1024);
+
+printf("-");
+fflush(stdout);
+
+count++;
+sleep(1);
+}
+}
+
+int wait_for_notification(struct pollfd *pfd)
+{
+int ret;
+read(pfd->fd, 0, 0);
+ret = poll(pfd, 1, -1);  /* wake up when low memory */
+if (ret == -1 && errno != EINTR) {
+perror("poll");
+exit(EXIT_FAILURE);
+}
+return ret;
+}
+
+void do_free()
+{
+   int fd;
+   struct pollfd pfd;
+
+fd = open("/dev/mem_notify", O_RDONLY);
+if (fd == -1) {
+perror("open");
+exit(EXIT_FAILURE);
+}
+
+   pfd.fd = fd;
+pfd.events = POLLIN;
+for(;;)
+if (wait_for_notification(&pfd) > 0) {
+printf("\nGot notification, allocated %d MB\n",
+   size * count);
+exit(EXIT_SUCCESS);
+}
+}
+
+void do_free_signal()
+{
+   int fd;
+   int flags;
+
+fd = open("/dev/mem_notify", O_RDONLY);
+if (fd == -1) {
+perror("open");
+exit(EXIT_FAILURE);
+}
+
+   fcntl(fd, F_SETOWN, getpid());
+   fcntl(fd, F_SETSIG, SIGUSR1);
+
+   flags = fcntl(fd, F_GETFL);
+   fcntl(fd, F_SETFL, flags|FASYNC); /* when low memory, receive SIGUSR1 */
+
+   for(;;)
+   sleep(1);
+}
+
+
+void daniel_exit(int signo)
+{
+   printf("\nGot notification %d, allocated %d MB\n",
+  signo, size * count);
+   exit(EXIT_SUCCESS);
+
+}
+
+int main(int argc, char *argv[])
+{
+pthread_t allocator;
+
+   if(argc == 2 && (strcmp(argv[1], "-sig") == 0)) {
+   printf("run signal mode\n");
+   signal(SIGUSR1, daniel_exit);
+   pthread_create(&allocator, NULL, do_alloc, NULL);
+   do_free_signal();
+   } else {
+   printf("run poll mode\n");
+   pthread_create(&allocator, NULL, do_alloc, NULL);
+   do_free();
+   }
+   return 0;
+}
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/8][for -mm] mem_notify v6: support fasync feature

2008-02-09 Thread KOSAKI Motohiro
implement FASYNC capability to /dev/mem_notify.


fd = open("/dev/mem_notify", O_RDONLY);

fcntl(fd, F_SETOWN, getpid());
fcntl(fd, F_SETSIG, SIGUSR1);

flags = fcntl(fd, F_GETFL);
fcntl(fd, F_SETFL, flags|FASYNC);  /* when low memory, receive SIGUSR1 
*/



ChangeLog
v5 -> v6:
   o rewrite usage example
   o cleanups number of wakeup tasks calculation.   

v5:  new

Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>

---
 mm/mem_notify.c |  109 +---
 1 file changed, 104 insertions(+), 5 deletions(-)

Index: b/mm/mem_notify.c
===
--- a/mm/mem_notify.c   2008-02-03 20:37:25.0 +0900
+++ b/mm/mem_notify.c   2008-02-03 20:48:04.0 +0900
@@ -24,18 +24,58 @@
 #define MAX_WAKEUP_TASKS (100)

 struct mem_notify_file_info {
-   unsigned long last_proc_notify;
+   unsigned long last_proc_notify;
+   struct file  *file;
+
+   /* for fasync */
+   struct list_head  fa_list;
+   int   fa_fd;
 };

 static DECLARE_WAIT_QUEUE_HEAD(mem_wait);
 static atomic_long_t nr_under_memory_pressure_zones = ATOMIC_LONG_INIT(0);
 static atomic_t nr_watcher_task = ATOMIC_INIT(0);
+static LIST_HEAD(mem_notify_fasync_list);
+static DEFINE_SPINLOCK(mem_notify_fasync_lock);
+static atomic_t nr_fasync_task = ATOMIC_INIT(0);

 atomic_long_t last_mem_notify = ATOMIC_LONG_INIT(INITIAL_JIFFIES);

+static void mem_notify_kill_fasync_nr(int nr)
+{
+   struct mem_notify_file_info *iter, *saved_iter;
+   LIST_HEAD(l_fired);
+
+   if (!nr)
+   return;
+
+   spin_lock(&mem_notify_fasync_lock);
+
+   list_for_each_entry_safe_reverse(iter, saved_iter,
+&mem_notify_fasync_list,
+fa_list) {
+   struct fown_struct *fown;
+
+   fown = &iter->file->f_owner;
+   send_sigio(fown, iter->fa_fd, POLL_IN);
+
+   list_del(&iter->fa_list);
+   list_add(&iter->fa_list, &l_fired);
+   if (!--nr)
+   break;
+   }
+
+   /* rotate moving for FIFO wakeup */
+   list_splice(&l_fired, &mem_notify_fasync_list);
+
+   spin_unlock(&mem_notify_fasync_lock);
+}
+
 void __memory_pressure_notify(struct zone *zone, int pressure)
 {
int nr_wakeup;
+   int nr_poll_wakeup = 0;
+   int nr_fasync_wakeup = 0;
int flags;

spin_lock_irqsave(&mem_wait.lock, flags);
@@ -48,6 +88,8 @@ void __memory_pressure_notify(struct zon

if (pressure) {
int nr_watcher = atomic_read(&nr_watcher_task);
+   int nr_fasync_wait_tasks = atomic_read(&nr_fasync_task);
+   int nr_poll_wait_tasks = nr_watcher - nr_fasync_wait_tasks;

atomic_long_set(&last_mem_notify, jiffies);
if (!nr_watcher)
@@ -57,10 +99,27 @@ void __memory_pressure_notify(struct zon
if (unlikely(nr_wakeup > MAX_WAKEUP_TASKS))
nr_wakeup = MAX_WAKEUP_TASKS;

-   wake_up_locked_nr(&mem_wait, nr_wakeup);
+   /*   nr_wakeup
+  nr_fasync_wakeup = nr_fasync_wait_taks x 
+nr_watcher
+   */
+   nr_fasync_wakeup = DIV_ROUND_UP(nr_fasync_wait_tasks *
+   nr_wakeup, nr_watcher);
+   if (unlikely(nr_fasync_wakeup > nr_fasync_wait_tasks))
+   nr_fasync_wakeup = nr_fasync_wait_tasks;
+
+   nr_poll_wakeup = DIV_ROUND_UP(nr_poll_wait_tasks *
+ nr_wakeup, nr_watcher);
+   if (unlikely(nr_poll_wakeup > nr_poll_wait_tasks))
+   nr_poll_wakeup = nr_poll_wait_tasks;
+
+   wake_up_locked_nr(&mem_wait, nr_poll_wakeup);
}
 out:
spin_unlock_irqrestore(&mem_wait.lock, flags);
+
+   if (nr_fasync_wakeup)
+   mem_notify_kill_fasync_nr(nr_fasync_wakeup);
 }

 static int mem_notify_open(struct inode *inode, struct file *file)
@@ -75,6 +134,9 @@ static int mem_notify_open(struct inode
}

info->last_proc_notify = INITIAL_JIFFIES;
+   INIT_LIST_HEAD(&info->fa_list);
+   info->file = file;
+   info->fa_fd = -1;
file->private_data = info;
atomic_inc(&nr_watcher_task);
 out:
@@ -83,7 +145,16 @@ out:

 static int mem_notify_release(struct inode *inode, struct file *file)
 {
-   kfree(file->private_data);
+   struct mem_notify_file_info *info = file->private_data;
+
+   spin_lock(&mem_notify_fasync_lock);
+   if (!list_empty(&info->fa_list)) {
+   list_del(&info->fa_list);
+   atomic_dec(&nr_fasync_

[PATCH 7/8][for -mm] mem_notify v6: ignore very small zone for prevent incorrect low mem notify

2008-02-09 Thread KOSAKI Motohiro
on X86, ZONE_DMA is very very small.
it cause undesirable low mem notification.
It should ignored.

but on other some architecture, ZONE_DMA have 4GB.
4GB is large as it is not possible to ignored.

therefore, ignore or not is decided by zone size.

ChangeLog:
v5: new


Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>

---
 include/linux/mem_notify.h |3 +++
 mm/page_alloc.c|6 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

Index: b/include/linux/mem_notify.h
===
--- a/include/linux/mem_notify.h2008-01-23 22:06:04.0 +0900
+++ b/include/linux/mem_notify.h2008-01-23 22:08:02.0 +0900
@@ -22,6 +22,9 @@ static inline void memory_pressure_notif
unsigned long target;
unsigned long pages_high, pages_free, pages_reserve;

+   if (unlikely(zone->mem_notify_status == -1))
+   return;
+
if (pressure) {
target = atomic_long_read(&last_mem_notify) + MEM_NOTIFY_FREQ;
if (likely(time_before(jiffies, target)))
Index: b/mm/page_alloc.c
===
--- a/mm/page_alloc.c   2008-01-23 22:07:57.0 +0900
+++ b/mm/page_alloc.c   2008-01-23 22:08:02.0 +0900
@@ -3470,7 +3470,11 @@ static void __meminit free_area_init_cor
zone->zone_pgdat = pgdat;

zone->prev_priority = DEF_PRIORITY;
-   zone->mem_notify_status = 0;
+
+   if (zone->present_pages < (pgdat->node_present_pages / 10))
+   zone->mem_notify_status = -1;
+   else
+   zone->mem_notify_status = 0;

zone_pcp_init(zone);
INIT_LIST_HEAD(&zone->active_list);
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8][for -mm] mem_notify v6: (optional) fixed incorrect shrink_zone

2008-02-09 Thread KOSAKI Motohiro
on X86, ZONE_DMA is very very small.
It is often no used at all.

Unfortunately,
when NR_ACTIVE==0, NR_INACTIVE==0, shrink_zone() try to reclaim 1 page.
because

zone->nr_scan_active +=
(zone_page_state(zone, NR_ACTIVE) >> priority) + 1;
^

it cause unnecessary low memory notify ;-)
I fixed it.

ChangeLog
v5: new


Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>

---
 mm/vmscan.c |   21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

Index: b/mm/vmscan.c
===
--- a/mm/vmscan.c   2008-02-03 20:27:53.0 +0900
+++ b/mm/vmscan.c   2008-02-03 20:33:13.0 +0900
@@ -947,7 +947,7 @@ static inline void note_zone_scanning_pr

 static inline int zone_is_near_oom(struct zone *zone)
 {
-   return zone->pages_scanned >= (zone_page_state(zone, NR_ACTIVE)
+   return zone->pages_scanned > (zone_page_state(zone, NR_ACTIVE)
+ zone_page_state(zone, NR_INACTIVE))*3;
 }

@@ -1196,18 +1196,29 @@ static unsigned long shrink_zone(int pri
unsigned long nr_inactive;
unsigned long nr_to_scan;
unsigned long nr_reclaimed = 0;
+   unsigned long tmp;
+   unsigned long zone_active;
+   unsigned long zone_inactive;

if (scan_global_lru(sc)) {
/*
 * Add one to nr_to_scan just to make sure that the kernel
 * will slowly sift through the active list.
 */
-   zone->nr_scan_active +=
-   (zone_page_state(zone, NR_ACTIVE) >> priority) + 1;
+   zone_active = zone_page_state(zone, NR_ACTIVE);
+   tmp = (zone_active >> priority) + 1;
+   if (unlikely(tmp > zone_active))
+   tmp = zone_active;
+   zone->nr_scan_active += tmp;
nr_active = zone->nr_scan_active;
-   zone->nr_scan_inactive +=
-   (zone_page_state(zone, NR_INACTIVE) >> priority) + 1;
+
+   zone_inactive = zone_page_state(zone, NR_INACTIVE);
+   tmp = (zone_inactive >> priority) + 1;
+   if (unlikely(tmp > zone_inactive))
+   tmp = zone_inactive;
+   zone->nr_scan_inactive += tmp;
nr_inactive = zone->nr_scan_inactive;
+
if (nr_inactive >= sc->swap_cluster_max)
zone->nr_scan_inactive = 0;
else
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/8][for -mm] mem_notify v6: add new mem_notify field to /proc/zoneinfo

2008-02-09 Thread KOSAKI Motohiro
show new member of zone struct by /proc/zoneinfo.

ChangeLog:
v5: change display order to at last.


Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>

---
 mm/vmstat.c |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

Index: b/mm/vmstat.c
===
--- a/mm/vmstat.c   2008-01-23 22:06:05.0 +0900
+++ b/mm/vmstat.c   2008-01-23 22:08:00.0 +0900
@@ -795,10 +795,12 @@ static void zoneinfo_show_print(struct s
seq_printf(m,
   "\n  all_unreclaimable: %u"
   "\n  prev_priority: %i"
-  "\n  start_pfn: %lu",
-  zone_is_all_unreclaimable(zone),
+  "\n  start_pfn: %lu"
+  "\n  mem_notify_status: %i",
+  zone_is_all_unreclaimable(zone),
   zone->prev_priority,
-  zone->zone_start_pfn);
+  zone->zone_start_pfn,
+  zone->mem_notify_status);
seq_putc(m, '\n');
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/8][for -mm] mem_notify v6: memory_pressure_notify() caller

2008-02-09 Thread KOSAKI Motohiro
the notification point to happen whenever the VM moves an
anonymous page to the inactive list - this is a pretty good indication
that there are unused anonymous pages present which will be very likely
swapped out soon.

and, It is judged out of trouble at the fllowing situations.
 o memory pressure decrease and stop moves an anonymous page to the
inactive list.
 o free pages increase than (pages_high+lowmem_reserve)*2.


ChangeLog:
v5: add out of trouble notify to exit of balance_pgdat().


Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>

---
 mm/page_alloc.c |   12 
 mm/vmscan.c |   26 ++
 2 files changed, 38 insertions(+)

Index: b/mm/vmscan.c
===
--- a/mm/vmscan.c   2008-01-23 22:06:08.0 +0900
+++ b/mm/vmscan.c   2008-01-23 22:07:57.0 +0900
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -1089,10 +1090,14 @@ static void shrink_active_list(unsigned
struct page *page;
struct pagevec pvec;
int reclaim_mapped = 0;
+   bool inactivated_anon = 0;

if (sc->may_swap)
reclaim_mapped = calc_reclaim_mapped(sc, zone, priority);

+   if (!reclaim_mapped)
+   memory_pressure_notify(zone, 0);
+
lru_add_drain();
spin_lock_irq(&zone->lru_lock);
pgmoved = sc->isolate_pages(nr_pages, &l_hold, &pgscanned, sc->order,
@@ -1116,6 +1121,13 @@ static void shrink_active_list(unsigned
if (!reclaim_mapped ||
(total_swap_pages == 0 && PageAnon(page)) ||
page_referenced(page, 0, sc->mem_cgroup)) {
+   /* deal with the case where there is no
+* swap but an anonymous page would be
+* moved to the inactive list.
+*/
+   if (!total_swap_pages && reclaim_mapped &&
+   PageAnon(page))
+   inactivated_anon = 1;
list_add(&page->lru, &l_active);
continue;
}
@@ -1123,8 +1135,12 @@ static void shrink_active_list(unsigned
list_add(&page->lru, &l_active);
continue;
}
+   if (PageAnon(page))
+   inactivated_anon = 1;
list_add(&page->lru, &l_inactive);
}
+   if (inactivated_anon)
+   memory_pressure_notify(zone, 1);

pagevec_init(&pvec, 1);
pgmoved = 0;
@@ -1158,6 +1174,8 @@ static void shrink_active_list(unsigned
pagevec_strip(&pvec);
spin_lock_irq(&zone->lru_lock);
}
+   if (!reclaim_mapped)
+   memory_pressure_notify(zone, 0);

pgmoved = 0;
while (!list_empty(&l_active)) {
@@ -1659,6 +1677,14 @@ out:
goto loop_again;
}

+   for (i = pgdat->nr_zones - 1; i >= 0; i--) {
+   struct zone *zone = pgdat->node_zones + i;
+
+   if (!populated_zone(zone))
+   continue;
+   memory_pressure_notify(zone, 0);
+   }
+
return nr_reclaimed;
 }

Index: b/mm/page_alloc.c
===
--- a/mm/page_alloc.c   2008-01-23 22:06:08.0 +0900
+++ b/mm/page_alloc.c   2008-01-23 23:09:32.0 +0900
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -435,6 +436,8 @@ static inline void __free_one_page(struc
unsigned long page_idx;
int order_size = 1 << order;
int migratetype = get_pageblock_migratetype(page);
+   unsigned long prev_free;
+   unsigned long notify_threshold;

if (unlikely(PageCompound(page)))
destroy_compound_page(page, order);
@@ -444,6 +447,7 @@ static inline void __free_one_page(struc
VM_BUG_ON(page_idx & (order_size - 1));
VM_BUG_ON(bad_range(zone, page));

+   prev_free = zone_page_state(zone, NR_FREE_PAGES);
__mod_zone_page_state(zone, NR_FREE_PAGES, order_size);
while (order < MAX_ORDER-1) {
unsigned long combined_idx;
@@ -465,6 +469,14 @@ static inline void __free_one_page(struc
list_add(&page->lru,
&zone->free_area[order].free_list[migratetype]);
zone->free_area[order].nr_free++;
+
+   notify_threshold = (zone->pages_high +
+   zone->lowmem_reserve[MAX_NR_ZONES-1]) * 2;
+
+   if (unlikely((zone->mem_notify_status == 1) &&
+(prev_free <= notify_threshold) &&
+(zone_page_state(zone, NR_FREE_PAGES) > notify_threshold)))
+   

[PATCH 3/8][for -mm] mem_notify v6: introduce /dev/mem_notify new device (the core of this patch series)

2008-02-09 Thread KOSAKI Motohiro
the core of this patch series.
add /dev/mem_notify device for notification low memory to user process.



fd = open("/dev/mem_notify", O_RDONLY);
if (fd < 0) {
exit(1);
}
pollfds.fd = fd;
pollfds.events = POLLIN;
pollfds.revents = 0;
err = poll(&pollfds, 1, -1); // wake up at low memory

...


ChangeLog
 v5 -> v6:
 o improve number of wakeup tasks fomula when task is a few.



Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>

---
 Documentation/devices.txt  |1
 drivers/char/mem.c |5 +
 include/linux/mem_notify.h |   42 +++
 include/linux/mmzone.h |1
 mm/Makefile|2
 mm/mem_notify.c|  123 +
 mm/page_alloc.c|1
 7 files changed, 174 insertions(+), 1 deletion(-)

Index: b/drivers/char/mem.c
===
--- a/drivers/char/mem.c2008-02-03 20:59:43.0 +0900
+++ b/drivers/char/mem.c2008-02-03 21:00:24.0 +0900
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -869,6 +870,9 @@ static int memory_open(struct inode * in
filp->f_op = &oldmem_fops;
break;
 #endif
+   case 13:
+   filp->f_op = &mem_notify_fops;
+   break;
default:
return -ENXIO;
}
@@ -901,6 +905,7 @@ static const struct {
 #ifdef CONFIG_CRASH_DUMP
{12,"oldmem",S_IRUSR | S_IWUSR | S_IRGRP, &oldmem_fops},
 #endif
+   {13, "mem_notify", S_IRUGO, &mem_notify_fops},
 };

 static struct class *mem_class;
Index: b/include/linux/mem_notify.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ b/include/linux/mem_notify.h2008-02-03 21:01:41.0 +0900
@@ -0,0 +1,42 @@
+/*
+ * Notify applications of memory pressure via /dev/mem_notify
+ *
+ * Copyright (C) 2008 Marcelo Tosatti <[EMAIL PROTECTED]>,
+ *KOSAKI Motohiro <[EMAIL PROTECTED]>
+ *
+ * Released under the GPL, see the file COPYING for details.
+ */
+
+#ifndef _LINUX_MEM_NOTIFY_H
+#define _LINUX_MEM_NOTIFY_H
+
+#define MEM_NOTIFY_FREQ (HZ/5)
+
+extern atomic_long_t last_mem_notify;
+extern struct file_operations mem_notify_fops;
+
+extern void __memory_pressure_notify(struct zone *zone, int pressure);
+
+static inline void memory_pressure_notify(struct zone *zone, int pressure)
+{
+   unsigned long target;
+   unsigned long pages_high, pages_free, pages_reserve;
+
+   if (pressure) {
+   target = atomic_long_read(&last_mem_notify) + MEM_NOTIFY_FREQ;
+   if (likely(time_before(jiffies, target)))
+   return;
+
+   pages_high = zone->pages_high;
+   pages_free = zone_page_state(zone, NR_FREE_PAGES);
+   pages_reserve = zone->lowmem_reserve[MAX_NR_ZONES-1];
+   if (unlikely(pages_free > (pages_high+pages_reserve)*2))
+   return;
+
+   } else if (likely(!zone->mem_notify_status))
+   return;
+
+   __memory_pressure_notify(zone, pressure);
+}
+
+#endif /* _LINUX_MEM_NOTIFY_H */
Index: b/include/linux/mmzone.h
===
--- a/include/linux/mmzone.h2008-02-03 20:59:43.0 +0900
+++ b/include/linux/mmzone.h2008-02-03 20:59:46.0 +0900
@@ -283,6 +283,7 @@ struct zone {
 */
int prev_priority;

+   int mem_notify_status;

ZONE_PADDING(_pad2_)
/* Rarely used or read-mostly fields */
Index: b/mm/Makefile
===
--- a/mm/Makefile   2008-02-03 20:59:43.0 +0900
+++ b/mm/Makefile   2008-02-03 20:59:46.0 +0900
@@ -11,7 +11,7 @@ obj-y := bootmem.o filemap.o mempool.o
   page_alloc.o page-writeback.o pdflush.o \
   readahead.o swap.o truncate.o vmscan.o \
   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
-  page_isolation.o $(mmu-y)
+  page_isolation.o mem_notify.o $(mmu-y)

 obj-$(CONFIG_PROC_PAGE_MONITOR) += pagewalk.o
 obj-$(CONFIG_BOUNCE)   += bounce.o
Index: b/mm/mem_notify.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ b/mm/mem_notify.c   2008-02-03 21:02:30.0 +0900
@@ -0,0 +1,123 @@
+/*
+ * Notify applications of memory pressure via /dev/mem_notify
+ *
+ * Copyright (C) 2008 Marcelo Tosatti <[EMAIL PROTECTED]>,
+ *KOSAKI Motohiro <[EMAIL PROTECTED]>
+ *
+ * Released under the GPL, see the

[PATCH 2/8][for -mm] mem_notify v6: introduce wake_up_locked_nr() new API

2008-02-09 Thread KOSAKI Motohiro
introduce new API wake_up_locked_nr() and wake_up_locked_all().
it it similar as wake_up_nr() and wake_up_all(), but it doesn't lock.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>

---
 include/linux/wait.h |   12 
 kernel/sched.c   |5 +++--
 2 files changed, 11 insertions(+), 6 deletions(-)

Index: b/include/linux/wait.h
===
--- a/include/linux/wait.h  2008-02-03 20:27:54.0 +0900
+++ b/include/linux/wait.h  2008-02-03 20:32:12.0 +0900
@@ -142,7 +142,8 @@ static inline void __remove_wait_queue(w
 }

 void FASTCALL(__wake_up(wait_queue_head_t *q, unsigned int mode, int
nr, void *key));
-extern void FASTCALL(__wake_up_locked(wait_queue_head_t *q, unsigned
int mode));
+void FASTCALL(__wake_up_locked(wait_queue_head_t *q, unsigned int mode,
+  int nr, void *key));
 extern void FASTCALL(__wake_up_sync(wait_queue_head_t *q, unsigned
int mode, int nr));
 void FASTCALL(__wake_up_bit(wait_queue_head_t *, void *, int));
 int FASTCALL(__wait_on_bit(wait_queue_head_t *, struct wait_bit_queue
*, int (*)(void *), unsigned));
@@ -155,10 +156,13 @@ wait_queue_head_t *FASTCALL(bit_waitqueu
 #define wake_up(x) __wake_up(x, TASK_NORMAL, 1, NULL)
 #define wake_up_nr(x, nr)  __wake_up(x, TASK_NORMAL, nr, NULL)
 #define wake_up_all(x) __wake_up(x, TASK_NORMAL, 0, NULL)
-#define wake_up_locked(x)  __wake_up_locked((x), TASK_NORMAL)

-#define wake_up_interruptible(x)   __wake_up(x, TASK_INTERRUPTIBLE, 1, 
NULL)
-#define wake_up_interruptible_nr(x, nr)__wake_up(x,
TASK_INTERRUPTIBLE, nr, NULL)
+#define wake_up_locked(x)  __wake_up_locked((x), TASK_NORMAL, 1, 
NULL)
+#define wake_up_locked_nr(x, nr)__wake_up_locked((x),
TASK_NORMAL, nr, NULL)
+#define wake_up_locked_all(x)  __wake_up_locked((x),
TASK_NORMAL, 0, NULL)
+
+#define wake_up_interruptible(x)   __wake_up(x, TASK_INTERRUPTIBLE, 1, 
NULL)
+#define wake_up_interruptible_nr(x, nr) __wake_up(x,
TASK_INTERRUPTIBLE, nr, NULL)
 #define wake_up_interruptible_all(x)   __wake_up(x, TASK_INTERRUPTIBLE, 0, 
NULL)
 #define wake_up_interruptible_sync(x)  __wake_up_sync((x),
TASK_INTERRUPTIBLE, 1)

Index: b/kernel/sched.c
===
--- a/kernel/sched.c2008-02-03 20:27:54.0 +0900
+++ b/kernel/sched.c2008-02-03 20:29:09.0 +0900
@@ -4115,9 +4115,10 @@ EXPORT_SYMBOL(__wake_up);
 /*
  * Same as __wake_up but called with the spinlock in wait_queue_head_t held.
  */
-void __wake_up_locked(wait_queue_head_t *q, unsigned int mode)
+void __wake_up_locked(wait_queue_head_t *q, unsigned int mode,
+ int nr_exclusive, void *key)
 {
-   __wake_up_common(q, mode, 1, 0, NULL);
+   __wake_up_common(q, mode, nr_exclusive, 0, key);
 }

 /**
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/8][for -mm] mem_notify v6: introduce poll_wait_exclusive()

2008-02-09 Thread KOSAKI Motohiro
There are 2 way of adding item to wait_queue,
  1. add_wait_queue()
  2. add_wait_queue_exclusive()
and add_wait_queue_exclusive() is very useful API.

unforunately, poll_wait_exclusive() against poll_wait() doesn't exist.
it means there is no way that wake up only 1 process where polled.
wake_up() is wake up all sleeping process by poll_wait(), not 1 process.

this patch introduce poll_wait_exclusive() new API for allow wake up
only 1 process.


unsigned int kosaki_poll(struct file *file,
 struct poll_table_struct *wait)
{
poll_wait_exclusive(file, &kosaki_wait_queue, wait);
if (data_exist)
return POLLIN | POLLRDNORM;
return 0;
}


Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>

---
 fs/eventpoll.c   |7 +--
 fs/select.c  |9 ++---
 include/linux/poll.h |   13 +++--
 3 files changed, 22 insertions(+), 7 deletions(-)



Index: b/fs/eventpoll.c
===
--- a/fs/eventpoll.c2008-01-23 19:22:19.0 +0900
+++ b/fs/eventpoll.c2008-01-23 21:11:56.0 +0900
@@ -675,7 +675,7 @@ out_unlock:
  * target file wakeup lists.
  */
 static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
-poll_table *pt)
+poll_table *pt, int exclusive)
 {
struct epitem *epi = ep_item_from_epqueue(pt);
struct eppoll_entry *pwq;
@@ -684,7 +684,10 @@ static void ep_ptable_queue_proc(struct
init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
pwq->whead = whead;
pwq->base = epi;
-   add_wait_queue(whead, &pwq->wait);
+   if (exclusive)
+   add_wait_queue_exclusive(whead, &pwq->wait);
+   else
+   add_wait_queue(whead, &pwq->wait);
list_add_tail(&pwq->llink, &epi->pwqlist);
epi->nwait++;
} else {
Index: b/fs/select.c
===
--- a/fs/select.c   2008-01-23 19:22:22.0 +0900
+++ b/fs/select.c   2008-01-23 21:11:56.0 +0900
@@ -48,7 +48,7 @@ struct poll_table_page {
  * poll table.
  */
 static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
-  poll_table *p);
+  poll_table *p, int exclusive);

 void poll_initwait(struct poll_wqueues *pwq)
 {
@@ -117,7 +117,7 @@ static struct poll_table_entry *poll_get

 /* Add a new entry */
 static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
-   poll_table *p)
+  poll_table *p, int exclusive)
 {
struct poll_table_entry *entry = poll_get_entry(p);
if (!entry)
@@ -126,7 +126,10 @@ static void __pollwait(struct file *filp
entry->filp = filp;
entry->wait_address = wait_address;
init_waitqueue_entry(&entry->wait, current);
-   add_wait_queue(wait_address, &entry->wait);
+   if (exclusive)
+   add_wait_queue_exclusive(wait_address, &entry->wait);
+   else
+   add_wait_queue(wait_address, &entry->wait);
 }

 #define FDS_IN(fds, n) (fds->in + n)
Index: b/include/linux/poll.h
===
--- a/include/linux/poll.h  2008-01-23 19:22:55.0 +0900
+++ b/include/linux/poll.h  2008-02-03 20:28:27.0 +0900
@@ -28,7 +28,8 @@ struct poll_table_struct;
 /*
  * structures and helpers for f_op->poll implementations
  */
-typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *,
struct poll_table_struct *);
+typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *,
+   struct poll_table_struct *, int);

 typedef struct poll_table_struct {
poll_queue_proc qproc;
@@ -37,7 +38,15 @@ typedef struct poll_table_struct {
 static inline void poll_wait(struct file * filp, wait_queue_head_t *
wait_address, poll_table *p)
 {
if (p && wait_address)
-   p->qproc(filp, wait_address, p);
+   p->qproc(filp, wait_address, p, 0);
+}
+
+static inline void poll_wait_exclusive(struct file *filp,
+  wait_queue_head_t *wait_address,
+  poll_table *p)
+{
+   if (p && wait_address)
+   p->qproc(filp, wait_address, p, 1);
 }

 static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/8][for -mm] mem_notify v6

2008-02-09 Thread KOSAKI Motohiro
Hi

The /dev/mem_notify is low memory notification device.
it can avoid swappness and oom by cooperationg with the user process.

the Linux Today article is very nice description. (great works by Jake Edge)
http://www.linuxworld.com/news/2008/020508-kernel.html


When memory gets tight, it is quite possible that applications have memory
allocated—often caches for better performance—that they could free.
After all, it is generally better to lose some performance than to face the
consequences of being chosen by the OOM killer.
But, currently, there is no way for a process to know that the kernel is
feeling memory pressure.
The patch provides a way for interested programs to monitor the /dev/mem_notify
 file to be notified if memory starts to run low.



You need not be annoyed by OOM any longer :)
please any comments!

patch list
   [1/8] introduce poll_wait_exclusive() new API
   [2/8] introduce wake_up_locked_nr() new API
   [3/8] introduce /dev/mem_notify new device (the core of this
patch series)
   [4/8] memory_pressure_notify() caller
   [5/8] add new mem_notify field to /proc/zoneinfo
   [6/8] (optional) fixed incorrect shrink_zone
   [7/8] ignore very small zone for prevent incorrect low mem notify.
   [8/8] support fasync feature


related discussion:
--
 LKML OOM notifications requirement discussion

http://www.gossamer-threads.com/lists/linux/kernel/832802?nohighlight=1#832802
 OOM notifications patch [Marcelo Tosatti]
http://marc.info/?l=linux-kernel&m=119273914027743&w=2
 mem notifications v3 [Marcelo Tosatti]
http://marc.info/?l=linux-mm&m=119852828327044&w=2
 Thrashing notification patch  [Daniel Spang]
http://marc.info/?l=linux-mm&m=119427416315676&w=2
 mem notification v4
http://marc.info/?l=linux-mm&m=120035840523718&w=2
 mem notification v5
http://marc.info/?l=linux-mm&m=120114835421602&w=2

Changelog
-
 v5 -> v6 (by KOSAKI Motohiro)
   o rebase to 2.6.24-mm1
   o fixed thundering herd guard formula.

 v4 -> v5 (by KOSAKI Motohiro)
   o rebase to 2.6.24-rc8-mm1
   o change display order of /proc/zoneinfo
   o ignore very small zone
   o support fcntl(F_SETFL, FASYNC)
   o fixed some trivial bugs.

 v3 -> v4 (by KOSAKI Motohiro)
   o rebase to 2.6.24-rc6-mm1
   o avoid wake up all.
   o add judgement point to __free_one_page().
   o add zone awareness.

 v2 -> v3 (by Marcelo Tosatti)
   o changes the notification point to happen whenever
 the VM moves an anonymous page to the inactive list.
   o implement notification rate limit.

 v1(oom notify) -> v2 (by Marcelo Tosatti)
   o name change
   o notify timing change from just swap thrashing to
 just before thrashing.
   o also works with swapless device.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] efs: move headers out of include/linux/

2008-02-09 Thread Christoph Hellwig
Merge include/linux/efs_fs{_i,_dir}.h into fs/efs/efs.h.  efs_vh.h
remains there because this is the IRIX volume header and shouldn't
really be handled by efs but by the partitioning code.  efs_sb.h
remains there for now because it's exported to userspace.  Of course
this wrong and aboot should have a copy of it's own, but I'll leave
that to a separate patch to avoid any contention.


Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]>

Index: linux-2.6/fs/efs/dir.c
===
--- linux-2.6.orig/fs/efs/dir.c 2008-02-09 10:11:47.0 +0100
+++ linux-2.6/fs/efs/dir.c  2008-02-09 10:12:33.0 +0100
@@ -5,8 +5,8 @@
  */
 
 #include 
-#include 
 #include 
+#include "efs.h"
 
 static int efs_readdir(struct file *, void *, filldir_t);
 
Index: linux-2.6/fs/efs/efs.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6/fs/efs/efs.h  2008-02-09 10:16:16.0 +0100
@@ -0,0 +1,140 @@
+/*
+ * Copyright (c) 1999 Al Smith
+ *
+ * Portions derived from work (c) 1995,1996 Christian Vogelgsang.
+ * Portions derived from IRIX header files (c) 1988 Silicon Graphics
+ */
+#ifndef _EFS_EFS_H_
+#define _EFS_EFS_H_
+
+#include 
+#include 
+
+#define EFS_VERSION "1.0a"
+
+static const char cprt[] = "EFS: "EFS_VERSION" - (c) 1999 Al Smith <[EMAIL 
PROTECTED]>";
+
+
+/* 1 block is 512 bytes */
+#defineEFS_BLOCKSIZE_BITS  9
+#defineEFS_BLOCKSIZE   (1 << EFS_BLOCKSIZE_BITS)
+
+typedefint32_t efs_block_t;
+typedef uint32_t   efs_ino_t;
+
+#defineEFS_DIRECTEXTENTS   12
+
+/*
+ * layout of an extent, in memory and on disk. 8 bytes exactly.
+ */
+typedef union extent_u {
+   unsigned char raw[8];
+   struct extent_s {
+   unsigned intex_magic:8; /* magic # (zero) */
+   unsigned intex_bn:24;   /* basic block */
+   unsigned intex_length:8;/* numblocks in this extent */
+   unsigned intex_offset:24;   /* logical offset into file */
+   } cooked;
+} efs_extent;
+
+typedef struct edevs {
+   __be16  odev;
+   __be32  ndev;
+} efs_devs;
+
+/*
+ * extent based filesystem inode as it appears on disk.  The efs inode
+ * is exactly 128 bytes long.
+ */
+struct efs_dinode {
+   __be16  di_mode;/* mode and type of file */
+   __be16  di_nlink;   /* number of links to file */
+   __be16  di_uid; /* owner's user id */
+   __be16  di_gid; /* owner's group id */
+   __be32  di_size;/* number of bytes in file */
+   __be32  di_atime;   /* time last accessed */
+   __be32  di_mtime;   /* time last modified */
+   __be32  di_ctime;   /* time created */
+   __be32  di_gen; /* generation number */
+   __be16  di_numextents;  /* # of extents */
+   u_char  di_version; /* version of inode */
+   u_char  di_spare;   /* spare - used by AFS */
+   union di_addr {
+   efs_extent  di_extents[EFS_DIRECTEXTENTS];
+   efs_devsdi_dev; /* device for IFCHR/IFBLK */
+   } di_u;
+};
+
+/* efs inode storage in memory */
+struct efs_inode_info {
+   int numextents;
+   int lastextent;
+
+   efs_extent  extents[EFS_DIRECTEXTENTS];
+   struct inodevfs_inode;
+};
+
+#include 
+
+#define EFS_DIRBSIZE_BITS  EFS_BLOCKSIZE_BITS
+#define EFS_DIRBSIZE   (1 << EFS_DIRBSIZE_BITS)
+
+struct efs_dentry {
+   __be32  inode;
+   unsigned char   namelen;
+   charname[3];
+};
+
+#define EFS_DENTSIZE   (sizeof(struct efs_dentry) - 3 + 1)
+#define EFS_MAXNAMELEN  ((1 << (sizeof(char) * 8)) - 1)
+
+#define EFS_DIRBLK_HEADERSIZE  4
+#define EFS_DIRBLK_MAGIC   0xbeef  /* moo */
+
+struct efs_dir {
+   __be16  magic;
+   unsigned char   firstused;
+   unsigned char   slots;
+
+   unsigned char   space[EFS_DIRBSIZE - EFS_DIRBLK_HEADERSIZE];
+};
+
+#define EFS_MAXENTS \
+   ((EFS_DIRBSIZE - EFS_DIRBLK_HEADERSIZE) / \
+(EFS_DENTSIZE + sizeof(char)))
+
+#define EFS_SLOTAT(dir, slot) EFS_REALOFF((dir)->space[slot])
+
+#define EFS_REALOFF(offset) ((offset << 1))
+
+
+static inline struct efs_inode_info *INODE_INFO(struct inode *inode)
+{
+   return container_of(inode, struct efs_inode_info, vfs_inode);
+}
+
+static inline struct efs_sb_info *SUPER_INFO(struct super_block *sb)
+{
+   return sb->s_fs_info;
+}
+
+struct statfs;
+struct fid;
+
+extern const struct inode_operations efs_dir_inode_operations;
+extern const struct file_operations efs_dir_operations;
+extern const struct address_space_operations efs_symlink_aops;
+
+extern struct inode *efs_iget(struct super_block *, unsigned long);
+exte

Question about do_sync_read()

2008-02-09 Thread Manish Katiyar
Hi,

In the implementation of file systems for 2.6 kernels,
generic_file_read is often replaced with do_sync_read(). In this
function we call "filp->f_op->aio_read" unconditionally.
where most of the times aio_read is intialized as
generic_file_aio_read(). Wouldn't it be a good idea to change the
following code

241  for (;;) {
242   ret = filp->f_op->aio_read(&kiocb, &iov, 1, kiocb.ki_pos);
243   if (ret != -EIOCBRETRY)
244break;
245   wait_on_retry_sync_kiocb(&kiocb);

to

241  for (;;) {
242 if(filp->f_op->aio_read)
243 ret = filp->f_op->aio_read(&kiocb, &iov, 1, kiocb.ki_pos);
244 else
245  ret = generic_file_aio_read(&kiocb, &iov, 1, kiocb.ki_pos);
246   if (ret != -EIOCBRETRY)
247break;
248   wait_on_retry_sync_kiocb(&kiocb);

Just to have a fall back mechanism as we do at many places in the VFS layer..

-- 
Thanks & Regards,

Manish Katiyar  ( http://mkatiyar.googlepages.com )
3rd Floor, Fair Winds Block
EGL Software Park
Off Intermediate Ring Road
Bangalore 560071, India
***
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html