i_version changes
I think the i_version changes that hit mainline about a week ago are not as nice as they should be. First there's a complete lack of documentation on this, which is very bad. Please document what the new semantics for i_version on regular files are supposed to be, and how it differes from the existing semantics for directories. Second abusing one of the rather scare superblock mount flags is a bad idea. It would be much better to set this through ->setattr and an extension of struct iattr. Especially as we need to convert file_update_time to update c and mtime through ->setattr anyway. Third using the MS_ flag but then actually having a filesystem mount option to enable it is more than confusing. After all MS_ options (at least the exported parts) are the mount ABI for common options. Also this option doesn't show up in ->show_options, which is something Miklos will beat you up for :) I'm also not convinced this should be option behaviour, either you do update i_version for a given filesystem or you don't - having an obscure mount option will only give you confusion. Beyond those any good reason for making inode_inc_iversion inline, especially after the first patch introduced it properly out of line. And as a last note please stop pushing these kind of core changes through specific filesystem trees. If this had been in ->mm we would have caught this a lot earlier, and would have also meant you'd get input and possible even implementations from other filesystem maintainers. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ext4: move headers out of include/linux
On Sat, Feb 09, 2008 at 10:39:33AM +0100, Christoph Hellwig wrote: > Move ext4 headers out of include/linux. This is just the trivial move, > there's some more thing that could be done later. > > Ted, is anything of these shared with e2fsprogs or can we rip out all > that #ifdef __KERNEL__ junk? > > Note that I plan to submit similar patches for ext2 and ext3 aswell, > so the diverging from them argument doesn't count. > > Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]> Looks like the patch is to big for vger. Here's a link instead: http://verein.lst.de/~hch/ext4-move-headers - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [sample] mem_notify v6: usage example
On Sat 2008-02-09 11:07:09, Jon Masters wrote: > This really needs to be triggered via a generic kernel > event in the final version - I picture glibc having a > reservation API and having generic support for freeing > such reservations. Not sure what you are talking about. This seems very right to me. We want memory-low notification, not yet another generic communication mechanism. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8][for -mm] mem_notify v6
Hi Rik > More importantly, all gtk+ programs, as well as most databases and other > system daemons have a poll() loop as their main loop. not only gtk+, may be all modern GUI program :) - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8][for -mm] mem_notify v6
Yo, Interesting patch series (I am being yuppie and reading this thread from my iPhone on a treadmill at the gym - so further comments later). I think that this is broadly along the lines that I was thinking, but this should be an RFC only patch series for now. Some initial questions: Where is the netlink interface? Polling an FD is so last century :) What testing have you done? Still, it is good to start with some code - eventually we might just have a full reservation API created. Rik and I and others have bounced ideas around for a while and I hope we can pitch in. I will play with these patches later. Jon. On Feb 9, 2008, at 10:19, "KOSAKI Motohiro" <[EMAIL PROTECTED] > wrote: Hi The /dev/mem_notify is low memory notification device. it can avoid swappness and oom by cooperationg with the user process. the Linux Today article is very nice description. (great works by Jake Edge) http://www.linuxworld.com/news/2008/020508-kernel.html When memory gets tight, it is quite possible that applications have memory allocated—often caches for better performance—that they could fre e. After all, it is generally better to lose some performance than to face the consequences of being chosen by the OOM killer. But, currently, there is no way for a process to know that the kernel is feeling memory pressure. The patch provides a way for interested programs to monitor the /dev/ mem_notify file to be notified if memory starts to run low. You need not be annoyed by OOM any longer :) please any comments! patch list [1/8] introduce poll_wait_exclusive() new API [2/8] introduce wake_up_locked_nr() new API [3/8] introduce /dev/mem_notify new device (the core of this patch series) [4/8] memory_pressure_notify() caller [5/8] add new mem_notify field to /proc/zoneinfo [6/8] (optional) fixed incorrect shrink_zone [7/8] ignore very small zone for prevent incorrect low mem notify. [8/8] support fasync feature related discussion: -- LKML OOM notifications requirement discussion http://www.gossamer-threads.com/lists/linux/kernel/832802?nohighlight=1#832802 OOM notifications patch [Marcelo Tosatti] http://marc.info/?l=linux-kernel&m=119273914027743&w=2 mem notifications v3 [Marcelo Tosatti] http://marc.info/?l=linux-mm&m=119852828327044&w=2 Thrashing notification patch [Daniel Spang] http://marc.info/?l=linux-mm&m=119427416315676&w=2 mem notification v4 http://marc.info/?l=linux-mm&m=120035840523718&w=2 mem notification v5 http://marc.info/?l=linux-mm&m=120114835421602&w=2 Changelog - v5 -> v6 (by KOSAKI Motohiro) o rebase to 2.6.24-mm1 o fixed thundering herd guard formula. v4 -> v5 (by KOSAKI Motohiro) o rebase to 2.6.24-rc8-mm1 o change display order of /proc/zoneinfo o ignore very small zone o support fcntl(F_SETFL, FASYNC) o fixed some trivial bugs. v3 -> v4 (by KOSAKI Motohiro) o rebase to 2.6.24-rc6-mm1 o avoid wake up all. o add judgement point to __free_one_page(). o add zone awareness. v2 -> v3 (by Marcelo Tosatti) o changes the notification point to happen whenever the VM moves an anonymous page to the inactive list. o implement notification rate limit. v1(oom notify) -> v2 (by Marcelo Tosatti) o name change o notify timing change from just swap thrashing to just before thrashing. o also works with swapless device. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8][for -mm] mem_notify v6
On Sun, 10 Feb 2008 01:33:49 +0900 "KOSAKI Motohiro" <[EMAIL PROTECTED]> wrote: > > Where is the netlink interface? Polling an FD is so last century :) > > to be honest, I don't know anyone use netlink and why hope receive > low memory notify by netlink. > > poll() is old way, but it works good enough. More importantly, all gtk+ programs, as well as most databases and other system daemons have a poll() loop as their main loop. A file descriptor fits that main loop perfectly. -- All rights reversed. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [sample] mem_notify v6: usage example
Hi Jon > This really needs to be triggered via a generic kernel event in the > final version - I picture glibc having a reservation API and having > generic support for freeing such reservations. to be honest, I doubt idea of generic reservation framework. end up, we hope drop the application cache, not also dataless memory. but, automatically drop mechanism only able to drop dataless memory. and, many application have own memory management subsystem. I afraid to nobody use too complex framework. What do you think it? I hope see your API. please post it. Thanks! - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [sample] mem_notify v6: usage example
This really needs to be triggered via a generic kernel event in the final version - I picture glibc having a reservation API and having generic support for freeing such reservations. Jon On Feb 9, 2008, at 10:55, "KOSAKI Motohiro" <[EMAIL PROTECTED] > wrote: this is usage example of /dev/mem_notify. Daniel Spang create original version. kosaki add fasync related code. Signed-off-by: Daniel Spang <[EMAIL PROTECTED]> Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]> --- Documentation/mem_notify.c | 120 +++ ++ 1 file changed, 120 insertions(+) Index: b/Documentation/mem_notify.c === --- /dev/null1970-01-01 00:00:00.0 + +++ b/Documentation/mem_notify.c2008-02-10 00:44:00.0 +0900 @@ -0,0 +1,120 @@ +/* + * Allocate 10 MB each second. Exit on notification. + */ + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +int count = 0; +int size = 10; + +void *do_alloc() +{ +for(;;) { +int *buffer; +buffer = mmap(NULL, size*1024*1024, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); +if (buffer == MAP_FAILED) { +perror("mmap"); +exit(EXIT_FAILURE); +} +memset(buffer, 1 , size*1024*1024); + +printf("-"); +fflush(stdout); + +count++; +sleep(1); +} +} + +int wait_for_notification(struct pollfd *pfd) +{ +int ret; +read(pfd->fd, 0, 0); +ret = poll(pfd, 1, -1); /* wake up when low memory */ +if (ret == -1 && errno != EINTR) { +perror("poll"); +exit(EXIT_FAILURE); +} +return ret; +} + +void do_free() +{ +int fd; +struct pollfd pfd; + +fd = open("/dev/mem_notify", O_RDONLY); +if (fd == -1) { +perror("open"); +exit(EXIT_FAILURE); +} + +pfd.fd = fd; +pfd.events = POLLIN; +for(;;) +if (wait_for_notification(&pfd) > 0) { +printf("\nGot notification, allocated %d MB \n", + size * count); +exit(EXIT_SUCCESS); +} +} + +void do_free_signal() +{ +int fd; +int flags; + +fd = open("/dev/mem_notify", O_RDONLY); +if (fd == -1) { +perror("open"); +exit(EXIT_FAILURE); +} + +fcntl(fd, F_SETOWN, getpid()); +fcntl(fd, F_SETSIG, SIGUSR1); + +flags = fcntl(fd, F_GETFL); +fcntl(fd, F_SETFL, flags|FASYNC); /* when low memory, receive SIGUSR1 */ + +for(;;) +sleep(1); +} + + +void daniel_exit(int signo) +{ +printf("\nGot notification %d, allocated %d MB\n", + signo, size * count); +exit(EXIT_SUCCESS); + +} + +int main(int argc, char *argv[]) +{ +pthread_t allocator; + +if(argc == 2 && (strcmp(argv[1], "-sig") == 0)) { +printf("run signal mode\n"); +signal(SIGUSR1, daniel_exit); +pthread_create(&allocator, NULL, do_alloc, NULL); +do_free_signal(); +} else { +printf("run poll mode\n"); +pthread_create(&allocator, NULL, do_alloc, NULL); +do_free(); +} +return 0; +} - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8][for -mm] mem_notify v6
Hi > Interesting patch series (I am being yuppie and reading this thread > from my iPhone on a treadmill at the gym - so further comments later). > I think that this is broadly along the lines that I was thinking, but > this should be an RFC only patch series for now. sorry, I fixed at next post. > Some initial questions: Thank you. welcome to any discussion. > Where is the netlink interface? Polling an FD is so last century :) to be honest, I don't know anyone use netlink and why hope receive low memory notify by netlink. poll() is old way, but it works good enough. and, netlink have a bit weak point. end up, netlink philosophy is read/write model. I afraid to many low-mem message queued in netlink buffer at under heavy pressure. it cause degrade memory pressure. > Still, it is good to start with some code - eventually we might just > have a full reservation API created. Rik and I and others have bounced > ideas around for a while and I hope we can pitch in. I will play with > these patches later. Great. Welcome to any idea and any discussion. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[sample] mem_notify v6: usage example
this is usage example of /dev/mem_notify. Daniel Spang create original version. kosaki add fasync related code. Signed-off-by: Daniel Spang <[EMAIL PROTECTED]> Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]> --- Documentation/mem_notify.c | 120 + 1 file changed, 120 insertions(+) Index: b/Documentation/mem_notify.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ b/Documentation/mem_notify.c2008-02-10 00:44:00.0 +0900 @@ -0,0 +1,120 @@ +/* + * Allocate 10 MB each second. Exit on notification. + */ + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +int count = 0; +int size = 10; + +void *do_alloc() +{ +for(;;) { +int *buffer; +buffer = mmap(NULL, size*1024*1024, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); +if (buffer == MAP_FAILED) { +perror("mmap"); +exit(EXIT_FAILURE); +} +memset(buffer, 1 , size*1024*1024); + +printf("-"); +fflush(stdout); + +count++; +sleep(1); +} +} + +int wait_for_notification(struct pollfd *pfd) +{ +int ret; +read(pfd->fd, 0, 0); +ret = poll(pfd, 1, -1); /* wake up when low memory */ +if (ret == -1 && errno != EINTR) { +perror("poll"); +exit(EXIT_FAILURE); +} +return ret; +} + +void do_free() +{ + int fd; + struct pollfd pfd; + +fd = open("/dev/mem_notify", O_RDONLY); +if (fd == -1) { +perror("open"); +exit(EXIT_FAILURE); +} + + pfd.fd = fd; +pfd.events = POLLIN; +for(;;) +if (wait_for_notification(&pfd) > 0) { +printf("\nGot notification, allocated %d MB\n", + size * count); +exit(EXIT_SUCCESS); +} +} + +void do_free_signal() +{ + int fd; + int flags; + +fd = open("/dev/mem_notify", O_RDONLY); +if (fd == -1) { +perror("open"); +exit(EXIT_FAILURE); +} + + fcntl(fd, F_SETOWN, getpid()); + fcntl(fd, F_SETSIG, SIGUSR1); + + flags = fcntl(fd, F_GETFL); + fcntl(fd, F_SETFL, flags|FASYNC); /* when low memory, receive SIGUSR1 */ + + for(;;) + sleep(1); +} + + +void daniel_exit(int signo) +{ + printf("\nGot notification %d, allocated %d MB\n", + signo, size * count); + exit(EXIT_SUCCESS); + +} + +int main(int argc, char *argv[]) +{ +pthread_t allocator; + + if(argc == 2 && (strcmp(argv[1], "-sig") == 0)) { + printf("run signal mode\n"); + signal(SIGUSR1, daniel_exit); + pthread_create(&allocator, NULL, do_alloc, NULL); + do_free_signal(); + } else { + printf("run poll mode\n"); + pthread_create(&allocator, NULL, do_alloc, NULL); + do_free(); + } + return 0; +} - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/8][for -mm] mem_notify v6: support fasync feature
implement FASYNC capability to /dev/mem_notify. fd = open("/dev/mem_notify", O_RDONLY); fcntl(fd, F_SETOWN, getpid()); fcntl(fd, F_SETSIG, SIGUSR1); flags = fcntl(fd, F_GETFL); fcntl(fd, F_SETFL, flags|FASYNC); /* when low memory, receive SIGUSR1 */ ChangeLog v5 -> v6: o rewrite usage example o cleanups number of wakeup tasks calculation. v5: new Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]> --- mm/mem_notify.c | 109 +--- 1 file changed, 104 insertions(+), 5 deletions(-) Index: b/mm/mem_notify.c === --- a/mm/mem_notify.c 2008-02-03 20:37:25.0 +0900 +++ b/mm/mem_notify.c 2008-02-03 20:48:04.0 +0900 @@ -24,18 +24,58 @@ #define MAX_WAKEUP_TASKS (100) struct mem_notify_file_info { - unsigned long last_proc_notify; + unsigned long last_proc_notify; + struct file *file; + + /* for fasync */ + struct list_head fa_list; + int fa_fd; }; static DECLARE_WAIT_QUEUE_HEAD(mem_wait); static atomic_long_t nr_under_memory_pressure_zones = ATOMIC_LONG_INIT(0); static atomic_t nr_watcher_task = ATOMIC_INIT(0); +static LIST_HEAD(mem_notify_fasync_list); +static DEFINE_SPINLOCK(mem_notify_fasync_lock); +static atomic_t nr_fasync_task = ATOMIC_INIT(0); atomic_long_t last_mem_notify = ATOMIC_LONG_INIT(INITIAL_JIFFIES); +static void mem_notify_kill_fasync_nr(int nr) +{ + struct mem_notify_file_info *iter, *saved_iter; + LIST_HEAD(l_fired); + + if (!nr) + return; + + spin_lock(&mem_notify_fasync_lock); + + list_for_each_entry_safe_reverse(iter, saved_iter, +&mem_notify_fasync_list, +fa_list) { + struct fown_struct *fown; + + fown = &iter->file->f_owner; + send_sigio(fown, iter->fa_fd, POLL_IN); + + list_del(&iter->fa_list); + list_add(&iter->fa_list, &l_fired); + if (!--nr) + break; + } + + /* rotate moving for FIFO wakeup */ + list_splice(&l_fired, &mem_notify_fasync_list); + + spin_unlock(&mem_notify_fasync_lock); +} + void __memory_pressure_notify(struct zone *zone, int pressure) { int nr_wakeup; + int nr_poll_wakeup = 0; + int nr_fasync_wakeup = 0; int flags; spin_lock_irqsave(&mem_wait.lock, flags); @@ -48,6 +88,8 @@ void __memory_pressure_notify(struct zon if (pressure) { int nr_watcher = atomic_read(&nr_watcher_task); + int nr_fasync_wait_tasks = atomic_read(&nr_fasync_task); + int nr_poll_wait_tasks = nr_watcher - nr_fasync_wait_tasks; atomic_long_set(&last_mem_notify, jiffies); if (!nr_watcher) @@ -57,10 +99,27 @@ void __memory_pressure_notify(struct zon if (unlikely(nr_wakeup > MAX_WAKEUP_TASKS)) nr_wakeup = MAX_WAKEUP_TASKS; - wake_up_locked_nr(&mem_wait, nr_wakeup); + /* nr_wakeup + nr_fasync_wakeup = nr_fasync_wait_taks x +nr_watcher + */ + nr_fasync_wakeup = DIV_ROUND_UP(nr_fasync_wait_tasks * + nr_wakeup, nr_watcher); + if (unlikely(nr_fasync_wakeup > nr_fasync_wait_tasks)) + nr_fasync_wakeup = nr_fasync_wait_tasks; + + nr_poll_wakeup = DIV_ROUND_UP(nr_poll_wait_tasks * + nr_wakeup, nr_watcher); + if (unlikely(nr_poll_wakeup > nr_poll_wait_tasks)) + nr_poll_wakeup = nr_poll_wait_tasks; + + wake_up_locked_nr(&mem_wait, nr_poll_wakeup); } out: spin_unlock_irqrestore(&mem_wait.lock, flags); + + if (nr_fasync_wakeup) + mem_notify_kill_fasync_nr(nr_fasync_wakeup); } static int mem_notify_open(struct inode *inode, struct file *file) @@ -75,6 +134,9 @@ static int mem_notify_open(struct inode } info->last_proc_notify = INITIAL_JIFFIES; + INIT_LIST_HEAD(&info->fa_list); + info->file = file; + info->fa_fd = -1; file->private_data = info; atomic_inc(&nr_watcher_task); out: @@ -83,7 +145,16 @@ out: static int mem_notify_release(struct inode *inode, struct file *file) { - kfree(file->private_data); + struct mem_notify_file_info *info = file->private_data; + + spin_lock(&mem_notify_fasync_lock); + if (!list_empty(&info->fa_list)) { + list_del(&info->fa_list); + atomic_dec(&nr_fasync_
[PATCH 7/8][for -mm] mem_notify v6: ignore very small zone for prevent incorrect low mem notify
on X86, ZONE_DMA is very very small. it cause undesirable low mem notification. It should ignored. but on other some architecture, ZONE_DMA have 4GB. 4GB is large as it is not possible to ignored. therefore, ignore or not is decided by zone size. ChangeLog: v5: new Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]> --- include/linux/mem_notify.h |3 +++ mm/page_alloc.c|6 +- 2 files changed, 8 insertions(+), 1 deletion(-) Index: b/include/linux/mem_notify.h === --- a/include/linux/mem_notify.h2008-01-23 22:06:04.0 +0900 +++ b/include/linux/mem_notify.h2008-01-23 22:08:02.0 +0900 @@ -22,6 +22,9 @@ static inline void memory_pressure_notif unsigned long target; unsigned long pages_high, pages_free, pages_reserve; + if (unlikely(zone->mem_notify_status == -1)) + return; + if (pressure) { target = atomic_long_read(&last_mem_notify) + MEM_NOTIFY_FREQ; if (likely(time_before(jiffies, target))) Index: b/mm/page_alloc.c === --- a/mm/page_alloc.c 2008-01-23 22:07:57.0 +0900 +++ b/mm/page_alloc.c 2008-01-23 22:08:02.0 +0900 @@ -3470,7 +3470,11 @@ static void __meminit free_area_init_cor zone->zone_pgdat = pgdat; zone->prev_priority = DEF_PRIORITY; - zone->mem_notify_status = 0; + + if (zone->present_pages < (pgdat->node_present_pages / 10)) + zone->mem_notify_status = -1; + else + zone->mem_notify_status = 0; zone_pcp_init(zone); INIT_LIST_HEAD(&zone->active_list); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8][for -mm] mem_notify v6: (optional) fixed incorrect shrink_zone
on X86, ZONE_DMA is very very small. It is often no used at all. Unfortunately, when NR_ACTIVE==0, NR_INACTIVE==0, shrink_zone() try to reclaim 1 page. because zone->nr_scan_active += (zone_page_state(zone, NR_ACTIVE) >> priority) + 1; ^ it cause unnecessary low memory notify ;-) I fixed it. ChangeLog v5: new Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]> --- mm/vmscan.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) Index: b/mm/vmscan.c === --- a/mm/vmscan.c 2008-02-03 20:27:53.0 +0900 +++ b/mm/vmscan.c 2008-02-03 20:33:13.0 +0900 @@ -947,7 +947,7 @@ static inline void note_zone_scanning_pr static inline int zone_is_near_oom(struct zone *zone) { - return zone->pages_scanned >= (zone_page_state(zone, NR_ACTIVE) + return zone->pages_scanned > (zone_page_state(zone, NR_ACTIVE) + zone_page_state(zone, NR_INACTIVE))*3; } @@ -1196,18 +1196,29 @@ static unsigned long shrink_zone(int pri unsigned long nr_inactive; unsigned long nr_to_scan; unsigned long nr_reclaimed = 0; + unsigned long tmp; + unsigned long zone_active; + unsigned long zone_inactive; if (scan_global_lru(sc)) { /* * Add one to nr_to_scan just to make sure that the kernel * will slowly sift through the active list. */ - zone->nr_scan_active += - (zone_page_state(zone, NR_ACTIVE) >> priority) + 1; + zone_active = zone_page_state(zone, NR_ACTIVE); + tmp = (zone_active >> priority) + 1; + if (unlikely(tmp > zone_active)) + tmp = zone_active; + zone->nr_scan_active += tmp; nr_active = zone->nr_scan_active; - zone->nr_scan_inactive += - (zone_page_state(zone, NR_INACTIVE) >> priority) + 1; + + zone_inactive = zone_page_state(zone, NR_INACTIVE); + tmp = (zone_inactive >> priority) + 1; + if (unlikely(tmp > zone_inactive)) + tmp = zone_inactive; + zone->nr_scan_inactive += tmp; nr_inactive = zone->nr_scan_inactive; + if (nr_inactive >= sc->swap_cluster_max) zone->nr_scan_inactive = 0; else - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8][for -mm] mem_notify v6: add new mem_notify field to /proc/zoneinfo
show new member of zone struct by /proc/zoneinfo. ChangeLog: v5: change display order to at last. Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]> --- mm/vmstat.c |8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) Index: b/mm/vmstat.c === --- a/mm/vmstat.c 2008-01-23 22:06:05.0 +0900 +++ b/mm/vmstat.c 2008-01-23 22:08:00.0 +0900 @@ -795,10 +795,12 @@ static void zoneinfo_show_print(struct s seq_printf(m, "\n all_unreclaimable: %u" "\n prev_priority: %i" - "\n start_pfn: %lu", - zone_is_all_unreclaimable(zone), + "\n start_pfn: %lu" + "\n mem_notify_status: %i", + zone_is_all_unreclaimable(zone), zone->prev_priority, - zone->zone_start_pfn); + zone->zone_start_pfn, + zone->mem_notify_status); seq_putc(m, '\n'); } - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8][for -mm] mem_notify v6: memory_pressure_notify() caller
the notification point to happen whenever the VM moves an anonymous page to the inactive list - this is a pretty good indication that there are unused anonymous pages present which will be very likely swapped out soon. and, It is judged out of trouble at the fllowing situations. o memory pressure decrease and stop moves an anonymous page to the inactive list. o free pages increase than (pages_high+lowmem_reserve)*2. ChangeLog: v5: add out of trouble notify to exit of balance_pgdat(). Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]> --- mm/page_alloc.c | 12 mm/vmscan.c | 26 ++ 2 files changed, 38 insertions(+) Index: b/mm/vmscan.c === --- a/mm/vmscan.c 2008-01-23 22:06:08.0 +0900 +++ b/mm/vmscan.c 2008-01-23 22:07:57.0 +0900 @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -1089,10 +1090,14 @@ static void shrink_active_list(unsigned struct page *page; struct pagevec pvec; int reclaim_mapped = 0; + bool inactivated_anon = 0; if (sc->may_swap) reclaim_mapped = calc_reclaim_mapped(sc, zone, priority); + if (!reclaim_mapped) + memory_pressure_notify(zone, 0); + lru_add_drain(); spin_lock_irq(&zone->lru_lock); pgmoved = sc->isolate_pages(nr_pages, &l_hold, &pgscanned, sc->order, @@ -1116,6 +1121,13 @@ static void shrink_active_list(unsigned if (!reclaim_mapped || (total_swap_pages == 0 && PageAnon(page)) || page_referenced(page, 0, sc->mem_cgroup)) { + /* deal with the case where there is no +* swap but an anonymous page would be +* moved to the inactive list. +*/ + if (!total_swap_pages && reclaim_mapped && + PageAnon(page)) + inactivated_anon = 1; list_add(&page->lru, &l_active); continue; } @@ -1123,8 +1135,12 @@ static void shrink_active_list(unsigned list_add(&page->lru, &l_active); continue; } + if (PageAnon(page)) + inactivated_anon = 1; list_add(&page->lru, &l_inactive); } + if (inactivated_anon) + memory_pressure_notify(zone, 1); pagevec_init(&pvec, 1); pgmoved = 0; @@ -1158,6 +1174,8 @@ static void shrink_active_list(unsigned pagevec_strip(&pvec); spin_lock_irq(&zone->lru_lock); } + if (!reclaim_mapped) + memory_pressure_notify(zone, 0); pgmoved = 0; while (!list_empty(&l_active)) { @@ -1659,6 +1677,14 @@ out: goto loop_again; } + for (i = pgdat->nr_zones - 1; i >= 0; i--) { + struct zone *zone = pgdat->node_zones + i; + + if (!populated_zone(zone)) + continue; + memory_pressure_notify(zone, 0); + } + return nr_reclaimed; } Index: b/mm/page_alloc.c === --- a/mm/page_alloc.c 2008-01-23 22:06:08.0 +0900 +++ b/mm/page_alloc.c 2008-01-23 23:09:32.0 +0900 @@ -44,6 +44,7 @@ #include #include #include +#include #include #include @@ -435,6 +436,8 @@ static inline void __free_one_page(struc unsigned long page_idx; int order_size = 1 << order; int migratetype = get_pageblock_migratetype(page); + unsigned long prev_free; + unsigned long notify_threshold; if (unlikely(PageCompound(page))) destroy_compound_page(page, order); @@ -444,6 +447,7 @@ static inline void __free_one_page(struc VM_BUG_ON(page_idx & (order_size - 1)); VM_BUG_ON(bad_range(zone, page)); + prev_free = zone_page_state(zone, NR_FREE_PAGES); __mod_zone_page_state(zone, NR_FREE_PAGES, order_size); while (order < MAX_ORDER-1) { unsigned long combined_idx; @@ -465,6 +469,14 @@ static inline void __free_one_page(struc list_add(&page->lru, &zone->free_area[order].free_list[migratetype]); zone->free_area[order].nr_free++; + + notify_threshold = (zone->pages_high + + zone->lowmem_reserve[MAX_NR_ZONES-1]) * 2; + + if (unlikely((zone->mem_notify_status == 1) && +(prev_free <= notify_threshold) && +(zone_page_state(zone, NR_FREE_PAGES) > notify_threshold))) +
[PATCH 3/8][for -mm] mem_notify v6: introduce /dev/mem_notify new device (the core of this patch series)
the core of this patch series. add /dev/mem_notify device for notification low memory to user process. fd = open("/dev/mem_notify", O_RDONLY); if (fd < 0) { exit(1); } pollfds.fd = fd; pollfds.events = POLLIN; pollfds.revents = 0; err = poll(&pollfds, 1, -1); // wake up at low memory ... ChangeLog v5 -> v6: o improve number of wakeup tasks fomula when task is a few. Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]> --- Documentation/devices.txt |1 drivers/char/mem.c |5 + include/linux/mem_notify.h | 42 +++ include/linux/mmzone.h |1 mm/Makefile|2 mm/mem_notify.c| 123 + mm/page_alloc.c|1 7 files changed, 174 insertions(+), 1 deletion(-) Index: b/drivers/char/mem.c === --- a/drivers/char/mem.c2008-02-03 20:59:43.0 +0900 +++ b/drivers/char/mem.c2008-02-03 21:00:24.0 +0900 @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -869,6 +870,9 @@ static int memory_open(struct inode * in filp->f_op = &oldmem_fops; break; #endif + case 13: + filp->f_op = &mem_notify_fops; + break; default: return -ENXIO; } @@ -901,6 +905,7 @@ static const struct { #ifdef CONFIG_CRASH_DUMP {12,"oldmem",S_IRUSR | S_IWUSR | S_IRGRP, &oldmem_fops}, #endif + {13, "mem_notify", S_IRUGO, &mem_notify_fops}, }; static struct class *mem_class; Index: b/include/linux/mem_notify.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ b/include/linux/mem_notify.h2008-02-03 21:01:41.0 +0900 @@ -0,0 +1,42 @@ +/* + * Notify applications of memory pressure via /dev/mem_notify + * + * Copyright (C) 2008 Marcelo Tosatti <[EMAIL PROTECTED]>, + *KOSAKI Motohiro <[EMAIL PROTECTED]> + * + * Released under the GPL, see the file COPYING for details. + */ + +#ifndef _LINUX_MEM_NOTIFY_H +#define _LINUX_MEM_NOTIFY_H + +#define MEM_NOTIFY_FREQ (HZ/5) + +extern atomic_long_t last_mem_notify; +extern struct file_operations mem_notify_fops; + +extern void __memory_pressure_notify(struct zone *zone, int pressure); + +static inline void memory_pressure_notify(struct zone *zone, int pressure) +{ + unsigned long target; + unsigned long pages_high, pages_free, pages_reserve; + + if (pressure) { + target = atomic_long_read(&last_mem_notify) + MEM_NOTIFY_FREQ; + if (likely(time_before(jiffies, target))) + return; + + pages_high = zone->pages_high; + pages_free = zone_page_state(zone, NR_FREE_PAGES); + pages_reserve = zone->lowmem_reserve[MAX_NR_ZONES-1]; + if (unlikely(pages_free > (pages_high+pages_reserve)*2)) + return; + + } else if (likely(!zone->mem_notify_status)) + return; + + __memory_pressure_notify(zone, pressure); +} + +#endif /* _LINUX_MEM_NOTIFY_H */ Index: b/include/linux/mmzone.h === --- a/include/linux/mmzone.h2008-02-03 20:59:43.0 +0900 +++ b/include/linux/mmzone.h2008-02-03 20:59:46.0 +0900 @@ -283,6 +283,7 @@ struct zone { */ int prev_priority; + int mem_notify_status; ZONE_PADDING(_pad2_) /* Rarely used or read-mostly fields */ Index: b/mm/Makefile === --- a/mm/Makefile 2008-02-03 20:59:43.0 +0900 +++ b/mm/Makefile 2008-02-03 20:59:46.0 +0900 @@ -11,7 +11,7 @@ obj-y := bootmem.o filemap.o mempool.o page_alloc.o page-writeback.o pdflush.o \ readahead.o swap.o truncate.o vmscan.o \ prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \ - page_isolation.o $(mmu-y) + page_isolation.o mem_notify.o $(mmu-y) obj-$(CONFIG_PROC_PAGE_MONITOR) += pagewalk.o obj-$(CONFIG_BOUNCE) += bounce.o Index: b/mm/mem_notify.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ b/mm/mem_notify.c 2008-02-03 21:02:30.0 +0900 @@ -0,0 +1,123 @@ +/* + * Notify applications of memory pressure via /dev/mem_notify + * + * Copyright (C) 2008 Marcelo Tosatti <[EMAIL PROTECTED]>, + *KOSAKI Motohiro <[EMAIL PROTECTED]> + * + * Released under the GPL, see the
[PATCH 2/8][for -mm] mem_notify v6: introduce wake_up_locked_nr() new API
introduce new API wake_up_locked_nr() and wake_up_locked_all(). it it similar as wake_up_nr() and wake_up_all(), but it doesn't lock. Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]> --- include/linux/wait.h | 12 kernel/sched.c |5 +++-- 2 files changed, 11 insertions(+), 6 deletions(-) Index: b/include/linux/wait.h === --- a/include/linux/wait.h 2008-02-03 20:27:54.0 +0900 +++ b/include/linux/wait.h 2008-02-03 20:32:12.0 +0900 @@ -142,7 +142,8 @@ static inline void __remove_wait_queue(w } void FASTCALL(__wake_up(wait_queue_head_t *q, unsigned int mode, int nr, void *key)); -extern void FASTCALL(__wake_up_locked(wait_queue_head_t *q, unsigned int mode)); +void FASTCALL(__wake_up_locked(wait_queue_head_t *q, unsigned int mode, + int nr, void *key)); extern void FASTCALL(__wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr)); void FASTCALL(__wake_up_bit(wait_queue_head_t *, void *, int)); int FASTCALL(__wait_on_bit(wait_queue_head_t *, struct wait_bit_queue *, int (*)(void *), unsigned)); @@ -155,10 +156,13 @@ wait_queue_head_t *FASTCALL(bit_waitqueu #define wake_up(x) __wake_up(x, TASK_NORMAL, 1, NULL) #define wake_up_nr(x, nr) __wake_up(x, TASK_NORMAL, nr, NULL) #define wake_up_all(x) __wake_up(x, TASK_NORMAL, 0, NULL) -#define wake_up_locked(x) __wake_up_locked((x), TASK_NORMAL) -#define wake_up_interruptible(x) __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL) -#define wake_up_interruptible_nr(x, nr)__wake_up(x, TASK_INTERRUPTIBLE, nr, NULL) +#define wake_up_locked(x) __wake_up_locked((x), TASK_NORMAL, 1, NULL) +#define wake_up_locked_nr(x, nr)__wake_up_locked((x), TASK_NORMAL, nr, NULL) +#define wake_up_locked_all(x) __wake_up_locked((x), TASK_NORMAL, 0, NULL) + +#define wake_up_interruptible(x) __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL) +#define wake_up_interruptible_nr(x, nr) __wake_up(x, TASK_INTERRUPTIBLE, nr, NULL) #define wake_up_interruptible_all(x) __wake_up(x, TASK_INTERRUPTIBLE, 0, NULL) #define wake_up_interruptible_sync(x) __wake_up_sync((x), TASK_INTERRUPTIBLE, 1) Index: b/kernel/sched.c === --- a/kernel/sched.c2008-02-03 20:27:54.0 +0900 +++ b/kernel/sched.c2008-02-03 20:29:09.0 +0900 @@ -4115,9 +4115,10 @@ EXPORT_SYMBOL(__wake_up); /* * Same as __wake_up but called with the spinlock in wait_queue_head_t held. */ -void __wake_up_locked(wait_queue_head_t *q, unsigned int mode) +void __wake_up_locked(wait_queue_head_t *q, unsigned int mode, + int nr_exclusive, void *key) { - __wake_up_common(q, mode, 1, 0, NULL); + __wake_up_common(q, mode, nr_exclusive, 0, key); } /** - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/8][for -mm] mem_notify v6: introduce poll_wait_exclusive()
There are 2 way of adding item to wait_queue, 1. add_wait_queue() 2. add_wait_queue_exclusive() and add_wait_queue_exclusive() is very useful API. unforunately, poll_wait_exclusive() against poll_wait() doesn't exist. it means there is no way that wake up only 1 process where polled. wake_up() is wake up all sleeping process by poll_wait(), not 1 process. this patch introduce poll_wait_exclusive() new API for allow wake up only 1 process. unsigned int kosaki_poll(struct file *file, struct poll_table_struct *wait) { poll_wait_exclusive(file, &kosaki_wait_queue, wait); if (data_exist) return POLLIN | POLLRDNORM; return 0; } Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]> --- fs/eventpoll.c |7 +-- fs/select.c |9 ++--- include/linux/poll.h | 13 +++-- 3 files changed, 22 insertions(+), 7 deletions(-) Index: b/fs/eventpoll.c === --- a/fs/eventpoll.c2008-01-23 19:22:19.0 +0900 +++ b/fs/eventpoll.c2008-01-23 21:11:56.0 +0900 @@ -675,7 +675,7 @@ out_unlock: * target file wakeup lists. */ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead, -poll_table *pt) +poll_table *pt, int exclusive) { struct epitem *epi = ep_item_from_epqueue(pt); struct eppoll_entry *pwq; @@ -684,7 +684,10 @@ static void ep_ptable_queue_proc(struct init_waitqueue_func_entry(&pwq->wait, ep_poll_callback); pwq->whead = whead; pwq->base = epi; - add_wait_queue(whead, &pwq->wait); + if (exclusive) + add_wait_queue_exclusive(whead, &pwq->wait); + else + add_wait_queue(whead, &pwq->wait); list_add_tail(&pwq->llink, &epi->pwqlist); epi->nwait++; } else { Index: b/fs/select.c === --- a/fs/select.c 2008-01-23 19:22:22.0 +0900 +++ b/fs/select.c 2008-01-23 21:11:56.0 +0900 @@ -48,7 +48,7 @@ struct poll_table_page { * poll table. */ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address, - poll_table *p); + poll_table *p, int exclusive); void poll_initwait(struct poll_wqueues *pwq) { @@ -117,7 +117,7 @@ static struct poll_table_entry *poll_get /* Add a new entry */ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address, - poll_table *p) + poll_table *p, int exclusive) { struct poll_table_entry *entry = poll_get_entry(p); if (!entry) @@ -126,7 +126,10 @@ static void __pollwait(struct file *filp entry->filp = filp; entry->wait_address = wait_address; init_waitqueue_entry(&entry->wait, current); - add_wait_queue(wait_address, &entry->wait); + if (exclusive) + add_wait_queue_exclusive(wait_address, &entry->wait); + else + add_wait_queue(wait_address, &entry->wait); } #define FDS_IN(fds, n) (fds->in + n) Index: b/include/linux/poll.h === --- a/include/linux/poll.h 2008-01-23 19:22:55.0 +0900 +++ b/include/linux/poll.h 2008-02-03 20:28:27.0 +0900 @@ -28,7 +28,8 @@ struct poll_table_struct; /* * structures and helpers for f_op->poll implementations */ -typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *, struct poll_table_struct *); +typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *, + struct poll_table_struct *, int); typedef struct poll_table_struct { poll_queue_proc qproc; @@ -37,7 +38,15 @@ typedef struct poll_table_struct { static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p) { if (p && wait_address) - p->qproc(filp, wait_address, p); + p->qproc(filp, wait_address, p, 0); +} + +static inline void poll_wait_exclusive(struct file *filp, + wait_queue_head_t *wait_address, + poll_table *p) +{ + if (p && wait_address) + p->qproc(filp, wait_address, p, 1); } static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc) - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/8][for -mm] mem_notify v6
Hi The /dev/mem_notify is low memory notification device. it can avoid swappness and oom by cooperationg with the user process. the Linux Today article is very nice description. (great works by Jake Edge) http://www.linuxworld.com/news/2008/020508-kernel.html When memory gets tight, it is quite possible that applications have memory allocated—often caches for better performance—that they could free. After all, it is generally better to lose some performance than to face the consequences of being chosen by the OOM killer. But, currently, there is no way for a process to know that the kernel is feeling memory pressure. The patch provides a way for interested programs to monitor the /dev/mem_notify file to be notified if memory starts to run low. You need not be annoyed by OOM any longer :) please any comments! patch list [1/8] introduce poll_wait_exclusive() new API [2/8] introduce wake_up_locked_nr() new API [3/8] introduce /dev/mem_notify new device (the core of this patch series) [4/8] memory_pressure_notify() caller [5/8] add new mem_notify field to /proc/zoneinfo [6/8] (optional) fixed incorrect shrink_zone [7/8] ignore very small zone for prevent incorrect low mem notify. [8/8] support fasync feature related discussion: -- LKML OOM notifications requirement discussion http://www.gossamer-threads.com/lists/linux/kernel/832802?nohighlight=1#832802 OOM notifications patch [Marcelo Tosatti] http://marc.info/?l=linux-kernel&m=119273914027743&w=2 mem notifications v3 [Marcelo Tosatti] http://marc.info/?l=linux-mm&m=119852828327044&w=2 Thrashing notification patch [Daniel Spang] http://marc.info/?l=linux-mm&m=119427416315676&w=2 mem notification v4 http://marc.info/?l=linux-mm&m=120035840523718&w=2 mem notification v5 http://marc.info/?l=linux-mm&m=120114835421602&w=2 Changelog - v5 -> v6 (by KOSAKI Motohiro) o rebase to 2.6.24-mm1 o fixed thundering herd guard formula. v4 -> v5 (by KOSAKI Motohiro) o rebase to 2.6.24-rc8-mm1 o change display order of /proc/zoneinfo o ignore very small zone o support fcntl(F_SETFL, FASYNC) o fixed some trivial bugs. v3 -> v4 (by KOSAKI Motohiro) o rebase to 2.6.24-rc6-mm1 o avoid wake up all. o add judgement point to __free_one_page(). o add zone awareness. v2 -> v3 (by Marcelo Tosatti) o changes the notification point to happen whenever the VM moves an anonymous page to the inactive list. o implement notification rate limit. v1(oom notify) -> v2 (by Marcelo Tosatti) o name change o notify timing change from just swap thrashing to just before thrashing. o also works with swapless device. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] efs: move headers out of include/linux/
Merge include/linux/efs_fs{_i,_dir}.h into fs/efs/efs.h. efs_vh.h remains there because this is the IRIX volume header and shouldn't really be handled by efs but by the partitioning code. efs_sb.h remains there for now because it's exported to userspace. Of course this wrong and aboot should have a copy of it's own, but I'll leave that to a separate patch to avoid any contention. Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]> Index: linux-2.6/fs/efs/dir.c === --- linux-2.6.orig/fs/efs/dir.c 2008-02-09 10:11:47.0 +0100 +++ linux-2.6/fs/efs/dir.c 2008-02-09 10:12:33.0 +0100 @@ -5,8 +5,8 @@ */ #include -#include #include +#include "efs.h" static int efs_readdir(struct file *, void *, filldir_t); Index: linux-2.6/fs/efs/efs.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6/fs/efs/efs.h 2008-02-09 10:16:16.0 +0100 @@ -0,0 +1,140 @@ +/* + * Copyright (c) 1999 Al Smith + * + * Portions derived from work (c) 1995,1996 Christian Vogelgsang. + * Portions derived from IRIX header files (c) 1988 Silicon Graphics + */ +#ifndef _EFS_EFS_H_ +#define _EFS_EFS_H_ + +#include +#include + +#define EFS_VERSION "1.0a" + +static const char cprt[] = "EFS: "EFS_VERSION" - (c) 1999 Al Smith <[EMAIL PROTECTED]>"; + + +/* 1 block is 512 bytes */ +#defineEFS_BLOCKSIZE_BITS 9 +#defineEFS_BLOCKSIZE (1 << EFS_BLOCKSIZE_BITS) + +typedefint32_t efs_block_t; +typedef uint32_t efs_ino_t; + +#defineEFS_DIRECTEXTENTS 12 + +/* + * layout of an extent, in memory and on disk. 8 bytes exactly. + */ +typedef union extent_u { + unsigned char raw[8]; + struct extent_s { + unsigned intex_magic:8; /* magic # (zero) */ + unsigned intex_bn:24; /* basic block */ + unsigned intex_length:8;/* numblocks in this extent */ + unsigned intex_offset:24; /* logical offset into file */ + } cooked; +} efs_extent; + +typedef struct edevs { + __be16 odev; + __be32 ndev; +} efs_devs; + +/* + * extent based filesystem inode as it appears on disk. The efs inode + * is exactly 128 bytes long. + */ +struct efs_dinode { + __be16 di_mode;/* mode and type of file */ + __be16 di_nlink; /* number of links to file */ + __be16 di_uid; /* owner's user id */ + __be16 di_gid; /* owner's group id */ + __be32 di_size;/* number of bytes in file */ + __be32 di_atime; /* time last accessed */ + __be32 di_mtime; /* time last modified */ + __be32 di_ctime; /* time created */ + __be32 di_gen; /* generation number */ + __be16 di_numextents; /* # of extents */ + u_char di_version; /* version of inode */ + u_char di_spare; /* spare - used by AFS */ + union di_addr { + efs_extent di_extents[EFS_DIRECTEXTENTS]; + efs_devsdi_dev; /* device for IFCHR/IFBLK */ + } di_u; +}; + +/* efs inode storage in memory */ +struct efs_inode_info { + int numextents; + int lastextent; + + efs_extent extents[EFS_DIRECTEXTENTS]; + struct inodevfs_inode; +}; + +#include + +#define EFS_DIRBSIZE_BITS EFS_BLOCKSIZE_BITS +#define EFS_DIRBSIZE (1 << EFS_DIRBSIZE_BITS) + +struct efs_dentry { + __be32 inode; + unsigned char namelen; + charname[3]; +}; + +#define EFS_DENTSIZE (sizeof(struct efs_dentry) - 3 + 1) +#define EFS_MAXNAMELEN ((1 << (sizeof(char) * 8)) - 1) + +#define EFS_DIRBLK_HEADERSIZE 4 +#define EFS_DIRBLK_MAGIC 0xbeef /* moo */ + +struct efs_dir { + __be16 magic; + unsigned char firstused; + unsigned char slots; + + unsigned char space[EFS_DIRBSIZE - EFS_DIRBLK_HEADERSIZE]; +}; + +#define EFS_MAXENTS \ + ((EFS_DIRBSIZE - EFS_DIRBLK_HEADERSIZE) / \ +(EFS_DENTSIZE + sizeof(char))) + +#define EFS_SLOTAT(dir, slot) EFS_REALOFF((dir)->space[slot]) + +#define EFS_REALOFF(offset) ((offset << 1)) + + +static inline struct efs_inode_info *INODE_INFO(struct inode *inode) +{ + return container_of(inode, struct efs_inode_info, vfs_inode); +} + +static inline struct efs_sb_info *SUPER_INFO(struct super_block *sb) +{ + return sb->s_fs_info; +} + +struct statfs; +struct fid; + +extern const struct inode_operations efs_dir_inode_operations; +extern const struct file_operations efs_dir_operations; +extern const struct address_space_operations efs_symlink_aops; + +extern struct inode *efs_iget(struct super_block *, unsigned long); +exte
Question about do_sync_read()
Hi, In the implementation of file systems for 2.6 kernels, generic_file_read is often replaced with do_sync_read(). In this function we call "filp->f_op->aio_read" unconditionally. where most of the times aio_read is intialized as generic_file_aio_read(). Wouldn't it be a good idea to change the following code 241 for (;;) { 242 ret = filp->f_op->aio_read(&kiocb, &iov, 1, kiocb.ki_pos); 243 if (ret != -EIOCBRETRY) 244break; 245 wait_on_retry_sync_kiocb(&kiocb); to 241 for (;;) { 242 if(filp->f_op->aio_read) 243 ret = filp->f_op->aio_read(&kiocb, &iov, 1, kiocb.ki_pos); 244 else 245 ret = generic_file_aio_read(&kiocb, &iov, 1, kiocb.ki_pos); 246 if (ret != -EIOCBRETRY) 247break; 248 wait_on_retry_sync_kiocb(&kiocb); Just to have a fall back mechanism as we do at many places in the VFS layer.. -- Thanks & Regards, Manish Katiyar ( http://mkatiyar.googlepages.com ) 3rd Floor, Fair Winds Block EGL Software Park Off Intermediate Ring Road Bangalore 560071, India *** - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html