Re: [patch resend v4] update ctime and mtime for mmaped write
[EMAIL PROTECTED] wrote: But if you didn't notice until now, then the current implementation must be pretty reasonable for you use as well. Oh, I definitely noticed. As soon as I tried to port my application to 2.6, it broke - as evidenced by my complaints last year. The current solution is simple - since it's running on dedicated boxes, leave them on 2.4. Well I didn't know that was a change in behaviour vs 2.4 (or maybe I did and forgot). That was probably a bit silly, unless there was a good reason for it. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> But if you didn't notice until now, then the current implementation > must be pretty reasonable for you use as well. Oh, I definitely noticed. As soon as I tried to port my application to 2.6, it broke - as evidenced by my complaints last year. The current solution is simple - since it's running on dedicated boxes, leave them on 2.4. I've now got the hint on how to make it work on 2.6 (sync_file_range()), so I can try again. But the pressure to upgrade is not strong, so it might be a while. You may recall, this subthread started when I responding to "the only reason to use msync(MS_ASYNC) is to update timestamps" with a counterexample. I still think the purpose of the call is a hint to the kernel that writing to the specified page(s) is complete and now would be a good time to clean them. Which has very little to do with timestamps. Now, my application, which leaves less than a second between the MS_ASYNC and a subsequent MS_SYNC to check whether it's done, broke, but I can imagine similar cases where MS_ASYNC would remain a useful hint to reduce the sort of memory hogging generally associated with "dd if=/dev/zero" type operations. Reading between the lines of the standard, that seems (to me, at least) to obviously be the intended purpose of msync(MS_ASYNC). I wonder if there's any historical documentation describing the original intent behind creating the call. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
[EMAIL PROTECTED] wrote: Linux _will_ write all modified data to permanent storage locations. Since 2.6.17 it will do this regardless of msync(). Before 2.6.17 you do need msync() to enable data to be written back. But it will not start I/O immediately, which is not a requirement in the standard, or at least it's pretty vague about that. As I've said before, I disagree, but I'm not going to start a major flame war about it. The most relevant paragraph is: # When MS_ASYNC is specified, msync() returns immediately once all the # write operations are initiated or queued for servicing; when MS_SYNC is # specified, msync() will not return until all write operations are # completed as defined for synchronised I/O data integrity completion. # Either MS_ASYNC or MS_SYNC is specified, but not both. Note two things: 1) In the paragraphs before, what msync does is defined independently of the MS_* flags. Only the time of the return to user space varies. Thus, whatever the delay between calling msync() and the data being written, it should be the same whether MS_SYNC or MS_ASYNC is used. The implementation intended is: - Start all I/O - If MS_SYNC, wait for I/O to complete - Return to user space set_page_dirty queues pages for IO, and the writeout daemon will service that queue when it sees fit. IMO it is sufficient that you cannot say the implementation does not meet the standard. 2) "all the write operations are initiated or queued for servicing". It is a common convention in English (and most languages, I expect) that in the "or" is a preference for the first alternative. The second is a permitted alternative if the first is not possible. And "queued for servicing", especially "initiated or queued for servicing", to me imples queuing waiting for some resource. To have the resource being waited for be a timer expiry seems like rather a cheat to me. It's perhaps doesn't break the letter of the standard, but definitely bends it. It feels like a fiddle. Still, the basic hint function of msync(MS_ASYNC) *is* being accomplished: "I don't expect to write this page any more, so now would be a good time to clean it." It would just make my life easier if the kernel procrastinated less. But if you didn't notice until now, then the current implementation must be pretty reasonable for you use as well. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
[EMAIL PROTECTED] wrote: Linux _will_ write all modified data to permanent storage locations. Since 2.6.17 it will do this regardless of msync(). Before 2.6.17 you do need msync() to enable data to be written back. But it will not start I/O immediately, which is not a requirement in the standard, or at least it's pretty vague about that. As I've said before, I disagree, but I'm not going to start a major flame war about it. The most relevant paragraph is: # When MS_ASYNC is specified, msync() returns immediately once all the # write operations are initiated or queued for servicing; when MS_SYNC is # specified, msync() will not return until all write operations are # completed as defined for synchronised I/O data integrity completion. # Either MS_ASYNC or MS_SYNC is specified, but not both. Note two things: 1) In the paragraphs before, what msync does is defined independently of the MS_* flags. Only the time of the return to user space varies. Thus, whatever the delay between calling msync() and the data being written, it should be the same whether MS_SYNC or MS_ASYNC is used. The implementation intended is: - Start all I/O - If MS_SYNC, wait for I/O to complete - Return to user space set_page_dirty queues pages for IO, and the writeout daemon will service that queue when it sees fit. IMO it is sufficient that you cannot say the implementation does not meet the standard. 2) all the write operations are initiated or queued for servicing. It is a common convention in English (and most languages, I expect) that in the or is a preference for the first alternative. The second is a permitted alternative if the first is not possible. And queued for servicing, especially initiated or queued for servicing, to me imples queuing waiting for some resource. To have the resource being waited for be a timer expiry seems like rather a cheat to me. It's perhaps doesn't break the letter of the standard, but definitely bends it. It feels like a fiddle. Still, the basic hint function of msync(MS_ASYNC) *is* being accomplished: I don't expect to write this page any more, so now would be a good time to clean it. It would just make my life easier if the kernel procrastinated less. But if you didn't notice until now, then the current implementation must be pretty reasonable for you use as well. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
But if you didn't notice until now, then the current implementation must be pretty reasonable for you use as well. Oh, I definitely noticed. As soon as I tried to port my application to 2.6, it broke - as evidenced by my complaints last year. The current solution is simple - since it's running on dedicated boxes, leave them on 2.4. I've now got the hint on how to make it work on 2.6 (sync_file_range()), so I can try again. But the pressure to upgrade is not strong, so it might be a while. You may recall, this subthread started when I responding to the only reason to use msync(MS_ASYNC) is to update timestamps with a counterexample. I still think the purpose of the call is a hint to the kernel that writing to the specified page(s) is complete and now would be a good time to clean them. Which has very little to do with timestamps. Now, my application, which leaves less than a second between the MS_ASYNC and a subsequent MS_SYNC to check whether it's done, broke, but I can imagine similar cases where MS_ASYNC would remain a useful hint to reduce the sort of memory hogging generally associated with dd if=/dev/zero type operations. Reading between the lines of the standard, that seems (to me, at least) to obviously be the intended purpose of msync(MS_ASYNC). I wonder if there's any historical documentation describing the original intent behind creating the call. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
[EMAIL PROTECTED] wrote: But if you didn't notice until now, then the current implementation must be pretty reasonable for you use as well. Oh, I definitely noticed. As soon as I tried to port my application to 2.6, it broke - as evidenced by my complaints last year. The current solution is simple - since it's running on dedicated boxes, leave them on 2.4. Well I didn't know that was a change in behaviour vs 2.4 (or maybe I did and forgot). That was probably a bit silly, unless there was a good reason for it. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> Linux _will_ write all modified data to permanent storage locations. > Since 2.6.17 it will do this regardless of msync(). Before 2.6.17 you > do need msync() to enable data to be written back. > > But it will not start I/O immediately, which is not a requirement in > the standard, or at least it's pretty vague about that. As I've said before, I disagree, but I'm not going to start a major flame war about it. The most relevant paragraph is: # When MS_ASYNC is specified, msync() returns immediately once all the # write operations are initiated or queued for servicing; when MS_SYNC is # specified, msync() will not return until all write operations are # completed as defined for synchronised I/O data integrity completion. # Either MS_ASYNC or MS_SYNC is specified, but not both. Note two things: 1) In the paragraphs before, what msync does is defined independently of the MS_* flags. Only the time of the return to user space varies. Thus, whatever the delay between calling msync() and the data being written, it should be the same whether MS_SYNC or MS_ASYNC is used. The implementation intended is: - Start all I/O - If MS_SYNC, wait for I/O to complete - Return to user space 2) "all the write operations are initiated or queued for servicing". It is a common convention in English (and most languages, I expect) that in the "or" is a preference for the first alternative. The second is a permitted alternative if the first is not possible. And "queued for servicing", especially "initiated or queued for servicing", to me imples queuing waiting for some resource. To have the resource being waited for be a timer expiry seems like rather a cheat to me. It's perhaps doesn't break the letter of the standard, but definitely bends it. It feels like a fiddle. Still, the basic hint function of msync(MS_ASYNC) *is* being accomplished: "I don't expect to write this page any more, so now would be a good time to clean it." It would just make my life easier if the kernel procrastinated less. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On 27 Mar 2007 16:09:33 -0400 [EMAIL PROTECTED] wrote: > What part of "The msync() function writes all modified data to > permanent storage locations [...] For mappings to files, the msync() > function ensures that all write operations are completed as defined > for synchronised I/O data integrity completion." suggests that it's not > supposed to do disk I/O? How is that uselessly vague? > Because for MS_ASYNC, "msync() shall return immediately once all the write operations are initiated or queued for servicing". ie: the writes can complete one millisecond or one week later. We chose 30 seconds. And this is not completely fatuous - before 2.6.17, MAP_SHARED pages could float about in memory in a dirty state for arbitrarily long periods - potentially for the entire application lifetime. It was quite reasonable for our MS_ASYNC implementation to do what it did: tell the VM about the dirtiness of these pages so they get written back soon. Post-2.6.17 we preserved that behaviour. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> > Suggest you use msync(MS_ASYNC), then > > sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE). > > Thank you; I didn't know about that. And I can handle -ENOSYS by falling > back to the old behavior. > > > We can fix your application, and we'll break someone else's. > > If you can point to an application that it'll break, I'd be a lot more > understanding. Nobody did, last year. > > > I don't think it's solveable, really - the range of applications is so > > broad, and the "standard" is so vague as to be useless. > > I agree that standards are sometimes vague, but that one seemed about > as clear as it's possible to be without imposing unreasonably on > the file system and device driver layers. > > What part of "The msync() function writes all modified data to > permanent storage locations [...] For mappings to files, the msync() > function ensures that all write operations are completed as defined > for synchronised I/O data integrity completion." suggests that it's not > supposed to do disk I/O? How is that uselessly vague? Linux _will_ write all modified data to permanent storage locations. Since 2.6.17 it will do this regardless of msync(). Before 2.6.17 you do need msync() to enable data to be written back. But it will not start I/O immediately, which is not a requirement in the standard, or at least it's pretty vague about that. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> Suggest you use msync(MS_ASYNC), then > sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE). Thank you; I didn't know about that. And I can handle -ENOSYS by falling back to the old behavior. > We can fix your application, and we'll break someone else's. If you can point to an application that it'll break, I'd be a lot more understanding. Nobody did, last year. > I don't think it's solveable, really - the range of applications is so > broad, and the "standard" is so vague as to be useless. I agree that standards are sometimes vague, but that one seemed about as clear as it's possible to be without imposing unreasonably on the file system and device driver layers. What part of "The msync() function writes all modified data to permanent storage locations [...] For mappings to files, the msync() function ensures that all write operations are completed as defined for synchronised I/O data integrity completion." suggests that it's not supposed to do disk I/O? How is that uselessly vague? It says to me that msync's raison d'ĂȘtre is to write data from RAM to stable storage. If an application calls it too often, that's the application's fault just as if it called sync(2) too often. > This is why we've > been extending these things with linux-specific goodies which permit > applications to actually tell the kernel what they want to be done in a > more finely-grained fashion. Well, I still think the current Linux behavior is a bug, but there's a usable (and run-time compatible) workaround that doesn't unreasonably complicate the code, and that's good enough. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On 27 Mar 2007 15:24:52 -0400 [EMAIL PROTECTED] wrote: > > * MS_ASYNC does not start I/O (it used to, up to 2.5.67). > > Yes, I noticed. See > http://www.ussg.iu.edu/hypermail/linux/kernel/0602.1/0450.html > for a bug report on the subject February 2006. Suggest you use msync(MS_ASYNC), then sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE). The new (post 2.6.17) MAP_SHARED dirty-page semantics mean that the msync() isn't actually needed. > That's why this application is still running on 2.4. > > As I mentioned at the time, the SUS says: > (http://opengroup.org/onlinepubs/007908799/xsh/msync.html) > "When MS_ASYNC is specified, msync() returns immediately once all the > write operations are initiated or queued for servicing." > > You can argue that putting it on the dirty list constitutes "queued for > servicing", but the intent seems pretty clear to me: MS_ASYNC is supposed > to start the I/O. Although strict standards-ese parsing says that > either branch of an or is acceptable, it is a common English language > convention that the first alternative is preferred and the second > is a fallback. > > It makes sense in this case: start the write or, if that's not possible > (the disk is already busy), queue it for service as soon as the disk > is available. > > They perhaps didn't mandate it this strictly, but that's clearly the > intent. We can fix your application, and we'll break someone else's. I don't think it's solveable, really - the range of applications is so broad, and the "standard" is so vague as to be useless. This is why we've been extending these things with linux-specific goodies which permit applications to actually tell the kernel what they want to be done in a more finely-grained fashion. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> * MS_ASYNC does not start I/O (it used to, up to 2.5.67). Yes, I noticed. See http://www.ussg.iu.edu/hypermail/linux/kernel/0602.1/0450.html for a bug report on the subject February 2006. That's why this application is still running on 2.4. As I mentioned at the time, the SUS says: (http://opengroup.org/onlinepubs/007908799/xsh/msync.html) "When MS_ASYNC is specified, msync() returns immediately once all the write operations are initiated or queued for servicing." You can argue that putting it on the dirty list constitutes "queued for servicing", but the intent seems pretty clear to me: MS_ASYNC is supposed to start the I/O. Although strict standards-ese parsing says that either branch of an or is acceptable, it is a common English language convention that the first alternative is preferred and the second is a fallback. It makes sense in this case: start the write or, if that's not possible (the disk is already busy), queue it for service as soon as the disk is available. They perhaps didn't mandate it this strictly, but that's clearly the intent. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Tue, 27 Mar 2007 20:55:51 +0200 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > It's actually wrong about FADV_DONTNEED, which I think doesn't start > writeout either. case POSIX_FADV_DONTNEED: if (!bdi_write_congested(mapping->backing_dev_info)) filemap_flush(mapping); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> > I agree that msync(MS_ASYNC) has no semantics if time is ignored. > > But it's a useful way to tell the OS that the page is not going > > to be dirtied again. > > Just to clarify, here's the header comment for sys_msync(): > > /* > * MS_SYNC syncs the entire file - including mappings. > * > * MS_ASYNC does not start I/O (it used to, up to 2.5.67). > * Nor does it marks the relevant pages dirty (it used to up to 2.6.17). > * Now it doesn't do anything, since dirty pages are properly tracked. > * > * The application may now run fsync() to > * write out the dirty pages and wait on the writeout and check the result. > * Or the application may run fadvise(FADV_DONTNEED) against the fd to start > * async writeout immediately. > * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to > * applications. > */ > > It's actually wrong about FADV_DONTNEED, which I think doesn't start > writeout either. So there you have it ;) Sorry, FADV_DONTNEED _does_ seem to start writeout, but it does it indiscriminately, not just on the specified range. That should be easy to fix though. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> > Yes, this will make msync(MS_ASYNC) more heavyweight again. But if an > > application doesn't want to update the timestamps, it should just omit > > this call, since it does nothing else. > > Er... FWIW, I have an application that makes heavy use of msync(MS_ASYNC) > and doesn't care about timestamps. (In fact, sometimes it's configured > to write to a raw device and there are no timestamps.) > > It's used as a poor man's portable async I/O. The application logs > data to disk, and sometimes needs to sync it to disk to ensure it has > all been written. > > To reduce long pauses when doing msync(MS_SYNC), it does msync(MS_ASYNC) > as soon as a page is filled up to prompt asynchronous writeback. > "I'm done writing this page and don't intend to write it again. > Please start committing it to stable storage, but don't block me." > > Then, occasionally, there's an msync(MS_SYNC) call to be sure the data > is synced to disk. This caused annoying hiccups before the MS_ASYNC > calls were added. > > > I agree that msync(MS_ASYNC) has no semantics if time is ignored. > But it's a useful way to tell the OS that the page is not going > to be dirtied again. Just to clarify, here's the header comment for sys_msync(): /* * MS_SYNC syncs the entire file - including mappings. * * MS_ASYNC does not start I/O (it used to, up to 2.5.67). * Nor does it marks the relevant pages dirty (it used to up to 2.6.17). * Now it doesn't do anything, since dirty pages are properly tracked. * * The application may now run fsync() to * write out the dirty pages and wait on the writeout and check the result. * Or the application may run fadvise(FADV_DONTNEED) against the fd to start * async writeout immediately. * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to * applications. */ It's actually wrong about FADV_DONTNEED, which I think doesn't start writeout either. So there you have it ;) Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> Yes, this will make msync(MS_ASYNC) more heavyweight again. But if an > application doesn't want to update the timestamps, it should just omit > this call, since it does nothing else. Er... FWIW, I have an application that makes heavy use of msync(MS_ASYNC) and doesn't care about timestamps. (In fact, sometimes it's configured to write to a raw device and there are no timestamps.) It's used as a poor man's portable async I/O. The application logs data to disk, and sometimes needs to sync it to disk to ensure it has all been written. To reduce long pauses when doing msync(MS_SYNC), it does msync(MS_ASYNC) as soon as a page is filled up to prompt asynchronous writeback. "I'm done writing this page and don't intend to write it again. Please start committing it to stable storage, but don't block me." Then, occasionally, there's an msync(MS_SYNC) call to be sure the data is synced to disk. This caused annoying hiccups before the MS_ASYNC calls were added. I agree that msync(MS_ASYNC) has no semantics if time is ignored. But it's a useful way to tell the OS that the page is not going to be dirtied again. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> that's simpler ;) Is it correct enough though? The place where it will > become inaccurate is for repeated modification via an established map. ie: > > p = mmap(..., MAP_SHARED); > for ( ; ; ) > *p = 1; > > in which case I think the timestamp will only get updated once per > writeback interval (ie: 30 seconds). Which is perfectly OK, we really can't do any better in any sane way. My concern is only about MS_ASYNC, it would be really nice to know what other OSs do. I've checked on a Solaris 5.7 (not exactly a modern OS) and that is as good as current Linux. If someone has access to others pls. find my test program at the end. The logical way to handle msync is IMO: write to memory + msync(MS_ASYNC) == write() write to memory + msync(MS_SYNC) == write() + fdatasync() Yes, it would add some overhead for MS_ASYNC, but that's what the user wants isn't it? If the user doesn't want correctly updated timestamps, it should not call msync(). Miklos === msync_time.c == #include #include #include #include #include #include #include static const char *filename; static void print_times(const char *msg) { struct stat stbuf; stat(filename, ); printf("%s\t%li\t%li\t%li\n", msg, stbuf.st_ctime, stbuf.st_mtime, stbuf.st_atime); } static void usage(const char *progname) { fprintf(stderr, "usage: %s filename [-s]\n", progname); exit(1); } int main(int argc, char *argv[]) { int res; char *addr; int msync_flag = MS_ASYNC; int fd; if (argc < 2) usage(argv[0]); filename = argv[1]; if (argc > 2) { if (argc > 3) usage(argv[0]); if (strcmp(argv[2], "-s") == 0) msync_flag = MS_SYNC; else usage(argv[0]); } fd = open(filename, O_RDWR | O_TRUNC | O_CREAT, 0666); if (fd == -1) { perror(filename); return 1; } print_times("begin"); sleep(1); write(fd, "\n", 4); print_times("write"); sleep(1); addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (addr == (char *) -1) { perror("mmap"); return 1; } print_times("mmap"); sleep(1); addr[1] = 'b'; print_times("b"); sleep(1); res = msync(addr, 4, msync_flag); if (res == -1) { perror("msync"); return 1; } print_times("msync b"); sleep(1); addr[2] = 'c'; print_times("c"); sleep(1); res = msync(addr, 4, msync_flag); if (res == -1) { perror("msync"); return 1; } print_times("msync c"); sleep(1); addr[3] = 'd'; print_times("d"); sleep(1); res = munmap(addr, 4); if (res == -1) { perror("munmap"); return 1; } print_times("munmap"); sleep(1); res = close(fd); if (res == -1) { perror("close"); return 1; } print_times("end"); return 0; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Tue, 27 Mar 2007 11:23:06 +0200 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > > > But Peter Staubach says a RH custumer has files written thorugh mmap, > > > > > which are not being backed up. > > > > > > > > Yes, I expect the backup problem is the major real-world hurt arising > > > > from > > > > this bug. > > > > > > > > But I expect we could adequately plug that problem at munmap()-time. > > > > Or, > > > > better, do_wp_page(). As I said - half-assed. > > > > > > > > It's a question if whether the backup problem is the only thing which > > > > is hurting > > > > in the real-world, or if people have other problems. > > > > > > > > (In fact, what's wrong with doing it in do_wp_page()? > > > > > > It's rather more expensive, than just toggling a bit. > > > > It shouldn't be, especially for filesystems which have one-second timestamp > > granularity. > > > > Filesystems which have s_time_gran=1 might hurt a bit, but no more than > > they will with write(). > > > > Actually, no - we'd only update the mctime once per page per writeback > > period (30 seconds by default) so the load will be small. > > Why? For each faulted page the times will be updated, no? Yes, but only at pagefault-time. And - the faults are already "slow": we need to pull the page contents in from disk, or memset or cow the page - we need to take the trap compared to which, the cost of the timestamp update will (we hope) be relatively low. > Maybe it's acceptable, I don't really know the cost of > file_update_time(). > > Tried this patch, and it seems to work. It will even randomly update > the time for tmpfs files (on initial fault, and on swapins). > > Miklos > > Index: linux/mm/memory.c > === > --- linux.orig/mm/memory.c2007-03-27 11:04:40.0 +0200 > +++ linux/mm/memory.c 2007-03-27 11:08:19.0 +0200 > @@ -1664,6 +1664,8 @@ gotten: > unlock: > pte_unmap_unlock(page_table, ptl); > if (dirty_page) { > + if (vma->vm_file) > + file_update_time(vma->vm_file); > set_page_dirty_balance(dirty_page); > put_page(dirty_page); > } > @@ -2316,6 +2318,8 @@ retry: > unlock: > pte_unmap_unlock(page_table, ptl); > if (dirty_page) { > + if (vma->vm_file) > + file_update_time(vma->vm_file); > set_page_dirty_balance(dirty_page); > put_page(dirty_page); > } that's simpler ;) Is it correct enough though? The place where it will become inaccurate is for repeated modification via an established map. ie: p = mmap(..., MAP_SHARED); for ( ; ; ) *p = 1; in which case I think the timestamp will only get updated once per writeback interval (ie: 30 seconds). tmpfs files have an s_time_gran of 1, so benchmarking some workload on tmpfs with this patch will tell us the worst-case overhead of the change. I guess we should arrange for multiple CPUs to perform write faults against multiple pages of the same file in parallel. Of course, that'd be a pretty darn short benchmark because it'll run out of RAM. Which reveals why we probably won't have a performance problem in there. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> > > > But Peter Staubach says a RH custumer has files written thorugh mmap, > > > > which are not being backed up. > > > > > > Yes, I expect the backup problem is the major real-world hurt arising from > > > this bug. > > > > > > But I expect we could adequately plug that problem at munmap()-time. Or, > > > better, do_wp_page(). As I said - half-assed. > > > > > > It's a question if whether the backup problem is the only thing which is > > > hurting > > > in the real-world, or if people have other problems. > > > > > > (In fact, what's wrong with doing it in do_wp_page()? > > > > It's rather more expensive, than just toggling a bit. > > It shouldn't be, especially for filesystems which have one-second timestamp > granularity. > > Filesystems which have s_time_gran=1 might hurt a bit, but no more than > they will with write(). > > Actually, no - we'd only update the mctime once per page per writeback > period (30 seconds by default) so the load will be small. Why? For each faulted page the times will be updated, no? Maybe it's acceptable, I don't really know the cost of file_update_time(). Tried this patch, and it seems to work. It will even randomly update the time for tmpfs files (on initial fault, and on swapins). Miklos Index: linux/mm/memory.c === --- linux.orig/mm/memory.c 2007-03-27 11:04:40.0 +0200 +++ linux/mm/memory.c 2007-03-27 11:08:19.0 +0200 @@ -1664,6 +1664,8 @@ gotten: unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { + if (vma->vm_file) + file_update_time(vma->vm_file); set_page_dirty_balance(dirty_page); put_page(dirty_page); } @@ -2316,6 +2318,8 @@ retry: unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { + if (vma->vm_file) + file_update_time(vma->vm_file); set_page_dirty_balance(dirty_page); put_page(dirty_page); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Tue, 27 Mar 2007 10:28:16 +0200 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > But Peter Staubach says a RH custumer has files written thorugh mmap, > > > which are not being backed up. > > > > Yes, I expect the backup problem is the major real-world hurt arising from > > this bug. > > > > But I expect we could adequately plug that problem at munmap()-time. Or, > > better, do_wp_page(). As I said - half-assed. > > > > It's a question if whether the backup problem is the only thing which is > > hurting > > in the real-world, or if people have other problems. > > > > (In fact, what's wrong with doing it in do_wp_page()? > > It's rather more expensive, than just toggling a bit. It shouldn't be, especially for filesystems which have one-second timestamp granularity. Filesystems which have s_time_gran=1 might hurt a bit, but no more than they will with write(). Actually, no - we'd only update the mctime once per page per writeback period (30 seconds by default) so the load will be small. It'll be more often if the user is doing a lot of pte-cleaning via msync() or fsync(), but then the m/ctime writes will be the least of their problems. I'd have thought there were more substantial problems with something that crude? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> > But Peter Staubach says a RH custumer has files written thorugh mmap, > > which are not being backed up. > > Yes, I expect the backup problem is the major real-world hurt arising from > this bug. > > But I expect we could adequately plug that problem at munmap()-time. Or, > better, do_wp_page(). As I said - half-assed. > > It's a question if whether the backup problem is the only thing which is > hurting > in the real-world, or if people have other problems. > > (In fact, what's wrong with doing it in do_wp_page()? It's rather more expensive, than just toggling a bit. Let me work on it a bit more. I think I can make the current patch more palatable. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Tue, 27 Mar 2007 10:03:41 +0200 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > But Peter Staubach says a RH custumer has files written thorugh mmap, > which are not being backed up. Yes, I expect the backup problem is the major real-world hurt arising from this bug. But I expect we could adequately plug that problem at munmap()-time. Or, better, do_wp_page(). As I said - half-assed. It's a question if whether the backup problem is the only thing which is hurting in the real-world, or if people have other problems. (In fact, what's wrong with doing it in do_wp_page()? The timestamp could be up to 30 seconds too early, but that's heaps better than what we have now..) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> > > There is surely no need to duplicate all that. > > > > Yeah, we could teach generic_writepages() to conditionally not submit > > for io just test/clear pte dirtyness. > > > > Maybe that would be somewhat cleaner, dunno. > > > > Then there are the ram backed filesystems, which don't have dirty > > accounting and radix trees, and for which this pte walking is still > > needed to provide semantics consistent with normal filesystems. > > hm. > > I don't know how important all this is, really - we've had this bug for > ever and presumably we've already trained everyone to work around it. > > What usage scenarios are people actually hurting from? Is there anything > interesting in the mysterious Novell Bugzilla #206431? That's just a failing LTP testcase, not quite real life ;) But Peter Staubach says a RH custumer has files written thorugh mmap, which are not being backed up. > Perhaps we can get away with doing something half-assed which covers most > requirements... OK. At least I can split the patch into two half asses. The big question is tmpfs and friends. Those won't get any timestamp update without additional page table walking. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Tue, 27 Mar 2007 09:36:50 +0200 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > There is surely no need to duplicate all that. > > Yeah, we could teach generic_writepages() to conditionally not submit > for io just test/clear pte dirtyness. > > Maybe that would be somewhat cleaner, dunno. > > Then there are the ram backed filesystems, which don't have dirty > accounting and radix trees, and for which this pte walking is still > needed to provide semantics consistent with normal filesystems. hm. I don't know how important all this is, really - we've had this bug for ever and presumably we've already trained everyone to work around it. What usage scenarios are people actually hurting from? Is there anything interesting in the mysterious Novell Bugzilla #206431? Perhaps we can get away with doing something half-assed which covers most requirements... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> > > clear_page_dirty_for_io() already does that. > > > > > > So we should be able to test PageDirtiedByWrite() after running > > > clear_page_dirty_for_io() to discover whether this page was dirtied via > > > MAP_SHARED, and then update the inode times if so. > > > > What do you need the page flag for? > > To work out whether the page was dirtied via write (in which case the > timestamps were updated) or via mmap (in which case they were not). > > > The "modified through mmap" info > > is there in the ptes. > > It might not be there any more - the ptes could have got taken down by, for > example, reclaim. Yes, but then the "modified through mmap" is transferred to the per-address_space flag. All this is already done by this patch. > I dunno - I'm not trying very hard. I'm trying to encourage you to come up > with something less costly and less complex than this thing, but you appear > to be resisting. No, I'm just arguing that your suggestion is actually a complication, not a simplification ;) > > And from the ptes it can be transfered to a > > per-address_space flag. Nobody is interested through which page was > > the file modified. > > > > Anyway, that's just MS_SYNC. MS_ASYNC doesn't walk the pages, yet it > > should update the timestamp. That's the difficult one. > > > > We can treat MS_ASYNC as we treat MS_SYNC. Then, MS_ASYNC *does* walk the > pages. Is does it in generic_writepages(). It also even walks the ptes > for you, in clear_page_dirty_for_io(). Yes. But that's not very useful semantic for MS_ASYNC vs. file time update. It would basically say: "if you cann MS_ASYNC, and the file was modified then sometime in the future you will get an updated c/mtime". But this is not what POSIX says, and it's not what applications want. For example "make" would want to know if a file was modified or not, and with your suggestion only msync(MS_SYNC) would reliably provide that info. But msync(MS_SYNC) is unnecessary in many cases. > There is surely no need to duplicate all that. Yeah, we could teach generic_writepages() to conditionally not submit for io just test/clear pte dirtyness. Maybe that would be somewhat cleaner, dunno. Then there are the ram backed filesystems, which don't have dirty accounting and radix trees, and for which this pte walking is still needed to provide semantics consistent with normal filesystems. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Tue, 27 Mar 2007 08:55:40 +0200 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > > > This patch makes writing to shared memory mappings update st_ctime and > > > > > st_mtime as defined by SUSv3: > > > > > > > > Boy this is complicated. > > > > > > You tell me? > > > > > > > Is there a simpler way of doing all this? Say, we define a new page > > > > flag > > > > PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and > > > > ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into > > > > page-dirtiness. Then, when performing writeback we look to see if any > > > > of > > > > the dirty pages are !PageDirtiedByWrite() and, if so, we update > > > > [mc]time to > > > > current-time. > > > > > > I don't think a page flag gains anything over the address_space flag > > > that this patch already has. > > > > > > The complexity is not about keeping track of the "data modified > > > through mmap" state, but about msync() guarantees, that POSIX wants. > > > > > > And these requirements do in fact make some sense: msync() basically > > > means: > > > > > > "I want the data written through mmaps to be visible to the world" > > > > > > And that obviously includes updating the timestamps. > > > > > > So how do we know if the data was modified between two msync() > > > invocations? The only sane way I can think of is to walk the page > > > tables in msync() and test/clear the pte dirty bit. > > > > clear_page_dirty_for_io() already does that. > > > > So we should be able to test PageDirtiedByWrite() after running > > clear_page_dirty_for_io() to discover whether this page was dirtied via > > MAP_SHARED, and then update the inode times if so. > > What do you need the page flag for? To work out whether the page was dirtied via write (in which case the timestamps were updated) or via mmap (in which case they were not). > The "modified through mmap" info > is there in the ptes. It might not be there any more - the ptes could have got taken down by, for example, reclaim. I dunno - I'm not trying very hard. I'm trying to encourage you to come up with something less costly and less complex than this thing, but you appear to be resisting. > And from the ptes it can be transfered to a > per-address_space flag. Nobody is interested through which page was > the file modified. > > Anyway, that's just MS_SYNC. MS_ASYNC doesn't walk the pages, yet it > should update the timestamp. That's the difficult one. > We can treat MS_ASYNC as we treat MS_SYNC. Then, MS_ASYNC *does* walk the pages. Is does it in generic_writepages(). It also even walks the ptes for you, in clear_page_dirty_for_io(). There is surely no need to duplicate all that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Tue, 27 Mar 2007 08:55:40 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: This patch makes writing to shared memory mappings update st_ctime and st_mtime as defined by SUSv3: Boy this is complicated. You tell me? Is there a simpler way of doing all this? Say, we define a new page flag PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into page-dirtiness. Then, when performing writeback we look to see if any of the dirty pages are !PageDirtiedByWrite() and, if so, we update [mc]time to current-time. I don't think a page flag gains anything over the address_space flag that this patch already has. The complexity is not about keeping track of the data modified through mmap state, but about msync() guarantees, that POSIX wants. And these requirements do in fact make some sense: msync() basically means: I want the data written through mmaps to be visible to the world And that obviously includes updating the timestamps. So how do we know if the data was modified between two msync() invocations? The only sane way I can think of is to walk the page tables in msync() and test/clear the pte dirty bit. clear_page_dirty_for_io() already does that. So we should be able to test PageDirtiedByWrite() after running clear_page_dirty_for_io() to discover whether this page was dirtied via MAP_SHARED, and then update the inode times if so. What do you need the page flag for? To work out whether the page was dirtied via write (in which case the timestamps were updated) or via mmap (in which case they were not). The modified through mmap info is there in the ptes. It might not be there any more - the ptes could have got taken down by, for example, reclaim. I dunno - I'm not trying very hard. I'm trying to encourage you to come up with something less costly and less complex than this thing, but you appear to be resisting. And from the ptes it can be transfered to a per-address_space flag. Nobody is interested through which page was the file modified. Anyway, that's just MS_SYNC. MS_ASYNC doesn't walk the pages, yet it should update the timestamp. That's the difficult one. We can treat MS_ASYNC as we treat MS_SYNC. Then, MS_ASYNC *does* walk the pages. Is does it in generic_writepages(). It also even walks the ptes for you, in clear_page_dirty_for_io(). There is surely no need to duplicate all that. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
clear_page_dirty_for_io() already does that. So we should be able to test PageDirtiedByWrite() after running clear_page_dirty_for_io() to discover whether this page was dirtied via MAP_SHARED, and then update the inode times if so. What do you need the page flag for? To work out whether the page was dirtied via write (in which case the timestamps were updated) or via mmap (in which case they were not). The modified through mmap info is there in the ptes. It might not be there any more - the ptes could have got taken down by, for example, reclaim. Yes, but then the modified through mmap is transferred to the per-address_space flag. All this is already done by this patch. I dunno - I'm not trying very hard. I'm trying to encourage you to come up with something less costly and less complex than this thing, but you appear to be resisting. No, I'm just arguing that your suggestion is actually a complication, not a simplification ;) And from the ptes it can be transfered to a per-address_space flag. Nobody is interested through which page was the file modified. Anyway, that's just MS_SYNC. MS_ASYNC doesn't walk the pages, yet it should update the timestamp. That's the difficult one. We can treat MS_ASYNC as we treat MS_SYNC. Then, MS_ASYNC *does* walk the pages. Is does it in generic_writepages(). It also even walks the ptes for you, in clear_page_dirty_for_io(). Yes. But that's not very useful semantic for MS_ASYNC vs. file time update. It would basically say: if you cann MS_ASYNC, and the file was modified then sometime in the future you will get an updated c/mtime. But this is not what POSIX says, and it's not what applications want. For example make would want to know if a file was modified or not, and with your suggestion only msync(MS_SYNC) would reliably provide that info. But msync(MS_SYNC) is unnecessary in many cases. There is surely no need to duplicate all that. Yeah, we could teach generic_writepages() to conditionally not submit for io just test/clear pte dirtyness. Maybe that would be somewhat cleaner, dunno. Then there are the ram backed filesystems, which don't have dirty accounting and radix trees, and for which this pte walking is still needed to provide semantics consistent with normal filesystems. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Tue, 27 Mar 2007 09:36:50 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: There is surely no need to duplicate all that. Yeah, we could teach generic_writepages() to conditionally not submit for io just test/clear pte dirtyness. Maybe that would be somewhat cleaner, dunno. Then there are the ram backed filesystems, which don't have dirty accounting and radix trees, and for which this pte walking is still needed to provide semantics consistent with normal filesystems. hm. I don't know how important all this is, really - we've had this bug for ever and presumably we've already trained everyone to work around it. What usage scenarios are people actually hurting from? Is there anything interesting in the mysterious Novell Bugzilla #206431? Perhaps we can get away with doing something half-assed which covers most requirements... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
There is surely no need to duplicate all that. Yeah, we could teach generic_writepages() to conditionally not submit for io just test/clear pte dirtyness. Maybe that would be somewhat cleaner, dunno. Then there are the ram backed filesystems, which don't have dirty accounting and radix trees, and for which this pte walking is still needed to provide semantics consistent with normal filesystems. hm. I don't know how important all this is, really - we've had this bug for ever and presumably we've already trained everyone to work around it. What usage scenarios are people actually hurting from? Is there anything interesting in the mysterious Novell Bugzilla #206431? That's just a failing LTP testcase, not quite real life ;) But Peter Staubach says a RH custumer has files written thorugh mmap, which are not being backed up. Perhaps we can get away with doing something half-assed which covers most requirements... OK. At least I can split the patch into two half asses. The big question is tmpfs and friends. Those won't get any timestamp update without additional page table walking. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Tue, 27 Mar 2007 10:03:41 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: But Peter Staubach says a RH custumer has files written thorugh mmap, which are not being backed up. Yes, I expect the backup problem is the major real-world hurt arising from this bug. But I expect we could adequately plug that problem at munmap()-time. Or, better, do_wp_page(). As I said - half-assed. It's a question if whether the backup problem is the only thing which is hurting in the real-world, or if people have other problems. (In fact, what's wrong with doing it in do_wp_page()? The timestamp could be up to 30 seconds too early, but that's heaps better than what we have now..) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
But Peter Staubach says a RH custumer has files written thorugh mmap, which are not being backed up. Yes, I expect the backup problem is the major real-world hurt arising from this bug. But I expect we could adequately plug that problem at munmap()-time. Or, better, do_wp_page(). As I said - half-assed. It's a question if whether the backup problem is the only thing which is hurting in the real-world, or if people have other problems. (In fact, what's wrong with doing it in do_wp_page()? It's rather more expensive, than just toggling a bit. Let me work on it a bit more. I think I can make the current patch more palatable. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Tue, 27 Mar 2007 10:28:16 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: But Peter Staubach says a RH custumer has files written thorugh mmap, which are not being backed up. Yes, I expect the backup problem is the major real-world hurt arising from this bug. But I expect we could adequately plug that problem at munmap()-time. Or, better, do_wp_page(). As I said - half-assed. It's a question if whether the backup problem is the only thing which is hurting in the real-world, or if people have other problems. (In fact, what's wrong with doing it in do_wp_page()? It's rather more expensive, than just toggling a bit. It shouldn't be, especially for filesystems which have one-second timestamp granularity. Filesystems which have s_time_gran=1 might hurt a bit, but no more than they will with write(). Actually, no - we'd only update the mctime once per page per writeback period (30 seconds by default) so the load will be small. It'll be more often if the user is doing a lot of pte-cleaning via msync() or fsync(), but then the m/ctime writes will be the least of their problems. I'd have thought there were more substantial problems with something that crude? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
But Peter Staubach says a RH custumer has files written thorugh mmap, which are not being backed up. Yes, I expect the backup problem is the major real-world hurt arising from this bug. But I expect we could adequately plug that problem at munmap()-time. Or, better, do_wp_page(). As I said - half-assed. It's a question if whether the backup problem is the only thing which is hurting in the real-world, or if people have other problems. (In fact, what's wrong with doing it in do_wp_page()? It's rather more expensive, than just toggling a bit. It shouldn't be, especially for filesystems which have one-second timestamp granularity. Filesystems which have s_time_gran=1 might hurt a bit, but no more than they will with write(). Actually, no - we'd only update the mctime once per page per writeback period (30 seconds by default) so the load will be small. Why? For each faulted page the times will be updated, no? Maybe it's acceptable, I don't really know the cost of file_update_time(). Tried this patch, and it seems to work. It will even randomly update the time for tmpfs files (on initial fault, and on swapins). Miklos Index: linux/mm/memory.c === --- linux.orig/mm/memory.c 2007-03-27 11:04:40.0 +0200 +++ linux/mm/memory.c 2007-03-27 11:08:19.0 +0200 @@ -1664,6 +1664,8 @@ gotten: unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { + if (vma-vm_file) + file_update_time(vma-vm_file); set_page_dirty_balance(dirty_page); put_page(dirty_page); } @@ -2316,6 +2318,8 @@ retry: unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { + if (vma-vm_file) + file_update_time(vma-vm_file); set_page_dirty_balance(dirty_page); put_page(dirty_page); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Tue, 27 Mar 2007 11:23:06 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: But Peter Staubach says a RH custumer has files written thorugh mmap, which are not being backed up. Yes, I expect the backup problem is the major real-world hurt arising from this bug. But I expect we could adequately plug that problem at munmap()-time. Or, better, do_wp_page(). As I said - half-assed. It's a question if whether the backup problem is the only thing which is hurting in the real-world, or if people have other problems. (In fact, what's wrong with doing it in do_wp_page()? It's rather more expensive, than just toggling a bit. It shouldn't be, especially for filesystems which have one-second timestamp granularity. Filesystems which have s_time_gran=1 might hurt a bit, but no more than they will with write(). Actually, no - we'd only update the mctime once per page per writeback period (30 seconds by default) so the load will be small. Why? For each faulted page the times will be updated, no? Yes, but only at pagefault-time. And - the faults are already slow: we need to pull the page contents in from disk, or memset or cow the page - we need to take the trap compared to which, the cost of the timestamp update will (we hope) be relatively low. Maybe it's acceptable, I don't really know the cost of file_update_time(). Tried this patch, and it seems to work. It will even randomly update the time for tmpfs files (on initial fault, and on swapins). Miklos Index: linux/mm/memory.c === --- linux.orig/mm/memory.c2007-03-27 11:04:40.0 +0200 +++ linux/mm/memory.c 2007-03-27 11:08:19.0 +0200 @@ -1664,6 +1664,8 @@ gotten: unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { + if (vma-vm_file) + file_update_time(vma-vm_file); set_page_dirty_balance(dirty_page); put_page(dirty_page); } @@ -2316,6 +2318,8 @@ retry: unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { + if (vma-vm_file) + file_update_time(vma-vm_file); set_page_dirty_balance(dirty_page); put_page(dirty_page); } that's simpler ;) Is it correct enough though? The place where it will become inaccurate is for repeated modification via an established map. ie: p = mmap(..., MAP_SHARED); for ( ; ; ) *p = 1; in which case I think the timestamp will only get updated once per writeback interval (ie: 30 seconds). tmpfs files have an s_time_gran of 1, so benchmarking some workload on tmpfs with this patch will tell us the worst-case overhead of the change. I guess we should arrange for multiple CPUs to perform write faults against multiple pages of the same file in parallel. Of course, that'd be a pretty darn short benchmark because it'll run out of RAM. Which reveals why we probably won't have a performance problem in there. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
that's simpler ;) Is it correct enough though? The place where it will become inaccurate is for repeated modification via an established map. ie: p = mmap(..., MAP_SHARED); for ( ; ; ) *p = 1; in which case I think the timestamp will only get updated once per writeback interval (ie: 30 seconds). Which is perfectly OK, we really can't do any better in any sane way. My concern is only about MS_ASYNC, it would be really nice to know what other OSs do. I've checked on a Solaris 5.7 (not exactly a modern OS) and that is as good as current Linux. If someone has access to others pls. find my test program at the end. The logical way to handle msync is IMO: write to memory + msync(MS_ASYNC) == write() write to memory + msync(MS_SYNC) == write() + fdatasync() Yes, it would add some overhead for MS_ASYNC, but that's what the user wants isn't it? If the user doesn't want correctly updated timestamps, it should not call msync(). Miklos === msync_time.c == #include stdio.h #include stdlib.h #include unistd.h #include string.h #include fcntl.h #include sys/mman.h #include sys/stat.h static const char *filename; static void print_times(const char *msg) { struct stat stbuf; stat(filename, stbuf); printf(%s\t%li\t%li\t%li\n, msg, stbuf.st_ctime, stbuf.st_mtime, stbuf.st_atime); } static void usage(const char *progname) { fprintf(stderr, usage: %s filename [-s]\n, progname); exit(1); } int main(int argc, char *argv[]) { int res; char *addr; int msync_flag = MS_ASYNC; int fd; if (argc 2) usage(argv[0]); filename = argv[1]; if (argc 2) { if (argc 3) usage(argv[0]); if (strcmp(argv[2], -s) == 0) msync_flag = MS_SYNC; else usage(argv[0]); } fd = open(filename, O_RDWR | O_TRUNC | O_CREAT, 0666); if (fd == -1) { perror(filename); return 1; } print_times(begin); sleep(1); write(fd, \n, 4); print_times(write); sleep(1); addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (addr == (char *) -1) { perror(mmap); return 1; } print_times(mmap); sleep(1); addr[1] = 'b'; print_times(b); sleep(1); res = msync(addr, 4, msync_flag); if (res == -1) { perror(msync); return 1; } print_times(msync b); sleep(1); addr[2] = 'c'; print_times(c); sleep(1); res = msync(addr, 4, msync_flag); if (res == -1) { perror(msync); return 1; } print_times(msync c); sleep(1); addr[3] = 'd'; print_times(d); sleep(1); res = munmap(addr, 4); if (res == -1) { perror(munmap); return 1; } print_times(munmap); sleep(1); res = close(fd); if (res == -1) { perror(close); return 1; } print_times(end); return 0; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
Yes, this will make msync(MS_ASYNC) more heavyweight again. But if an application doesn't want to update the timestamps, it should just omit this call, since it does nothing else. Er... FWIW, I have an application that makes heavy use of msync(MS_ASYNC) and doesn't care about timestamps. (In fact, sometimes it's configured to write to a raw device and there are no timestamps.) It's used as a poor man's portable async I/O. The application logs data to disk, and sometimes needs to sync it to disk to ensure it has all been written. To reduce long pauses when doing msync(MS_SYNC), it does msync(MS_ASYNC) as soon as a page is filled up to prompt asynchronous writeback. I'm done writing this page and don't intend to write it again. Please start committing it to stable storage, but don't block me. Then, occasionally, there's an msync(MS_SYNC) call to be sure the data is synced to disk. This caused annoying hiccups before the MS_ASYNC calls were added. I agree that msync(MS_ASYNC) has no semantics if time is ignored. But it's a useful way to tell the OS that the page is not going to be dirtied again. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
Yes, this will make msync(MS_ASYNC) more heavyweight again. But if an application doesn't want to update the timestamps, it should just omit this call, since it does nothing else. Er... FWIW, I have an application that makes heavy use of msync(MS_ASYNC) and doesn't care about timestamps. (In fact, sometimes it's configured to write to a raw device and there are no timestamps.) It's used as a poor man's portable async I/O. The application logs data to disk, and sometimes needs to sync it to disk to ensure it has all been written. To reduce long pauses when doing msync(MS_SYNC), it does msync(MS_ASYNC) as soon as a page is filled up to prompt asynchronous writeback. I'm done writing this page and don't intend to write it again. Please start committing it to stable storage, but don't block me. Then, occasionally, there's an msync(MS_SYNC) call to be sure the data is synced to disk. This caused annoying hiccups before the MS_ASYNC calls were added. I agree that msync(MS_ASYNC) has no semantics if time is ignored. But it's a useful way to tell the OS that the page is not going to be dirtied again. Just to clarify, here's the header comment for sys_msync(): /* * MS_SYNC syncs the entire file - including mappings. * * MS_ASYNC does not start I/O (it used to, up to 2.5.67). * Nor does it marks the relevant pages dirty (it used to up to 2.6.17). * Now it doesn't do anything, since dirty pages are properly tracked. * * The application may now run fsync() to * write out the dirty pages and wait on the writeout and check the result. * Or the application may run fadvise(FADV_DONTNEED) against the fd to start * async writeout immediately. * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to * applications. */ It's actually wrong about FADV_DONTNEED, which I think doesn't start writeout either. So there you have it ;) Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
I agree that msync(MS_ASYNC) has no semantics if time is ignored. But it's a useful way to tell the OS that the page is not going to be dirtied again. Just to clarify, here's the header comment for sys_msync(): /* * MS_SYNC syncs the entire file - including mappings. * * MS_ASYNC does not start I/O (it used to, up to 2.5.67). * Nor does it marks the relevant pages dirty (it used to up to 2.6.17). * Now it doesn't do anything, since dirty pages are properly tracked. * * The application may now run fsync() to * write out the dirty pages and wait on the writeout and check the result. * Or the application may run fadvise(FADV_DONTNEED) against the fd to start * async writeout immediately. * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to * applications. */ It's actually wrong about FADV_DONTNEED, which I think doesn't start writeout either. So there you have it ;) Sorry, FADV_DONTNEED _does_ seem to start writeout, but it does it indiscriminately, not just on the specified range. That should be easy to fix though. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Tue, 27 Mar 2007 20:55:51 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: It's actually wrong about FADV_DONTNEED, which I think doesn't start writeout either. case POSIX_FADV_DONTNEED: if (!bdi_write_congested(mapping-backing_dev_info)) filemap_flush(mapping); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
* MS_ASYNC does not start I/O (it used to, up to 2.5.67). Yes, I noticed. See http://www.ussg.iu.edu/hypermail/linux/kernel/0602.1/0450.html for a bug report on the subject February 2006. That's why this application is still running on 2.4. As I mentioned at the time, the SUS says: (http://opengroup.org/onlinepubs/007908799/xsh/msync.html) When MS_ASYNC is specified, msync() returns immediately once all the write operations are initiated or queued for servicing. You can argue that putting it on the dirty list constitutes queued for servicing, but the intent seems pretty clear to me: MS_ASYNC is supposed to start the I/O. Although strict standards-ese parsing says that either branch of an or is acceptable, it is a common English language convention that the first alternative is preferred and the second is a fallback. It makes sense in this case: start the write or, if that's not possible (the disk is already busy), queue it for service as soon as the disk is available. They perhaps didn't mandate it this strictly, but that's clearly the intent. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On 27 Mar 2007 15:24:52 -0400 [EMAIL PROTECTED] wrote: * MS_ASYNC does not start I/O (it used to, up to 2.5.67). Yes, I noticed. See http://www.ussg.iu.edu/hypermail/linux/kernel/0602.1/0450.html for a bug report on the subject February 2006. Suggest you use msync(MS_ASYNC), then sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE). The new (post 2.6.17) MAP_SHARED dirty-page semantics mean that the msync() isn't actually needed. That's why this application is still running on 2.4. As I mentioned at the time, the SUS says: (http://opengroup.org/onlinepubs/007908799/xsh/msync.html) When MS_ASYNC is specified, msync() returns immediately once all the write operations are initiated or queued for servicing. You can argue that putting it on the dirty list constitutes queued for servicing, but the intent seems pretty clear to me: MS_ASYNC is supposed to start the I/O. Although strict standards-ese parsing says that either branch of an or is acceptable, it is a common English language convention that the first alternative is preferred and the second is a fallback. It makes sense in this case: start the write or, if that's not possible (the disk is already busy), queue it for service as soon as the disk is available. They perhaps didn't mandate it this strictly, but that's clearly the intent. We can fix your application, and we'll break someone else's. I don't think it's solveable, really - the range of applications is so broad, and the standard is so vague as to be useless. This is why we've been extending these things with linux-specific goodies which permit applications to actually tell the kernel what they want to be done in a more finely-grained fashion. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
Suggest you use msync(MS_ASYNC), then sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE). Thank you; I didn't know about that. And I can handle -ENOSYS by falling back to the old behavior. We can fix your application, and we'll break someone else's. If you can point to an application that it'll break, I'd be a lot more understanding. Nobody did, last year. I don't think it's solveable, really - the range of applications is so broad, and the standard is so vague as to be useless. I agree that standards are sometimes vague, but that one seemed about as clear as it's possible to be without imposing unreasonably on the file system and device driver layers. What part of The msync() function writes all modified data to permanent storage locations [...] For mappings to files, the msync() function ensures that all write operations are completed as defined for synchronised I/O data integrity completion. suggests that it's not supposed to do disk I/O? How is that uselessly vague? It says to me that msync's raison d'ĂȘtre is to write data from RAM to stable storage. If an application calls it too often, that's the application's fault just as if it called sync(2) too often. This is why we've been extending these things with linux-specific goodies which permit applications to actually tell the kernel what they want to be done in a more finely-grained fashion. Well, I still think the current Linux behavior is a bug, but there's a usable (and run-time compatible) workaround that doesn't unreasonably complicate the code, and that's good enough. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
Suggest you use msync(MS_ASYNC), then sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE). Thank you; I didn't know about that. And I can handle -ENOSYS by falling back to the old behavior. We can fix your application, and we'll break someone else's. If you can point to an application that it'll break, I'd be a lot more understanding. Nobody did, last year. I don't think it's solveable, really - the range of applications is so broad, and the standard is so vague as to be useless. I agree that standards are sometimes vague, but that one seemed about as clear as it's possible to be without imposing unreasonably on the file system and device driver layers. What part of The msync() function writes all modified data to permanent storage locations [...] For mappings to files, the msync() function ensures that all write operations are completed as defined for synchronised I/O data integrity completion. suggests that it's not supposed to do disk I/O? How is that uselessly vague? Linux _will_ write all modified data to permanent storage locations. Since 2.6.17 it will do this regardless of msync(). Before 2.6.17 you do need msync() to enable data to be written back. But it will not start I/O immediately, which is not a requirement in the standard, or at least it's pretty vague about that. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On 27 Mar 2007 16:09:33 -0400 [EMAIL PROTECTED] wrote: What part of The msync() function writes all modified data to permanent storage locations [...] For mappings to files, the msync() function ensures that all write operations are completed as defined for synchronised I/O data integrity completion. suggests that it's not supposed to do disk I/O? How is that uselessly vague? Because for MS_ASYNC, msync() shall return immediately once all the write operations are initiated or queued for servicing. ie: the writes can complete one millisecond or one week later. We chose 30 seconds. And this is not completely fatuous - before 2.6.17, MAP_SHARED pages could float about in memory in a dirty state for arbitrarily long periods - potentially for the entire application lifetime. It was quite reasonable for our MS_ASYNC implementation to do what it did: tell the VM about the dirtiness of these pages so they get written back soon. Post-2.6.17 we preserved that behaviour. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
Linux _will_ write all modified data to permanent storage locations. Since 2.6.17 it will do this regardless of msync(). Before 2.6.17 you do need msync() to enable data to be written back. But it will not start I/O immediately, which is not a requirement in the standard, or at least it's pretty vague about that. As I've said before, I disagree, but I'm not going to start a major flame war about it. The most relevant paragraph is: # When MS_ASYNC is specified, msync() returns immediately once all the # write operations are initiated or queued for servicing; when MS_SYNC is # specified, msync() will not return until all write operations are # completed as defined for synchronised I/O data integrity completion. # Either MS_ASYNC or MS_SYNC is specified, but not both. Note two things: 1) In the paragraphs before, what msync does is defined independently of the MS_* flags. Only the time of the return to user space varies. Thus, whatever the delay between calling msync() and the data being written, it should be the same whether MS_SYNC or MS_ASYNC is used. The implementation intended is: - Start all I/O - If MS_SYNC, wait for I/O to complete - Return to user space 2) all the write operations are initiated or queued for servicing. It is a common convention in English (and most languages, I expect) that in the or is a preference for the first alternative. The second is a permitted alternative if the first is not possible. And queued for servicing, especially initiated or queued for servicing, to me imples queuing waiting for some resource. To have the resource being waited for be a timer expiry seems like rather a cheat to me. It's perhaps doesn't break the letter of the standard, but definitely bends it. It feels like a fiddle. Still, the basic hint function of msync(MS_ASYNC) *is* being accomplished: I don't expect to write this page any more, so now would be a good time to clean it. It would just make my life easier if the kernel procrastinated less. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> > > > This patch makes writing to shared memory mappings update st_ctime and > > > > st_mtime as defined by SUSv3: > > > > > > Boy this is complicated. > > > > You tell me? > > > > > Is there a simpler way of doing all this? Say, we define a new page flag > > > PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and > > > ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into > > > page-dirtiness. Then, when performing writeback we look to see if any of > > > the dirty pages are !PageDirtiedByWrite() and, if so, we update [mc]time > > > to > > > current-time. > > > > I don't think a page flag gains anything over the address_space flag > > that this patch already has. > > > > The complexity is not about keeping track of the "data modified > > through mmap" state, but about msync() guarantees, that POSIX wants. > > > > And these requirements do in fact make some sense: msync() basically > > means: > > > > "I want the data written through mmaps to be visible to the world" > > > > And that obviously includes updating the timestamps. > > > > So how do we know if the data was modified between two msync() > > invocations? The only sane way I can think of is to walk the page > > tables in msync() and test/clear the pte dirty bit. > > clear_page_dirty_for_io() already does that. > > So we should be able to test PageDirtiedByWrite() after running > clear_page_dirty_for_io() to discover whether this page was dirtied via > MAP_SHARED, and then update the inode times if so. What do you need the page flag for? The "modified through mmap" info is there in the ptes. And from the ptes it can be transfered to a per-address_space flag. Nobody is interested through which page was the file modified. Anyway, that's just MS_SYNC. MS_ASYNC doesn't walk the pages, yet it should update the timestamp. That's the difficult one. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Mon, 26 Mar 2007 23:43:08 +0200 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > This patch makes writing to shared memory mappings update st_ctime and > > > st_mtime as defined by SUSv3: > > > > Boy this is complicated. > > You tell me? > > > Is there a simpler way of doing all this? Say, we define a new page flag > > PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and > > ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into > > page-dirtiness. Then, when performing writeback we look to see if any of > > the dirty pages are !PageDirtiedByWrite() and, if so, we update [mc]time to > > current-time. > > I don't think a page flag gains anything over the address_space flag > that this patch already has. > > The complexity is not about keeping track of the "data modified > through mmap" state, but about msync() guarantees, that POSIX wants. > > And these requirements do in fact make some sense: msync() basically > means: > > "I want the data written through mmaps to be visible to the world" > > And that obviously includes updating the timestamps. > > So how do we know if the data was modified between two msync() > invocations? The only sane way I can think of is to walk the page > tables in msync() and test/clear the pte dirty bit. clear_page_dirty_for_io() already does that. So we should be able to test PageDirtiedByWrite() after running clear_page_dirty_for_io() to discover whether this page was dirtied via MAP_SHARED, and then update the inode times if so. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Mon, 26 Mar 2007 16:10:09 -0500 Matt Mackall <[EMAIL PROTECTED]> wrote: > On Mon, Mar 26, 2007 at 02:00:36PM -0700, Andrew Morton wrote: > > On Sun, 25 Mar 2007 23:10:21 +0200 > > Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > > > This patch makes writing to shared memory mappings update st_ctime and > > > st_mtime as defined by SUSv3: > > > > Boy this is complicated. > > > > Is there a simpler way of doing all this? Say, we define a new page flag > > PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and > > ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into > > page-dirtiness. Then, when performing writeback we look to see if any of > > the dirty pages are !PageDirtiedByWrite() and, if so, we update [mc]time to > > current-time. > > > > Or something like that - I'm just thinking out loud and picking holes in > > the above doesn't shut me up ;) We're adding complexity and some overhead > > and we're losing our recent msync() simplifications and this all hurts. Is > > there some other way? I think burning a page flag to avoid this additional > > complexity would be worthwhile. > > Aren't we basically out of those? Rafael has liberated a couple of the flags which swsusp was using. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
> > This patch makes writing to shared memory mappings update st_ctime and > > st_mtime as defined by SUSv3: > > Boy this is complicated. You tell me? > Is there a simpler way of doing all this? Say, we define a new page flag > PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and > ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into > page-dirtiness. Then, when performing writeback we look to see if any of > the dirty pages are !PageDirtiedByWrite() and, if so, we update [mc]time to > current-time. I don't think a page flag gains anything over the address_space flag that this patch already has. The complexity is not about keeping track of the "data modified through mmap" state, but about msync() guarantees, that POSIX wants. And these requirements do in fact make some sense: msync() basically means: "I want the data written through mmaps to be visible to the world" And that obviously includes updating the timestamps. So how do we know if the data was modified between two msync() invocations? The only sane way I can think of is to walk the page tables in msync() and test/clear the pte dirty bit. Yes, this will make msync(MS_ASYNC) more heavyweight again. But if an application doesn't want to update the timestamps, it should just omit this call, since it does nothing else. There shouldn't be any other side effect. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Mon, Mar 26, 2007 at 02:00:36PM -0700, Andrew Morton wrote: > On Sun, 25 Mar 2007 23:10:21 +0200 > Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > This patch makes writing to shared memory mappings update st_ctime and > > st_mtime as defined by SUSv3: > > Boy this is complicated. > > Is there a simpler way of doing all this? Say, we define a new page flag > PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and > ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into > page-dirtiness. Then, when performing writeback we look to see if any of > the dirty pages are !PageDirtiedByWrite() and, if so, we update [mc]time to > current-time. > > Or something like that - I'm just thinking out loud and picking holes in > the above doesn't shut me up ;) We're adding complexity and some overhead > and we're losing our recent msync() simplifications and this all hurts. Is > there some other way? I think burning a page flag to avoid this additional > complexity would be worthwhile. Aren't we basically out of those? -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Sun, 25 Mar 2007 23:10:21 +0200 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > This patch makes writing to shared memory mappings update st_ctime and > st_mtime as defined by SUSv3: Boy this is complicated. Is there a simpler way of doing all this? Say, we define a new page flag PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into page-dirtiness. Then, when performing writeback we look to see if any of the dirty pages are !PageDirtiedByWrite() and, if so, we update [mc]time to current-time. Or something like that - I'm just thinking out loud and picking holes in the above doesn't shut me up ;) We're adding complexity and some overhead and we're losing our recent msync() simplifications and this all hurts. Is there some other way? I think burning a page flag to avoid this additional complexity would be worthwhile. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Sun, 25 Mar 2007 23:10:21 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: This patch makes writing to shared memory mappings update st_ctime and st_mtime as defined by SUSv3: Boy this is complicated. Is there a simpler way of doing all this? Say, we define a new page flag PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into page-dirtiness. Then, when performing writeback we look to see if any of the dirty pages are !PageDirtiedByWrite() and, if so, we update [mc]time to current-time. Or something like that - I'm just thinking out loud and picking holes in the above doesn't shut me up ;) We're adding complexity and some overhead and we're losing our recent msync() simplifications and this all hurts. Is there some other way? I think burning a page flag to avoid this additional complexity would be worthwhile. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Mon, Mar 26, 2007 at 02:00:36PM -0700, Andrew Morton wrote: On Sun, 25 Mar 2007 23:10:21 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: This patch makes writing to shared memory mappings update st_ctime and st_mtime as defined by SUSv3: Boy this is complicated. Is there a simpler way of doing all this? Say, we define a new page flag PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into page-dirtiness. Then, when performing writeback we look to see if any of the dirty pages are !PageDirtiedByWrite() and, if so, we update [mc]time to current-time. Or something like that - I'm just thinking out loud and picking holes in the above doesn't shut me up ;) We're adding complexity and some overhead and we're losing our recent msync() simplifications and this all hurts. Is there some other way? I think burning a page flag to avoid this additional complexity would be worthwhile. Aren't we basically out of those? -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
This patch makes writing to shared memory mappings update st_ctime and st_mtime as defined by SUSv3: Boy this is complicated. You tell me? Is there a simpler way of doing all this? Say, we define a new page flag PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into page-dirtiness. Then, when performing writeback we look to see if any of the dirty pages are !PageDirtiedByWrite() and, if so, we update [mc]time to current-time. I don't think a page flag gains anything over the address_space flag that this patch already has. The complexity is not about keeping track of the data modified through mmap state, but about msync() guarantees, that POSIX wants. And these requirements do in fact make some sense: msync() basically means: I want the data written through mmaps to be visible to the world And that obviously includes updating the timestamps. So how do we know if the data was modified between two msync() invocations? The only sane way I can think of is to walk the page tables in msync() and test/clear the pte dirty bit. Yes, this will make msync(MS_ASYNC) more heavyweight again. But if an application doesn't want to update the timestamps, it should just omit this call, since it does nothing else. There shouldn't be any other side effect. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Mon, 26 Mar 2007 16:10:09 -0500 Matt Mackall [EMAIL PROTECTED] wrote: On Mon, Mar 26, 2007 at 02:00:36PM -0700, Andrew Morton wrote: On Sun, 25 Mar 2007 23:10:21 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: This patch makes writing to shared memory mappings update st_ctime and st_mtime as defined by SUSv3: Boy this is complicated. Is there a simpler way of doing all this? Say, we define a new page flag PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into page-dirtiness. Then, when performing writeback we look to see if any of the dirty pages are !PageDirtiedByWrite() and, if so, we update [mc]time to current-time. Or something like that - I'm just thinking out loud and picking holes in the above doesn't shut me up ;) We're adding complexity and some overhead and we're losing our recent msync() simplifications and this all hurts. Is there some other way? I think burning a page flag to avoid this additional complexity would be worthwhile. Aren't we basically out of those? Rafael has liberated a couple of the flags which swsusp was using. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
On Mon, 26 Mar 2007 23:43:08 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: This patch makes writing to shared memory mappings update st_ctime and st_mtime as defined by SUSv3: Boy this is complicated. You tell me? Is there a simpler way of doing all this? Say, we define a new page flag PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into page-dirtiness. Then, when performing writeback we look to see if any of the dirty pages are !PageDirtiedByWrite() and, if so, we update [mc]time to current-time. I don't think a page flag gains anything over the address_space flag that this patch already has. The complexity is not about keeping track of the data modified through mmap state, but about msync() guarantees, that POSIX wants. And these requirements do in fact make some sense: msync() basically means: I want the data written through mmaps to be visible to the world And that obviously includes updating the timestamps. So how do we know if the data was modified between two msync() invocations? The only sane way I can think of is to walk the page tables in msync() and test/clear the pte dirty bit. clear_page_dirty_for_io() already does that. So we should be able to test PageDirtiedByWrite() after running clear_page_dirty_for_io() to discover whether this page was dirtied via MAP_SHARED, and then update the inode times if so. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch resend v4] update ctime and mtime for mmaped write
This patch makes writing to shared memory mappings update st_ctime and st_mtime as defined by SUSv3: Boy this is complicated. You tell me? Is there a simpler way of doing all this? Say, we define a new page flag PG_dirtiedbywrite and we do SetPageDirtiedByWrite() inside write() and ClearPageDirtiedByWrite() whenever we propagate pte-dirtiness into page-dirtiness. Then, when performing writeback we look to see if any of the dirty pages are !PageDirtiedByWrite() and, if so, we update [mc]time to current-time. I don't think a page flag gains anything over the address_space flag that this patch already has. The complexity is not about keeping track of the data modified through mmap state, but about msync() guarantees, that POSIX wants. And these requirements do in fact make some sense: msync() basically means: I want the data written through mmaps to be visible to the world And that obviously includes updating the timestamps. So how do we know if the data was modified between two msync() invocations? The only sane way I can think of is to walk the page tables in msync() and test/clear the pte dirty bit. clear_page_dirty_for_io() already does that. So we should be able to test PageDirtiedByWrite() after running clear_page_dirty_for_io() to discover whether this page was dirtied via MAP_SHARED, and then update the inode times if so. What do you need the page flag for? The modified through mmap info is there in the ptes. And from the ptes it can be transfered to a per-address_space flag. Nobody is interested through which page was the file modified. Anyway, that's just MS_SYNC. MS_ASYNC doesn't walk the pages, yet it should update the timestamp. That's the difficult one. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch resend v4] update ctime and mtime for mmaped write
From: Miklos Szeredi <[EMAIL PROTECTED]> Changes: v4: o rename test_clear_page_modified to page_mkclean_noprot o clean up page_mkclean_noprot o don't set AS_CMTIME from fault handler, since that also sets the PTE dirty o only update c/mtime in munmap, if file is not mapped any more o cleanups v3: o rename is_page_modified to test_clear_page_modified v2: o set AS_CMTIME flag in clear_page_dirty_for_io() too o don't clear AS_CMTIME in file_update_time() o check the dirty bit in the page tables v1: o moved check from __fput() to remove_vma(), which is more logical o changed set_page_dirty() to set_page_dirty_mapping in hugetlb.c o cleaned up #ifdef CONFIG_BLOCK mess This patch makes writing to shared memory mappings update st_ctime and st_mtime as defined by SUSv3: The st_ctime and st_mtime fields of a file that is mapped with MAP_SHARED and PROT_WRITE shall be marked for update at some point in the interval between a write reference to the mapped region and the next call to msync() with MS_ASYNC or MS_SYNC for that portion of the file by any process. If there is no such call and if the underlying file is modified as a result of a write reference, then these fields shall be marked for update at some time after the write reference. A new address_space flag is introduced: AS_CMTIME. This is set each time dirtyness from the page table is transferred to the page or if the page is dirtied without the page table being set dirty. Note, the flag is set unconditionally, even if the page is already dirty. This is important, because the page might have been dirtied earlier by a non-mmap write. Msync checks this flag and also dirty bit in the page tables, because the data might change again after an msync(MS_ASYNC), while the page is already dirty and read-write. This also makes the time updating work for memory backed filesystems such as tmpfs. This implementation walks the pages in the synced range, and uses rmap to find all the ptes for each page. Non-linear vmas are ignored, since the ptes can only be found by scanning the whole vma, which is very inefficient. As an optimization, if dirty pages are accounted, then only walk the dirty pages, since the clean pages necessarily have clean ptes. This doesn't work for memory backed filesystems, where no dirty accounting is done. An alternative implementation could check for all intersecting vmas in the mapping and walk the page tables for each. This would probably be more efficient for memory backed filesystems and if the number of dirty pages is near the total number of pages in the range. In munmap if there are no more memory mappings of this file, and the AS_CMTIME flag has been set, the file times are updated. Fixes Novell Bugzilla #206431. Inspired by Peter Staubach's patch and the resulting comments. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux-2.6.21-rc4-mm1/include/linux/pagemap.h === --- linux-2.6.21-rc4-mm1.orig/include/linux/pagemap.h 2007-03-25 19:00:06.0 +0200 +++ linux-2.6.21-rc4-mm1/include/linux/pagemap.h2007-03-25 19:00:35.0 +0200 @@ -19,6 +19,7 @@ */ #defineAS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ #define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */ +#define AS_CMTIME (__GFP_BITS_SHIFT + 2) /* ctime/mtime update needed */ static inline void mapping_set_error(struct address_space *mapping, int error) { Index: linux-2.6.21-rc4-mm1/include/linux/mm.h === --- linux-2.6.21-rc4-mm1.orig/include/linux/mm.h2007-03-25 19:00:06.0 +0200 +++ linux-2.6.21-rc4-mm1/include/linux/mm.h 2007-03-25 19:00:36.0 +0200 @@ -808,6 +808,7 @@ int redirty_page_for_writepage(struct wr struct page *page); int FASTCALL(set_page_dirty(struct page *page)); int set_page_dirty_lock(struct page *page); +int set_page_dirty_mapping(struct page *page); int clear_page_dirty_for_io(struct page *page); extern unsigned long do_mremap(unsigned long addr, Index: linux-2.6.21-rc4-mm1/mm/memory.c === --- linux-2.6.21-rc4-mm1.orig/mm/memory.c 2007-03-25 19:00:06.0 +0200 +++ linux-2.6.21-rc4-mm1/mm/memory.c2007-03-25 19:00:36.0 +0200 @@ -676,7 +676,7 @@ static unsigned long zap_pte_range(struc anon_rss--; else { if (pte_dirty(ptent)) - set_page_dirty(page); + set_page_dirty_mapping(page); if (pte_young(ptent)) SetPageReferenced(page); file_rss--; @@ -954,7 +954,7 @@ struct page
[patch resend v4] update ctime and mtime for mmaped write
From: Miklos Szeredi [EMAIL PROTECTED] Changes: v4: o rename test_clear_page_modified to page_mkclean_noprot o clean up page_mkclean_noprot o don't set AS_CMTIME from fault handler, since that also sets the PTE dirty o only update c/mtime in munmap, if file is not mapped any more o cleanups v3: o rename is_page_modified to test_clear_page_modified v2: o set AS_CMTIME flag in clear_page_dirty_for_io() too o don't clear AS_CMTIME in file_update_time() o check the dirty bit in the page tables v1: o moved check from __fput() to remove_vma(), which is more logical o changed set_page_dirty() to set_page_dirty_mapping in hugetlb.c o cleaned up #ifdef CONFIG_BLOCK mess This patch makes writing to shared memory mappings update st_ctime and st_mtime as defined by SUSv3: The st_ctime and st_mtime fields of a file that is mapped with MAP_SHARED and PROT_WRITE shall be marked for update at some point in the interval between a write reference to the mapped region and the next call to msync() with MS_ASYNC or MS_SYNC for that portion of the file by any process. If there is no such call and if the underlying file is modified as a result of a write reference, then these fields shall be marked for update at some time after the write reference. A new address_space flag is introduced: AS_CMTIME. This is set each time dirtyness from the page table is transferred to the page or if the page is dirtied without the page table being set dirty. Note, the flag is set unconditionally, even if the page is already dirty. This is important, because the page might have been dirtied earlier by a non-mmap write. Msync checks this flag and also dirty bit in the page tables, because the data might change again after an msync(MS_ASYNC), while the page is already dirty and read-write. This also makes the time updating work for memory backed filesystems such as tmpfs. This implementation walks the pages in the synced range, and uses rmap to find all the ptes for each page. Non-linear vmas are ignored, since the ptes can only be found by scanning the whole vma, which is very inefficient. As an optimization, if dirty pages are accounted, then only walk the dirty pages, since the clean pages necessarily have clean ptes. This doesn't work for memory backed filesystems, where no dirty accounting is done. An alternative implementation could check for all intersecting vmas in the mapping and walk the page tables for each. This would probably be more efficient for memory backed filesystems and if the number of dirty pages is near the total number of pages in the range. In munmap if there are no more memory mappings of this file, and the AS_CMTIME flag has been set, the file times are updated. Fixes Novell Bugzilla #206431. Inspired by Peter Staubach's patch and the resulting comments. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux-2.6.21-rc4-mm1/include/linux/pagemap.h === --- linux-2.6.21-rc4-mm1.orig/include/linux/pagemap.h 2007-03-25 19:00:06.0 +0200 +++ linux-2.6.21-rc4-mm1/include/linux/pagemap.h2007-03-25 19:00:35.0 +0200 @@ -19,6 +19,7 @@ */ #defineAS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ #define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */ +#define AS_CMTIME (__GFP_BITS_SHIFT + 2) /* ctime/mtime update needed */ static inline void mapping_set_error(struct address_space *mapping, int error) { Index: linux-2.6.21-rc4-mm1/include/linux/mm.h === --- linux-2.6.21-rc4-mm1.orig/include/linux/mm.h2007-03-25 19:00:06.0 +0200 +++ linux-2.6.21-rc4-mm1/include/linux/mm.h 2007-03-25 19:00:36.0 +0200 @@ -808,6 +808,7 @@ int redirty_page_for_writepage(struct wr struct page *page); int FASTCALL(set_page_dirty(struct page *page)); int set_page_dirty_lock(struct page *page); +int set_page_dirty_mapping(struct page *page); int clear_page_dirty_for_io(struct page *page); extern unsigned long do_mremap(unsigned long addr, Index: linux-2.6.21-rc4-mm1/mm/memory.c === --- linux-2.6.21-rc4-mm1.orig/mm/memory.c 2007-03-25 19:00:06.0 +0200 +++ linux-2.6.21-rc4-mm1/mm/memory.c2007-03-25 19:00:36.0 +0200 @@ -676,7 +676,7 @@ static unsigned long zap_pte_range(struc anon_rss--; else { if (pte_dirty(ptent)) - set_page_dirty(page); + set_page_dirty_mapping(page); if (pte_young(ptent)) SetPageReferenced(page); file_rss--; @@ -954,7 +954,7 @@ struct page