Re: fishy -put_inode usage in ntfs
On Thu, Oct 14, 2004 at 02:26:45PM +0100, Anton Altaparmakov wrote: I don't like filesystem doings things like this in -put_inode at all, and indeed the plan is to get rid of -put_inode completely. Why do you need to hold an additional reference anyway? What's so special about the relation of these two inodes? The bmp_ino is a virtual inode. It doesn't exist on disk as an inode. It is an NTFS attribute of the base inode. It cannot exist without the base inode there. You could neither read from nor write to this inode without its base inode being there and you couldn't even clear_inode() this inode without the base inode being there. The reference is essential I am afraid. If -put_inode is removed then I will have to switch to using ntfs_attr_iget() each time or I will have to attach the inode in some other much hackier way that doesn't use the i_count and uses my ntfs private counter instead. Coming back to this issue. Why do you need to refcount bmp_ino at all? Can someone ever grab a reference separate from it's master inode? - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fishy -put_inode usage in ntfs
On Thu, Feb 10, 2005 at 02:48:26PM +, Anton Altaparmakov wrote: If the igrab() were not done, it would be possible for clear_inode to be called on the 'parent' inode whilst at the same time one or more attr inodes (belonging to this 'parent') are in use and Bad Things(TM) would happen... What bad thing specificly? If there's shared information we should probably refcount them separately. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fishy -put_inode usage in ntfs
On Thu, 2005-02-10 at 14:48 +, Anton Altaparmakov wrote: On Thu, 2005-02-10 at 15:42 +0100, Christoph Hellwig wrote: On Thu, Feb 10, 2005 at 02:40:39PM +, Anton Altaparmakov wrote: I am not sure what you mean. The VFS layer does reference counting on inodes. I have no choice in the matter. Can someone ever grab a reference separate from it's master inode? Again, not sure what you mean. Could you elaborate? ntfs_read_locked_attr_inode() does igrab on the 'parent' inode currently. What do you need this for exactly - the attr inode goes away anyway when clear_inode is called on that 'parent' inode (in my scheme). If the igrab() were not done, it would be possible for clear_inode to be called on the 'parent' inode whilst at the same time one or more attr inodes (belonging to this 'parent') are in use and Bad Things(TM) would happen... The igrab() effectively guarantees that iput() is called on all attr inodes before clear_inode on the 'parent' can be invoked. Best regards, Anton -- Anton Altaparmakov aia21 at cam.ac.uk (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ http://www-stu.christs.cam.ac.uk/~aia21/ - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fishy -put_inode usage in ntfs
On Thu, Feb 10, 2005 at 02:50:02PM +, Anton Altaparmakov wrote: If the igrab() were not done, it would be possible for clear_inode to be called on the 'parent' inode whilst at the same time one or more attr inodes (belonging to this 'parent') are in use and Bad Things(TM) would happen... The igrab() effectively guarantees that iput() is called on all attr inodes before clear_inode on the 'parent' can be invoked. Yes, but why exactly is this important. It looks like you're absuing the refcount on the 'parent' inode for some shared data? - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fishy -put_inode usage in ntfs
On Thu, 2005-02-10 at 15:50 +0100, Christoph Hellwig wrote: On Thu, Feb 10, 2005 at 02:48:26PM +, Anton Altaparmakov wrote: If the igrab() were not done, it would be possible for clear_inode to be called on the 'parent' inode whilst at the same time one or more attr inodes (belonging to this 'parent') are in use and Bad Things(TM) would happen... What bad thing specificly? If there's shared information we should probably refcount them separately. Each attr inode stores a pointer to its parent inode in NTFS_I(struct inode *vi)-ext.base_ntfs_ino. This pointer would point to random memory if clear_inode is called on the parent whilst the attr inode is still in use. Best regards, Anton -- Anton Altaparmakov aia21 at cam.ac.uk (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ http://www-stu.christs.cam.ac.uk/~aia21/ - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ext3 writepages ?
On Wed, 2005-02-09 at 18:05, Bryan Henderson wrote: I see much larger IO chunks and better throughput. So, I guess its worth doing it I hate to see something like this go ahead based on empirical results without theory. It might make things worse somewhere else. Do you have an explanation for why the IO chunks are larger? Is the I/O scheduler not building large I/Os out of small requests? Is the queue running dry while the device is actually busy? Bryan, I would like to find out what theory you are looking for. Don't you think, filesystems submitting biggest chunks of IO possible is better than submitting 1k-4k chunks and hoping that IO schedulers do the perfect job ? BTW, writepages() is being used for other filesystems like JFS. We all learnt thro 2.4 RAW code about the overhead of doing 512bytes IO and making the elevator merge all the peices together. Thats one reason why 2.6 DIO/RAW code is completely written from scratch to submit the biggest possible IO chunks. Well, I agree that we should have theory behind the results. We are just playing with prototypes for now. Thanks, Badari - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] block new writers on frozen filesystems
When the lockfs patches went in an important bit got lost, the call in generic_file_write to put newly incoming writers to sleep when a filesystem is frozen. Nathan added back the call in the now separate XFS write patch, and the patch for the generic code is below: Index: mm/filemap.c === RCS file: /cvs/linux-2.6-xfs/mm/filemap.c,v retrieving revision 1.14 diff -u -p -r1.14 filemap.c --- mm/filemap.c5 Jan 2005 14:17:31 - 1.14 +++ mm/filemap.c4 Feb 2005 21:35:53 - @@ -2046,6 +2046,8 @@ __generic_file_aio_write_nolock(struct k count = ocount; pos = *ppos; + vfs_check_frozen(inode-i_sb, SB_FREEZE_WRITE); + /* We can write back this queue in page reclaim */ current-backing_dev_info = mapping-backing_dev_info; written = 0; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ext3 writepages ?
I am inferring this using iostat which shows that average device utilization fluctuates between 83 and 99 percent and the average request size is around 650 sectors (going to the device) without writepages. With writepages, device utilization never drops below 95 percent and is usually about 98 percent utilized, and the average request size to the device is around 1000 sectors. Well that blows away the only two ways I know that this effect can happen. The first has to do with certain code being more efficient than other code at assembling I/Os, but the fact that the CPU utilization is the same in both cases pretty much eliminates that. The other is where the interactivity of the I/O generator doesn't match the buffering in the device so that the device ends up 100% busy processing small I/Os that were sent to it because it said all the while that it needed more work. But in the small-I/O case, we don't see a 100% busy device. So why would the device be up to 17% idle, since the writepages case makes it apparent that the I/O generator is capable of generating much more work? Is there some queue plugging (I/O scheduler delays sending I/O to the device even though the device is idle) going on? -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ext3 writepages ?
Don't you think, filesystems submitting biggest chunks of IO possible is better than submitting 1k-4k chunks and hoping that IO schedulers do the perfect job ? No, I don't see why it would better. In fact intuitively, I think the I/O scheduler, being closer to the device, should do a better job of deciding in what packages I/O should go to the device. After all, there exist block devices that don't process big chunks faster than small ones. But So this starts to look like something where you withhold data from the I/O scheduler in order to prevent it from scheduling the I/O wrongly because you (the pager/filesystem driver) know better. That shouldn't be the architecture. So I'd like still like to see a theory that explains why submitting the I/O a little at a time (i.e. including the bio_submit() in the loop that assembles the I/O) causes the device to be idle more. We all learnt thro 2.4 RAW code about the overhead of doing 512bytes IO and making the elevator merge all the peices together. That was CPU time, right? In the present case, the numbers say it takes the same amount of CPU time to assemble the I/O above the I/O scheduler as inside it. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ext3 writepages ?
On Thu, 2005-02-10 at 10:00, Bryan Henderson wrote: Don't you think, filesystems submitting biggest chunks of IO possible is better than submitting 1k-4k chunks and hoping that IO schedulers do the perfect job ? No, I don't see why it would better. In fact intuitively, I think the I/O scheduler, being closer to the device, should do a better job of deciding in what packages I/O should go to the device. After all, there exist block devices that don't process big chunks faster than small ones. But So this starts to look like something where you withhold data from the I/O scheduler in order to prevent it from scheduling the I/O wrongly because you (the pager/filesystem driver) know better. That shouldn't be the architecture. So I'd like still like to see a theory that explains why submitting the I/O a little at a time (i.e. including the bio_submit() in the loop that assembles the I/O) causes the device to be idle more. We all learnt thro 2.4 RAW code about the overhead of doing 512bytes IO and making the elevator merge all the peices together. That was CPU time, right? In the present case, the numbers say it takes the same amount of CPU time to assemble the I/O above the I/O scheduler as inside it. One clear distinction between submitting smaller chunks vs larger ones is - number of call backs we get and the processing we need to do. I don't think we have enough numbers here to get to bottom of this. CPU utilization remains same in both cases, doesn't mean that - the test took exactly same amount of time. I don't even think that we are doing a fixed number of IOs. Its possible that by doing larger IOs we save CPU and use that CPU to push more data ? Thanks, Badari - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Allow kernel-only mount interfaces...
On Feb 10, 2005 13:41 -0500, Trond Myklebust wrote: +struct vfsmount * +do_kern_mount(const char *fstype, int flags, const char *name, void *data) +{ + struct file_system_type *type = get_fs_type(fstype); + struct vfsmount *mnt = vfs_kern_mount(type, flags, name, data); + put_filesystem(type); + return mnt; +} This will OOPS if fstype is bad, since you unconditionally put_filesystem() on a possible PTR_ERR() type. You need an extra if (!IS_ERR(type)) put_filesystem(type); Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/ pgpof1y4sKPal.pgp Description: PGP signature
Re: [PATCH] Allow kernel-only mount interfaces...
to den 10.02.2005 Klokka 12:01 (-0700) skreiv Andreas Dilger: This will OOPS if fstype is bad, since you unconditionally put_filesystem() on a possible PTR_ERR() type. You need an extra if (!IS_ERR(type)) put_filesystem(type); Agreed. That was not a final patch, but just a first untested draft in order to test the waters. I'm mainly wanting to hear whether or not anyone has major objections (Al ?) against the new function itself. Here's an update, though ;-) Cheers, Trond VFS: Add GPL_EXPORTED function vfs_kern_mount() do_kern_mount() does not allow the kernel to use private mount interfaces without exposing the same interfaces to userland. The problem is that the filesystem is referenced by name, thus meaning that it and its mount interface must be registered in the global filesystem list. vfs_kern_mount() passes the struct file_system_type as an explicit parameter in order to overcome this limitation. Signed-off-by: Trond Myklebust [EMAIL PROTECTED] super.c | 22 +++--- 1 files changed, 15 insertions(+), 7 deletions(-) Index: linux-2.6.11-rc3/fs/super.c === --- linux-2.6.11-rc3.orig/fs/super.c +++ linux-2.6.11-rc3/fs/super.c @@ -794,17 +794,13 @@ struct super_block *get_sb_single(struct EXPORT_SYMBOL(get_sb_single); struct vfsmount * -do_kern_mount(const char *fstype, int flags, const char *name, void *data) +vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void *data) { - struct file_system_type *type = get_fs_type(fstype); struct super_block *sb = ERR_PTR(-ENOMEM); struct vfsmount *mnt; int error; char *secdata = NULL; - if (!type) - return ERR_PTR(-ENODEV); - mnt = alloc_vfsmnt(name); if (!mnt) goto out; @@ -835,7 +831,6 @@ do_kern_mount(const char *fstype, int fl mnt-mnt_parent = mnt; mnt-mnt_namespace = current-namespace; up_write(sb-s_umount); - put_filesystem(type); return mnt; out_sb: up_write(sb-s_umount); @@ -846,10 +841,23 @@ out_free_secdata: out_mnt: free_vfsmnt(mnt); out: - put_filesystem(type); return (struct vfsmount *)sb; } +EXPORT_SYMBOL_GPL(vfs_kern_mount); + +struct vfsmount * +do_kern_mount(const char *fstype, int flags, const char *name, void *data) +{ + struct file_system_type *type = get_fs_type(fstype); + struct vfsmount *mnt; + if (!type) + return ERR_PTR(-ENODEV); + mnt = vfs_kern_mount(type, flags, name, data); + put_filesystem(type); + return mnt; +} + EXPORT_SYMBOL_GPL(do_kern_mount); struct vfsmount *kern_mount(struct file_system_type *type) -- Trond Myklebust [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: journal start/stop in ext3_writeback_writepage()
Badari Pulavarty [EMAIL PROTECTED] wrote: But I still don't understand why this can't happen thro original code .. journal_destory() iput(journal inode) do_writepages() generic_writepages() ext3_writeback_writepage() journal_start() what am i missing ? presumably there are never any dirty pages or inodes when we run journal_destroy(). - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ext3 writepages ?
Its possible that by doing larger IOs we save CPU and use that CPU to push more data ? This is absolutely right; my mistake -- the relevant number is CPU seconds per megabyte moved, not CPU seconds per elapsed second. But I don't think we're close enough to 100% CPU utilization that this explains much. In fact, the curious thing here is that neither the disk nor the CPU seems to be a bottleneck in the slow case. Maybe there's some serialization I'm not seeing that makes less parallelism between I/O and execution. Is this a single thread doing writes and syncs to a single file? -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ext3 writepages ?
On Thu, Feb 10, 2005 at 12:30:23PM -0800, Bryan Henderson wrote: Its possible that by doing larger IOs we save CPU and use that CPU to push more data ? This is absolutely right; my mistake -- the relevant number is CPU seconds per megabyte moved, not CPU seconds per elapsed second. But I don't think we're close enough to 100% CPU utilization that this explains much. In fact, the curious thing here is that neither the disk nor the CPU seems to be a bottleneck in the slow case. Maybe there's some serialization I'm not seeing that makes less parallelism between I/O and execution. Is this a single thread doing writes and syncs to a single file? From what I've seen, without writepages, the application thread itself tends to do the writing by falling into balance_dirty_pages() during it's write call, while in the writepages case, a pdflush thread seems to do more of the writeback.This also depends somewhat on processor speed (and number) and amount of RAM. To try and isolate this more, I've limited RAM (1GB) and number of CPUs (1) on my testing setup. So yes, there could be better parallelism in the writepages case, but again this behavior could be a symptom and not a cause, but I'm not sure how to figure that out, any suggestions ? Sonny - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] block new writers on frozen filesystems
Christoph Hellwig [EMAIL PROTECTED] wrote: When the lockfs patches went in an important bit got lost, the call in generic_file_write to put newly incoming writers to sleep when a filesystem is frozen. Nathan added back the call in the now separate XFS write patch, and the patch for the generic code is below: Index: mm/filemap.c === RCS file: /cvs/linux-2.6-xfs/mm/filemap.c,v retrieving revision 1.14 diff -u -p -r1.14 filemap.c --- mm/filemap.c 5 Jan 2005 14:17:31 - 1.14 +++ mm/filemap.c 4 Feb 2005 21:35:53 - @@ -2046,6 +2046,8 @@ __generic_file_aio_write_nolock(struct k count = ocount; pos = *ppos; + vfs_check_frozen(inode-i_sb, SB_FREEZE_WRITE); hm, I didn't pay much attention to this stuff. Shouldn't the direct-io code be waiting as well? Are all paths which can write to the bdev supposed to be blocked? kjournald? - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: journal start/stop in ext3_writeback_writepage()
Hi, On Thu, 2005-02-10 at 20:21, Andrew Morton wrote: But I still don't understand why this can't happen thro original code .. what am i missing ? presumably there are never any dirty pages or inodes when we run journal_destroy(). I assume so, yes. If there is no a_ops-writepages(), then we default to generic_writepages() which is a noop if there are no dirty pages. If your new ext3-specific writepages code tries to do a journal_start() in that case, then yes, it is likely to blow up spectacularly during journal_destroy! --Stephen - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Ext2-devel] Re: journal start/stop in ext3_writeback_writepage()
On Thu, 2005-02-10 at 15:12, Stephen C. Tweedie wrote: Hi, On Thu, 2005-02-10 at 20:21, Andrew Morton wrote: But I still don't understand why this can't happen thro original code .. what am i missing ? presumably there are never any dirty pages or inodes when we run journal_destroy(). I assume so, yes. If there is no a_ops-writepages(), then we default to generic_writepages() which is a noop if there are no dirty pages. If your new ext3-specific writepages code tries to do a journal_start() in that case, then yes, it is likely to blow up spectacularly during journal_destroy! --Stephen Yep. I found this hardway that exactly whats happening. generic_writepages() is clever enough to do nothing, if there are no dirty pages. But I am being stupid in my writepages(). I need to teach writepages() to nothing in case of no dirty pages. Is there a easy way like checking a count somewhere than doing all the stuff mpage_writepages() is doing to figure this out, like .. while (!done (index = end) (nr_pages = pagevec_lookup_tag(pvec, mapping, index, PAGECACHE_TAG_DIRTY, min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) ... Thanks, Badari - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ext3 writepages ?
I went back and looked more closely and see that you did more than add a -writepages method. You replaced the -prepare_write with one that doesn't involve the buffer cache, right? And from your answer to Badari's question about that, I believe you said this is not an integral part of having -writepages, but an additional enhancement. Well, that could explain a lot. First of all, there's a significant amount of CPU time involved in managing buffer heads. In the profile you posted, it's one of the differences in CPU time between the writepages and non-writepages case. But it also changes the whole way the file cache is managed, doesn't it? That might account for the fact that in one case you see cache cleaning happening via balance_dirty_pages() (i.e. memory fills up), but in the other it happens via Pdflush. I'm not really up on the buffer cache; I haven't used it in my own studies for years. I also saw that while you originally said CPU utilization was 73% in both cases, in one of the profiles I add up at least 77% for the writepages case, so I'm not sure we're really comparing straight across. To investigate these effects further, I think you should monitor /proc/meminfo. And/or make more isolated changes to the code. So yes, there could be better parallelism in the writepages case, but again this behavior could be a symptom and not a cause, I'm not really suggesting that there's better parallelism in the writepages case. I'm suggesting that there's poor parallelism (compared to what I expect) in both cases, which means that adding CPU time directly affects throughput. If the CPU time were in parallel with the I/O time, adding an extra 1.8ms per megabyte to the CPU time (which is what one of my calculation from your data gave) wouldn't affect throughput. But I believe we've at least established doubt that submitting an entire file cache in one bio is faster than submitting a bio for each page and that smaller I/Os (to the device) cause lower throughput in the non-writepages case (it seems more likely that the lower throughput causes the smaller I/Os). -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Ext2-devel] Re: journal start/stop in ext3_writeback_writepage()
Badari Pulavarty [EMAIL PROTECTED] wrote: I need to teach writepages() to nothing in case of no dirty pages. Is there a easy way like checking a count somewhere than doing all the stuff mpage_writepages() is doing to figure this out if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) return; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] ext3 writepages for writeback mode
Hi, Here is my first cut at adding writepages() support for ext3 writeback mode. I have not done any performance analysis on the patch, so try it at your own risk. Please let me know, if I am completely off or its a stupid idea. Thanks, Badari --- linux-2.6.10.org/fs/ext3/inode.c2004-12-06 11:45:49.0 -0800 +++ linux-2.6.10/fs/ext3/inode.c2005-02-10 18:14:17.987263744 -0800 @@ -856,6 +856,12 @@ return ret; } +static int ext3_writepages_get_block(struct inode *inode, sector_t iblock, + struct buffer_head *bh, int create) +{ + return ext3_direct_io_get_blocks(inode, iblock, 1, bh, create); +} + /* * `handle' can be NULL if create is zero */ @@ -1321,6 +1327,37 @@ return ret; } +static int +ext3_writeback_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + struct inode *inode = mapping-host; + handle_t *handle = NULL; + int err, ret = 0; + + if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) + return ret; + + handle = ext3_journal_start(inode, ext3_writepage_trans_blocks(inode)); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + return ret; + } + +ret = mpage_writepages(mapping, wbc, ext3_writepages_get_block); + + /* +* Need to reaquire the handle since ext3_writepages_get_block() +* can restart the handle +*/ + handle = journal_current_handle(); + + err = ext3_journal_stop(handle); + if (!ret) + ret = err; + return ret; +} + static int ext3_writeback_writepage(struct page *page, struct writeback_control *wbc) { @@ -1552,6 +1589,7 @@ .readpage = ext3_readpage, .readpages = ext3_readpages, .writepage = ext3_writeback_writepage, + .writepages = ext3_writeback_writepages, .sync_page = block_sync_page, .prepare_write = ext3_prepare_write, .commit_write = ext3_writeback_commit_write,
Re: [RFC] ext3 writepages for writeback mode
Badari Pulavarty [EMAIL PROTECTED] wrote: Here is my first cut at adding writepages() support for ext3 writeback mode. Looks sane from a brief scan. I have not done any performance analysis on the patch, Please do ;) +static int ext3_writepages_get_block(struct inode *inode, sector_t iblock, +struct buffer_head *bh, int create) +{ +return ext3_direct_io_get_blocks(inode, iblock, 1, bh, create); +} yup. + /* * `handle' can be NULL if create is zero */ @@ -1321,6 +1327,37 @@ return ret; } +static int +ext3_writeback_writepages(struct address_space *mapping, +struct writeback_control *wbc) +{ +struct inode *inode = mapping-host; +handle_t *handle = NULL; +int err, ret = 0; + +if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) +return ret; + +handle = ext3_journal_start(inode, ext3_writepage_trans_blocks(inode)); +if (IS_ERR(handle)) { +ret = PTR_ERR(handle); +return ret; +} + +ret = mpage_writepages(mapping, wbc, ext3_writepages_get_block); + Funny whitespace. What is it with you IBM guys? ;) +/* + * Need to reaquire the handle since ext3_writepages_get_block() + * can restart the handle + */ +handle = journal_current_handle(); + +err = ext3_journal_stop(handle); +if (!ret) +ret = err; +return ret; +} - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext3 writepages for writeback mode
Andrew Morton wrote: Badari Pulavarty [EMAIL PROTECTED] wrote: Here is my first cut at adding writepages() support for ext3 writeback mode. Looks sane from a brief scan. Well, not really.. mpage_writepages() could end up calling ext3_writeback_writepage() in confused case thro .. *ret = page-mapping-a_ops-writepage(page, wbc); which ends up doing nothing and leaves the page dirty, since there is journal handle started :( if (ext3_journal_current_handle()) goto out_fail; Ideas ? Thanks, Badari - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
about commit sector concept in journal
i am reading the scheme by which journal works. i came to know that after every transaction writen to the log file a 512 byte sector is writen back to the disk. this is treated as a commit block, and there is some sequence number in there that matches with all the previous transaction has been done. but how it is possible i can't understand . please tell me about the proper scheme of the commit block. somenath - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html