Re: [PATCH] ext34: ensure do_split leaves enough free space in both blocks

2007-09-18 Thread Stephen C. Tweedie
Hi, On Mon, 2007-09-17 at 11:06 -0500, Eric Sandeen wrote: > The do_split() function for htree dir blocks is intended to split a > leaf block to make room for a new entry. It sorts the entries in the > original block by hash value, then moves the last half of the entries to > the new block -

Re: [PATCH] ext34: ensure do_split leaves enough free space in both blocks

2007-09-18 Thread Stephen C. Tweedie
Hi, On Mon, 2007-09-17 at 11:06 -0500, Eric Sandeen wrote: The do_split() function for htree dir blocks is intended to split a leaf block to make room for a new entry. It sorts the entries in the original block by hash value, then moves the last half of the entries to the new block -

Re: [PATCH] [188/2many] MAINTAINERS - EXT4 FILE SYSTEM

2007-08-13 Thread Stephen C. Tweedie
Hi, On Mon, 2007-08-13 at 03:01 -0600, Andreas Dilger wrote: > To be honest, Stephen and Andrew haven't been directly involved in > the ext4 development. It probably makes more sense to have e.g. > Eric Sandeen, Ted Ts'o, and MingMing Cao in their place. Works for me. --Stephen - To

Re: [PATCH] [188/2many] MAINTAINERS - EXT4 FILE SYSTEM

2007-08-13 Thread Stephen C. Tweedie
Hi, On Mon, 2007-08-13 at 03:01 -0600, Andreas Dilger wrote: To be honest, Stephen and Andrew haven't been directly involved in the ext4 development. It probably makes more sense to have e.g. Eric Sandeen, Ted Ts'o, and MingMing Cao in their place. Works for me. --Stephen - To

Re: ext3fs: umount+sync not enough to guarantee metadata-on-disk

2007-06-12 Thread Stephen C. Tweedie
Hi, On Sun, 2007-06-10 at 18:27 +, Pavel Machek wrote: > > Once a f/s is read-only, there should be NO writing to > > it. Right? > > Linux happily writes to filesystems mounted read-only. It will replay > journal on them. Only at mount time, not on unmount; and it does check whether the

Re: ext3fs: umount+sync not enough to guarantee metadata-on-disk

2007-06-12 Thread Stephen C. Tweedie
Hi, On Sun, 2007-06-10 at 18:27 +, Pavel Machek wrote: Once a f/s is read-only, there should be NO writing to it. Right? Linux happily writes to filesystems mounted read-only. It will replay journal on them. Only at mount time, not on unmount; and it does check whether the

Re: ext3fs: umount+sync not enough to guarantee metadata-on-disk

2007-06-07 Thread Stephen C. Tweedie
Hi, On Thu, 2007-06-07 at 12:01 -0400, Mark Lord wrote: > >>mount /var/lib/mythtv -oremount,ro > >>sync > >>umount /var/lib/mythtv > > > > Did this succeed? If the application is still truncating that file, the > > umount should have failed. > > Actually, what I expect to happen

Re: ext3fs: umount+sync not enough to guarantee metadata-on-disk

2007-06-07 Thread Stephen C. Tweedie
Hi, On Thu, 2007-06-07 at 12:01 -0400, Mark Lord wrote: mount /var/lib/mythtv -oremount,ro sync umount /var/lib/mythtv Did this succeed? If the application is still truncating that file, the umount should have failed. Actually, what I expect to happen is for the

Re: EXT3-fs error (device hda8): ext3_free_blocks: Freeing blocks not in datazone

2005-09-05 Thread Stephen C. Tweedie
Hi, On Mon, 2005-09-05 at 17:24, Riccardo Castellani wrote: > I'm using FC3 with Kernel 2.6.12-1.1376. > After few hours file system on /dev/hda8 EXT3 partition has a problem so it > remounted in only read mode. > Sep 5 17:34:40 mrtg kernel: EXT3-fs error (device hda8): ext3_free_blocks: >

Re: [Linux-cluster] Re: GFS, what's remaining

2005-09-05 Thread Stephen C. Tweedie
Hi, On Sun, 2005-09-04 at 21:33, Pavel Machek wrote: > > - read-only mount > > - "specatator" mount (like ro but no journal allocated for the mount, > > no fencing needed for failed node that was mounted as specatator) > > I'd call it "real-read-only", and yes, that's very usefull > mount.

Re: [Linux-cluster] Re: GFS, what's remaining

2005-09-05 Thread Stephen C. Tweedie
Hi, On Sun, 2005-09-04 at 21:33, Pavel Machek wrote: - read-only mount - specatator mount (like ro but no journal allocated for the mount, no fencing needed for failed node that was mounted as specatator) I'd call it real-read-only, and yes, that's very usefull mount. Could we get it

Re: EXT3-fs error (device hda8): ext3_free_blocks: Freeing blocks not in datazone

2005-09-05 Thread Stephen C. Tweedie
Hi, On Mon, 2005-09-05 at 17:24, Riccardo Castellani wrote: I'm using FC3 with Kernel 2.6.12-1.1376. After few hours file system on /dev/hda8 EXT3 partition has a problem so it remounted in only read mode. Sep 5 17:34:40 mrtg kernel: EXT3-fs error (device hda8): ext3_free_blocks: Freeing

Re: [PATCH] Ext3 online resizing locking issue

2005-08-31 Thread Stephen C. Tweedie
Hi, On Wed, 2005-08-31 at 12:35, Glauber de Oliveira Costa wrote: > At a first look, i thought about locking gdt-related data. But in a > closer one, it seemed to me that we're in fact modifying a little bit > more than that in the resize code. But all these modifications seem to > be somehow

Re: [PATCH] Ext3 online resizing locking issue

2005-08-31 Thread Stephen C. Tweedie
Hi, On Wed, 2005-08-31 at 12:35, Glauber de Oliveira Costa wrote: At a first look, i thought about locking gdt-related data. But in a closer one, it seemed to me that we're in fact modifying a little bit more than that in the resize code. But all these modifications seem to be somehow

Re: [PATCH] Ext3 online resizing locking issue

2005-08-30 Thread Stephen C. Tweedie
Hi, On Thu, 2005-08-25 at 21:43, Glauber de Oliveira Costa wrote: > Just a question here. With s_lock held by the remount code, we're > altering the struct super_block, and believing we're safe. We try to > acquire it inside the resize functions, because we're trying to modify > this same data.

Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01

2005-08-30 Thread Stephen C. Tweedie
Hi, On Fri, 2005-08-26 at 12:20, Steven Rostedt wrote: > > could you try a), how clean does it get? Personally i'm much more in > > favor of cleanliness. On the vanilla kernel a spinlock is zero bytes on > > UP [the most RAM-sensitive platform], and it's a word on typical SMP. It's a word,

Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01

2005-08-30 Thread Stephen C. Tweedie
Hi, On Fri, 2005-08-26 at 05:24, Steven Rostedt wrote: > Well, I just spent several hours trying to use the b_update_lock in > implementing something to replace the bit spinlocks for RT. It's > getting really ugly and I just hit a stone wall. > > The problem is that I have two locks to work

Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01

2005-08-30 Thread Stephen C. Tweedie
Hi, On Fri, 2005-08-26 at 12:20, Steven Rostedt wrote: could you try a), how clean does it get? Personally i'm much more in favor of cleanliness. On the vanilla kernel a spinlock is zero bytes on UP [the most RAM-sensitive platform], and it's a word on typical SMP. It's a word, maybe;

Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01

2005-08-30 Thread Stephen C. Tweedie
Hi, On Fri, 2005-08-26 at 05:24, Steven Rostedt wrote: Well, I just spent several hours trying to use the b_update_lock in implementing something to replace the bit spinlocks for RT. It's getting really ugly and I just hit a stone wall. The problem is that I have two locks to work with. A

Re: [PATCH] Ext3 online resizing locking issue

2005-08-30 Thread Stephen C. Tweedie
Hi, On Thu, 2005-08-25 at 21:43, Glauber de Oliveira Costa wrote: Just a question here. With s_lock held by the remount code, we're altering the struct super_block, and believing we're safe. We try to acquire it inside the resize functions, because we're trying to modify this same data.

Re: [PATCH] Ext3 online resizing locking issue

2005-08-25 Thread Stephen C. Tweedie
Hi, On Wed, 2005-08-24 at 22:03, Glauber de Oliveira Costa wrote: > This simple patch provides a fix for a locking issue found in the online > resizing code. The problem actually happened while trying to resize the > filesystem trough the resize=xxx option in a remount. NAK, this is wrong: >

Re: [PATCH] Ext3 online resizing locking issue

2005-08-25 Thread Stephen C. Tweedie
Hi, On Wed, 2005-08-24 at 22:03, Glauber de Oliveira Costa wrote: This simple patch provides a fix for a locking issue found in the online resizing code. The problem actually happened while trying to resize the filesystem trough the resize=xxx option in a remount. NAK, this is wrong: +

Re: [-mm PATCH 2/32] fs: fix-up schedule_timeout() usage

2005-08-15 Thread Stephen C. Tweedie
Hi, On Mon, 2005-08-15 at 19:08, Nishanth Aravamudan wrote: > Description: Use schedule_timeout_{,un}interruptible() instead of > set_current_state()/schedule_timeout() to reduce kernel size. > +++ 2.6.13-rc5-mm1-dev/fs/jbd/transaction.c 2005-08-10 15:03:33.0 > -0700 > @@ -1340,8

Re: [-mm PATCH 2/32] fs: fix-up schedule_timeout() usage

2005-08-15 Thread Stephen C. Tweedie
Hi, On Mon, 2005-08-15 at 19:08, Nishanth Aravamudan wrote: Description: Use schedule_timeout_{,un}interruptible() instead of set_current_state()/schedule_timeout() to reduce kernel size. +++ 2.6.13-rc5-mm1-dev/fs/jbd/transaction.c 2005-08-10 15:03:33.0 -0700 @@ -1340,8 +1340,7

Inotify patch missed arch/x86_64/ia32/sys_ia32.c

2005-07-15 Thread Stephen C. Tweedie
Hi, The inotify patch just added a line + fsnotify_open(f->f_dentry); to sys_open, but it missed the x86_64 compatibility sys32_open() equivalent in arch/x86_64/ia32/sys_ia32.c. Andi, perhaps it's time to factor out the guts of sys_open from the flag munging to

Inotify patch missed arch/x86_64/ia32/sys_ia32.c

2005-07-15 Thread Stephen C. Tweedie
Hi, The inotify patch just added a line + fsnotify_open(f-f_dentry); to sys_open, but it missed the x86_64 compatibility sys32_open() equivalent in arch/x86_64/ia32/sys_ia32.c. Andi, perhaps it's time to factor out the guts of sys_open from the flag munging to

Re: [PATCH] kjournald() missing JFS_UNMOUNT check

2005-07-12 Thread Stephen C. Tweedie
Hi, On Mon, 2005-07-11 at 22:19, Mark Fasheh wrote: > Can we please merge this patch? I sent it to ext2-devel for comments > last week and haven't hear anything back. It seems trivially correct and is > testing fine - famous last words, I know :) ACK --- looks fine to me. --Stephen - To

Re: [PATCH] kjournald() missing JFS_UNMOUNT check

2005-07-12 Thread Stephen C. Tweedie
Hi, On Mon, 2005-07-11 at 22:19, Mark Fasheh wrote: Can we please merge this patch? I sent it to ext2-devel for comments last week and haven't hear anything back. It seems trivially correct and is testing fine - famous last words, I know :) ACK --- looks fine to me. --Stephen - To

Re: [PATCH] Fix race in do_get_write_access()

2005-07-11 Thread Stephen C. Tweedie
Hi, On Mon, 2005-07-11 at 17:10, Jan Kara wrote: > attached patch should fix the following race: ... > and we have sent wrong data to disk... We now clean the dirty buffer > flag under buffer lock in all cases and hence we know that whenever a buffer > is starting to be journaled we either

Re: [PATCH] Fix JBD race in t_forget list handling

2005-07-11 Thread Stephen C. Tweedie
Hi, On Mon, 2005-07-11 at 16:34, Jan Kara wrote: > attached patch should close the possible race between > journal_commit_transaction() and journal_unmap_buffer() (which adds > buffers to committing transaction's t_forget list) that could leave > some buffers on transaction's t_forget list

Re: [PATCH] Fix JBD race in t_forget list handling

2005-07-11 Thread Stephen C. Tweedie
Hi, On Mon, 2005-07-11 at 16:34, Jan Kara wrote: attached patch should close the possible race between journal_commit_transaction() and journal_unmap_buffer() (which adds buffers to committing transaction's t_forget list) that could leave some buffers on transaction's t_forget list (hence

Re: [PATCH] Fix race in do_get_write_access()

2005-07-11 Thread Stephen C. Tweedie
Hi, On Mon, 2005-07-11 at 17:10, Jan Kara wrote: attached patch should fix the following race: ... and we have sent wrong data to disk... We now clean the dirty buffer flag under buffer lock in all cases and hence we know that whenever a buffer is starting to be journaled we either

Re: ext3 allocate-with-reservation latencies

2005-04-18 Thread Stephen C. Tweedie
Hi, On Fri, 2005-04-15 at 21:32, Mingming Cao wrote: > Sorry for the delaying. I was not in office these days. No problem. > > > Also I am concerned about the possible > > > starvation on writers. > > In what way? > I was worried about the rw lock case.:) OK, so we're both on the same track

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-13 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 11:01, Hifumi Hisashi wrote: > I have measured the bh refcount before the buffer_uptodate() for a few days. > I found out that the bh refcount sometimes reached to 0 . > So, I think following modifications are effective. > > diff -Nru 2.4.30-rc3/fs/jbd/commit.c

Re: ext3 allocate-with-reservation latencies

2005-04-13 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-13 at 00:27, Mingming Cao wrote: > > I wonder if there's not a simple solution for this --- mark the window > > as "provisional", and if any other task tries to allocate in the space > > immediately following such a window, it needs to block until that window > > is released.

Re: ext3 allocate-with-reservation latencies

2005-04-13 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-13 at 00:27, Mingming Cao wrote: I wonder if there's not a simple solution for this --- mark the window as provisional, and if any other task tries to allocate in the space immediately following such a window, it needs to block until that window is released. Sounds

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-13 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 11:01, Hifumi Hisashi wrote: I have measured the bh refcount before the buffer_uptodate() for a few days. I found out that the bh refcount sometimes reached to 0 . So, I think following modifications are effective. diff -Nru 2.4.30-rc3/fs/jbd/commit.c

Re: Problem in log_do_checkpoint()?

2005-04-12 Thread Stephen C. Tweedie
Hi, On Mon, 2005-04-11 at 12:36, Jan Kara wrote: > > The prevention of multiple writes in this case should also improve > > performance a little. > > > > That ought to be pretty straightforward, I think. The existing cases > > where we remove buffers from a checkpoint shouldn't have to care

Re: [Ext2-devel] Re: OOM problems on 2.6.12-rc1 with many fsx tests

2005-04-12 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 02:23, Andrew Morton wrote: > Nobody has noticed the now-fixed leak since 2.6.6 and this one appears to > be 100x slower. Which is fortunate because this one is going to take a > long time to fix. I'll poke at it some more. OK, I'm now at the stage where I can kick

Re: ext3 allocate-with-reservation latencies

2005-04-12 Thread Stephen C. Tweedie
Hi, On Tue, 2005-04-12 at 07:41, Mingming Cao wrote: > > Note that this may improve average case latencies, but it's not likely > > to improve worst-case ones. We still need a write lock to install a new > > window, and that's going to have to wait for us to finish finding a free > > bit even

Re: ext3 allocate-with-reservation latencies

2005-04-12 Thread Stephen C. Tweedie
Hi, On Tue, 2005-04-12 at 07:41, Mingming Cao wrote: Note that this may improve average case latencies, but it's not likely to improve worst-case ones. We still need a write lock to install a new window, and that's going to have to wait for us to finish finding a free bit even if that

Re: [Ext2-devel] Re: OOM problems on 2.6.12-rc1 with many fsx tests

2005-04-12 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 02:23, Andrew Morton wrote: Nobody has noticed the now-fixed leak since 2.6.6 and this one appears to be 100x slower. Which is fortunate because this one is going to take a long time to fix. I'll poke at it some more. OK, I'm now at the stage where I can kick off

Re: Problem in log_do_checkpoint()?

2005-04-12 Thread Stephen C. Tweedie
Hi, On Mon, 2005-04-11 at 12:36, Jan Kara wrote: The prevention of multiple writes in this case should also improve performance a little. That ought to be pretty straightforward, I think. The existing cases where we remove buffers from a checkpoint shouldn't have to care about

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-11 Thread Stephen C. Tweedie
Hi, On Mon, 2005-04-11 at 21:46, Andrew Morton wrote: > "Stephen C. Tweedie" <[EMAIL PROTECTED]> wrote: > > > > Andrew, what was the exact illegal state of the pages you were seeing > > when fixing that recent leak? It looks like it's nothing more complex &g

Re: ext3 allocate-with-reservation latencies

2005-04-11 Thread Stephen C. Tweedie
Hi, On Mon, 2005-04-11 at 19:38, Mingming Cao wrote: > I agree. We should not skip the home block group of the file. I guess > what I was suggesting is, if allocation from the home group failed and > we continuing the linear search the rest of block groups, we could > probably try to skip the

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-11 Thread Stephen C. Tweedie
Hi, On Thu, 2005-04-07 at 16:51, Stephen C. Tweedie wrote: > I'm currently running with the buffer-trace debug patch, on 2.4, with a > custom patch to put every buffer jbd ever sees onto a per-superblock > list, and remove it only when the bh is destroyed in > put_unused

Re: ext3 allocate-with-reservation latencies

2005-04-11 Thread Stephen C. Tweedie
Hi, On Fri, 2005-04-08 at 19:10, Mingming Cao wrote: > > It still needs to be done under locking to prevent us from expanding > > over the next window, though. And having to take and drop a spinlock a > > dozen times or more just to find out that there are no usable free > > blocks in the

Re: Problem in log_do_checkpoint()?

2005-04-11 Thread Stephen C. Tweedie
Hi, On Fri, 2005-04-08 at 18:14, Badari Pulavarty wrote: > I get OOPs in log_do_checkpoint() while using ext3 quotas. > Is this anyway related to what you are working on ? > > Unable to handle kernel NULL pointer dereference at virtual address > Doesn't look like it, no. If we

Re: Problem in log_do_checkpoint()?

2005-04-11 Thread Stephen C. Tweedie
Hi, On Fri, 2005-04-08 at 18:14, Badari Pulavarty wrote: I get OOPs in log_do_checkpoint() while using ext3 quotas. Is this anyway related to what you are working on ? Unable to handle kernel NULL pointer dereference at virtual address Doesn't look like it, no. If we understand

Re: ext3 allocate-with-reservation latencies

2005-04-11 Thread Stephen C. Tweedie
Hi, On Fri, 2005-04-08 at 19:10, Mingming Cao wrote: It still needs to be done under locking to prevent us from expanding over the next window, though. And having to take and drop a spinlock a dozen times or more just to find out that there are no usable free blocks in the current block

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-11 Thread Stephen C. Tweedie
Hi, On Thu, 2005-04-07 at 16:51, Stephen C. Tweedie wrote: I'm currently running with the buffer-trace debug patch, on 2.4, with a custom patch to put every buffer jbd ever sees onto a per-superblock list, and remove it only when the bh is destroyed in put_unused_buffer_head(). At unmount

Re: ext3 allocate-with-reservation latencies

2005-04-11 Thread Stephen C. Tweedie
Hi, On Mon, 2005-04-11 at 19:38, Mingming Cao wrote: I agree. We should not skip the home block group of the file. I guess what I was suggesting is, if allocation from the home group failed and we continuing the linear search the rest of block groups, we could probably try to skip the block

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-11 Thread Stephen C. Tweedie
Hi, On Mon, 2005-04-11 at 21:46, Andrew Morton wrote: Stephen C. Tweedie [EMAIL PROTECTED] wrote: Andrew, what was the exact illegal state of the pages you were seeing when fixing that recent leak? It looks like it's nothing more complex than dirty buffers on an anon page. Correct

Re: ext3 allocate-with-reservation latencies

2005-04-08 Thread Stephen C. Tweedie
Hi, On Fri, 2005-04-08 at 00:37, Mingming Cao wrote: > Actually, we do not have to do an rbtree link and unlink for every > window we search. If the reserved window(old) has no free bit and the > new reservable window's is right after the old one, no need to unlink > the old window from the

Re: Problem in log_do_checkpoint()?

2005-04-08 Thread Stephen C. Tweedie
Hi, On Mon, 2005-04-04 at 10:04, Jan Kara wrote: > In log_do_checkpoint() we go through the t_checkpoint_list of a > transaction and call __flush_buffer() on each buffer. Suppose there is > just one buffer on the list and it is dirty. __flush_buffer() sees it and > puts it to an array of

Re: Problem in log_do_checkpoint()?

2005-04-08 Thread Stephen C. Tweedie
Hi, On Mon, 2005-04-04 at 10:04, Jan Kara wrote: In log_do_checkpoint() we go through the t_checkpoint_list of a transaction and call __flush_buffer() on each buffer. Suppose there is just one buffer on the list and it is dirty. __flush_buffer() sees it and puts it to an array of buffers

Re: ext3 allocate-with-reservation latencies

2005-04-08 Thread Stephen C. Tweedie
Hi, On Fri, 2005-04-08 at 00:37, Mingming Cao wrote: Actually, we do not have to do an rbtree link and unlink for every window we search. If the reserved window(old) has no free bit and the new reservable window's is right after the old one, no need to unlink the old window from the rbtree

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-07 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 21:10, Stephen C. Tweedie wrote: > However, 2.6 is suspected of still having leaks in ext3. To be certain > that we're not just backporting one of those to 2.4, we need to > understand who exactly is going to clean up these bh's if they are in > fact un

Re: ext3 allocate-with-reservation latencies

2005-04-07 Thread Stephen C. Tweedie
Hi, On Thu, 2005-04-07 at 09:14, Ingo Molnar wrote: > doesnt the first option also allow searches to be in parallel? In terms of CPU usage, yes. But either we use large windows, in which case we *can't* search remotely near areas of the disk in parallel; or we use small windows, in which case

Re: ext3 allocate-with-reservation latencies

2005-04-07 Thread Stephen C. Tweedie
Hi, On Thu, 2005-04-07 at 09:14, Ingo Molnar wrote: doesnt the first option also allow searches to be in parallel? In terms of CPU usage, yes. But either we use large windows, in which case we *can't* search remotely near areas of the disk in parallel; or we use small windows, in which case

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-07 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 21:10, Stephen C. Tweedie wrote: However, 2.6 is suspected of still having leaks in ext3. To be certain that we're not just backporting one of those to 2.4, we need to understand who exactly is going to clean up these bh's if they are in fact unused once we

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-06 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 11:01, Hifumi Hisashi wrote: > I have measured the bh refcount before the buffer_uptodate() for a few days. > I found out that the bh refcount sometimes reached to 0 . > So, I think following modifications are effective. Thanks --- it certainly looks like this should

Re: ext3 allocate-with-reservation latencies

2005-04-06 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 17:53, Mingming Cao wrote: > > Possible, but not necessarily nice. If you've got a nearly-full disk, > > most bits will be already allocated. As you scan the bitmaps, it may > > take quite a while to find a free bit; do you really want to (a) lock > > the whole block

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-06 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 11:01, Hifumi Hisashi wrote: > >Certainly it's normal for a short read/write to imply either error or > >EOF, without the error necessarily needing to be returned explicitly. > >I'm not convinced that the Singleunix language actually requires that, > >but it seems

Re: ext3 allocate-with-reservation latencies

2005-04-06 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 06:35, Mingming Cao wrote: > It seems we are holding the rsv_block while searching the bitmap for a > free bit. Probably something to avoid! > In alloc_new_reservation(), we first find a available to > create a reservation window, then we check the bitmap to see if

Re: ext3 allocate-with-reservation latencies

2005-04-06 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 06:35, Mingming Cao wrote: It seems we are holding the rsv_block while searching the bitmap for a free bit. Probably something to avoid! In alloc_new_reservation(), we first find a available to create a reservation window, then we check the bitmap to see if it

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-06 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 11:01, Hifumi Hisashi wrote: Certainly it's normal for a short read/write to imply either error or EOF, without the error necessarily needing to be returned explicitly. I'm not convinced that the Singleunix language actually requires that, but it seems the most

Re: ext3 allocate-with-reservation latencies

2005-04-06 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 17:53, Mingming Cao wrote: Possible, but not necessarily nice. If you've got a nearly-full disk, most bits will be already allocated. As you scan the bitmaps, it may take quite a while to find a free bit; do you really want to (a) lock the whole block group

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-06 Thread Stephen C. Tweedie
Hi, On Wed, 2005-04-06 at 11:01, Hifumi Hisashi wrote: I have measured the bh refcount before the buffer_uptodate() for a few days. I found out that the bh refcount sometimes reached to 0 . So, I think following modifications are effective. Thanks --- it certainly looks like this should fix

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-05 Thread Stephen C. Tweedie
Hi, On Wed, 2005-03-30 at 12:59, Marcelo Tosatti wrote: > > I'm not certain that this is right, but it seems possible and would > > explain the symptoms. Maybe Stephen or Andrew could comments? > > Andrew, Stephen? Sorry, was offline for a week last week; I'll try to look at this more closely

Re: OOM problems on 2.6.12-rc1 with many fsx tests

2005-04-05 Thread Stephen C. Tweedie
Hi, On Mon, 2005-04-04 at 02:35, Andrew Morton wrote: > Without the below patch it's possible to make ext3 leak at around a > megabyte per minute by arranging for the fs to run a commit every 50 > milliseconds, btw. Ouch! > (Stephen, please review...) Doing so now. > The patch teaches

Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

2005-04-05 Thread Stephen C. Tweedie
Hi, On Wed, 2005-03-30 at 12:59, Marcelo Tosatti wrote: I'm not certain that this is right, but it seems possible and would explain the symptoms. Maybe Stephen or Andrew could comments? Andrew, Stephen? Sorry, was offline for a week last week; I'll try to look at this more closely

Re: OOM problems on 2.6.12-rc1 with many fsx tests

2005-04-05 Thread Stephen C. Tweedie
Hi, On Mon, 2005-04-04 at 02:35, Andrew Morton wrote: Without the below patch it's possible to make ext3 leak at around a megabyte per minute by arranging for the fs to run a commit every 50 milliseconds, btw. Ouch! (Stephen, please review...) Doing so now. The patch teaches

Re: ext3 journalling BUG on full filesystem

2005-03-24 Thread Stephen C. Tweedie
Hi, On Thu, 2005-03-24 at 19:38, Chris Wright wrote: > OK, good to know. When I last checked you were working on a higher risk > yet more complete fix, and I thought we'd wait for that one to stabilize. > Looks like the one Jan attached is the better -stable candidate? Definitely; it's the one

Re: ext3 journalling BUG on full filesystem

2005-03-24 Thread Stephen C. Tweedie
Hi, On Thu, 2005-03-24 at 10:39, Jan Kara wrote: > Actually the patch you atached showed in the end as not covering all > the cases and so Stephen agreed to stay with the first try (attached) > which should cover all known cases (although it's not so nice). Right. The later patch is getting

Re: ext3 journalling BUG on full filesystem

2005-03-24 Thread Stephen C. Tweedie
Hi, On Thu, 2005-03-24 at 10:39, Jan Kara wrote: Actually the patch you atached showed in the end as not covering all the cases and so Stephen agreed to stay with the first try (attached) which should cover all known cases (although it's not so nice). Right. The later patch is getting

Re: ext3 journalling BUG on full filesystem

2005-03-24 Thread Stephen C. Tweedie
Hi, On Thu, 2005-03-24 at 19:38, Chris Wright wrote: OK, good to know. When I last checked you were working on a higher risk yet more complete fix, and I thought we'd wait for that one to stabilize. Looks like the one Jan attached is the better -stable candidate? Definitely; it's the one I

Re: [CHECKER] ext3 bug in ftruncate() with O_SYNC?

2005-03-23 Thread Stephen C. Tweedie
Hi, On Tue, 2005-03-22 at 03:51, Andrew Morton wrote: > The spec says "Write I/O operations on the file descriptor shall complete > as defined by synchronized I/O file integrity completion". > > Is ftruncate a "write I/O operation"? No. SUS seems to be pretty clear on this. The syscall

Re: [CHECKER] ext3 bug in ftruncate() with O_SYNC?

2005-03-23 Thread Stephen C. Tweedie
Hi, On Tue, 2005-03-22 at 03:51, Andrew Morton wrote: The spec says Write I/O operations on the file descriptor shall complete as defined by synchronized I/O file integrity completion. Is ftruncate a write I/O operation? No. SUS seems to be pretty clear on this. The syscall descriptions

e2fsprogs bug [was Re: ext2/3 file limits to avoid overflowing i_blocks]

2005-03-17 Thread Stephen C. Tweedie
Hi, On Thu, 2005-03-17 at 17:23, Stephen C. Tweedie wrote: > I wrote a small program to calculate the total indirect tree overhead > for any given file size, and 0x1ff7fffe000 turned out to be the largest > file we can get without the total i_blocks overflowing 2^32. > >

[Patch] ext2/3 file limits to avoid overflowing i_blocks

2005-03-17 Thread Stephen C. Tweedie
Hi all, As discussed before, we can overflow i_blocks in ext2/ext3 inodes by growing a file up to 2TB. That gives us 2^32 sectors of data in the file; but once you add on the indirect tree and possible EA/ACL metadata, i_blocks will wrap beyond 2^32. Consensus seemed to be that the best way to

[Patch] ext2/3 file limits to avoid overflowing i_blocks

2005-03-17 Thread Stephen C. Tweedie
Hi all, As discussed before, we can overflow i_blocks in ext2/ext3 inodes by growing a file up to 2TB. That gives us 2^32 sectors of data in the file; but once you add on the indirect tree and possible EA/ACL metadata, i_blocks will wrap beyond 2^32. Consensus seemed to be that the best way to

e2fsprogs bug [was Re: ext2/3 file limits to avoid overflowing i_blocks]

2005-03-17 Thread Stephen C. Tweedie
Hi, On Thu, 2005-03-17 at 17:23, Stephen C. Tweedie wrote: I wrote a small program to calculate the total indirect tree overhead for any given file size, and 0x1ff7fffe000 turned out to be the largest file we can get without the total i_blocks overflowing 2^32. But in testing, that *just

Re: Devices/Partitions over 2TB

2005-03-16 Thread Stephen C. Tweedie
Hi, On Tue, 2005-03-15 at 04:54, jmerkey wrote: > Good Question. Where are the standard tools in FC2 and FC3 for these types? For LVM, the lvm2 package contains all the necessary tools. I know Alasdair did some kernel fixes for lvm2 striping on >2TB partitions recently, though, so older

Re: Devices/Partitions over 2TB

2005-03-16 Thread Stephen C. Tweedie
Hi, On Tue, 2005-03-15 at 04:54, jmerkey wrote: Good Question. Where are the standard tools in FC2 and FC3 for these types? For LVM, the lvm2 package contains all the necessary tools. I know Alasdair did some kernel fixes for lvm2 striping on 2TB partitions recently, though, so older kernels

Re: [RFC] ext3/jbd race: releasing in-use journal_heads

2005-03-09 Thread Stephen C. Tweedie
Hi, On Wed, 2005-03-09 at 13:28, Jan Kara wrote: > Hmm. I see for example a place at jbd/commit.c, line 287 (which you > did not change in your patch) which does this and doesn't seem to be > protected against journal_unmap_buffer() (but maybe I miss something). > Not that I'd find that race

Re: [RFC] ext3/jbd race: releasing in-use journal_heads

2005-03-09 Thread Stephen C. Tweedie
Hi, On Tue, 2005-03-08 at 15:12, Jan Kara wrote: > Isn't also the following scenario dangerous? > > __journal_unfile_buffer(jh); > journal_remove_journal_head(bh); It depends. I think the biggest problem here is that there's really no written rule protecting this stuff universally. But

Re: [RFC] ext3/jbd race: releasing in-use journal_heads

2005-03-09 Thread Stephen C. Tweedie
Hi, On Tue, 2005-03-08 at 15:12, Jan Kara wrote: Isn't also the following scenario dangerous? __journal_unfile_buffer(jh); journal_remove_journal_head(bh); It depends. I think the biggest problem here is that there's really no written rule protecting this stuff universally. But in

Re: [RFC] ext3/jbd race: releasing in-use journal_heads

2005-03-09 Thread Stephen C. Tweedie
Hi, On Wed, 2005-03-09 at 13:28, Jan Kara wrote: Hmm. I see for example a place at jbd/commit.c, line 287 (which you did not change in your patch) which does this and doesn't seem to be protected against journal_unmap_buffer() (but maybe I miss something). Not that I'd find that race

Re: [RFC] ext3/jbd race: releasing in-use journal_heads

2005-03-08 Thread Stephen C. Tweedie
Hi, On Mon, 2005-03-07 at 21:08, Stephen C. Tweedie wrote: > Right, that was what I was thinking might be possible. But for now I've > just done the simple patch --- make sure we don't clear > jh->b_transaction when we're just refiling buffers from one list to > another. T

Re: Linux 2.6.11-ac1

2005-03-08 Thread Stephen C. Tweedie
Hi, On Tue, 2005-03-08 at 06:49, Chris Wright wrote: > Yes, we are intending to pick up bits from -ac (you might have missed > that in another thread). There's actually a successor patch to that which I'm just about to get feedback on here and on ext2-devel. It's higher-risk than the one Alan

[PATCH] invalidate/o_direct livelock {was Re: [RFC] ext3/jbd race: releasing in-use journal_heads}

2005-03-08 Thread Stephen C. Tweedie
Hi, On Tue, 2005-03-08 at 09:28, Stephen C. Tweedie wrote: > I think it should be OK just to move the page->mapping != mapping test > above the page>index > end test. Sure, if all the pages have been > stolen by the time we see them, then we'll repeat without advancing > &q

Re: [RFC] ext3/jbd race: releasing in-use journal_heads

2005-03-08 Thread Stephen C. Tweedie
Hi, On Mon, 2005-03-07 at 23:50, Andrew Morton wrote: > truncate_inode_pages_range() seems to dtrt here. Can we do it in the same > manner in invalidate_inode_pages2_range()? > > > Something like: > - if (page->mapping != mapping || page->index > end) { > +

Re: [RFC] ext3/jbd race: releasing in-use journal_heads

2005-03-08 Thread Stephen C. Tweedie
Hi, On Mon, 2005-03-07 at 23:50, Andrew Morton wrote: truncate_inode_pages_range() seems to dtrt here. Can we do it in the same manner in invalidate_inode_pages2_range()? Something like: - if (page-mapping != mapping || page-index end) { +

[PATCH] invalidate/o_direct livelock {was Re: [RFC] ext3/jbd race: releasing in-use journal_heads}

2005-03-08 Thread Stephen C. Tweedie
Hi, On Tue, 2005-03-08 at 09:28, Stephen C. Tweedie wrote: I think it should be OK just to move the page-mapping != mapping test above the pageindex end test. Sure, if all the pages have been stolen by the time we see them, then we'll repeat without advancing next; but we're still making

Re: Linux 2.6.11-ac1

2005-03-08 Thread Stephen C. Tweedie
Hi, On Tue, 2005-03-08 at 06:49, Chris Wright wrote: Yes, we are intending to pick up bits from -ac (you might have missed that in another thread). There's actually a successor patch to that which I'm just about to get feedback on here and on ext2-devel. It's higher-risk than the one Alan

Re: [RFC] ext3/jbd race: releasing in-use journal_heads

2005-03-08 Thread Stephen C. Tweedie
Hi, On Mon, 2005-03-07 at 21:08, Stephen C. Tweedie wrote: Right, that was what I was thinking might be possible. But for now I've just done the simple patch --- make sure we don't clear jh-b_transaction when we're just refiling buffers from one list to another. That should have

Re: [RFC] ext3/jbd race: releasing in-use journal_heads

2005-03-07 Thread Stephen C. Tweedie
Hi, On Mon, 2005-03-07 at 21:22, Stephen C. Tweedie wrote: > altgr-scrlck is showing a range of EIPs all in ext3_direct_IO-> > invalidate_inode_pages2_range(). I'm seeing > > invalidate_inode_pages2_range()->pagevec_lookup()->find_get_pages() In invalidate_inode_pages2_

Re: [RFC] ext3/jbd race: releasing in-use journal_heads

2005-03-07 Thread Stephen C. Tweedie
Hi, On Mon, 2005-03-07 at 21:11, Andrew Morton wrote: > > I'm having trouble testing it, though --- I seem to be getting livelocks > > in O_DIRECT running 400 fsstress processes in parallel; ring any bells? > > Nope. I dont think anyone has been that cruel to ext3 for a while. > I assume

Re: [RFC] ext3/jbd race: releasing in-use journal_heads

2005-03-07 Thread Stephen C. Tweedie
Hi, On Mon, 2005-03-07 at 20:31, Andrew Morton wrote: > jbd_lock_bh_journal_head() is supposed to be a > finegrained innermost lock whose mandate is purely for atomicity of adding > and removing the journal_head and the b_jcount refcounting. I don't recall > there being any deeper meaning than

  1   2   3   4   5   6   7   >