Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Ric Wheeler
Jeff Garzik wrote: Jamie Lokier wrote: By durable, I mean that fsync() should actually commit writes to physical stable storage, Yes, it should. I was surprised that fsync() doesn't do this already. There was a lot of effort put into block I/O write barriers during 2.5, so that

Re: [RFD] Incremental fsck

2008-01-14 Thread Ric Wheeler
Pavel Machek wrote: On Sat 2008-01-12 09:51:40, Theodore Tso wrote: On Wed, Jan 09, 2008 at 02:52:14PM +0300, Al Boldi wrote: Ok, but let's look at this a bit more opportunistic / optimistic. Even after a black-out shutdown, the corruption is pretty minimal, using ext3fs at least. After a

Re: [patch 1/1] Drop CAP_SYS_RAWIO requirement for FIBMAP

2007-11-01 Thread Ric Wheeler
Pavel Machek wrote: Hi! Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an open file descriptor. It would be nice to allow users to have permission to see where their data is landing on disk, and there really isn't a good reason to keep them from getting at this

Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-31 Thread Ric Wheeler
Zach Brown wrote: Can you clarify what you mean above with an example? I don't really follow. Sure, take 'tar' as an example. It'll read files in the order that their names are returned from directory listing. This can produce bad IO patterns because the order in which the file names are

Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-31 Thread Ric Wheeler
Zach Brown wrote: The second use case is to look at the physical layout of blocks on disk for a specific file, use Mark Lord's write_long patches to inject a disk error and then read that file to make sure that we are handling disk IO errors correctly. A bit obscure, but really quite useful.

Re: batching support for transactions

2007-10-03 Thread Ric Wheeler
Andreas Dilger wrote: On Oct 02, 2007 08:57 -0400, Ric Wheeler wrote: One thing that jumps out is that the way we currently batch synchronous work loads into transactions does really horrible things to performance for storage devices which have really low latency. For example, one a mid

Re: batching support for transactions

2007-10-03 Thread Ric Wheeler
Andreas Dilger wrote: On Oct 03, 2007 06:42 -0400, Ric Wheeler wrote: With 2 threads writing to the same directory, we instantly drop down to 234 files/sec. Is this with HZ=250? Yes - I assume that with HZ=1000 the batching would start to work again since the penalty for batching would only

batching support for transactions

2007-10-02 Thread Ric Wheeler
After several years of helping tune file systems for normal (ATA/S-ATA) drives, we have been doing some performance work on ext3 reiserfs on disk arrays. One thing that jumps out is that the way we currently batch synchronous work loads into transactions does really horrible things to

Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-07-13 Thread Ric Wheeler
} development; linux-fsdevel@vger.kernel.org; [EMAIL PROTECTED]; } [EMAIL PROTECTED]; Jens Axboe; David Chinner; Andreas Dilger } Subject: Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for } devices, filesystems, and dm/md. } } On Wed, 11 Jul 2007 18:44:21 EDT, Ric Wheeler said: } [EMAIL

Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-07-11 Thread Ric Wheeler
[EMAIL PROTECTED] wrote: On Tue, 10 Jul 2007 14:39:41 EDT, Ric Wheeler said: All of the high end arrays have non-volatile cache (read, on power loss, it is a promise that it will get all of your data out to permanent storage). You don't need to ask this kind of array to drain the cache

Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-07-10 Thread Ric Wheeler
Tejun Heo wrote: [ cc'ing Ric Wheeler for storage array thingie. Hi, whole thread is at http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/3344 ] I am actually on the list, just really, really far behind in the thread ;-) Hello, [EMAIL PROTECTED] wrote: but when you

Re: Testing framework

2007-04-23 Thread Ric Wheeler
Avishay Traeger wrote: On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote: For some time I had been working on this file system test framework. Now I have a implementation for the same and below is the explanation. Any comments are welcome. snip You may want to check out the paper

Linux 2007 File System IO Workshop notes talks

2007-04-10 Thread Ric Wheeler
We have some of the material reviewed and posted now from the IO FS workshop. USENIX has posted the talks at: http://www.usenix.org/events/lsf07/tech/tech.html A write up of the workshop went out at LWN and invoked a healthy discussion: http://lwn.net/Articles/226351/ At that LWN

Re: impact of 4k sector size on the IO FS stack

2007-03-12 Thread Ric Wheeler
Alan Cox wrote: First generation of 1K sector drives will continue to use the same 512-byte ATA sector size you are familiar with. A single 512-byte write will cause the drive to perform a read-modify-write cycle. This configuration is physical 1K sector, logical 512b sector. The problem

impact of 4k sector size on the IO FS stack

2007-03-11 Thread Ric Wheeler
During the recent IO/FS workshop, we spoke briefly about the coming change to a 4k sector size for disks on linux. If I recall correctly, the general feeling was that the impact was not significant since we already do most file system IO in 4k page sizes and should be fine as long as we

Re: end to end error recovery musings

2007-02-26 Thread Ric Wheeler
Alan wrote: the new location. I believe this should be always true, so presumably with all modern disk drives a write error should mean something very serious has happend. Not quite that simple. I think that write errors are normally quite serious, but there are exceptions which might

Re: end to end error recovery musings

2007-02-26 Thread Ric Wheeler
Jeff Garzik wrote: Theodore Tso wrote: Can someone with knowledge of current disk drive behavior confirm that for all drives that support bad block sparing, if an attempt to write to a particular spot on disk results in an error due to bad media at that spot, the disk drive will automatically

end to end error recovery musings

2007-02-23 Thread Ric Wheeler
In the IO/FS workshop, one idea we kicked around is the need to provide better and more specific error messages between the IO stack and the file system layer. My group has been working to stabilize a relatively up to date libata + MD based box, so I can try to lay out at least one appliance

Re: XFS corruption during power-blackout

2005-07-11 Thread Ric Wheeler
Jens Axboe wrote: On Fri, Jul 01 2005, Bryan Henderson wrote: Wouldn't a commercial class drive that ignores explicit flushes be infamous? I'm ready to accept that there are SCSI drives that cache writes in volatile storage by default (but frankly, I'm still skeptical), but I'm not