Re: [PATCH 00/37] Permit filesystem local caching

2008-02-22 Thread Chris Mason
On Thursday 21 February 2008, David Howells wrote: David Howells [EMAIL PROTECTED] wrote: Have you got before/after benchmark results? See attached. Attached here are results using BTRFS (patched so that it'll work at all) rather than Ext3 on the client on the partition backing the

[ANNOUNCE] Btrfs v0.13

2008-02-21 Thread Chris Mason
Hello everyone, Btrfs v0.13 is now available for download from: http://oss.oracle.com/projects/btrfs/ We took another short break from the multi-device code to make the minor mods required to compile on 2.6.25, fix some problematic bugs and do more tuning. The most important fix is for file

Re: [ANNOUNCE] Btrfs v0.13

2008-02-21 Thread Chris Mason
On Thursday 21 February 2008, Chris Mason wrote: Hello everyone, Btrfs v0.13 is now available for download from: http://oss.oracle.com/projects/btrfs/ We took another short break from the multi-device code to make the minor mods required to compile on 2.6.25, fix some problematic bugs

Re: very poor ext3 write performance on big filesystems?

2008-02-19 Thread Chris Mason
On Tuesday 19 February 2008, Tomasz Chmielewski wrote: Theodore Tso schrieb: (...) The following ld_preload can help in some cases. Mutt has this hack encoded in for maildir directories, which helps. It doesn't work very reliable for me. For some reason, it hangs for me sometimes

Re: very poor ext3 write performance on big filesystems?

2008-02-19 Thread Chris Mason
On Tuesday 19 February 2008, Tomasz Chmielewski wrote: Chris Mason schrieb: On Tuesday 19 February 2008, Tomasz Chmielewski wrote: Theodore Tso schrieb: (...) The following ld_preload can help in some cases. Mutt has this hack encoded in for maildir directories, which helps

Re: BTRFS only works with PAGE_SIZE = 4K

2008-02-12 Thread Chris Mason
On Tuesday 12 February 2008, David Miller wrote: From: Chris Mason [EMAIL PROTECTED] Date: Wed, 6 Feb 2008 12:00:13 -0500 So, here's v0.12. Any page size larger than 4K will not work with btrfs. All of the extent stuff assumes that PAGE_SIZE = sectorsize. Yeah, there is definitely clean

Re: BTRFS partition usage...

2008-02-12 Thread Chris Mason
On Tuesday 12 February 2008, Jan Engelhardt wrote: On Feb 12 2008 09:08, Chris Mason wrote: So, if Btrfs starts zeroing at 1k, will that be acceptable for you? Something looks wrong here. Why would btrfs need to zero at all? Superblock at 0, and done. Just like xfs. (Yes, I had xfs

Re: BTRFS partition usage...

2008-02-12 Thread Chris Mason
On Tuesday 12 February 2008, Jan Engelhardt wrote: On Feb 12 2008 08:49, Chris Mason wrote: This is a real issue on sparc where the default sun disk labels created use an initial partition where block zero aliases the disk label. It took me a few iterations before I figured out why

Re: BTRFS partition usage...

2008-02-12 Thread Chris Mason
On Tuesday 12 February 2008, David Miller wrote: From: David Miller [EMAIL PROTECTED] Date: Mon, 11 Feb 2008 23:21:39 -0800 (PST) Filesystems like ext2 put their superblock 1 block into the partition in order to avoid overwriting disk labels and other uglies. UFS does this too, as do

[ANNOUNCE] Btrfs v0.12 released

2008-02-06 Thread Chris Mason
Hello everyone, I wasn't planning on releasing v0.12 yet, and it was supposed to have some initial support for multiple devices. But, I have made a number of performance fixes and small bug fixes, and I wanted to get them out there before the (destabilizing) work on multiple-devices took

Re: [RFC] ext3: per-process soft-syncing data=ordered mode

2008-01-31 Thread Chris Mason
On Thursday 31 January 2008, Jan Kara wrote: On Thu 31-01-08 11:56:01, Chris Mason wrote: On Thursday 31 January 2008, Al Boldi wrote: Andreas Dilger wrote: On Wednesday 30 January 2008, Al Boldi wrote: And, a quick test of successive 1sec delayed syncs shows no hangs until

Re: [RFC] ext3: per-process soft-syncing data=ordered mode

2008-01-31 Thread Chris Mason
On Thursday 31 January 2008, Al Boldi wrote: Andreas Dilger wrote: On Wednesday 30 January 2008, Al Boldi wrote: And, a quick test of successive 1sec delayed syncs shows no hangs until about 1 minute (~180mb) of db-writeout activity, when the sync abruptly hangs for minutes on end, and

Re: lockdep warning with LTP dio test (v2.6.24-rc6-125-g5356f66)

2008-01-25 Thread Chris Mason
On Friday 25 January 2008, Jan Kara wrote: If ext3's DIO code only touches transactions in get_block, then it can violate data=ordered rules. Basically the transaction that allocates the blocks might commit before the DIO code gets around to writing them. A crash in the wrong place

Re: konqueror deadlocks on 2.6.22

2008-01-22 Thread Chris Mason
On Tuesday 22 January 2008, Al Boldi wrote: Ingo Molnar wrote: * Oliver Pinter (Pintér Olivér) [EMAIL PROTECTED] wrote: and then please update to CFS-v24.1 http://people.redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.6.22.15-v24. 1 .patch Yes with CFSv20.4, as in the log.

Re: konqueror deadlocks on 2.6.22

2008-01-22 Thread Chris Mason
On Tuesday 22 January 2008, Al Boldi wrote: Chris Mason wrote: Running fsync in data=ordered means that all of the dirty blocks on the FS will get written before fsync returns. Hm, that's strange, I expected this kind of behaviour from data=journal. data=writeback should return immediatly

Re: [Btrfs-devel] [ANNOUNCE] Btrfs v0.10 (online growing/shrinking, ext3 conversion, and more)

2008-01-18 Thread Chris mason
On Thursday 17 January 2008, Christian Hesse wrote: On Thursday 17 January 2008, Chris mason wrote: So, I've put v0.11 out there. Ok, back to the suspend problem I mentioned: [ oopsen ] I get this after a suspend/resume cycle with mounted btrfs. Looks like metadata corruption. How

Re: [Btrfs-devel] [ANNOUNCE] Btrfs v0.10 (online growing/shrinking, ext3 conversion, and more)

2008-01-17 Thread Chris mason
On Tuesday 15 January 2008, Chris Mason wrote: Hello everyone, Btrfs v0.10 is now available for download from: http://oss.oracle.com/projects/btrfs/ Well, it turns out this release had a few small problems: * data=ordered deadlock on older kernels (including 2.6.23) * Compile problems when

Re: [Btrfs-devel] [ANNOUNCE] Btrfs v0.10 (online growing/shrinking, ext3 conversion, and more)

2008-01-17 Thread Chris mason
On Thursday 17 January 2008, Daniel Phillips wrote: On Jan 17, 2008 1:25 PM, Chris mason [EMAIL PROTECTED] wrote: So, I've put v0.11 out there. It fixes those two problems and will also compile on older (2.6.18) enterprise kernels. v0.11 does not have any disk format changes. Hi Chris

[ANNOUNCE] Btrfs v0.10 (online growing/shrinking, ext3 conversion, and more)

2008-01-15 Thread Chris Mason
Hello everyone, Btrfs v0.10 is now available for download from: http://oss.oracle.com/projects/btrfs/ Btrfs is still in an early alpha state, and the disk format is not finalized. v0.10 introduces a new disk format, and is not compatible with v0.9. The core of this release is explicit back

Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)

2008-01-15 Thread Chris Mason
On Tue, 15 Jan 2008 20:24:27 -0500 Daniel Phillips [EMAIL PROTECTED] wrote: On Jan 15, 2008 7:15 PM, Alan Cox [EMAIL PROTECTED] wrote: Writeback cache on disk in iteself is not bad, it only gets bad if the disk is not engineered to save all its dirty cache on power loss, using the disk

Re: lockdep warning with LTP dio test (v2.6.24-rc6-125-g5356f66)

2008-01-14 Thread Chris Mason
On Mon, 14 Jan 2008 18:06:09 +0100 Jan Kara [EMAIL PROTECTED] wrote: On Wed 02-01-08 12:42:19, Zach Brown wrote: Erez Zadok wrote: Setting: ltp-full-20071031, dio01 test on ext3 with Linus's latest tree. Kernel w/ SMP, preemption, and lockdep configured. This is a real lock ordering

Re: [PATCH][RFC] fast file mapping for loop

2008-01-11 Thread Chris Mason
On Fri, 11 Jan 2008 10:01:18 +1100 Neil Brown [EMAIL PROTECTED] wrote: On Thursday January 10, [EMAIL PROTECTED] wrote: On Thu, Jan 10 2008, Chris Mason wrote: On Thu, 10 Jan 2008 09:31:31 +0100 Jens Axboe [EMAIL PROTECTED] wrote: On Wed, Jan 09 2008, Alasdair G Kergon wrote

Re: [PATCH][RFC] fast file mapping for loop

2008-01-10 Thread Chris Mason
On Thu, 10 Jan 2008 09:31:31 +0100 Jens Axboe [EMAIL PROTECTED] wrote: On Wed, Jan 09 2008, Alasdair G Kergon wrote: Here's the latest version of dm-loop, for comparison. To try it out, ln -s dmsetup dmlosetup and supply similar basic parameters to losetup. (using dmsetup version

Re: [PATCH][RFC] fast file mapping for loop

2008-01-10 Thread Chris Mason
On Thu, 10 Jan 2008 08:54:59 + Christoph Hellwig [EMAIL PROTECTED] wrote: On Thu, Jan 10, 2008 at 09:44:57AM +0100, Jens Axboe wrote: IMHO this shouldn't be done in the loop driver anyway. Filesystems have their own effricient extent lookup trees (well, at least xfs and btrfs do),

Re: [PATCH][RFC] fast file mapping for loop

2008-01-10 Thread Chris Mason
On Thu, 10 Jan 2008 14:03:24 +0100 Jens Axboe [EMAIL PROTECTED] wrote: On Thu, Jan 10 2008, Chris Mason wrote: On Thu, 10 Jan 2008 08:54:59 + Christoph Hellwig [EMAIL PROTECTED] wrote: On Thu, Jan 10, 2008 at 09:44:57AM +0100, Jens Axboe wrote: IMHO this shouldn't be done

Re: [PATCH][RFC] fast file mapping for loop

2008-01-09 Thread Chris Mason
On Wed, 9 Jan 2008 10:43:21 +0100 Jens Axboe [EMAIL PROTECTED] wrote: On Wed, Jan 09 2008, Christoph Hellwig wrote: On Wed, Jan 09, 2008 at 09:52:32AM +0100, Jens Axboe wrote: - The file block mappings must not change while loop is using the file. This means that we have to ensure

[ANNOUNCE] Btrfs v0.9

2007-12-04 Thread Chris Mason
Hello everyone, I've just tagged and released Btrfs v0.9. Special thanks to Yan Zheng and Josef Bacik for their work. This release includes a number of disk format changes from v0.8 and also a small change from recent btrfs-unstable HG trees. So, if you have existing Btrfs filesystems, you

Reminder: Last day for submissions to the Storage and Filesystem Workshop.

2007-12-03 Thread Chris Mason
Hello everyone, The deadline for position statements to the Linux Storage and Filesystem Workshop is here. Submitting a position statement is an easy way for you to tell the organizers that you would like to attend, and which topics you are most interesting in. You can find all the details

Reminder: Linux Storage and Filesystem Workshop

2007-11-26 Thread Chris Mason
Hello everyone, The deadline for position statements to the Linux Storage and Filesystem Workshop is quickly approaching. The position statements are an easy way for you to tell the organizers that you would like to attend, and which topics you are most interesting in. You can find all the

Re: migratepage failures on reiserfs

2007-11-05 Thread Chris Mason
On Mon, 5 Nov 2007 10:23:35 + [EMAIL PROTECTED] (Mel Gorman) wrote: On (01/11/07 10:10), Badari Pulavarty didst pronounce: Hmpf, my first reply had a paragraph about the block device inode pages, I noticed the phrase file data pages and deleted it ;) But, for the metadata

2008 Linux Storage and Filesystem Workshop

2007-11-05 Thread Chris Mason
Hello everyone, The position statement submission system for the 2008 storage and filesystem workshop is now online. This is how you let us know you're interested in attending and what topics are most important for discussion. For all the details, please see:

Re: migratepage failures on reiserfs

2007-11-01 Thread Chris Mason
On Thu, 01 Nov 2007 08:38:57 -0800 Badari Pulavarty [EMAIL PROTECTED] wrote: On Wed, 2007-10-31 at 13:40 -0400, Chris Mason wrote: On Wed, 31 Oct 2007 08:14:21 -0800 Badari Pulavarty [EMAIL PROTECTED] wrote: I tried data=writeback mode and it didn't help :( Ouch, so much

Re: migratepage failures on reiserfs

2007-10-31 Thread Chris Mason
On Wed, 31 Oct 2007 08:14:21 -0800 Badari Pulavarty [EMAIL PROTECTED] wrote: I tried data=writeback mode and it didn't help :( Ouch, so much for the easy way out. unable to release the page 262070 bh c000211b9408 flags 110029 count 1 private 0 unable to release the page 262098 bh

Re: migratepage failures on reiserfs

2007-10-30 Thread Chris Mason
On Tue, 30 Oct 2007 10:27:04 -0800 Badari Pulavarty [EMAIL PROTECTED] wrote: Hi, While testing hotplug memory remove, I ran into this issue. Given a range of pages hotplug memory remove tries to migrate those pages. migrate_pages() keeps failing to migrate pages containing pagecache

Re: migratepage failures on reiserfs

2007-10-30 Thread Chris Mason
On Tue, 30 Oct 2007 13:54:05 -0800 Badari Pulavarty [EMAIL PROTECTED] wrote: On Tue, 2007-10-30 at 13:54 -0400, Chris Mason wrote: On Tue, 30 Oct 2007 10:27:04 -0800 Badari Pulavarty [EMAIL PROTECTED] wrote: Hi, While testing hotplug memory remove, I ran into this issue. Given

Re: [patch 4/6][RFC] Attempt to plug race with truncate

2007-10-29 Thread Chris Mason
On Fri, 26 Oct 2007 16:37:36 -0700 Mike Waychison [EMAIL PROTECTED] wrote: Attempt to deal with races with truncate paths. I'm not really sure on the locking here, but these seem to be taken by the truncate path. BKL is left as some filesystem may(?) still require it. Signed-off-by:

Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Chris Mason
On Sat, 27 Oct 2007 18:57:06 +0100 Anton Altaparmakov [EMAIL PROTECTED] wrote: Hi, -bmap is ugly and horrible! If you have to do this at the very least please cause -bmap64 to be able to return error values in case the file system failed to get the information or indeed such information

Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Chris Mason
On Mon, 29 Oct 2007 12:18:22 -0700 Mike Waychison [EMAIL PROTECTED] wrote: Zach Brown wrote: And another of my pet peeves with -bmap is that it uses 0 to mean sparse which causes a conflict on NTFS at least as block zero is part of the $Boot system file so it is a real, valid block...

[CFP] 2008 Linux Storage and Filesystem Workshop

2007-10-24 Thread Chris Mason
Hello everyone, We are organizing another filesystem and storage workshop in San Jose next Feb 25 and 26. You can find some great writeups of last year's conference on LWN: http://lwn.net/Articles/226351/ This year we're trying to concentrate on more problem solving sessions, short term

Re: [PATCH] reiserfs: don't drop PG_dirty when releasing sub-page-sized dirty file

2007-10-23 Thread Chris Mason
On Tue, 23 Oct 2007 19:56:20 +0800 Fengguang Wu [EMAIL PROTECTED] wrote: On Tue, Oct 23, 2007 at 12:07:07PM +0200, Peter Zijlstra wrote: [ adding reiserfs devs to the CC ] Thank you. This fix is kind of crude - even when it fixed Maxim's problem, and survived my stress testing of a lot

Re: More Large blocksize benchmarks

2007-10-16 Thread Chris Mason
On Tue, 2007-10-16 at 12:36 +1000, David Chinner wrote: On Mon, Oct 15, 2007 at 08:22:31PM -0400, Chris Mason wrote: Hello everyone, I'm stealing the cc list and reviving and old thread because I've finally got some numbers to go along with the Btrfs variable blocksize feature

More Large blocksize benchmarks

2007-10-15 Thread Chris Mason
Hello everyone, I'm stealing the cc list and reviving and old thread because I've finally got some numbers to go along with the Btrfs variable blocksize feature. The basic idea is to create a read/write interface to map a range of bytes on the address space, and use it in Btrfs for all metadata

Correct behavior on O_DIRECT sparse file writes

2007-10-12 Thread Chris Mason
Hello everyone, The test below creates a sparse file and then fills a hole with O_DIRECT. As far as I can tell from reading generic_osync_inode, the filesystem metadata is only forced to disk if i_size changes during the file write. I've tested ext3, xfs and reiserfs and they all skip the

Re: [PATCH 0/6] writeback time order/delay fixes take 3

2007-08-28 Thread Chris Mason
:23:14PM -0400, Chris Mason wrote: Notes: (1) I'm not sure inode number is correlated to disk location in filesystems other than ext2/3/4. Or parent dir? The correspond to the exact location on disk on XFS. But, XFS has it's own inode clustering (see xfs_iflush

Re: [PATCH 0/6] writeback time order/delay fixes take 3

2007-08-28 Thread Chris Mason
On Wed, 29 Aug 2007 02:33:08 +1000 David Chinner [EMAIL PROTECTED] wrote: On Tue, Aug 28, 2007 at 11:08:20AM -0400, Chris Mason wrote: I wonder if XFS can benefit any more from the general writeback clustering. How large would be a typical XFS cluster? Depends on inode size

Re: [PATCH 0/6] writeback time order/delay fixes take 3

2007-08-23 Thread Chris Mason
On Thu, 23 Aug 2007 12:47:23 +1000 David Chinner [EMAIL PROTECTED] wrote: On Wed, Aug 22, 2007 at 08:42:01AM -0400, Chris Mason wrote: I think we should assume a full scan of s_dirty is impossible in the presence of concurrent writers. We want to be able to pick a start time (right now

[ANNOUNCE] seekwatcher v0.3 IO graphing an animation

2007-07-27 Thread Chris Mason
Hello everyone, I've tossed out seekwatcher v0.3. The major changes are using rolling averages to smooth out the seek and throughput graphs, and it can generate mpgs of the IO done by a given trace. Here's a sample of the smoother graphs (creating 20 kernel trees):

Re: [PATCH RFC] extent mapped page cache

2007-07-26 Thread Chris Mason
On Thu, 26 Jul 2007 04:36:39 +0200 Nick Piggin [EMAIL PROTECTED] wrote: [ are state trees a good idea? ] One thing it gains us is finding the start of the cluster. Even if called by kswapd, the state tree allows writepage to find the start of the cluster and send down a big bio (provided

Re: [PATCH RFC] extent mapped page cache

2007-07-25 Thread Chris Mason
On Wed, 25 Jul 2007 04:32:17 +0200 Nick Piggin [EMAIL PROTECTED] wrote: On Tue, Jul 24, 2007 at 07:25:09PM -0400, Chris Mason wrote: On Tue, 24 Jul 2007 23:25:43 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: The tree is a critical part of the patch, but it is also the easiest to rip

Re: [PATCH RFC] extent mapped page cache

2007-07-25 Thread Chris Mason
On Thu, 26 Jul 2007 03:37:28 +0200 Nick Piggin [EMAIL PROTECTED] wrote: One advantage to the state tree is that it separates the state from the memory being described, allowing a simple kmap style interface that covers subpages, highmem and superpages. I suppose so, although we should

[PATCH RFC] extent mapped page cache

2007-07-24 Thread Chris Mason
On Tue, 10 Jul 2007 17:03:26 -0400 Chris Mason [EMAIL PROTECTED] wrote: This patch aims to demonstrate one way to replace buffer heads with a few extent trees. Buffer heads provide a few different features: 1) Mapping of logical file offset to blocks on disk 2) Recording state (dirty

[PATCH RFC] extent mapped page cache main code

2007-07-24 Thread Chris Mason
Core Extentmap implementation diff -r 126111346f94 -r 53cabea328f7 fs/Makefile --- a/fs/Makefile Mon Jul 09 10:53:57 2007 -0400 +++ b/fs/Makefile Tue Jul 24 15:40:27 2007 -0400 @@ -11,7 +11,7 @@ obj-y := open.o read_write.o file_table. attr.o bad_inode.o file.o

[PATCH RFC] ext2 extentmap support

2007-07-24 Thread Chris Mason
mount -o extentmap to use the new stuff diff -r 126111346f94 -r 53cabea328f7 fs/ext2/ext2.h --- a/fs/ext2/ext2.hMon Jul 09 10:53:57 2007 -0400 +++ b/fs/ext2/ext2.hTue Jul 24 15:40:27 2007 -0400 @@ -1,5 +1,6 @@ #include linux/fs.h #include linux/ext2_fs.h +#include linux/extent_map.h

Re: [PATCH RFC] extent mapped page cache

2007-07-24 Thread Chris Mason
On Tue, 24 Jul 2007 23:25:43 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: On Tue, 2007-07-24 at 16:13 -0400, Trond Myklebust wrote: On Tue, 2007-07-24 at 16:00 -0400, Chris Mason wrote: On Tue, 10 Jul 2007 17:03:26 -0400 Chris Mason [EMAIL PROTECTED] wrote: This patch aims

[ANNOUNCE] seekwatcher IO graphing v0.2

2007-07-23 Thread Chris Mason
Hello everyone, Since doing the initial Btrfs benchmarks, I've made my blktrace graphing utility a little more generic and tossed it out on oss.oracle.com. This new version can easily graph two different runs, and has a few other tweaks that make the graphs look nicer. Docs, examples and other

Re: [PATCH RFC] extent mapped page cache

2007-07-18 Thread Chris Mason
On Thu, 12 Jul 2007 00:00:28 -0700 Daniel Phillips [EMAIL PROTECTED] wrote: On Tuesday 10 July 2007 14:03, Chris Mason wrote: This patch aims to demonstrate one way to replace buffer heads with a few extent trees... Hi Chris, Quite terse commentary on algorithms and data structures

[PATCH RFC] extent mapped page cache

2007-07-10 Thread Chris Mason
This patch aims to demonstrate one way to replace buffer heads with a few extent trees. Buffer heads provide a few different features: 1) Mapping of logical file offset to blocks on disk 2) Recording state (dirty, locked etc) 3) Providing a mechanism to access sub-page sized blocks. This patch

Re: vm/fs meetup details

2007-07-06 Thread Chris Mason
On Fri, 6 Jul 2007 23:42:01 +1000 David Chinner [EMAIL PROTECTED] wrote: On Fri, Jul 06, 2007 at 12:26:23PM +0200, Jörn Engel wrote: On Fri, 6 July 2007 20:01:10 +1000, David Chinner wrote: On Fri, Jul 06, 2007 at 04:26:51AM +0200, Nick Piggin wrote: But, surprisingly enough, the

Re: Versioning file system

2007-07-05 Thread Chris Mason
On Thu, 5 Jul 2007 09:57:40 -0400 John Stoffel [EMAIL PROTECTED] wrote: Erik == Erik Mouw [EMAIL PROTECTED] writes: Erik (sorry for the late reply, just got back from holiday) Erik On Mon, Jun 18, 2007 at 01:29:56PM -0400, Theodore Tso wrote: As I mentioned in my Linux.conf.au

Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Chris Mason
On Tue, 3 Jul 2007 01:28:57 -0400 Xin Zhao [EMAIL PROTECTED] wrote: Hi, If a file is already opened when snapshot command is issued, the file itself could be in an inconsistent state already. Before the file is closed, maybe part of the file contains old data, the rest contains new

Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Chris Mason
On Tue, 3 Jul 2007 12:31:49 -0400 Xin Zhao [EMAIL PROTECTED] wrote: That's a good point! But this sounds hopeless to take a real consistent snapshot from app perspective unless you shutdown the computer. Right? Many different applications support some form of pausing in order to facilitate

Re: how do versioning filesystems take snapshot of opened files?

2007-07-03 Thread Chris Mason
On Tue, 3 Jul 2007 13:15:06 -0400 Xin Zhao [EMAIL PROTECTED] wrote: OK. From discussion above, can we reach a conclusion: from the application perspective, it is very hard, if not impossible, to take a transactional consistent snapshot without the help from applications? You definitely need

Re: [RFC] fsblock

2007-06-28 Thread Chris Mason
On Thu, Jun 28, 2007 at 04:44:43AM +0200, Nick Piggin wrote: On Thu, Jun 28, 2007 at 08:35:48AM +1000, David Chinner wrote: On Wed, Jun 27, 2007 at 07:50:56AM -0400, Chris Mason wrote: Lets look at a typical example of how IO actually gets done today, starting with sys_write

Re: [RFC] fsblock

2007-06-27 Thread Chris Mason
On Wed, Jun 27, 2007 at 07:32:45AM +0200, Nick Piggin wrote: On Tue, Jun 26, 2007 at 08:34:49AM -0400, Chris Mason wrote: On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote: On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote: [ ... fsblocks vs extent range mapping

Re: [patch 1/3] add the fsblock layer

2007-06-26 Thread Chris Mason
On Tue, Jun 26, 2007 at 01:07:43PM +1000, Nick Piggin wrote: Neil Brown wrote: On Tuesday June 26, [EMAIL PROTECTED] wrote: Chris Mason wrote: The block device pagecache isn't special, and certainly isn't that much code. I would suggest keeping it buffer head specific and making

Re: [RFC] fsblock

2007-06-26 Thread Chris Mason
On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote: On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote: [ ... fsblocks vs extent range mapping ] iomaps can double as range locks simply because iomaps are expressions of ranges within the file. Seeing as you can only

Re: vm/fs meetup in september?

2007-06-26 Thread Chris Mason
On Tue, Jun 26, 2007 at 12:35:09PM +1000, Nick Piggin wrote: Christoph Hellwig wrote: On Sun, Jun 24, 2007 at 06:23:45AM +0200, Nick Piggin wrote: I'd just like to take the chance also to ask about a VM/FS meetup some time around kernel summit (maybe take a big of time during UKUUG or so).

Re: [RFC] fsblock

2007-06-25 Thread Chris Mason
On Mon, Jun 25, 2007 at 04:58:48PM +1000, Nick Piggin wrote: Using buffer heads instead allows the FS to send file data down inside the transaction code, without taking the page lock. So, locking wrt data=ordered is definitely going to be tricky. The best long term option may be making

Re: [patch 1/3] add the fsblock layer

2007-06-25 Thread Chris Mason
On Mon, Jun 25, 2007 at 05:41:58PM +1000, Nick Piggin wrote: Neil Brown wrote: On Sunday June 24, [EMAIL PROTECTED] wrote: +#define PG_blocks 20 /* Page has block mappings */ + I've only had a very quick look, but this line looks *very* wrong. You should be using

Re: [patch 1/3] add the fsblock layer

2007-06-25 Thread Chris Mason
On Sun, Jun 24, 2007 at 03:46:13AM +0200, Nick Piggin wrote: Rewrite the buffer layer. Overall, I like the basic concepts, but it is hard to track the locking rules. Could you please write them up? I like the way you split out the assoc_buffers from the main fsblock code, but the list setup is

Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Chris Mason
On Thu, Jun 21, 2007 at 09:06:40PM -0400, James Morris wrote: On Thu, 21 Jun 2007, Chris Mason wrote: The incomplete mediation flows from the design, since the pathname-based mediation doesn't generalize to cover all objects unlike label- or attribute-based mediation. And the use

Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Chris Mason
On Fri, Jun 22, 2007 at 10:23:03AM -0400, James Morris wrote: On Fri, 22 Jun 2007, Chris Mason wrote: But, this is a completely different discussion than if AA is solving problems in the wild for its intended audience, or if the code is somehow flawed and breaking other parts

Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-21 Thread Chris Mason
On Thu, Jun 21, 2007 at 04:59:54PM -0400, Stephen Smalley wrote: On Thu, 2007-06-21 at 21:54 +0200, Lars Marowsky-Bree wrote: On 2007-06-21T15:42:28, James Morris [EMAIL PROTECTED] wrote: A veto is not a technical argument. All technical arguments (except for path name is ugly, yuk

Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-19 Thread Chris Mason
On Tue, Jun 19, 2007 at 10:11:13AM +0100, Pádraig Brady wrote: Vladislav Bolkhovitin wrote: I would also suggest one more feature: support for block level de-duplication. I mean: 1. Ability for Btrfs to have blocks in several files to point to the same block on disk 2. Support

Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-18 Thread Chris Mason
On Sat, Jun 16, 2007 at 11:31:47AM +0200, Florian D. wrote: Chris Mason wrote: Strange, these numbers are not quite what I was expecting ;) Could you please post your fio job files? Also, how much ram does the machine have? Only writing doesn't seem like enough to fill the ram

Re: Versioning file system

2007-06-18 Thread Chris Mason
On Mon, Jun 18, 2007 at 03:45:24AM -0600, Andreas Dilger wrote: Too bad everyone is spending time on 10 similar-but-slightly-different filesystems. This will likely end up with a bunch of filesystems that implement some easy subset of features, but will not get polished for users or have a

Updated Btrfs project site online

2007-06-18 Thread Chris Mason
Hello everyone, I've moved the Btrfs pages here: http://oss.oracle.com/projects/btrfs Which gives us a bugzilla, mailing lists, and a somewhat more orderly file download area. There are links to my HG trees for sources as well. The oss project area automagically creates a few different

Re: Updated Btrfs project site online -git repo?

2007-06-18 Thread Chris Mason
On Mon, Jun 18, 2007 at 09:53:39PM +0200, Maria Domenica Bertolucci wrote: Would it be possible to have a git repo as well so as to keep in sync with all git kernel projects? It also helps standardize things. Sorry, the repos will stay Mercurial based for now. These are small repos and not

Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-15 Thread Chris Mason
On Fri, Jun 15, 2007 at 09:08:38PM +0200, Florian D. wrote: Chris Mason wrote: is it possible to test it on top of LVM2 on RAID at this stage? Yes, I haven't done much multi-spindle testing yet, so I'm definitely interested in these numbers. -chris I did not get very far

Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-15 Thread Chris Mason
On Fri, Jun 15, 2007 at 10:46:04PM +0200, Florian D. wrote: Chris Mason wrote: # umount /mnt/temp/ [ 457.980372] [ cut here ] [ 457.980377] kernel BUG at fs/buffer.c:2644! Whoops. Please try this: [ bad patch ] sorry, with the patch applied

Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-15 Thread Chris Mason
On Sat, Jun 16, 2007 at 12:03:06AM +0200, Florian D. wrote: Chris Mason wrote: Well, apparently I get get the silly stuff wrong an infinite number of times. Sorry, lets try again: diff -r 38b36731 disk-io.c --- a/disk-io.c Fri Jun 15 13:50:20 2007 -0400 +++ b/disk-io.c

Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-14 Thread Chris Mason
On Thu, Jun 14, 2007 at 08:29:10PM +0200, Florian D. wrote: Chris Mason wrote: The basic list of features looks like this: [amazing stuff snipped] The current status is a very early alpha state, and the kernel code weighs in at a sparsely commented 10,547 lines. I'm releasing now

Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-13 Thread Chris Mason
On Wed, Jun 13, 2007 at 04:08:30AM +0100, Christoph Hellwig wrote: On Tue, Jun 12, 2007 at 04:14:39PM -0400, Chris Mason wrote: Aside from folding snapshot history into the origin's namespace... It could be possible to have a mount.btrfs that allows subvolumes and/or snapshot volumes

Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-13 Thread Chris Mason
On Tue, Jun 12, 2007 at 11:46:20PM -0400, John Stoffel wrote: Chris == Chris Mason [EMAIL PROTECTED] writes: Chris After the last FS summit, I started working on a new filesystem Chris that maintains checksums of all file data and metadata. Many Chris thanks to Zach Brown for his ideas

Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-13 Thread Chris Mason
On Wed, Jun 13, 2007 at 01:45:28AM -0400, Albert Cahalan wrote: Neat! It's great to see somebody else waking up to the idea that storage media is NOT to be trusted. Judging by the design paper, it looks like your structs have some alignment problems. Actual defs are all packed, but I may

Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-13 Thread Chris Mason
On Wed, Jun 13, 2007 at 10:00:56AM -0400, John Stoffel wrote: Chris == Chris Mason [EMAIL PROTECTED] writes: As a user of Netapps, having quotas (if only for reporting purposes) and some way to migrate non-used files to slower/cheaper storage would be great. Chris So far, I'm

Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-13 Thread Chris Mason
On Wed, Jun 13, 2007 at 12:12:23PM -0400, John Stoffel wrote: Chris == Chris Mason [EMAIL PROTECTED] writes: [ nod ] Also, I think you're wrong here when you state that making a snapshot (sub-volume?) RO just requires you to set the quota to 1 block. What is to stop me from writing 1 block

Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-13 Thread Chris Mason
On Wed, Jun 13, 2007 at 12:14:40PM -0400, Albert Cahalan wrote: On 6/13/07, Chris Mason [EMAIL PROTECTED] wrote: On Wed, Jun 13, 2007 at 01:45:28AM -0400, Albert Cahalan wrote: The usual wishlist: * inode-to-pathnames mapping This one I'll code, it will help with inode link count

Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-12 Thread Chris Mason
On Tue, Jun 12, 2007 at 03:53:03PM -0400, Mike Snitzer wrote: On 6/12/07, Chris Mason [EMAIL PROTECTED] wrote: Hello everyone, After the last FS summit, I started working on a new filesystem that maintains checksums of all file data and metadata. Many thanks to Zach Brown for his ideas

Re: [PATCH 1 of 2] block_page_mkwrite() Implementation V2

2007-05-16 Thread Chris Mason
On Wed, May 16, 2007 at 08:09:19PM +0800, David Woodhouse wrote: On Wed, 2007-05-16 at 11:19 +0100, David Howells wrote: The start and end points passed to block_prepare_write() delimit the region of the page that is going to be modified. This means that prepare_write() doesn't need to

Re: [PATCH 1 of 2] block_page_mkwrite() Implementation V2

2007-05-16 Thread Chris Mason
On Wed, May 16, 2007 at 11:04:11PM +1000, Nick Piggin wrote: Chris Mason wrote: On Wed, May 16, 2007 at 08:09:19PM +0800, David Woodhouse wrote: On Wed, 2007-05-16 at 11:19 +0100, David Howells wrote: The start and end points passed to block_prepare_write() delimit the region

Re: [PATCH 4 of 8] Add flags to control direct IO helpers

2007-02-08 Thread Chris Mason
On Thu, Feb 08, 2007 at 09:33:05AM +0530, Suparna Bhattacharya wrote: On Wed, Feb 07, 2007 at 01:05:44PM -0500, Chris Mason wrote: On Wed, Feb 07, 2007 at 10:38:45PM +0530, Suparna Bhattacharya wrote: + * The flags parameter is a bitmask of: + * + * DIO_PLACEHOLDERS (use placeholder

Re: [PATCH 1 of 2] Implement generic block_page_mkwrite() functionality

2007-02-08 Thread Chris Mason
On Thu, Feb 08, 2007 at 09:50:13AM +1100, David Chinner wrote: You don't need to lock out all truncation, but you do need to lock out truncation of the page in question. Instead of your i_size checks, check page-mapping isn't NULL after the lock_page? Yes, that can be done, but we still

[PATCH 7 of 8] Adapt XFS to the new blockdev_direct_IO calls

2007-02-07 Thread Chris Mason
XFS is changed to use blockdev_direct_IO flags instead of DIO_OWN_LOCKING. Signed-off-by: Chris Mason [EMAIL PROTECTED] diff -r 1ab8a2112a7d -r f53fd3802dc9 fs/xfs/linux-2.6/xfs_aops.c --- a/fs/xfs/linux-2.6/xfs_aops.c Tue Feb 06 20:02:56 2007 -0500 +++ b/fs/xfs/linux-2.6/xfs_aops.c

[PATCH 8 of 8] Avoid too many boundary buffers in DIO

2007-02-07 Thread Chris Mason
. DIO can't tell which part of the big region was a boundary, and so it may not be a good idea to trust the hint. This patch just clears the boundary bit after using it once. It is 10% faster for a streaming DIO write w/blocksize of 512k on my sata drive. Signed-off-by: Chris Mason [EMAIL PROTECTED

[PATCH 5 of 8] Make ext3 safe for the new DIO locking rules

2007-02-07 Thread Chris Mason
page locks). Signed-off-by: Chris Mason [EMAIL PROTECTED] diff -r 04dd7ddd593e -r 42596f5254ca fs/ext3/inode.c --- a/fs/ext3/inode.c Tue Feb 06 20:02:56 2007 -0500 +++ b/fs/ext3/inode.c Tue Feb 06 20:02:56 2007 -0500 @@ -1673,6 +1673,30 @@ static int ext3_releasepage(struct page return

[PATCH 1 of 8] Introduce a place holder page for the pagecache

2007-02-07 Thread Chris Mason
/filemap.c finds that bit set, searches for an index in the pagecache look forward to find any placeholders that index may intersect. Signed-off-by: Chris Mason [EMAIL PROTECTED] diff -r fc2d683623bb -r 7819e6e3f674 drivers/mtd/devices/block2mtd.c --- a/drivers/mtd/devices/block2mtd.c Sun Feb 04

[PATCH 6 of 8] Make reiserfs safe for new DIO locking rules

2007-02-07 Thread Chris Mason
reiserfs is changed to use a version of reiserfs_get_block that is safe for filling holes without i_mutex held. Signed-off-by: Chris Mason [EMAIL PROTECTED] diff -r 42596f5254ca -r 1ab8a2112a7d fs/reiserfs/inode.c --- a/fs/reiserfs/inode.c Tue Feb 06 20:02:56 2007 -0500 +++ b/fs/reiserfs

[PATCH 4 of 8] Add flags to control direct IO helpers

2007-02-07 Thread Chris Mason
. Filesystems that want to be special can pull out the bits of blockdev_direct_IO_flags they care about and then call direct_io_worker directly. Signed-off-by: Chris Mason [EMAIL PROTECTED] diff -r 1a7105ab9c19 -r 04dd7ddd593e fs/direct-io.c --- a/fs/direct-io.cTue Feb 06 20:02:55 2007 -0500 +++ b

Re: [PATCH 4 of 8] Add flags to control direct IO helpers

2007-02-07 Thread Chris Mason
On Wed, Feb 07, 2007 at 10:38:45PM +0530, Suparna Bhattacharya wrote: + * The flags parameter is a bitmask of: + * + * DIO_PLACEHOLDERS (use placeholder pages for locking) + * DIO_CREATE (pass create=1 to get_block for filling holes or extending) A little more explanation about why

  1   2   >