Re: [PATCH] E2fsprogs: add missing usage for No_COW

2012-06-13 Thread Ted Ts'o
On Wed, Jun 13, 2012 at 03:47:13PM +0800, Liu Bo wrote: > Add the missing usage for No_COW since we've supported No_COW flag. > > Signed-off-by: Liu Bo Applied, thanks. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs

Re: [PATCH v2] E2fsprogs: add missing usage for No_COW

2012-06-13 Thread Ted Ts'o
On Wed, Jun 13, 2012 at 04:56:42PM +0800, Liu Bo wrote: > Add the missing usage for No_COW since we've supported No_COW flag. > > Signed-off-by: Liu Bo Applied, although I changed the commit desciption to read: chattr: add the -C option to the usage message

Re: Btrfs and data nocow per inode basis

2012-06-12 Thread Ted Ts'o
... and e2fsprogs 1.42.4 has been released, with the No_COW lsattr and chattr support. It's in all of the usual places: ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/v1.42.4 and http://prdownloads.sourceforge.net/e2fsprogs/e2fsprogs-1.42.4.tar.gz ... and I've uploaded a release

Re: Btrfs and data nocow per inode basis

2012-06-12 Thread Ted Ts'o
On Tue, Jun 12, 2012 at 04:44:23PM -0400, Chris Mason wrote: > On Tue, Jun 12, 2012 at 01:15:27PM -0600, Ted Ts'o wrote: > > It appears the NOCOW_FL flag is currently a no-op in the 3.2 kernel? > > It's not a noop, but it is only setting the NODATACOW flag. It needs to

Re: Btrfs and data nocow per inode basis

2012-06-12 Thread Ted Ts'o
It appears the NOCOW_FL flag is currently a no-op in the 3.2 kernel? {/mnt} 2062# grep /mnt /proc/mounts /dev/mapper/funarg-btrfs /mnt btrfs rw,relatime,space_cache 0 0 {/mnt} 2063# sync ; filefrag -v a Filesystem type is: 9123683e File size of a is 32768 (8 blocks, blocksize 4096) ext log

Re: Btrfs and data nocow per inode basis

2012-06-12 Thread Ted Ts'o
On Tue, Jun 12, 2012 at 07:41:25PM +0200, Goffredo Baroncelli wrote: > > After a bit of googling I found a Liu Bo patches which add the ability > to set the NOCOW flags to a btrfs file.[1] > > However it seems that it was not present in the current (v1.42.3) > e2fsprogs suite. > > There is any r

Re: [PATCH] fs: make i_generation a u64

2012-04-12 Thread Ted Ts'o
On Wed, Apr 11, 2012 at 04:42:48PM -0400, Josef Bacik wrote: > Btrfs stores generation numbers as 64bit numbers, which means we have to > carry around a u64 in our incore inode in addition to setting i_generation. > So convert to a u64 so btrfs can kill it's incore generation. Thanks, > > Signed-

Re: getdents - ext4 vs btrfs performance

2012-03-18 Thread Ted Ts'o
On Thu, Mar 15, 2012 at 11:42:24AM +0100, Jacek Luczak wrote: > > That was not a SVN server. It was a build host having checkouts of SVN > projects. > > The many files/dirs case is common for VCS and the SVN is not the only > that would be affected here. Well, with SVN it's 2x or 3x the number

Re: getdents - ext4 vs btrfs performance

2012-03-14 Thread Ted Ts'o
On Wed, Mar 14, 2012 at 03:34:13PM +0100, Lukas Czerner wrote: > > > > You can make it be a RO_COMPAT change instead of an INCOMPAT change, > > yes. > > Does it have to be RO_COMPAT change though ? Since this would be both > forward and backward compatible. The challenge is how do you notice if

Re: getdents - ext4 vs btrfs performance

2012-03-14 Thread Ted Ts'o
On Wed, Mar 14, 2012 at 10:28:20AM -0400, Phillip Susi wrote: > > Do you really think it is that much easier? Even if it is easier, > it is still an ugly kludge. It would be much better to fix the > underlying problem rather than try to paper over it. I don't think the choice is obvious. A sol

Re: getdents - ext4 vs btrfs performance

2012-03-14 Thread Ted Ts'o
On Wed, Mar 14, 2012 at 10:17:37AM -0400, Zach Brown wrote: > > >We could do this if we have two b-trees, one indexed by filename and > >one indexed by inode number, which is what JFS (and I believe btrfs) > >does. > > Typically the inode number of the destination inode isn't used to index > entr

Re: getdents - ext4 vs btrfs performance

2012-03-14 Thread Ted Ts'o
On Wed, Mar 14, 2012 at 09:12:02AM +0100, Lukas Czerner wrote: > I kind of like the idea about having the separate btree with inode > numbers for the directory reading, just because it does not affect > allocation policy nor the write performance which is a good thing. Also > it has been done befor

Re: getdents - ext4 vs btrfs performance

2012-03-13 Thread Ted Ts'o
On Wed, Mar 14, 2012 at 10:48:17AM +0800, Yongqiang Yang wrote: > What if we use inode number as the hash value? Does it work? The whole point of using the tree structure is to accelerate filename -> inode number lookups. So the namei lookup doesn't have the inode number; the whole point is to u

Re: getdents - ext4 vs btrfs performance

2012-03-13 Thread Ted Ts'o
On Tue, Mar 13, 2012 at 04:22:52PM -0400, Phillip Susi wrote: > > I think a format change would be preferable to runtime sorting. Are you volunteering to spearhead the design and coding of such a thing? Run-time sorting is backwards compatible, and a heck of a lot easier to code and test... The

Re: getdents - ext4 vs btrfs performance

2012-03-13 Thread Ted Ts'o
On Tue, Mar 13, 2012 at 03:05:59PM -0400, Phillip Susi wrote: > Why not just separate the hash table from the conventional, mostly > in inode order directory entries? For instance, the first 200k of > the directory could be the normal entries that would tend to be in > inode order ( and e2fsck -D

Re: getdents - ext4 vs btrfs performance

2012-03-11 Thread Ted Ts'o
On Sun, Mar 11, 2012 at 04:30:37AM -0600, Andreas Dilger wrote: > > if the userspace process could > > feed us the exact set of filenames that will be used in the directory, > > plus the exact file sizes for each of the file names... > > Except POSIX doesn't allow anything close to this at all. S

Re: getdents - ext4 vs btrfs performance

2012-03-09 Thread Ted Ts'o
On Fri, Mar 09, 2012 at 04:09:43PM -0800, Andreas Dilger wrote: > > I have also run the correlation.py from Phillip Susi on directory with > > 10 4k files and indeed the name to block correlation in ext4 is pretty > > much random :) > > Just reading this on the plane, so I can't find the exact

Re: getdents - ext4 vs btrfs performance

2012-03-09 Thread Ted Ts'o
Hey Jacek, I'm curious parameters of the set of directories on your production server. On an ext4 file system, assuming you've copied the directories over, what are the result of this command pipeline when you are cd'ed into the top of the directory hierarchy of interest (your svn tree, as I reca

Re: getdents - ext4 vs btrfs performance

2012-03-02 Thread Ted Ts'o
On Fri, Mar 02, 2012 at 09:26:51AM -0500, Chris Mason wrote: > > filefrag will tell you how many extents each file has, any file with > more than one extent is interesting. (The ext4 crowd may have better > suggestions on measuring fragmentation). You can get a *huge* amount of information (prob

Re: getdents - ext4 vs btrfs performance

2012-03-01 Thread Ted Ts'o
On Thu, Mar 01, 2012 at 03:43:41PM +0100, Jacek Luczak wrote: > > Yep, ext4 is close to my wife's closet. > Were all of the file systems freshly laid down, or was this an aged ext4 file system? Also you should beware that if you have a workload which is heavy parallel I/O, with lots of random,

Re: A Plumber???s Wish List for Linux

2011-10-14 Thread Ted Ts'o
On Thu, Oct 13, 2011 at 11:28:39AM +1100, Dave Chinner wrote: > Yup. xfs_admin already provides an interface for offline > modification of the UUID for XFS filesytems. I.e. clone the > filesytem using xfs_copy, then run xfs_admin -U generate to > generate a new uuid in the cloned copy before you m

Re: [PATCH 0/8] remove i_alloc_sem

2011-06-22 Thread Ted Ts'o
On Wed, Jun 22, 2011 at 01:54:25AM +0200, Jan Kara wrote: > ext4_page_mkwrite()... Ted, what happened to that patch. Should I resend > it? So assuming I fix the refcounting issue in fs/ext4/page_io.c (which I will do not dropping the page's refcount until after the workqueue finishes its job), doe

Re: [PATCH] Check for immutable flag in fallocate path

2011-02-27 Thread Ted Ts'o
On Mon, Feb 21, 2011 at 05:50:21PM +0100, Marco Stornelli wrote: > 2011/2/21 Christoph Hellwig : > > On Mon, Feb 21, 2011 at 09:26:32AM +0100, Marco Stornelli wrote: > >> From: Marco Stornelli > >> > >> All fs must check for the immutable flag in their fallocate callback. > >> It's possible to hav

Re: [PATCH 1/6] fs: add hole punching to fallocate

2011-01-11 Thread Ted Ts'o
On Tue, Jan 11, 2011 at 04:13:42PM -0500, Lawrence Greenfield wrote: > > IOWs, all they want to do is avoid the unwritten extent conversion > > overhead. Time has shown that a bad security/performance tradeoff > > decision was made 13 years ago in XFS, so I see little reason to > > repeat it for ex

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-12 Thread Ted Ts'o
On Sun, Dec 12, 2010 at 07:11:28AM -0600, Jon Nelson wrote: > I'm glad you've been able to reproduce the problem! If you should need > any further assistance, please do not hesitate to ask. This patch seems to fix the problem for me. (Unless the partition is mounted with mblk_io_submit.) Could y

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-12 Thread Ted Ts'o
On Sun, Dec 12, 2010 at 04:18:29AM -0600, Jon Nelson wrote: > > I have one CPU configured in the environment, 512MB of memory. > > I have not done any memory-constriction tests whatsoever. I've finally been able to reproduce it myself, on real hardware. SMP is not necessary to reproduce it, altho

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-11 Thread Ted Ts'o
One experiment --- can you try this with the file system mounted with data=writeback, and see if the problem reproduces in that journalling mode? I want to rule out (if possible) journal_submit_inode_data_buffers() racing with mpage_da_submit_io(). I don't think that's the issue, but I'd prefer t

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-11 Thread Ted Ts'o
On Fri, Dec 10, 2010 at 08:14:56PM -0600, Jon Nelson wrote: > > Barring false negatives, bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc > > appears to be the culprit (according to git bisect). > > I will test bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc again, confirm > > the behavior, and work backwards to

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Ted Ts'o
On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote: > > Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179 > > from the tests I've done that one showed the least or no corruption if > you count the empty /etc/env.d/03opengl as an artefact Yes, that's a good test. Also try commit bd2

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Ted Ts'o
On Thu, Dec 09, 2010 at 12:10:58PM -0600, Jon Nelson wrote: > > You should be OK, there. Are you using encryption or no? > I had difficulty replicating the issue without encryption. Yes, I'm using encryption. LUKS with aes-xts-plain-sha256, and then LVM on top of LUKS. > > If you can point out

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Ted Ts'o
On Tue, Dec 07, 2010 at 09:37:20PM -0600, Jon Nelson wrote: > One difference is the location of the transaction logs (pg_xlog). In > my case, /var/lib/pgsql/data *is* mountpoint for the test volume > (actually, it's a symlink to the mount point). In your case, that is > not so. Perhaps that makes a

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Ted Ts'o
On Tue, Dec 07, 2010 at 01:22:43PM -0500, Mike Snitzer wrote: > > 1. create a database (from bash): > > > > createdb test > > > > 2. place the following contents in a file (I used 't.sql'): > > > > begin; > > create temporary table foo as select x as a, ARRAY[x] as b FROM > > generate_series(1,

Re: [dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Ted Ts'o
On Sun, Dec 05, 2010 at 02:44:14PM +0100, Matt wrote: > gcc version 4.5.1 (Gentoo Hardened 4.5.1-r1 p1.4, pie-0.4.5) This is probably just me being paranoid, but it might be worth trying using a gcc 4.4.x compiler and see if that makes any difference. There have been some other gcc 4.5-caused prob

Re: [patch] fix up lock order reversal in writeback

2010-11-17 Thread Ted Ts'o
On Wed, Nov 17, 2010 at 05:10:57PM +1100, Nick Piggin wrote: > On Tue, Nov 16, 2010 at 11:05:52PM -0600, Eric Sandeen wrote: > > On 11/16/10 10:38 PM, Nick Piggin wrote: > > >> as for the locking problems ... sorry about that! > > > > > > That's no problem. So is that an ack? :) > > > > > > > I'

Re: [PATCH 4/6] Ext4: fail if we try to use hole punch

2010-11-16 Thread Ted Ts'o
> >There is no simple way to test if a filesystem supports hole punching or not > >so > >the check has to be done per fs. Thanks, > > Could put a flag word in superblock_operations. Filesystems which > support punching (or other features) can enable it there. No, it couldn't be in super_operat

Re: [PATCH 1/6] fs: add hole punching to fallocate

2010-11-09 Thread Ted Ts'o
On Tue, Nov 09, 2010 at 03:42:42PM +1100, Dave Chinner wrote: > Implementation is up to the filesystem. However, XFS does (b) > because: > > 1) it was extremely simple to implement (one of the > advantages of having an exceedingly complex allocation > interface to begin wit

Re: [PATCH 1/6] fs: add hole punching to fallocate

2010-11-08 Thread Ted Ts'o
On Tue, Nov 09, 2010 at 12:12:22PM +1100, Dave Chinner wrote: > Hole punching was not included originally in fallocate() for a > variety of reasons. IIRC, they were along the lines of: > > 1 de-allocating of blocks in an allocation syscall is wrong. > People wanted a new syscall for

Re: BTRFS: Unbelievably slow with kvm/qemu

2010-09-02 Thread Ted Ts'o
On Tue, Aug 31, 2010 at 02:58:44PM -0700, K. Richard Pixley wrote: > On 20100831 14:46, Mike Fedyk wrote: > >There is little reason not to use duplicate metadata. Only small > >files (less than 2kb) get stored in the tree, so there should be no > >worries about images being duplicated without dat

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Ted Ts'o
On Wed, Aug 25, 2010 at 05:30:42PM -0700, David Rientjes wrote: > > We certainly hope that nobody will reimplement the same function without > the __deprecated warning, especially for order < PAGE_ALLOC_COSTLY_ORDER > where there's no looping at a higher level. So perhaps the best > alternativ

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Ted Ts&#x27;o
On Wed, Aug 25, 2010 at 04:11:38PM -0700, David Rientjes wrote: > > I'll repropose the patchset with __deprecated as you suggested. Thanks! And what Dave and I are saying is that we'll either need to do our on loop to avoid the deprecation warning, or the use of the deprecated function will prob

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Ted Ts&#x27;o
On Wed, Aug 25, 2010 at 03:35:42PM +0200, Peter Zijlstra wrote: > > While I appreciate that it might be somewhat (a lot) harder for a > filesystem to provide that guarantee, I'd be deeply worried about your > claim that its impossible. > > It would render a system without swap very prone to deadl

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Ted Ts&#x27;o
On Wed, Aug 25, 2010 at 01:35:32PM +0200, Peter Zijlstra wrote: > On Wed, 2010-08-25 at 07:24 -0400, Ted Ts'o wrote: > > Part of the problem is that we have a few places in the kernel where > > failure is really not an option --- or rather, if we're going to fail > &g

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Ted Ts&#x27;o
On Tue, Aug 24, 2010 at 01:11:26PM -0700, David Rientjes wrote: > On Tue, 24 Aug 2010, Jens Axboe wrote: > > > Should be possible to warn at build time for anyone using __GFP_NOFAIL > > without wrapping it in a function. > > > > We could make this __deprecated functions as Peter suggested if you