Re: [PATCH] xfstests 255: add a seek_data/seek_hole tester

2011-06-28 Thread Dave Chinner
On Tue, Jun 28, 2011 at 11:33:19AM -0400, Josef Bacik wrote:
> This is a test to make sure seek_data/seek_hole is acting like it does on
> Solaris.  It will check to see if the fs supports finding a hole or not and 
> will
> adjust as necessary.

So I just looked at this with an eye to validating an XFS
implementation, and I came up with this list of stuff that the test
does not cover that I'd need to test in some way:

- files with clean unwritten extents. Are they a hole or
  data? What's SEEK_DATA supposed to return on layout like
  hole-unwritten-data? i.e. needs to add fallocate to the
  picture...

- files with dirty unwritten extents (i.e. dirty in memory,
  not on disk). They are most definitely data, and most
  filesystems will need a separate lookup path to detect
  dirty unwritten ranges because the state is kept
  separately (page cache vs extent cache).  Plenty of scope
  for filesystem specific bugs here so needs a roubust test.

- cold cache behaviour - all dirty data ranges the test
  creates are hot in cache and not even forced to disk, so
  it is not testing the no-page-cache-over-the-data-range
  case. i.e. it tests delalloc state tracking but not
  data-extent-already exists lookups during a seek.

- assumes that allocation size is the block size and that
  holes follows block size alignment. We already know that
  ext4 does not follow that rule when doing small sparse
  writes close together in a file, and XFS is also known to
  fill holes when doing sparse writes past EOF.

- only tests single block data extents ѕo doesn't cover
  corner cases like skipping over multiple fragmented data
  extents to the next hole.

Some more comments in line

> +_cleanup()
> +{
> +rm -f $tmp.*
> +}
> +
> +trap "_cleanup ; exit \$status" 0 1 2 3 15
> +
> +# get standard environment, filters and checks
> +. ./common.rc
> +. ./common.filter
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +
> +testfile=$TEST_DIR/seek_test.$$
> +logfile=$TEST_DIR/seek_test.$$.log

The log file is usually named $seq.full, and doesn't get placed in
the filesystem being tested. It gets saved in the xfstests directory
along side $seq.out.bad for analysis whenteh test fails...

> +[ -x $here/src/seek-tester ] || _notrun "seek-tester not built"
> +
> +_cleanup()
> +{
> + rm -f $testfile
> + rm -f $logfile
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +echo "Silence is golden"
> +$here/src/seek-tester -q $testfile 2>&1 | tee -a $logfile

Personally I'd prefer the test to be a bit noisy about what it is
running, especially when there are so many subtests the single
invocation is running. It makes no difference to the run time ofthe
test, or the output when something fails, but it at least allows you
to run the test manually and see what it is doing easily...

> +
> +if grep -q "SEEK_HOLE is not supported" $logfile; then
> + _notrun "SEEK_HOLE/SEEK_DATA not supported by this kernel"
> +fi
> +
> +rm -f $logfile
> +rm -f $testfile
> +
> +status=0 ; exit
> diff --git a/255.out b/255.out
> new file mode 100644
> index 000..7eefb82
> --- /dev/null
> +++ b/255.out
> @@ -0,0 +1,2 @@
> +QA output created by 255
> +Silence is golden
> diff --git a/group b/group
> index 1f86075..c045e70 100644
> --- a/group
> +++ b/group
> @@ -368,3 +368,4 @@ deprecated
>  252 auto quick prealloc
>  253 auto quick
>  254 auto quick
> +255 auto quick

I'd suggest that rw and prealloc (once unwritten extent
testing is added) groups should also be defined for this test.

Otherwise, the test code looks ok if a bit over-engineered

> +struct testrec {
> + int test_num;
> + int (*test_func)(int fd, int testnum);
> + char*test_desc;
> +};
> +
> +struct testrec seek_tests[] = {
> + {  1, test01, "Test basic support" },
> + {  2, test02, "Test an empty file" },
> + {  3, test03, "Test a full file" },
> + {  4, test04, "Test file hole at beg, data at end" },
> + {  5, test05, "Test file data at beg, hole at end" },
> + {  6, test06, "Test file hole data hole data" },

So, to take from the hole punch test matrix, it covers a bunch more
file state transitions and cases that are just as relevant to
SEEK_HOLE/SEEK_DATA. Those cases are:

#   1. into a hole
#   2. into allocated space
#   3. into unwritten space
#   4. hole -> data
#   5. hole -> unwritten
#   6. data -> hole
#   7. data -> unwritten
#   8. unwritten -> hole
#   9. unwritten -> data
#   10. hole -> data -> hole
#   11. data -> hole -> data
#   12. unwritten -> data -> unwritten
#   13. data -> unwritten -> data
#   14. data -> hole @ EOF
#   15. data -> hole @ 0
#   16. data -> cache cold ->hole
#   17. data -> hole in single block file

I

Re: Snapshot reconciliation

2011-06-28 Thread Li Zefan
João Eduardo Luís wrote:
> Hello.
> 
> Can anyone think of a simple way to copy a set of pages from a given file 
> (which may or may not be scattered throughout multiple extents) from a 
> snapshot to correct pages within another file on another snapshot?
> 
> This might sound silly, but the whole purpose is to create some sort of 
> reconciliation method between divergent snapshots taken from the same 
> original subvolume.
> 

How about the file clone ioctl? It won't copy data, but it makes the dest file 
points
to the same extents of the source file.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 2/8] btrfs: Cancel filesystem balance

2011-06-28 Thread Li Zefan
Hugo Mills wrote:
> This patch adds an ioctl for cancelling a btrfs balance operation
> mid-flight. The ioctl simply sets a flag, and the operation terminates
> after the current block group move has completed.
> 
> Signed-off-by: Hugo Mills 
> ---
>  fs/btrfs/ctree.h   |1 +
>  fs/btrfs/ioctl.c   |   28 
>  fs/btrfs/ioctl.h   |1 +
>  fs/btrfs/volumes.c |7 ++-
>  4 files changed, 36 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 25aa3cf..5031085 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -876,6 +876,7 @@ struct btrfs_block_group_cache {
>  struct btrfs_balance_info {
>   u32 expected;
>   u32 completed;
> + int cancel_pending;
>  };
>  
>  struct reloc_control;
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 5ddf816..d4458d0 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2868,6 +2868,32 @@ error:
>   return ret;
>  }
>  
> +/*
> + * Cancel a running balance operation
> + */
> +long btrfs_ioctl_balance_cancel(struct btrfs_fs_info *fs_info)
> +{
> + int err = 0;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + spin_lock(&fs_info->balance_info_lock);
> + if (!fs_info->balance_info) {
> + err = -EINVAL;
> + goto error;
> + }
> + if (fs_info->balance_info->cancel_pending) {
> + err = -ECANCELED;
> + goto error;
> + }
> + fs_info->balance_info->cancel_pending = 1;
> +
> +error:
> + spin_unlock(&fs_info->balance_info_lock);
> + return err;
> +}
> +
>  long btrfs_ioctl(struct file *file, unsigned int
>   cmd, unsigned long arg)
>  {
> @@ -2915,6 +2941,8 @@ long btrfs_ioctl(struct file *file, unsigned int
>   return btrfs_balance(root->fs_info->dev_root);
>   case BTRFS_IOC_BALANCE_PROGRESS:
>   return btrfs_ioctl_balance_progress(root->fs_info, argp);
> + case BTRFS_IOC_BALANCE_CANCEL:
> + return btrfs_ioctl_balance_cancel(root->fs_info);
>   case BTRFS_IOC_CLONE:
>   return btrfs_ioctl_clone(file, arg, 0, 0, 0);
>   case BTRFS_IOC_CLONE_RANGE:
> diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
> index 575b25f..edcbe61 100644
> --- a/fs/btrfs/ioctl.h
> +++ b/fs/btrfs/ioctl.h
> @@ -255,4 +255,5 @@ struct btrfs_ioctl_balance_progress {
>  struct btrfs_ioctl_fs_info_args)
>  #define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 32, \
> struct btrfs_ioctl_balance_progress)
> +#define BTRFS_IOC_BALANCE_CANCEL _IO(BTRFS_IOCTL_MAGIC, 33)
>  #endif
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 4c0a386..f38b231 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2049,6 +2049,7 @@ int btrfs_balance(struct btrfs_root *dev_root)
>   bal_info->expected = -1; /* One less than actually counted,
>   because chunk 0 is special */
>   bal_info->completed = 0;
> + bal_info->cancel_pending = 0;
>   spin_unlock(&dev_root->fs_info->balance_info_lock);
>  
>   /* step one make some room on all the devices */
> @@ -2109,7 +2110,7 @@ int btrfs_balance(struct btrfs_root *dev_root)
>   key.offset = (u64)-1;
>   key.type = BTRFS_CHUNK_ITEM_KEY;
>  
> - while (1) {
> + while (!bal_info->cancel_pending) {
>   ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0);
>   if (ret < 0)
>   goto error;
> @@ -2149,6 +2150,10 @@ int btrfs_balance(struct btrfs_root *dev_root)
>  bal_info->completed, bal_info->expected);
>   }
>   ret = 0;
> + if (bal_info->cancel_pending) {
> + printk(KERN_INFO "btrfs: balance cancelled\n");
> + ret = -EINTR;
> + }

Why not detect if there's any pending signal in the while loop? so
we can just use Ctrl+C to cancel balance.

>  error:
>   btrfs_free_path(path);
>   spin_lock(&dev_root->fs_info->balance_info_lock);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] Btrfs: fix error check of btrfs_lookup_dentry()

2011-06-28 Thread Tsutomu Itoh
The return value of btrfs_lookup_dentry is checked so that
the panic such as illegal address reference should not occur.

Signed-off-by: Tsutomu Itoh 
---
V1->V2: unnecessary BUG_ON was deleted
V2->V3: to return -ENOENT instead of NULL when no entry was found,
return value of btrfs_lookup_dentry is changed.

 fs/btrfs/inode.c |   10 +++---
 fs/btrfs/ioctl.c |   10 --
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 447612d..9210c60 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4025,7 +4025,7 @@ struct inode *btrfs_lookup_dentry(struct inode *dir, 
struct dentry *dentry)
return ERR_PTR(ret);
 
if (location.objectid == 0)
-   return NULL;
+   return ERR_PTR(-ENOENT);
 
if (location.type == BTRFS_INODE_ITEM_KEY) {
inode = btrfs_iget(dir->i_sb, &location, root, NULL);
@@ -4080,8 +4080,12 @@ static struct dentry *btrfs_lookup(struct inode *dir, 
struct dentry *dentry,
struct inode *inode;
 
inode = btrfs_lookup_dentry(dir, dentry);
-   if (IS_ERR(inode))
-   return ERR_CAST(inode);
+   if (IS_ERR(inode)) {
+   if (PTR_ERR(inode) == -ENOENT)
+   inode = NULL;
+   else
+   return ERR_CAST(inode);
+   }
 
return d_splice_alias(inode, dentry);
 }
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a3c4751..981084d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -325,6 +325,7 @@ static noinline int create_subvol(struct btrfs_root *root,
struct btrfs_root *new_root;
struct dentry *parent = dget_parent(dentry);
struct inode *dir;
+   struct inode *inode;
int ret;
int err;
u64 objectid;
@@ -437,7 +438,13 @@ static noinline int create_subvol(struct btrfs_root *root,
 
BUG_ON(ret);
 
-   d_instantiate(dentry, btrfs_lookup_dentry(dir, dentry));
+   inode = btrfs_lookup_dentry(dir, dentry);
+   if (IS_ERR(inode)) {
+   ret = PTR_ERR(inode);
+   goto fail;
+   }
+
+   d_instantiate(dentry, inode);
 fail:
dput(parent);
if (async_transid) {
@@ -511,7 +518,6 @@ static int create_snapshot(struct btrfs_root *root, struct 
dentry *dentry,
ret = PTR_ERR(inode);
goto fail;
}
-   BUG_ON(!inode);
d_instantiate(dentry, inode);
ret = 0;
 fail:


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix error check of btrfs_lookup_dentry()

2011-06-28 Thread Tsutomu Itoh
(2011/06/28 23:22), Josef Bacik wrote:
> On 06/27/2011 11:34 PM, Tsutomu Itoh wrote:
>> The return value of btrfs_lookup_dentry is checked so that
>> the panic such as illegal address reference should not occur.
>>
>> Signed-off-by: Tsutomu Itoh 
> 
> Nack, please fix btrfs_lookup_dentry to return ERR_PTR(-ENOENT) if it
> doesn't find something.  Thanks,

OK, I will repost soon.

Thanks,
Tsutomu

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Remove BUG_ON's from btrfs_update_root

2011-06-28 Thread Mark Fasheh
Instead, have it pass those errors back to the callers. Most callers right
now actually BUG_ON the return code anyway so behavior hasn't really changed
in those cases. Others (such as btrfs_sync_log()) try to handle the error
returned. btrfs_ioctl_subvol_setflags() ignores the error today. In order to
maintain behavior I placed a BUG_ON clause there - at least though it's now
at a higher level in the code.

Signed-off-by: Mark Fasheh 
---
 fs/btrfs/ioctl.c |1 +
 fs/btrfs/root-tree.c |6 --
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a3c4751..6ebd282 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1437,6 +1437,7 @@ static noinline int btrfs_ioctl_subvol_setflags(struct 
file *file,
 
ret = btrfs_update_root(trans, root->fs_info->tree_root,
&root->root_key, &root->root_item);
+   BUG_ON(ret);
 
btrfs_commit_transaction(trans, root);
 out_reset:
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
index ebe4544..ea96ab8 100644
--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -94,7 +94,9 @@ int btrfs_update_root(struct btrfs_trans_handle *trans, 
struct btrfs_root
unsigned long ptr;
 
path = btrfs_alloc_path();
-   BUG_ON(!path);
+   if (!path)
+   return -ENOMEM;
+
ret = btrfs_search_slot(trans, root, key, path, 0, 1);
if (ret < 0)
goto out;
@@ -104,7 +106,7 @@ int btrfs_update_root(struct btrfs_trans_handle *trans, 
struct btrfs_root
printk(KERN_CRIT "unable to update root key %llu %u %llu\n",
   (unsigned long long)key->objectid, key->type,
   (unsigned long long)key->offset);
-   BUG_ON(1);
+   goto out;
}
 
l = path->nodes[0];
-- 
1.7.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 7/8] btrfs: Replication-type information

2011-06-28 Thread Ilya Dryomov
On Tue, Jun 28, 2011 at 08:26:43PM +0100, Hugo Mills wrote:
> On Tue, Jun 28, 2011 at 06:32:43PM +0200, David Sterba wrote:
> > On Sun, Jun 26, 2011 at 09:36:54PM +0100, Hugo Mills wrote:
> > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > > index 828aa34..fb11550 100644
> > > --- a/fs/btrfs/volumes.c
> > > +++ b/fs/btrfs/volumes.c
> > > @@ -117,6 +117,52 @@ static void requeue_list(struct btrfs_pending_bios 
> > > *pending_bios,
> > >   pending_bios->tail = tail;
> > >  }
> > >  
> > > +void btrfs_get_replication_info(struct btrfs_replication_info *info,
> > > + u64 type)
> > > +{
> > > + info->sub_stripes = 1;
> > > + info->dev_stripes = 1;
> > > + info->devs_increment = 1;
> > > + info->num_copies = 1;
> > > + info->devs_max = 0; /* 0 == as many as possible */
> > > + info->devs_min = 1;
> > > +
> > > + if (type & BTRFS_BLOCK_GROUP_DUP) {
> > > + info->dev_stripes = 2;
> > > + info->num_copies = 2;
> > > + info->devs_max = 1;
> > > + } else if (type & BTRFS_BLOCK_GROUP_RAID0) {
> > > + info->devs_min = 2;
> > > + } else if (type & BTRFS_BLOCK_GROUP_RAID1) {
> > > + info->devs_increment = 2;
> > > + info->num_copies = 2;
> > > + info->devs_max = 2;
> > > + info->devs_min = 2;
> > > + } else if (type & BTRFS_BLOCK_GROUP_RAID10) {
> > > + info->sub_stripes = 2;
> > > + info->devs_increment = 2;
> > > + info->num_copies = 2;
> > > + info->devs_min = 4;
> > > + }
> > > +
> > > + if (type & BTRFS_BLOCK_GROUP_DATA) {
> > > + info->max_stripe_size = 1024 * 1024 * 1024;
> > > + info->min_stripe_size = 64 * 1024 * 1024;
> > > + info->max_chunk_size = 10 * info->max_stripe_size;
> > > + } else if (type & BTRFS_BLOCK_GROUP_METADATA) {
> > > + info->max_stripe_size = 256 * 1024 * 1024;
> > > + info->min_stripe_size = 32 * 1024 * 1024;
> > > + info->max_chunk_size = info->max_stripe_size;
> > > + } else if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
> > > + info->max_stripe_size = 8 * 1024 * 1024;
> > > + info->min_stripe_size = 1 * 1024 * 1024;
> > > + info->max_chunk_size = 2 * info->max_stripe_size;
> > > + } else {
> > > + printk(KERN_ERR "Block group is of an unknown usage type: not 
> > > data, metadata or system.\n");
> > > + BUG_ON(1);
> 
>From inspection, this looks like it's a viable solution:
> 
> +   info->max_stripe_size = 0;
> +   info->min_stripe_size = -1ULL;
> +   info->max_chunk_size = 0;
> 
> We only run into problems if a user of this function passes a
> RAID-only block group type and then tries to use the size parameters
> from it. There's only three users of the function currently, and this
> case is the only one that doesn't pass a "real" block group type flag.
> 
>I'll run a quick test of dev rm and see what happens...

[ I didn't apply or run this series, take this with a grain of salt ]

The problem seems to be that Hugo's function expects on-disk chunk type
as it's input.  However avail_{data,metadata,system}_alloc_bits (of
which all_avail is comprised) are in-memory fields, they don't have
BTRFS_BLOCK_GROUP_{DATA,METADATA_SYSTEM} set by design.  There are three
fields:

avail_data_alloc_bits
avail_metadata_alloc_bits
avail_system_alloc_bits

so we don't need BTRFS_BLOCK_GROUP_{DATA,METADATA_SYSTEM} set to
differentiate between data and metadata profiles.

I'd say that BUG_ON should be dropped and those three lines above added
or maybe a special switch for this particular case to leave info
partially un-initialized, since we only need devs_min in this case.

Thanks,

Ilya

> > I'm hitting this BUG_ON with 'btrfs device delete', type = 24 which is
> > BTRFS_BLOCK_GROUP_RAID0 + BTRFS_BLOCK_GROUP_RAID1 .
> > 
> > in btrfs_rm_device:
> > 
> > 1277 all_avail = root->fs_info->avail_data_alloc_bits |
> > 1278 root->fs_info->avail_system_alloc_bits |
> > 1279 root->fs_info->avail_metadata_alloc_bits;
> > 
> > the values before the call are:
> > 
> > [  105.107074] D: all_avail 24
> > [  105.111844] D: root->fs_info->avail_data_alloc_bits 8
> > [  105.118858] D: root->fs_info->avail_system_alloc_bits 16
> > [  105.126110] D: root->fs_info->avail_metadata_alloc_bits 16
> > 
> > 
> > there are 5 devices, sdb5 - sdb9, i'm removing sdb9, after clean
> > mount.
> > 
> > 
> > david
> 
>Hugo.
> 
> -- 
> === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
>   PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
>--- vi vi vi:  the Editor of the Beast. ---   


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 7/8] btrfs: Replication-type information

2011-06-28 Thread Hugo Mills
On Tue, Jun 28, 2011 at 06:32:43PM +0200, David Sterba wrote:
> On Sun, Jun 26, 2011 at 09:36:54PM +0100, Hugo Mills wrote:
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 828aa34..fb11550 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -117,6 +117,52 @@ static void requeue_list(struct btrfs_pending_bios 
> > *pending_bios,
> > pending_bios->tail = tail;
> >  }
> >  
> > +void btrfs_get_replication_info(struct btrfs_replication_info *info,
> > +   u64 type)
> > +{
> > +   info->sub_stripes = 1;
> > +   info->dev_stripes = 1;
> > +   info->devs_increment = 1;
> > +   info->num_copies = 1;
> > +   info->devs_max = 0; /* 0 == as many as possible */
> > +   info->devs_min = 1;
> > +
> > +   if (type & BTRFS_BLOCK_GROUP_DUP) {
> > +   info->dev_stripes = 2;
> > +   info->num_copies = 2;
> > +   info->devs_max = 1;
> > +   } else if (type & BTRFS_BLOCK_GROUP_RAID0) {
> > +   info->devs_min = 2;
> > +   } else if (type & BTRFS_BLOCK_GROUP_RAID1) {
> > +   info->devs_increment = 2;
> > +   info->num_copies = 2;
> > +   info->devs_max = 2;
> > +   info->devs_min = 2;
> > +   } else if (type & BTRFS_BLOCK_GROUP_RAID10) {
> > +   info->sub_stripes = 2;
> > +   info->devs_increment = 2;
> > +   info->num_copies = 2;
> > +   info->devs_min = 4;
> > +   }
> > +
> > +   if (type & BTRFS_BLOCK_GROUP_DATA) {
> > +   info->max_stripe_size = 1024 * 1024 * 1024;
> > +   info->min_stripe_size = 64 * 1024 * 1024;
> > +   info->max_chunk_size = 10 * info->max_stripe_size;
> > +   } else if (type & BTRFS_BLOCK_GROUP_METADATA) {
> > +   info->max_stripe_size = 256 * 1024 * 1024;
> > +   info->min_stripe_size = 32 * 1024 * 1024;
> > +   info->max_chunk_size = info->max_stripe_size;
> > +   } else if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
> > +   info->max_stripe_size = 8 * 1024 * 1024;
> > +   info->min_stripe_size = 1 * 1024 * 1024;
> > +   info->max_chunk_size = 2 * info->max_stripe_size;
> > +   } else {
> > +   printk(KERN_ERR "Block group is of an unknown usage type: not 
> > data, metadata or system.\n");
> > +   BUG_ON(1);

   From inspection, this looks like it's a viable solution:

+   info->max_stripe_size = 0;
+   info->min_stripe_size = -1ULL;
+   info->max_chunk_size = 0;

We only run into problems if a user of this function passes a
RAID-only block group type and then tries to use the size parameters
from it. There's only three users of the function currently, and this
case is the only one that doesn't pass a "real" block group type flag.

   I'll run a quick test of dev rm and see what happens...

> I'm hitting this BUG_ON with 'btrfs device delete', type = 24 which is
> BTRFS_BLOCK_GROUP_RAID0 + BTRFS_BLOCK_GROUP_RAID1 .
> 
> in btrfs_rm_device:
> 
> 1277 all_avail = root->fs_info->avail_data_alloc_bits |
> 1278 root->fs_info->avail_system_alloc_bits |
> 1279 root->fs_info->avail_metadata_alloc_bits;
> 
> the values before the call are:
> 
> [  105.107074] D: all_avail 24
> [  105.111844] D: root->fs_info->avail_data_alloc_bits 8
> [  105.118858] D: root->fs_info->avail_system_alloc_bits 16
> [  105.126110] D: root->fs_info->avail_metadata_alloc_bits 16
> 
> 
> there are 5 devices, sdb5 - sdb9, i'm removing sdb9, after clean
> mount.
> 
> 
> david

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- vi vi vi:  the Editor of the Beast. ---   


signature.asc
Description: Digital signature


[Patch] make btrfs cross compilation friendly

2011-06-28 Thread Kamble, Nitin A
Attached is a patch to fix a cross compilation issue I observed with 
btrfs-tools. 

Signed-Off-By: Nitin A Kamble 

Nitin A Kamble
Yocto Project 
www.yoctoproject.org



Avoid these kinds of errors while doing cross build:

| ccache i586-poky-linux-gcc -march=i586 
--sysroot=/disk0/pokybuild/build0/tmp/sysroots/qemux86 
-Wp,-MMD,./.btrfsctl.o.d,-MT,btrfsctl.o -Wall -D_FILE_OFFSET_BITS=64 
-D_FORTIFY_SOURCE=2 -O2 -pipe -g -feliminate-unused-debug-types -c btrfsctl.c
| gcc -O2 -pipe -g -feliminate-unused-debug-types -o btrfsctl btrfsctl.o 
ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o root-tree.o 
dir-item.o file-item.o inode-item.o inode-map.o crc32c.o rbtree.o 
extent-cache.o extent_io.o volumes.o utils.o btrfs-list.o -Wl,-O1  
-Wl,--as-needed -luuid
| /usr/bin/ld: i386 architecture of input file `btrfsctl.o' is incompatible 
with i386:x86-64 output
| /usr/bin/ld: i386 architecture of input file `ctree.o' is incompatible with 
i386:x86-64 output

Index: git/Makefile
===
--- git.orig/Makefile
+++ git/Makefile
@@ -38,53 +38,53 @@ version:
bash version.sh
 
 btrfs: $(objects) btrfs.o btrfs_cmds.o
-   gcc $(CFLAGS) -o btrfs btrfs.o btrfs_cmds.o \
+   $(CC) $(CFLAGS) -o btrfs btrfs.o btrfs_cmds.o \
$(objects) $(LDFLAGS) $(LIBS)
 
 btrfsctl: $(objects) btrfsctl.o
-   gcc $(CFLAGS) -o btrfsctl btrfsctl.o $(objects) $(LDFLAGS) $(LIBS)
+   $(CC) $(CFLAGS) -o btrfsctl btrfsctl.o $(objects) $(LDFLAGS) $(LIBS)
 
 btrfs-vol: $(objects) btrfs-vol.o
-   gcc $(CFLAGS) -o btrfs-vol btrfs-vol.o $(objects) $(LDFLAGS) $(LIBS)
+   $(CC) $(CFLAGS) -o btrfs-vol btrfs-vol.o $(objects) $(LDFLAGS) $(LIBS)
 
 btrfs-show: $(objects) btrfs-show.o
-   gcc $(CFLAGS) -o btrfs-show btrfs-show.o $(objects) $(LDFLAGS) $(LIBS)
+   $(CC) $(CFLAGS) -o btrfs-show btrfs-show.o $(objects) $(LDFLAGS) $(LIBS)
 
 btrfsck: $(objects) btrfsck.o
-   gcc $(CFLAGS) -o btrfsck btrfsck.o $(objects) $(LDFLAGS) $(LIBS)
+   $(CC) $(CFLAGS) -o btrfsck btrfsck.o $(objects) $(LDFLAGS) $(LIBS)
 
 mkfs.btrfs: $(objects) mkfs.o
-   gcc $(CFLAGS) -o mkfs.btrfs $(objects) mkfs.o $(LDFLAGS) $(LIBS)
+   $(CC) $(CFLAGS) -o mkfs.btrfs $(objects) mkfs.o $(LDFLAGS) $(LIBS)
 
 btrfs-debug-tree: $(objects) debug-tree.o
-   gcc $(CFLAGS) -o btrfs-debug-tree $(objects) debug-tree.o $(LDFLAGS) 
$(LIBS)
+   $(CC) $(CFLAGS) -o btrfs-debug-tree $(objects) debug-tree.o $(LDFLAGS) 
$(LIBS)
 
 btrfs-zero-log: $(objects) btrfs-zero-log.o
-   gcc $(CFLAGS) -o btrfs-zero-log $(objects) btrfs-zero-log.o $(LDFLAGS) 
$(LIBS)
+   $(CC) $(CFLAGS) -o btrfs-zero-log $(objects) btrfs-zero-log.o 
$(LDFLAGS) $(LIBS)
 
 btrfs-select-super: $(objects) btrfs-select-super.o
-   gcc $(CFLAGS) -o btrfs-select-super $(objects) btrfs-select-super.o 
$(LDFLAGS) $(LIBS)
+   $(CC) $(CFLAGS) -o btrfs-select-super $(objects) btrfs-select-super.o 
$(LDFLAGS) $(LIBS)
 
 btrfstune: $(objects) btrfstune.o
-   gcc $(CFLAGS) -o btrfstune $(objects) btrfstune.o $(LDFLAGS) $(LIBS)
+   $(CC) $(CFLAGS) -o btrfstune $(objects) btrfstune.o $(LDFLAGS) $(LIBS)
 
 btrfs-map-logical: $(objects) btrfs-map-logical.o
-   gcc $(CFLAGS) -o btrfs-map-logical $(objects) btrfs-map-logical.o 
$(LDFLAGS) $(LIBS)
+   $(CC) $(CFLAGS) -o btrfs-map-logical $(objects) btrfs-map-logical.o 
$(LDFLAGS) $(LIBS)
 
 btrfs-image: $(objects) btrfs-image.o
-   gcc $(CFLAGS) -o btrfs-image $(objects) btrfs-image.o -lpthread -lz 
$(LDFLAGS) $(LIBS)
+   $(CC) $(CFLAGS) -o btrfs-image $(objects) btrfs-image.o -lpthread -lz 
$(LDFLAGS) $(LIBS)
 
 dir-test: $(objects) dir-test.o
-   gcc $(CFLAGS) -o dir-test $(objects) dir-test.o $(LDFLAGS) $(LIBS)
+   $(CC) $(CFLAGS) -o dir-test $(objects) dir-test.o $(LDFLAGS) $(LIBS)
 
 quick-test: $(objects) quick-test.o
-   gcc $(CFLAGS) -o quick-test $(objects) quick-test.o $(LDFLAGS) $(LIBS)
+   $(CC) $(CFLAGS) -o quick-test $(objects) quick-test.o $(LDFLAGS) $(LIBS)
 
 convert: $(objects) convert.o
-   gcc $(CFLAGS) -o btrfs-convert $(objects) convert.o -lext2fs -lcom_err 
$(LDFLAGS) $(LIBS)
+   $(CC) $(CFLAGS) -o btrfs-convert $(objects) convert.o -lext2fs 
-lcom_err $(LDFLAGS) $(LIBS)
 
 ioctl-test: $(objects) ioctl-test.o
-   gcc $(CFLAGS) -o ioctl-test $(objects) ioctl-test.o $(LDFLAGS) $(LIBS)
+   $(CC) $(CFLAGS) -o ioctl-test $(objects) ioctl-test.o $(LDFLAGS) $(LIBS)
 
 manpages:
cd man; make


fix_use_of_gcc.patch
Description: fix_use_of_gcc.patch


Re: Snapshot reconciliation

2011-06-28 Thread João Eduardo Luís
On Jun 28, 2011, at 4:07 PM, C Anthony Risinger wrote:

> 2011/6/28 João Eduardo Luís :
>> Hello.
>> 
>> Can anyone think of a simple way to copy a set of pages from a given file 
>> (which may or may not be scattered throughout multiple extents) from a 
>> snapshot to correct pages within another file on another snapshot?
>> 
>> This might sound silly, but the whole purpose is to create some sort of 
>> reconciliation method between divergent snapshots taken from the same 
>> original subvolume.
> 
> generic deduplication?
> 

I'm not sure if deduplication is what I'm looking for.

What I actually want to achieve is to reconstruct a file's data from two 
diverging files. I.e., two snapshots are taken from the same subvolume and, in 
each snapshot, a given file A is written to. Assuming different blocks were 
written on, and no expected semantics are violated, what I aim to achieve is 
the correct reconciliation of file A in one of the snapshots.

Maybe this could be achieved by using deduplication. I'll look into those 
patches. Even if they are not completely useful, they very well contain some 
neat concept that may be used to solve this little puzzle of mine. :-)

Thanks.

---
João Eduardo Luís
gpg key: 477C26E5 from pool.keyserver.eu 







PGP.sig
Description: This is a digitally signed message part


Re: parent transid verify failures on 2.6.39

2011-06-28 Thread Daniel Witzel
Thanks for the reply, Copied the patch from the "diff" line onwards and patched
against a  fresh kernel 2.6.39-r1 and r2 tree with same result:

localhost linux # patch --dry-run --verbose -p1 < disk-io.patch 
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--
|diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
|index c650a1d..53e330e 100644
|--- a/fs/btrfs/disk-io.c
|+++ b/fs/btrfs/disk-io.c
--
Patching file fs/btrfs/disk-io.c using Plan A...
Hunk #1 succeeded at 281.
Hunk #2 succeeded at 296.
Hunk #3 succeeded at 328.
Hunk #4 succeeded at 338.
Hunk #5 succeeded at 360.
Hunk #6 succeeded at 2012.
Hunk #7 succeeded at 2631.
done


same problem. Any other ideas would be great

Dan Witzel



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent transid verify failures on 2.6.39

2011-06-28 Thread Daniel Witzel
Thanks for the reply. copied the patch from the line "diff" onward 
did a fresh kernel tree and got the following (same on 2.6.39-r1 and r2)

localhost linux# patch -p1 --dry-run --verbose < disk-io.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--
|diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
|index c650a1d..53e330e 100644
|--- a/fs/btrfs/disk-io.c
|+++ b/fs/btrfs/disk-io.c
--
Patching file fs/btrfs/disk-io.c using Plan A...
Hunk #1 succeeded at 281.
Hunk #2 succeeded at 296.
Hunk #3 succeeded at 328.
Hunk #4 succeeded at 338.
Hunk #5 succeeded at 360.
Hunk #6 succeeded at 2012.
Hunk #7 succeeded at 2631.
done


A perfect patch job if I say so :)

any other ideas are welcome 

Dan Witzel



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent transid verify failures on 2.6.39

2011-06-28 Thread Mitch Harder
On Tue, Jun 28, 2011 at 10:46 AM, Daniel Witzel  wrote:
> Earlier I tried the read only patch with no result. Josef said I must be
> applying it wrong because the error I get is not possible with the patch 
> applied.
> I tried again with no luck and posted my steps for review. Well here I am a 
> few
> days later with the following questions:
>
> 1) If my steps are correct what else could be the problem
> 2) if my steps are wrong what do i need to do to get it right
>
> Any help would be awesome
>
> Thanks
> Dan Witzel
>

I just used this patch yesterday to help with a slightly different corruption.

I know the patch didn't apply cleanly for me, and I had to massage it.

You may want to manually audit disk-io.c to make sure the entire patch
is applied.

I know if I try to apply this patch to my 2.6.39.1 kernel, it fails.

# patch -p1 --dry-run <
/mnt/local/local/dontpanic/parent-transid-verify-failures-on-2.6.39.patch
patching file fs/btrfs/disk-io.c
Hunk #2 FAILED at 296.
Hunk #3 succeeded at 321 (offset -2 lines).
Hunk #4 succeeded at 331 (offset -2 lines).
Hunk #5 succeeded at 353 (offset -2 lines).
Hunk #6 succeeded at 1993 (offset -14 lines).
Hunk #7 succeeded at 2629 (offset 3 lines).
1 out of 7 hunks FAILED -- saving rejects to file fs/btrfs/disk-io.c.rej
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 7/8] btrfs: Replication-type information

2011-06-28 Thread David Sterba
On Sun, Jun 26, 2011 at 09:36:54PM +0100, Hugo Mills wrote:
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 828aa34..fb11550 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -117,6 +117,52 @@ static void requeue_list(struct btrfs_pending_bios 
> *pending_bios,
>   pending_bios->tail = tail;
>  }
>  
> +void btrfs_get_replication_info(struct btrfs_replication_info *info,
> + u64 type)
> +{
> + info->sub_stripes = 1;
> + info->dev_stripes = 1;
> + info->devs_increment = 1;
> + info->num_copies = 1;
> + info->devs_max = 0; /* 0 == as many as possible */
> + info->devs_min = 1;
> +
> + if (type & BTRFS_BLOCK_GROUP_DUP) {
> + info->dev_stripes = 2;
> + info->num_copies = 2;
> + info->devs_max = 1;
> + } else if (type & BTRFS_BLOCK_GROUP_RAID0) {
> + info->devs_min = 2;
> + } else if (type & BTRFS_BLOCK_GROUP_RAID1) {
> + info->devs_increment = 2;
> + info->num_copies = 2;
> + info->devs_max = 2;
> + info->devs_min = 2;
> + } else if (type & BTRFS_BLOCK_GROUP_RAID10) {
> + info->sub_stripes = 2;
> + info->devs_increment = 2;
> + info->num_copies = 2;
> + info->devs_min = 4;
> + }
> +
> + if (type & BTRFS_BLOCK_GROUP_DATA) {
> + info->max_stripe_size = 1024 * 1024 * 1024;
> + info->min_stripe_size = 64 * 1024 * 1024;
> + info->max_chunk_size = 10 * info->max_stripe_size;
> + } else if (type & BTRFS_BLOCK_GROUP_METADATA) {
> + info->max_stripe_size = 256 * 1024 * 1024;
> + info->min_stripe_size = 32 * 1024 * 1024;
> + info->max_chunk_size = info->max_stripe_size;
> + } else if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
> + info->max_stripe_size = 8 * 1024 * 1024;
> + info->min_stripe_size = 1 * 1024 * 1024;
> + info->max_chunk_size = 2 * info->max_stripe_size;
> + } else {
> + printk(KERN_ERR "Block group is of an unknown usage type: not 
> data, metadata or system.\n");
> + BUG_ON(1);

I'm hitting this BUG_ON with 'btrfs device delete', type = 24 which is
BTRFS_BLOCK_GROUP_RAID0 + BTRFS_BLOCK_GROUP_RAID1 .

in btrfs_rm_device:

1277 all_avail = root->fs_info->avail_data_alloc_bits |
1278 root->fs_info->avail_system_alloc_bits |
1279 root->fs_info->avail_metadata_alloc_bits;

the values before the call are:

[  105.107074] D: all_avail 24
[  105.111844] D: root->fs_info->avail_data_alloc_bits 8
[  105.118858] D: root->fs_info->avail_system_alloc_bits 16
[  105.126110] D: root->fs_info->avail_metadata_alloc_bits 16


there are 5 devices, sdb5 - sdb9, i'm removing sdb9, after clean
mount.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent transid verify failures on 2.6.39

2011-06-28 Thread Daniel Witzel
Earlier I tried the read only patch with no result. Josef said I must be 
applying it wrong because the error I get is not possible with the patch 
applied.
I tried again with no luck and posted my steps for review. Well here I am a few 
days later with the following questions:

1) If my steps are correct what else could be the problem
2) if my steps are wrong what do i need to do to get it right

Any help would be awesome

Thanks
Dan Witzel



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] xfstests 255: add a seek_data/seek_hole tester

2011-06-28 Thread Josef Bacik
This is a test to make sure seek_data/seek_hole is acting like it does on
Solaris.  It will check to see if the fs supports finding a hole or not and will
adjust as necessary.

Signed-off-by: Josef Bacik 
---
 255   |   71 
 255.out   |2 +
 group |1 +
 src/Makefile  |2 +-
 src/seek-tester.c |  475 +
 5 files changed, 550 insertions(+), 1 deletions(-)
 create mode 100755 255
 create mode 100644 255.out
 create mode 100644 src/seek-tester.c

diff --git a/255 b/255
new file mode 100755
index 000..4bb4d0b
--- /dev/null
+++ b/255
@@ -0,0 +1,71 @@
+#! /bin/bash
+# FS QA Test No. 255
+#
+# Test SEEK_DATA and SEEK_HOLE
+#
+#---
+# Copyright (c) 2011 Red Hat.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+#---
+#
+# creator
+owner=jo...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+rm -f $tmp.*
+}
+
+trap "_cleanup ; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+
+testfile=$TEST_DIR/seek_test.$$
+logfile=$TEST_DIR/seek_test.$$.log
+
+[ -x $here/src/seek-tester ] || _notrun "seek-tester not built"
+
+_cleanup()
+{
+   rm -f $testfile
+   rm -f $logfile
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+echo "Silence is golden"
+$here/src/seek-tester -q $testfile 2>&1 | tee -a $logfile
+
+if grep -q "SEEK_HOLE is not supported" $logfile; then
+   _notrun "SEEK_HOLE/SEEK_DATA not supported by this kernel"
+fi
+
+rm -f $logfile
+rm -f $testfile
+
+status=0 ; exit
diff --git a/255.out b/255.out
new file mode 100644
index 000..7eefb82
--- /dev/null
+++ b/255.out
@@ -0,0 +1,2 @@
+QA output created by 255
+Silence is golden
diff --git a/group b/group
index 1f86075..c045e70 100644
--- a/group
+++ b/group
@@ -368,3 +368,4 @@ deprecated
 252 auto quick prealloc
 253 auto quick
 254 auto quick
+255 auto quick
diff --git a/src/Makefile b/src/Makefile
index 91088bf..ccdaeec 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -17,7 +17,7 @@ LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize 
preallo_rw_pattern_reader \
preallo_rw_pattern_writer ftrunc trunc fs_perms testx looptest \
locktest unwritten_mmap bulkstat_unlink_test t_stripealign \
bulkstat_unlink_test_modified t_dir_offset t_futimens t_immutable \
-   stale_handle pwrite_mmap_blocked fstrim
+   stale_handle pwrite_mmap_blocked fstrim seek-tester
 
 SUBDIRS =
 
diff --git a/src/seek-tester.c b/src/seek-tester.c
new file mode 100644
index 000..5141b45
--- /dev/null
+++ b/src/seek-tester.c
@@ -0,0 +1,475 @@
+/*
+ * Copyright (C) 2011 Oracle.  All rights reserved.
+ * Copyright (C) 2011 Red Hat.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#define _XOPEN_SOURCE 500
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifndef SEEK_DATA
+#define SEEK_DATA  3
+#define SEEK_HOLE  4
+#endif
+
+#define FS_NO_HOLES(1 << 0)
+#define QUIET  (1 << 1)
+
+static blksize_t alloc_size;
+static unsigned flags = 0;
+
+static int get_io_sizes(int fd)
+{
+   struct stat buf;
+   int ret;
+
+   ret = fstat(fd, &buf);
+   if (ret)
+   fprintf(stderr, "  ERROR %d: Failed to find io blocksize\n",
+   errno);
+
+   /* st_blksize is typically also the allocation size */
+   alloc_size = buf.st_blksize;
+
+  

[PATCH 4/4] fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek

2011-06-28 Thread Josef Bacik
This converts everybody to handle SEEK_HOLE/SEEK_DATA properly.  In some cases
we just return -EINVAL, in others we do the normal generic thing, and in others
we're simply making sure that the properly due-dilligence is done.  For example
in NFS/CIFS we need to make sure the file size is update properly for the
SEEK_HOLE and SEEK_DATA case, but since it calls the generic llseek stuff itself
that is all we have to do.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/block_dev.c   |   11 ---
 fs/ceph/dir.c|8 +++-
 fs/ceph/file.c   |   20 ++--
 fs/cifs/cifsfs.c |7 +--
 fs/fuse/file.c   |   21 +++--
 fs/hpfs/dir.c|4 
 fs/nfs/file.c|7 +--
 7 files changed, 66 insertions(+), 12 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 610e8e0..966617a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -355,20 +355,25 @@ static loff_t block_llseek(struct file *file, loff_t 
offset, int origin)
mutex_lock(&bd_inode->i_mutex);
size = i_size_read(bd_inode);
 
+   retval = -EINVAL;
switch (origin) {
-   case 2:
+   case SEEK_END:
offset += size;
break;
-   case 1:
+   case SEEK_CUR:
offset += file->f_pos;
+   case SEEK_SET:
+   break;
+   default:
+   goto out;
}
-   retval = -EINVAL;
if (offset >= 0 && offset <= size) {
if (offset != file->f_pos) {
file->f_pos = offset;
}
retval = offset;
}
+out:
mutex_unlock(&bd_inode->i_mutex);
return retval;
 }
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index ef8f08c..79cd77c 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -446,14 +446,19 @@ static loff_t ceph_dir_llseek(struct file *file, loff_t 
offset, int origin)
loff_t retval;
 
mutex_lock(&inode->i_mutex);
+   retval = -EINVAL;
switch (origin) {
case SEEK_END:
offset += inode->i_size + 2;   /* FIXME */
break;
case SEEK_CUR:
offset += file->f_pos;
+   case SEEK_SET:
+   break;
+   default:
+   goto out;
}
-   retval = -EINVAL;
+
if (offset >= 0 && offset <= inode->i_sb->s_maxbytes) {
if (offset != file->f_pos) {
file->f_pos = offset;
@@ -477,6 +482,7 @@ static loff_t ceph_dir_llseek(struct file *file, loff_t 
offset, int origin)
if (offset > old_offset)
fi->dir_release_count--;
}
+out:
mutex_unlock(&inode->i_mutex);
return retval;
 }
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 9542f07..774feb1 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -770,13 +770,16 @@ static loff_t ceph_llseek(struct file *file, loff_t 
offset, int origin)
 
mutex_lock(&inode->i_mutex);
__ceph_do_pending_vmtruncate(inode);
-   switch (origin) {
-   case SEEK_END:
+   if (origin != SEEK_CUR || origin != SEEK_SET) {
ret = ceph_do_getattr(inode, CEPH_STAT_CAP_SIZE);
if (ret < 0) {
offset = ret;
goto out;
}
+   }
+
+   switch (origin) {
+   case SEEK_END:
offset += inode->i_size;
break;
case SEEK_CUR:
@@ -792,6 +795,19 @@ static loff_t ceph_llseek(struct file *file, loff_t 
offset, int origin)
}
offset += file->f_pos;
break;
+   case SEEK_DATA:
+   if (offset >= inode->i_size) {
+   ret = -ENXIO;
+   goto out;
+   }
+   break;
+   case SEEK_HOLE:
+   if (offset >= inode->i_size) {
+   ret = -ENXIO;
+   goto out;
+   }
+   offset = inode->i_size;
+   break;
}
 
if (offset < 0 || offset > inode->i_sb->s_maxbytes) {
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 35f9154..5feb6bb 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -746,8 +746,11 @@ static ssize_t cifs_file_aio_write(struct kiocb *iocb, 
const struct iovec *iov,
 
 static loff_t cifs_llseek(struct file *file, loff_t offset, int origin)
 {
-   /* origin == SEEK_END => we must revalidate the cached file length */
-   if (origin == SEEK_END) {
+   /*
+* origin == SEEK_END || SEEK_DATA || SEEK_HOLE => we must revalidate
+* the cached file length
+*/
+   if (origin != SEEK_SET || origin != SEEK_CUR) {
int rc;
struct inode *inode = file->f_path.dentry->d_inode;
 
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 82a6646..73b89df 100644
--- a/fs/f

[PATCH 2/4] Btrfs: implement our own ->llseek

2011-06-28 Thread Josef Bacik
In order to handle SEEK_HOLE/SEEK_DATA we need to implement our own llseek.
Basically for the normal SEEK_*'s we will just defer to the generic helper, and
for SEEK_HOLE/SEEK_DATA we will use our fiemap helper to figure out the nearest
hole or data.  Currently this helper doesn't check for delalloc bytes for
prealloc space, so for now treat prealloc as data until that is fixed.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/ctree.h |3 +
 fs/btrfs/file.c  |  148 +-
 2 files changed, 150 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f30ac05..32be5e0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2505,6 +2505,9 @@ int btrfs_csum_truncate(struct btrfs_trans_handle *trans,
 int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
 struct list_head *list, int search_commit);
 /* inode.c */
+struct extent_map *btrfs_get_extent_fiemap(struct inode *inode, struct page 
*page,
+  size_t pg_offset, u64 start, u64 len,
+  int create);
 
 /* RHEL and EL kernels have a patch that renames PG_checked to FsMisc */
 #if defined(ClearPageFsMisc) && !defined(ClearPageChecked)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index fa4ef18..bd4d061 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1664,8 +1664,154 @@ out:
return ret;
 }
 
+static int find_desired_extent(struct inode *inode, loff_t *offset, int origin)
+{
+   struct btrfs_root *root = BTRFS_I(inode)->root;
+   struct extent_map *em;
+   struct extent_state *cached_state = NULL;
+   u64 lockstart = *offset;
+   u64 lockend = i_size_read(inode);
+   u64 start = *offset;
+   u64 orig_start = *offset;
+   u64 len = i_size_read(inode);
+   u64 last_end = 0;
+   int ret = 0;
+
+   lockend = max_t(u64, root->sectorsize, lockend);
+   if (lockend <= lockstart)
+   lockend = lockstart + root->sectorsize;
+
+   len = lockend - lockstart + 1;
+
+   len = max_t(u64, len, root->sectorsize);
+   if (inode->i_size == 0)
+   return -ENXIO;
+
+   lock_extent_bits(&BTRFS_I(inode)->io_tree, lockstart, lockend, 0,
+&cached_state, GFP_NOFS);
+
+   /*
+* Delalloc is such a pain.  If we have a hole and we have pending
+* delalloc for a portion of the hole we will get back a hole that
+* exists for the entire range since it hasn't been actually written
+* yet.  So to take care of this case we need to look for an extent just
+* before the position we want in case there is outstanding delalloc
+* going on here.
+*/
+   if (origin == SEEK_HOLE && start != 0) {
+   if (start <= root->sectorsize)
+   em = btrfs_get_extent_fiemap(inode, NULL, 0, 0,
+root->sectorsize, 0);
+   else
+   em = btrfs_get_extent_fiemap(inode, NULL, 0,
+start - root->sectorsize,
+root->sectorsize, 0);
+   if (IS_ERR(em)) {
+   ret = -ENXIO;
+   goto out;
+   }
+   last_end = em->start + em->len;
+   if (em->block_start == EXTENT_MAP_DELALLOC)
+   last_end = min_t(u64, last_end, inode->i_size);
+   free_extent_map(em);
+   }
+
+   while (1) {
+   em = btrfs_get_extent_fiemap(inode, NULL, 0, start, len, 0);
+   if (IS_ERR(em)) {
+   ret = -ENXIO;
+   break;
+   }
+
+   if (em->block_start == EXTENT_MAP_HOLE) {
+   if (test_bit(EXTENT_FLAG_VACANCY, &em->flags)) {
+   if (last_end <= orig_start) {
+   free_extent_map(em);
+   ret = -ENXIO;
+   break;
+   }
+   }
+
+   if (origin == SEEK_HOLE) {
+   *offset = start;
+   free_extent_map(em);
+   break;
+   }
+   } else {
+   if (origin == SEEK_DATA) {
+   if (em->block_start == EXTENT_MAP_DELALLOC) {
+   if (start >= inode->i_size) {
+   free_extent_map(em);
+   ret = -ENXIO;
+   break;
+   }
+   }
+
+

[PATCH 3/4] Ext4: handle SEEK_HOLE/SEEK_DATA generically

2011-06-28 Thread Josef Bacik
Since Ext4 has its own lseek we need to make sure it handles
SEEK_HOLE/SEEK_DATA.  For now just do the same thing that is done in the generic
case, somebody else can come along and make it do fancy things later.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/ext4/file.c |   21 +
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 2c09723..ce766f9 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -236,6 +236,27 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int 
origin)
}
offset += file->f_pos;
break;
+   case SEEK_DATA:
+   /*
+* In the generic case the entire file is data, so as long as
+* offset isn't at the end of the file then the offset is data.
+*/
+   if (offset >= inode->i_size) {
+   mutex_unlock(&inode->i_mutex);
+   return -ENXIO;
+   }
+   break;
+   case SEEK_HOLE:
+   /*
+* There is a virtual hole at the end of the file, so as long as
+* offset isn't i_size or larger, return i_size.
+*/
+   if (offset >= inode->i_size) {
+   mutex_unlock(&inode->i_mutex);
+   return -ENXIO;
+   }
+   offset = inode->i_size;
+   break;
}
 
if (offset < 0 || offset > maxbytes) {
-- 
1.7.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] fs: add SEEK_HOLE and SEEK_DATA flags

2011-06-28 Thread Josef Bacik
This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags.  Turns out
using fiemap in things like cp cause more problems than it solves, so lets try
and give userspace an interface that doesn't suck.  We need to match solaris
here, and the definitions are

*o* If /whence/ is SEEK_HOLE, the offset of the start of the
next hole greater than or equal to the supplied offset
is returned. The definition of a hole is provided near
the end of the DESCRIPTION.

*o* If /whence/ is SEEK_DATA, the file pointer is set to the
start of the next non-hole file region greater than or
equal to the supplied offset.

So in the generic case the entire file is data and there is a virtual hole at
the end.  That means we will just return i_size for SEEK_HOLE and will return
the same offset for SEEK_DATA.  This is how Solaris does it so we have to do it
the same way.

Thanks,

Signed-off-by: Josef Bacik 
---
 fs/read_write.c|   44 +---
 include/linux/fs.h |4 +++-
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5520f8a..5907b49 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -64,6 +64,23 @@ generic_file_llseek_unlocked(struct file *file, loff_t 
offset, int origin)
return file->f_pos;
offset += file->f_pos;
break;
+   case SEEK_DATA:
+   /*
+* In the generic case the entire file is data, so as long as
+* offset isn't at the end of the file then the offset is data.
+*/
+   if (offset >= inode->i_size)
+   return -ENXIO;
+   break;
+   case SEEK_HOLE:
+   /*
+* There is a virtual hole at the end of the file, so as long as
+* offset isn't i_size or larger, return i_size.
+*/
+   if (offset >= inode->i_size)
+   return -ENXIO;
+   offset = inode->i_size;
+   break;
}
 
if (offset < 0 && !unsigned_offsets(file))
@@ -128,12 +145,13 @@ EXPORT_SYMBOL(no_llseek);
 
 loff_t default_llseek(struct file *file, loff_t offset, int origin)
 {
+   struct inode *inode = file->f_path.dentry->d_inode;
loff_t retval;
 
-   mutex_lock(&file->f_dentry->d_inode->i_mutex);
+   mutex_lock(&inode->i_mutex);
switch (origin) {
case SEEK_END:
-   offset += i_size_read(file->f_path.dentry->d_inode);
+   offset += i_size_read(inode);
break;
case SEEK_CUR:
if (offset == 0) {
@@ -141,6 +159,26 @@ loff_t default_llseek(struct file *file, loff_t offset, 
int origin)
goto out;
}
offset += file->f_pos;
+   break;
+   case SEEK_DATA:
+   /*
+* In the generic case the entire file is data, so as
+* long as offset isn't at the end of the file then the
+* offset is data.
+*/
+   if (offset >= inode->i_size)
+   return -ENXIO;
+   break;
+   case SEEK_HOLE:
+   /*
+* There is a virtual hole at the end of the file, so
+* as long as offset isn't i_size or larger, return
+* i_size.
+*/
+   if (offset >= inode->i_size)
+   return -ENXIO;
+   offset = inode->i_size;
+   break;
}
retval = -EINVAL;
if (offset >= 0 || unsigned_offsets(file)) {
@@ -151,7 +189,7 @@ loff_t default_llseek(struct file *file, loff_t offset, int 
origin)
retval = offset;
}
 out:
-   mutex_unlock(&file->f_dentry->d_inode->i_mutex);
+   mutex_unlock(&inode->i_mutex);
return retval;
 }
 EXPORT_SYMBOL(default_llseek);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b5b9792..c9156f3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -32,7 +32,9 @@
 #define SEEK_SET   0   /* seek relative to beginning of file */
 #define SEEK_CUR   1   /* seek relative to current file position */
 #define SEEK_END   2   /* seek relative to end of file */
-#define SEEK_MAX   SEEK_END
+#define SEEK_DATA  3   /* seek to the next data */
+#define SEEK_HOLE  4   /* seek to the next hole */
+#define SEEK_MAX   SEEK_HOLE
 
 struct fstrim_range {
__u64 start;
-- 
1.7.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel

[PATCH for -rc6] btrfs: add missing options displayed in mount output

2011-06-28 Thread David Sterba
Hi Chris,

I think this is patch should go into -rc6 series, it fixes a user-visible bug.
I was confused when I did not see the inode_cache in mount output although I
was sure (and verified in my script) that I had passed it to mount.

thanks,
david

---
From: David Sterba 

There are three missed mount options settable by user which are not
currently displayed in mount output.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h |5 +
 fs/btrfs/super.c |6 ++
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3006287..4c840b5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1336,6 +1336,11 @@ struct btrfs_ioctl_defrag_range_args {
  */
 #define BTRFS_STRING_ITEM_KEY  253
 
+/*
+ * Flags for mount options.
+ *
+ * Note: don't forget to add new options to btrfs_show_options()
+ */
 #define BTRFS_MOUNT_NODATASUM  (1 << 0)
 #define BTRFS_MOUNT_NODATACOW  (1 << 1)
 #define BTRFS_MOUNT_NOBARRIER  (1 << 2)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 0bb4ebb..15634d4 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -723,6 +723,12 @@ static int btrfs_show_options(struct seq_file *seq, struct 
vfsmount *vfs)
seq_puts(seq, ",clear_cache");
if (btrfs_test_opt(root, USER_SUBVOL_RM_ALLOWED))
seq_puts(seq, ",user_subvol_rm_allowed");
+   if (btrfs_test_opt(root, ENOSPC_DEBUG))
+   seq_puts(seq, ",enospc_debug");
+   if (btrfs_test_opt(root, AUTO_DEFRAG))
+   seq_puts(seq, ",autodefrag");
+   if (btrfs_test_opt(root, INODE_MAP_CACHE))
+   seq_puts(seq, ",inode_cache");
return 0;
 }
 
-- 
1.7.5.2.353.g5df3e

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Snapshot reconciliation

2011-06-28 Thread C Anthony Risinger
2011/6/28 João Eduardo Luís :
> Hello.
>
> Can anyone think of a simple way to copy a set of pages from a given file 
> (which may or may not be scattered throughout multiple extents) from a 
> snapshot to correct pages within another file on another snapshot?
>
> This might sound silly, but the whole purpose is to create some sort of 
> reconciliation method between divergent snapshots taken from the same 
> original subvolume.

generic deduplication?

Josef posted some patches back in Jan:

http://www.spinics.net/lists/linux-btrfs/msg07818.html
http://www.spinics.net/lists/linux-btrfs/msg07819.html
(etc)

... if that's what your looking for.  i don't know whats all needed to
make it work at this point, ie. if you only need the patch to
btrfs-progs or some combination.  there could be more recent patches
but i don't recall anyone talking about any.

C Anthony
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix error check of btrfs_lookup_dentry()

2011-06-28 Thread Josef Bacik
On 06/27/2011 11:34 PM, Tsutomu Itoh wrote:
> The return value of btrfs_lookup_dentry is checked so that
> the panic such as illegal address reference should not occur.
> 
> Signed-off-by: Tsutomu Itoh 

Nack, please fix btrfs_lookup_dentry to return ERR_PTR(-ENOENT) if it
doesn't find something.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Snapshot reconciliation

2011-06-28 Thread João Eduardo Luís
Hello.

Can anyone think of a simple way to copy a set of pages from a given file 
(which may or may not be scattered throughout multiple extents) from a snapshot 
to correct pages within another file on another snapshot?

This might sound silly, but the whole purpose is to create some sort of 
reconciliation method between divergent snapshots taken from the same original 
subvolume.

---
João Eduardo Luís
gpg key: 477C26E5 from pool.keyserver.eu 







PGP.sig
Description: This is a digitally signed message part


Re: How to handle badblocks with btrfs?

2011-06-28 Thread Arne Jansen
On 28.06.2011 01:36, Marco L. Crociani wrote:
> # smartctl -d ata -l selftest /dev/sda
> smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART Self-test log structure revision number 1
> Num  Test_DescriptionStatus  Remaining
> LifeTime(hours)  LBA_of_first_error
> # 1  Extended offlineCompleted: read failure   90%   171
>   494581664
> 
> 
> What should I do to repair the disk?

Scrub can repair bad blocks as long as there's a good copy. To use scrub
you'll need the most recent rc-kernel and a btrfs-utility that supports
scrub, e.g. from the integration branch of Hugo Mills git tree, see his
recent mail for this.

> Is it possible to know which file is affected by the badblock?

There's a patch from Jan Schmidt for this, but it's not yet integrated.

> 
> I found http://smartmontools.sourceforge.net/badblockhowto.html but is
> about to ext2/3/4 fs.
> 
> I am concerned about the absence of fsck tool. Should I run badblocks
> on the unmounted fs?
> Thanks,
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html