Re: System completely unresponsive after `btrfs balance start -dconvert=raid0 /` and `btrfs fi show /`

2015-10-13 Thread Duncan
Carmine Paolino posted on Tue, 13 Oct 2015 23:21:49 +0200 as excerpted:

> I have an home server with 3 hard drives that I added to the same btrfs
> filesystem. Several hours ago I run `btrfs balance start -dconvert=raid0
> /` and as soon as I run `btrfs fi show /` I lost my ssh connection to
> the machine. The machine is still on, but it doesn’t even respond to
> ping[. ...]
> 
> (I have a 250gb internal hard drive, a 120gb usb 2.0 one and a 2TB usb
> 2.0 one so the transfer speeds are pretty low)

I won't attempt to answer the primary question[1] directly, but can point 
out that in many cases, USB-connected devices simply don't have a stable 
enough connection to work reliably in a multi-device btrfs.  There's 
several possibilities for failure, including flaky connections (sometimes 
assisted by cats or kids), unstable USB host port drivers, and unstable 
USB/ATA translators.  A number of folks have reported problems with such 
filesystems with devices connected over USB, that simply disappear if 
they direct-connect the exact same devices to a proper SATA port.  The 
problem seems to be /dramatically/ worse with USB connected devices, than 
it is with, for instance, PCIE-based SATA expansion cards.

Single-device btrfs with USB-attached devices seem to work rather better, 
because at least in that case, if the connection is flaky, the entire 
filesystem appears and disappears at once, and btrfs' COW, atomic-commit 
and data-integrity features, kick in to help deal with the connection's 
instability.

Arguably, a two-device raid1 (both data/metadata, with metadata including 
system) should work reasonably well too, as long as scrubs are done after 
reconnection when there's trouble with one of the pair, because in that 
case, all data appears on both devices, but single and raid0 modes are 
likely to have severe issues in that sort of environment, because even 
temporary disconnection of a single device means loss of access to some 
data/metadata on the filesystem.  Raid10, 3+-device-raid1, and raid5/6, 
are more complex situations.  They should survive loss of at least one 
device, but keeping the filesystem healthy in the presence of unstable 
connections is... complex enough I'd hate to be the one having to deal 
with it, which means I can't recommend it to others, either.

So I'd recommend either connecting all devices internally if possible, or 
setting up the USB-connected devices with separate filesystems, if 
internal direct-connection isn't possible.

---
[1] Sysadmin's rule of backups.  If the data isn't backed up, by 
definition it is of less value than the resource and hassle cost of 
backup.  No exceptions -- post-loss claims to the contrary simply put the 
lie to the claims, as actions spoke louder than words and they defined 
the cost of the backup as more expensive than the data that would have 
been backed up.  Worst-case is then loss of data that was by definition 
of less value than the cost of backup, and the more valuable resource and 
hassle cost of the backup was avoided, so the comparatively lower value 
data loss is no big deal.

So in a case like this, I'd simply power down and take my chances of 
filesystem loss, strictly limiting the time and resources I'd devote to 
any further attempt at recovery, because the data is by definition either 
backed up, or of such low value that a backup was considered too 
expensive to do, meaning there's a very real possibility of spending more 
time in a recovery attempt that's iffy at best, than the data on the 
filesystem is actually worth, either because there are backups, or 
because it's throw-away data in the first place.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: State of Dedup / Defrag

2015-10-13 Thread Zygo Blaxell
On Tue, Oct 13, 2015 at 02:59:59PM -0400, Rich Freeman wrote:
> What is the current state of Dedup and Defrag in btrfs?  I seem to
> recall there having been problems a few months ago and I've stopped
> using it, but I haven't seen much news since.

It has been 1 day since a kernel bug leading to data loss was fixed in the
ioctl calls for dedup (commit 6e685a1e3e9054d43fac58f2bc0cd070df915079
from fdmanana yesterday); however, to hit that particular bug you'd
need to be doing something unusual with the ioctls--in particular, a
thing that makes no sense for dedup, and that dedup userspace programs
intentionally avoid doing.  There was another bug for defrag 68 days ago.

I wouldn't try to use dedup on a kernel older than v4.1 because of these
fixes in 4.1 and later:

- allow dedup of the ends of files when they are not aligned
to 4K.  Before this was fixed, up to 1GB of space could be wasted
per file.

- no mtime update on extent-same.  With the update, rsync
and backup programs think all the deduped files are modified.
The next rsync after dedup would immediately un-dedup (redup?) all
the deduped files.

- fixes for deadlocks.  If dedup is running at the same time as
other readers of files (e.g. deduping /usr or a tree on a busy
file server), a deadlock was inevitable.

IMHO these fixes really made dedup usable for the first time.

There are some other fixes that appeared after v4.1, but they should
not impact cases where mostly static data is deduped without concurrent
modifications.  Do dedup a photo or video file collection.  Don't dedup
a live database server on a filesystem with compression enabled...yet.

Using dedup and defrag at the same time is still a bad idea.  The features
work against each other:  autodefrag skips anything that has been deduped,
while manual defrag un-dedups everything it touches.  The effect of
defrag on dedup depends on the choice of dedup userspace strategy,
so defrag can either be helpful or harmful.

Autodefrag in my experience pushes write latencies up to insane levels.
Data ends up making multiple round-trips to the disk _with_ extra
constraints on the allocator on the second and later passes, and while
this is happening any other writes on the filesystem block an absurdly
long time.  It can easily cost more I/O time than it saves.  That said,
there are some kernel patches floating around to fix the allocator,
so at least we can hope autodefrag will be less bad someday.



signature.asc
Description: Digital signature


[PATCH v2] btrfs: compress: put variables defined per compress type in struct to make cache friendly

2015-10-13 Thread Byongho Lee
Below variables are defined per compress type.
 - struct list_head comp_idle_workspace[BTRFS_COMPRESS_TYPES]
 - spinlock_t comp_workspace_lock[BTRFS_COMPRESS_TYPES]
 - int comp_num_workspace[BTRFS_COMPRESS_TYPES]
 - atomic_t comp_alloc_workspace[BTRFS_COMPRESS_TYPES]
 - wait_queue_head_t comp_workspace_wait[BTRFS_COMPRESS_TYPES]

BTW, while accessing one compress type of these variables, the next or
before address is other compress types of it.
So this patch puts these variables in a struct to make cache friendly.

Signed-off-by: Byongho Lee 
---
 V2: Apply David's review comment.
 Rename struct comp to btrfs_comp_ws and trim it's members to 'ws'
 instead of 'workspace'.

 fs/btrfs/compression.c | 94 ++
 1 file changed, 48 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index ce62324c78e7..8e94ae5fe732 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -744,11 +744,13 @@ out:
return ret;
 }
 
-static struct list_head comp_idle_workspace[BTRFS_COMPRESS_TYPES];
-static spinlock_t comp_workspace_lock[BTRFS_COMPRESS_TYPES];
-static int comp_num_workspace[BTRFS_COMPRESS_TYPES];
-static atomic_t comp_alloc_workspace[BTRFS_COMPRESS_TYPES];
-static wait_queue_head_t comp_workspace_wait[BTRFS_COMPRESS_TYPES];
+static struct {
+   struct list_head idle_ws;
+   spinlock_t ws_lock;
+   int num_ws;
+   atomic_t alloc_ws;
+   wait_queue_head_t ws_wait;
+} btrfs_comp_ws[BTRFS_COMPRESS_TYPES];
 
 static const struct btrfs_compress_op * const btrfs_compress_op[] = {
_zlib_compress,
@@ -760,10 +762,10 @@ void __init btrfs_init_compress(void)
int i;
 
for (i = 0; i < BTRFS_COMPRESS_TYPES; i++) {
-   INIT_LIST_HEAD(_idle_workspace[i]);
-   spin_lock_init(_workspace_lock[i]);
-   atomic_set(_alloc_workspace[i], 0);
-   init_waitqueue_head(_workspace_wait[i]);
+   INIT_LIST_HEAD(_comp_ws[i].idle_ws);
+   spin_lock_init(_comp_ws[i].ws_lock);
+   atomic_set(_comp_ws[i].alloc_ws, 0);
+   init_waitqueue_head(_comp_ws[i].ws_wait);
}
 }
 
@@ -777,38 +779,38 @@ static struct list_head *find_workspace(int type)
int cpus = num_online_cpus();
int idx = type - 1;
 
-   struct list_head *idle_workspace= _idle_workspace[idx];
-   spinlock_t *workspace_lock  = _workspace_lock[idx];
-   atomic_t *alloc_workspace   = _alloc_workspace[idx];
-   wait_queue_head_t *workspace_wait   = _workspace_wait[idx];
-   int *num_workspace  = _num_workspace[idx];
+   struct list_head *idle_ws   = _comp_ws[idx].idle_ws;
+   spinlock_t *ws_lock = _comp_ws[idx].ws_lock;
+   atomic_t *alloc_ws  = _comp_ws[idx].alloc_ws;
+   wait_queue_head_t *ws_wait  = _comp_ws[idx].ws_wait;
+   int *num_ws = _comp_ws[idx].num_ws;
 again:
-   spin_lock(workspace_lock);
-   if (!list_empty(idle_workspace)) {
-   workspace = idle_workspace->next;
+   spin_lock(ws_lock);
+   if (!list_empty(idle_ws)) {
+   workspace = idle_ws->next;
list_del(workspace);
-   (*num_workspace)--;
-   spin_unlock(workspace_lock);
+   (*num_ws)--;
+   spin_unlock(ws_lock);
return workspace;
 
}
-   if (atomic_read(alloc_workspace) > cpus) {
+   if (atomic_read(alloc_ws) > cpus) {
DEFINE_WAIT(wait);
 
-   spin_unlock(workspace_lock);
-   prepare_to_wait(workspace_wait, , TASK_UNINTERRUPTIBLE);
-   if (atomic_read(alloc_workspace) > cpus && !*num_workspace)
+   spin_unlock(ws_lock);
+   prepare_to_wait(ws_wait, , TASK_UNINTERRUPTIBLE);
+   if (atomic_read(alloc_ws) > cpus && !*num_ws)
schedule();
-   finish_wait(workspace_wait, );
+   finish_wait(ws_wait, );
goto again;
}
-   atomic_inc(alloc_workspace);
-   spin_unlock(workspace_lock);
+   atomic_inc(alloc_ws);
+   spin_unlock(ws_lock);
 
workspace = btrfs_compress_op[idx]->alloc_workspace();
if (IS_ERR(workspace)) {
-   atomic_dec(alloc_workspace);
-   wake_up(workspace_wait);
+   atomic_dec(alloc_ws);
+   wake_up(ws_wait);
}
return workspace;
 }
@@ -820,27 +822,27 @@ again:
 static void free_workspace(int type, struct list_head *workspace)
 {
int idx = type - 1;
-   struct list_head *idle_workspace= _idle_workspace[idx];
-   spinlock_t *workspace_lock  = _workspace_lock[idx];
-   atomic_t *alloc_workspace   = _alloc_workspace[idx];
-   wait_queue_head_t 

Re: [PATCH] btrfs: fix use after free iterating extrefs

2015-10-13 Thread Chris Mason
On Tue, Oct 13, 2015 at 12:17:55PM -0700, Mark Fasheh wrote:
> On Tue, Oct 13, 2015 at 02:06:48PM -0400, Chris Mason wrote:
> > The code for btrfs inode-resolve has never worked properly for
> > files with enough hard links to trigger extrefs.  It was trying to
> > get the leaf out of a path after freeing the path:
> > 
> > btrfs_release_path(path);
> > leaf = path->nodes[0];
> > item_size = btrfs_item_size_nr(leaf, slot);
> > 
> > The fix here is to use the extent buffer we cloned just a little higher
> > up to avoid deadlocks caused by using the leaf in the path.
> > 
> > Signed-off-by: Chris Mason 
> > cc: sta...@vger.kernel.org # v3.7+
> > cc: Mark Fasheh 
> Reviewed-by: Mark Fasheh 

Thanks Mark and Filipe, I've tested this and queued it up.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 8/9] vfs: Add vfs_copy_file_range() support for pagecache copies

2015-10-13 Thread Darrick J. Wong
On Mon, Oct 12, 2015 at 11:36:31PM -0400, Trond Myklebust wrote:
> On Mon, Oct 12, 2015 at 7:17 PM, Darrick J. Wong
>  wrote:
> > On Sun, Oct 11, 2015 at 07:22:03AM -0700, Christoph Hellwig wrote:
> >> On Wed, Sep 30, 2015 at 01:26:52PM -0400, Anna Schumaker wrote:
> >> > This allows us to have an in-kernel copy mechanism that avoids frequent
> >> > switches between kernel and user space.  This is especially useful so
> >> > NFSD can support server-side copies.
> >> >
> >> > I make pagecache copies configurable by adding three new (exclusive)
> >> > flags:
> >> > - COPY_FR_REFLINK tells vfs_copy_file_range() to only create a reflink.
> >> > - COPY_FR_COPY does a full data copy, but may be filesystem accelerated.
> >> > - COPY_FR_DEDUP creates a reflink, but only if the contents of both
> >> >   ranges are identical.
> >>
> >> All but FR_COPY really should be a separate system call.  Clones (an
> >> dedup as a special case of clones) are really a separate beast from file
> >> copies.
> >>
> >> If I want to clone a file I either want it clone fully or fail, not copy
> >> a certain amount.  That means that a) we need to return an error not
> >> short "write", and b) locking impementations are important - we need to
> >> prevent other applications from racing with our clone even if it is
> >> large, while to get these semantics for the possible short returning
> >> file copy will require a proper userland locking protocol. Last but not
> >> least file copies need to be interruptible while clones should be not.
> >> All this is already important for local file systems and even more
> >> important for NFS exporting.
> >>
> >> So I'd suggest to drop this patch and just let your syscall handle
> >> actualy copies with all their horrors.  We can go with Peng's patches
> >> to generalize the btrfs ioctls for clones for now which is what everyone
> >> already uses anyway, and then add a separate sys_file_clone later.
> >
> > Hm.  Peng's patches only generalize the CLONE and CLONE_RANGE ioctls from
> > btrfs, however they don't port over the (vastly different) EXTENT_SAME 
> > ioctl.
> >
> > What does everyone think about generalizing EXTENT_SAME?  The interface 
> > enables
> > one to ask the kernel to dedupe multiple file ranges in a single call.  
> > That's
> > more complex than what I was proposing with COPY_FR_DEDUP(E), but I'm 
> > assuming
> > that the extra complexity buys us the ability to ... multi-dedupe at the 
> > same
> > time, with locks held on the source file?
> 
> How is this supposed to be implemented on something like NFS without
> protocol changes?

Quite frankly, I'm not sure.  Assuming NFS doesn't already have some sort of
deduplication primitive (I could be totally wrong about that) I'd probably just
leave the appropriate ops function pointer set to NULL and return -EOPNOTSUPP
to userspace.  Trying to fake it by comparing contents on the client and
issuing a reflink might be doable with hard locks but if I had to guess I'd say
that's even less palatable than simply bailing out. :)

IOW: I was only considering the filesystems that already support dedupe, which
is basically btrfs and future-XFS.

--D

> 
> Trond
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 8/9] vfs: Add vfs_copy_file_range() support for pagecache copies

2015-10-13 Thread Christoph Hellwig
On Mon, Oct 12, 2015 at 04:17:49PM -0700, Darrick J. Wong wrote:
> Hm.  Peng's patches only generalize the CLONE and CLONE_RANGE ioctls from
> btrfs, however they don't port over the (vastly different) EXTENT_SAME ioctl.
> 
> What does everyone think about generalizing EXTENT_SAME?  The interface 
> enables
> one to ask the kernel to dedupe multiple file ranges in a single call.  That's
> more complex than what I was proposing with COPY_FR_DEDUP(E), but I'm assuming
> that the extra complexity buys us the ability to ... multi-dedupe at the same
> time, with locks held on the source file?
> 
> I'm happy to generalize the existing EXTENT_SAME, but please yell if you 
> really
> hate the interface.

It's not pretty, but if the btrfs folks have a good reason for it I
don't see a reason to diverge.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 8/9] vfs: Add vfs_copy_file_range() support for pagecache copies

2015-10-13 Thread Christoph Hellwig
On Mon, Oct 12, 2015 at 11:36:31PM -0400, Trond Myklebust wrote:
> How is this supposed to be implemented on something like NFS without
> protocol changes?

Explicit dedup has no chance of working over NFS or other network
protocols without protocol changes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs: extend balance filter limit to take minimum and maximum

2015-10-13 Thread David Sterba
The 'limit' filter is underdesigned, it should have been a range for
[min,max], with some relaxed semantics when one of the bounds is
missing. Besides that, using a full u64 for a single value is a waste of
bytes.

Let's fix both by extending the use of the u64 bytes for the [min,max]
range. This can be done in a backward compatible way, the range will be
interpreted only if the appropriate flag is set
(BTRFS_BALANCE_ARGS_LIMITS).

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h   | 14 --
 fs/btrfs/volumes.c | 14 ++
 fs/btrfs/volumes.h |  1 +
 include/uapi/linux/btrfs.h | 13 -
 4 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 938efe33be80..7d2e1b6d0ac1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -846,8 +846,18 @@ struct btrfs_disk_balance_args {
/* BTRFS_BALANCE_ARGS_* */
__le64 flags;
 
-   /* BTRFS_BALANCE_ARGS_LIMIT value */
-   __le64 limit;
+   /*
+* BTRFS_BALANCE_ARGS_LIMIT with value 'limit'
+* BTRFS_BALANCE_ARGS_LIMITS - the extend version can use minimum and
+* maximum
+*/
+   union {
+   __le64 limit;
+   struct {
+   __le32 limit_min;
+   __le32 limit_max;
+   };
+   };
 
__le64 unused[7];
 } __attribute__ ((__packed__));
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6fc735869c18..0693e974f1c0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3250,6 +3250,15 @@ static int should_balance_chunk(struct btrfs_root *root,
return 0;
else
bargs->limit--;
+   } else if ((bargs->flags & BTRFS_BALANCE_ARGS_LIMITS)) {
+   if (bargs->limit_min < bargs->limit_max) {
+   bargs->limit_max--;
+   } else if (bargs->limit_min == bargs->limit_max) {
+   bargs->limit_min = UINT_MAX;
+   bargs->limit_max = 0;
+   } else {
+   return 0;
+   }
}
 
return 1;
@@ -3274,6 +3283,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
int ret;
int enospc_errors = 0;
bool counting = true;
+   /* The single value limit and min/max limits use the same bytes in the 
*/
u64 limit_data = bctl->data.limit;
u64 limit_meta = bctl->meta.limit;
u64 limit_sys = bctl->sys.limit;
@@ -3317,6 +3327,10 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
spin_unlock(_info->balance_lock);
 again:
if (!counting) {
+   /*
+* The single value limit and min/max limits use the same bytes
+* in the
+*/
bctl->data.limit = limit_data;
bctl->meta.limit = limit_meta;
bctl->sys.limit = limit_sys;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 2ca784a14e84..1c9d8edd7d57 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -375,6 +375,7 @@ struct map_lookup {
 #define BTRFS_BALANCE_ARGS_DRANGE  (1ULL << 3)
 #define BTRFS_BALANCE_ARGS_VRANGE  (1ULL << 4)
 #define BTRFS_BALANCE_ARGS_LIMIT   (1ULL << 5)
+#define BTRFS_BALANCE_ARGS_LIMITS  (1ULL << 6)
 
 /*
  * Profile changing flags.  When SOFT is set we won't relocate chunk if
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index b6dec05c7196..264ecea5ecfc 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -217,7 +217,18 @@ struct btrfs_balance_args {
 
__u64 flags;
 
-   __u64 limit;/* limit number of processed chunks */
+   /*
+* BTRFS_BALANCE_ARGS_LIMIT with value 'limit'
+* BTRFS_BALANCE_ARGS_LIMITS - the extend version can use minimum and
+* maximum
+*/
+   union {
+   __u64 limit;/* limit number of processed chunks */
+   struct {
+   __u32 limit_min;
+   __u32 limit_max;
+   };
+   };
__u64 unused[7];
 } __attribute__ ((__packed__));
 
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Balance filters: stripes, enhanced limit

2015-10-13 Thread David Sterba
Update to balance filters, intended fro 4.4:

* new 'stripes=' - process only stripes that cross given number of
  devices, specified by a range

* updated 'limit=' - previously a single number was accepted, it's a
  range so now we can specify a minimum number of chunks to process

There will be more documentation about the use in the btrfs-progs patches, the
kernel side just applies the ranges. The update to 'limit' is backward
compatible, reuses the previous struct member.

Can be pulled from

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git 
dev/balance-filters

I'm finalizing the progs patches and haven't tested that them extensively.


David Sterba (1):
  btrfs: extend balance filter limit to take minimum and maximum

Gabríel Arthúr Pétursson (1):
  btrfs: add balance filter for stripes

 fs/btrfs/ctree.h   | 23 ---
 fs/btrfs/volumes.c | 33 +
 fs/btrfs/volumes.h |  2 ++
 include/uapi/linux/btrfs.h | 23 +--
 4 files changed, 76 insertions(+), 5 deletions(-)

-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: add balance filter for stripes

2015-10-13 Thread David Sterba
From: Gabríel Arthúr Pétursson 

Balance block groups which have the given number of stripes, defined by
a range min..max. This is useful to selectively rebalance only chunks
that do not span enough devices, applies to RAID0/10/5/6.

Signed-off-by: Gabríel Arthúr Pétursson 
[ renamed bargs members, added to the UAPI, wrote the changelog ]
Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h   |  9 -
 fs/btrfs/volumes.c | 19 +++
 fs/btrfs/volumes.h |  1 +
 include/uapi/linux/btrfs.h | 10 +-
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7d2e1b6d0ac1..e2eefa222999 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -859,7 +859,14 @@ struct btrfs_disk_balance_args {
};
};
 
-   __le64 unused[7];
+   /*
+* Process chunks that cross stripes_min..stripes_max devices,
+* BTRFS_BALANCE_ARGS_STRIPES
+*/
+   __le32 stripes_min;
+   __le32 stripes_max;
+
+   __le64 unused[6];
 } __attribute__ ((__packed__));
 
 /*
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0693e974f1c0..51c0e5b219a3 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3170,6 +3170,19 @@ static int chunk_vrange_filter(struct extent_buffer 
*leaf,
return 1;
 }
 
+static int chunk_stripes_filter(struct extent_buffer *leaf,
+  struct btrfs_chunk *chunk,
+  struct btrfs_balance_args *bargs)
+{
+   int num_stripes = btrfs_chunk_num_stripes(leaf, chunk);
+
+   if (bargs->stripes_min <= num_stripes
+   && num_stripes <= bargs->stripes_max)
+   return 0;
+
+   return 1;
+}
+
 static int chunk_soft_convert_filter(u64 chunk_type,
 struct btrfs_balance_args *bargs)
 {
@@ -3236,6 +3249,12 @@ static int should_balance_chunk(struct btrfs_root *root,
return 0;
}
 
+   /* stripes filter */
+   if ((bargs->flags & BTRFS_BALANCE_ARGS_STRIPES) &&
+   chunk_stripes_filter(leaf, chunk, bargs)) {
+   return 0;
+   }
+
/* soft profile changing mode */
if ((bargs->flags & BTRFS_BALANCE_ARGS_SOFT) &&
chunk_soft_convert_filter(chunk_type, bargs)) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 1c9d8edd7d57..a87d96d75d07 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -376,6 +376,7 @@ struct map_lookup {
 #define BTRFS_BALANCE_ARGS_VRANGE  (1ULL << 4)
 #define BTRFS_BALANCE_ARGS_LIMIT   (1ULL << 5)
 #define BTRFS_BALANCE_ARGS_LIMITS  (1ULL << 6)
+#define BTRFS_BALANCE_ARGS_STRIPES (1ULL << 7)
 
 /*
  * Profile changing flags.  When SOFT is set we won't relocate chunk if
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 264ecea5ecfc..ab720200d0f7 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -229,7 +229,15 @@ struct btrfs_balance_args {
__u32 limit_max;
};
};
-   __u64 unused[7];
+
+   /*
+* Process chunks that cross stripes_min..stripes_max devices,
+* BTRFS_BALANCE_ARGS_STRIPES
+*/
+   __le32 stripes_min;
+   __le32 stripes_max;
+
+   __u64 unused[6];
 } __attribute__ ((__packed__));
 
 /* report balance progress to userspace */
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 9/9] btrfs: btrfs_copy_file_range() only supports reflinks

2015-10-13 Thread Christoph Hellwig
On Mon, Oct 12, 2015 at 04:41:06PM -0700, Darrick J. Wong wrote:
> One of the patches in last week's XFS reflink patchbomb adds FALLOC_FL_UNSHARE
> flag; at the moment it _only_ forces copy-on-write of shared blocks, and it
> leaves holes alone.

Yes, I've seen the implementation. 

> Obviously we haven't yet figured out what are peoples' preferences in terms of
> "fill the holes and unshare the shared" vs. "only unshare the shared" vs. 
> "only
> fill the holes".  It isn't that hard to add a FALLOC_FL_UNSHARE_FILL_HOLES 
> flag
> that fills the holes while unsharing is going on.
> 
> Personally I suspect that the most interest is in filling holes and unsharing,
> because they don't want to pay for allocation at a critical stage for anywhere
> in the file.  But I could be wrong, so allowing both goals to be expressed via
> mode allows flexibility.

Exactly.  And a normal falloc should do just that - fill holes and
ensure that we don't need to COW already allocated locks.  So I don't
think we need a new fallocate interface for that.  The question is if we
want a copy interface that gives you the same semantics as if you also
called an fallocate on the destination range.  For that case we'd
usually want to avoid doing the clone and instead do a in-kernel or
hardware assisted copy and then fill the holes with unwritten extents.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: mkfs: Enable -d dup for single device

2015-10-13 Thread Zhao Lei
Current code don't support dup profile in single device, except it
is in mixed mode, because following reason:
1: In some ssd with deduplication function, it have no effect.
2: For a physical device, it the entire disk broken, -d dup can
   not help.
3: Half performance comparing with single profile.
4: We have a workaround: Create multi-partition in single device,
   and btefs will treat them as multi device.

Instead of refuse -d dup, we have a better choise:
Give user a chance to select, and output a warning to notice above
problem.

Signed-off-by: Zhao Lei 
---
 mkfs.c  |  2 +-
 utils.c | 10 --
 utils.h |  2 +-
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index ecd6fbf..7fa7cfc 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1578,7 +1578,7 @@ int main(int ac, char **av)
}
}
ret = test_num_disk_vs_raid(metadata_profile, data_profile,
-   dev_cnt, mixed);
+   dev_cnt, mixed, ssd);
if (ret)
exit(1);
 
diff --git a/utils.c b/utils.c
index f1e3248..d81c2d9 100644
--- a/utils.c
+++ b/utils.c
@@ -2425,7 +2425,7 @@ static int group_profile_devs_min(u64 flag)
 }
 
 int test_num_disk_vs_raid(u64 metadata_profile, u64 data_profile,
-   u64 dev_cnt, int mixed)
+   u64 dev_cnt, int mixed, int ssd)
 {
u64 allowed = 0;
 
@@ -2466,11 +2466,9 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 
data_profile,
return 1;
}
 
-   if (!mixed && (data_profile & BTRFS_BLOCK_GROUP_DUP)) {
-   fprintf(stderr,
-   "ERROR: DUP for data is allowed only in mixed mode\n");
-   return 1;
-   }
+   warning_on(!mixed && (data_profile & BTRFS_BLOCK_GROUP_DUP) && ssd,
+  "DUP have no effect if your SSD have deduplication 
function");
+
return 0;
 }
 
diff --git a/utils.h b/utils.h
index 044ea15..b85f3fe 100644
--- a/utils.h
+++ b/utils.h
@@ -167,7 +167,7 @@ int test_dev_for_mkfs(char *file, int force_overwrite);
 int get_label_mounted(const char *mount_path, char *labelp);
 int get_label_unmounted(const char *dev, char *label);
 int test_num_disk_vs_raid(u64 metadata_profile, u64 data_profile,
-   u64 dev_cnt, int mixed);
+   u64 dev_cnt, int mixed, int ssd);
 int group_profile_max_safe_loss(u64 flags);
 int is_vol_small(char *file);
 int csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf,
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


behavior of BTRFS in relation to inodes when moving/copying files between filesystems

2015-10-13 Thread Martin Steigerwald
Hi!

With BTRFS to XFS/Ext4 the inode number of the target file stays the same in 
with both cp and mv case (/mnt/zeit is a freshly created XFS in this example):

merkaba:~> ls -li foo /mnt/zeit/moo
6609270  foo
 99  /mnt/zeit/moo
merkaba:~> cp foo /mnt/zeit/moo
merkaba:~> ls -li foo /mnt/zeit/moo
6609270 8 foo
 99  /mnt/zeit/moo
merkaba:~> cp -p foo /mnt/zeit/moo  
merkaba:~> ls -li foo /mnt/zeit/moo
6609270 foo
 99 /mnt/zeit/moo
merkaba:~> mv foo /mnt/zeit/moo
merkaba:~> ls -lid /mnt/zeit/moo
99 -rw-r--r-- 1 root root 6 Okt 13 12:28 /mnt/zeit/moo


With BTRFS as target filesystem however in the mv case I get a new inode:

merkaba:~> ls -li foo /home/moo
 6609289 -rw-r--r-- 1 root root 6 Okt 13 12:34 foo
16476276 -rw-r--r-- 1 root root 6 Okt 13 12:34 /home/moo
merkaba:~> cp foo /home/moo
merkaba:~> ls -li foo /home/moo
 6609289 -rw-r--r-- 1 root root 6 Okt 13 12:34 foo
16476276 -rw-r--r-- 1 root root 6 Okt 13 12:34 /home/moo
merkaba:~> cp -p foo /home/moo 
merkaba:~> ls -li foo /home/moo
 6609289 -rw-r--r-- 1 root root 6 Okt 13 12:34 foo
16476276 -rw-r--r-- 1 root root 6 Okt 13 12:34 /home/moo
merkaba:~> mv foo /home/moo
merkaba:~> ls -li /home/moo 
16476280 -rw-r--r-- 1 root root 6 Okt 13 12:34 /home/moo


Is this intentional and/or somehow related to the copy on write specifics of 
the filesystem?

I think even with COW it can just overwrite the existing file instead of 
removing the old one and creating a new one – but it wouldn´t give much of a 
benefit unless the target file is nocow.

(Also I thought only certain other utilities had supercow powers, but well 
BTRFS seems to have them as well :)

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs-progs: mkfs: Enable -d dup for single device

2015-10-13 Thread Zhao Lei
Hi, David Sterba

> -Original Message-
> From: David Sterba [mailto:dste...@suse.cz]
> Sent: Tuesday, October 13, 2015 8:36 PM
> To: Zhao Lei 
> Cc: dste...@suse.cz; linux-btrfs@vger.kernel.org; c...@fb.com
> Subject: Re: [PATCH] btrfs-progs: mkfs: Enable -d dup for single device
> 
> On Tue, Oct 13, 2015 at 07:40:30PM +0800, Zhao Lei wrote:
> > > What I remember from the comment is that "it's slightly offset that
> > > would lead to corruption".
> >
> > Before this patch, I had done git blame to search why the condition
> > was added, and hadn't found the exact reason.
> 
> found it: commit bc3f116fec194f1d7329b160c266fe16b9266a1e and it was not
> aobut data/dup but mixed bgs with sectorisze != nodesize:
> 
>  26 +   nodesize = btrfs_super_nodesize(disk_super);
>  27 +   leafsize = btrfs_super_leafsize(disk_super);
>  28 +   sectorsize = btrfs_super_sectorsize(disk_super);
>  29 +   stripesize = btrfs_super_stripesize(disk_super);
>  30 +
>  31 +   /*
>  32 +* mixed block groups end up with duplicate but slightly offset
>  33 +* extent buffers for the same range.  It leads to corruptions
>  34 +*/
>  35 +   if ((features & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS)
> &&
>  36 +   (sectorsize != leafsize)) {
>  37 +   printk(KERN_WARNING "btrfs: unequal
> leaf/node/sector sizes "
>  38 +   "are not allowed for mixed block
> groups on %s\n",
>  39 +   sb->s_id);
>  40 +   goto fail_alloc;
>  41 +   }
>  42 +
> 
Thanks for this information, I'll investigate is similar problem in non-mixed
with dup.

> > I will queue xfstests(btrfs/generic) at this profile with all mount
> > option for multi-times, to check is something wrong with this.
> 
> Thanks. We need to cover more: the balance conversion forbids data/dup
> profile, I'm not sure if scrub handles that properly, and the ususal suspects 
> in
> the rescue tools (fsck, restore, chunk-recover).
>
Agree, a new profile may be have potential problem because existing code
haven't check the support status.

IMHO, it is still necessary except we can prove this function should not exist.
But we'll need to do more works to confirm above potential problem.

Thanks
Zhaolei


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: mkfs: Enable -d dup for single device

2015-10-13 Thread David Sterba
On Tue, Oct 13, 2015 at 07:40:30PM +0800, Zhao Lei wrote:
> > What I remember from the comment is that "it's slightly offset that would 
> > lead
> > to corruption".
> 
> Before this patch, I had done git blame to search why the condition was added,
> and hadn't found the exact reason.

found it: commit bc3f116fec194f1d7329b160c266fe16b9266a1e and it was not
aobut data/dup but mixed bgs with sectorisze != nodesize:

 26 +   nodesize = btrfs_super_nodesize(disk_super);
 27 +   leafsize = btrfs_super_leafsize(disk_super);
 28 +   sectorsize = btrfs_super_sectorsize(disk_super);
 29 +   stripesize = btrfs_super_stripesize(disk_super);
 30 +
 31 +   /*
 32 +* mixed block groups end up with duplicate but slightly offset
 33 +* extent buffers for the same range.  It leads to corruptions
 34 +*/
 35 +   if ((features & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) &&
 36 +   (sectorsize != leafsize)) {
 37 +   printk(KERN_WARNING "btrfs: unequal leaf/node/sector sizes "
 38 +   "are not allowed for mixed block groups on 
%s\n",
 39 +   sb->s_id);
 40 +   goto fail_alloc;
 41 +   }
 42 +

> I will queue xfstests(btrfs/generic) at this profile with all mount option
> for multi-times, to check is something wrong with this.

Thanks. We need to cover more: the balance conversion forbids data/dup profile,
I'm not sure if scrub handles that properly, and the ususal suspects in the
rescue tools (fsck, restore, chunk-recover).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/3] btrfs-progs: Introduce device delete by devid

2015-10-13 Thread David Sterba
On Sat, Oct 10, 2015 at 10:30:55PM +0800, Anand Jain wrote:
> This is the btrfs-progs part of the kernel patch
>Btrfs: Introduce device delete by devid

Thanks, now in next/delete-by-id-v3, I made some changes so please have
a look. Notably, I've dropped the BTRFS_VOL_ARG_V2_FLAGS mask, this
belongs to kernel only (unless you need it userspace of course).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: mkfs: Enable -d dup for single device

2015-10-13 Thread David Sterba
On Tue, Oct 13, 2015 at 06:29:41PM +0800, Zhao Lei wrote:
> Current code don't support dup profile in single device, except it
> is in mixed mode, because following reason:
> 1: In some ssd with deduplication function, it have no effect.
> 2: For a physical device, it the entire disk broken, -d dup can
>not help.
> 3: Half performance comparing with single profile.
> 4: We have a workaround: Create multi-partition in single device,
>and btefs will treat them as multi device.

While the above makes sense is true, I'm not sure that DUP was disabled
for these reasons. I'm sure that I read a comment from Chris that dup
for data is intentionally disabled because this would lead to
corruption, the code for DUP for metadata would not work for data. And I
can't find the comment, but the doubt is there. So unless I find it or
get otherwise convicend that it's ok, I won't merge the patch. I hope
you understand that.

What I remember from the comment is that "it's slightly offset that would
lead to corruption".
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs-progs: mkfs: Enable -d dup for single device

2015-10-13 Thread Zhao Lei
Hi, David Sterba

Thanks for review.

> -Original Message-
> From: David Sterba [mailto:dste...@suse.cz]
> Sent: Tuesday, October 13, 2015 7:29 PM
> To: Zhao Lei 
> Cc: linux-btrfs@vger.kernel.org; c...@fb.com
> Subject: Re: [PATCH] btrfs-progs: mkfs: Enable -d dup for single device
> 
> On Tue, Oct 13, 2015 at 06:29:41PM +0800, Zhao Lei wrote:
> > Current code don't support dup profile in single device, except it is
> > in mixed mode, because following reason:
> > 1: In some ssd with deduplication function, it have no effect.
> > 2: For a physical device, it the entire disk broken, -d dup can
> >not help.
> > 3: Half performance comparing with single profile.
> > 4: We have a workaround: Create multi-partition in single device,
> >and btefs will treat them as multi device.
> 
> While the above makes sense is true, I'm not sure that DUP was disabled for
> these reasons. I'm sure that I read a comment from Chris that dup for data is
> intentionally disabled because this would lead to corruption, the code for DUP
> for metadata would not work for data. And I can't find the comment, but the
> doubt is there. So unless I find it or get otherwise convicend that it's ok, 
> I won't
> merge the patch. I hope you understand that.
> 
> What I remember from the comment is that "it's slightly offset that would lead
> to corruption".

Before this patch, I had done git blame to search why the condition was added,
and hadn't found the exact reason.

I will queue xfstests(btrfs/generic) at this profile with all mount option
for multi-times, to check is something wrong with this.

Thanks
Zhaolei


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] btrfs-progs: mkfs: Fix different mixed type by argument sequence

2015-10-13 Thread Zhao Lei
Given a 200G vdd1 and 1G vdd2:

In current code:
 mkfs.btrfs -f /dev/vdd1 /dev/vdd2
 and
 mkfs.btrfs -f /dev/vdd2 /dev/vdd1
 will create different "mixed" type.

See [PATCH 2/3] for detail.

This patchset also include some small fixs.

Zhao Lei (3):
  btrfs-progs: mkfs: Remove saved_optind in mkfs.btrfs
  btrfs-progs: mkfs: Fix different mixed type by argument sequence
  btrfs-progs: mkfs: Fix inaccurate mixed information

 mkfs.c | 43 ++-
 1 file changed, 22 insertions(+), 21 deletions(-)

-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: mkfs: Remove saved_optind in mkfs.btrfs

2015-10-13 Thread Zhao Lei
No need to use complex logic for iter devs in mkfs.c,
as backup optind, increase/decrease optind and reset dev_cnt.

A simple for() loop is enough for above request.

Signed-off-by: Zhao Lei 
---
 mkfs.c | 25 +++--
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index 7fa7cfc..cdae94d 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1354,7 +1354,6 @@ int main(int ac, char **av)
u64 size_of_data = 0;
u64 source_dir_size = 0;
int dev_cnt = 0;
-   int saved_optind;
char fs_uuid[BTRFS_UUID_UNPARSED_SIZE] = { 0 };
u64 features = BTRFS_MKFS_DEFAULT_FEATURES;
struct mkfs_allocation allocation = { 0 };
@@ -1467,7 +1466,6 @@ int main(int ac, char **av)
}
 
sectorsize = max(sectorsize, (u32)sysconf(_SC_PAGESIZE));
-   saved_optind = optind;
dev_cnt = ac - optind;
if (dev_cnt == 0)
print_usage(1);
@@ -1490,18 +1488,15 @@ int main(int ac, char **av)
exit(1);
}
}
-   
-   while (dev_cnt-- > 0) {
-   file = av[optind++];
+
+   for (i = optind; i < optind + dev_cnt; i++) {
+   file = av[i];
if (is_block_device(file) == 1)
if (test_dev_for_mkfs(file, force_overwrite))
exit(1);
}
 
-   optind = saved_optind;
-   dev_cnt = ac - optind;
-
-   file = av[optind++];
+   file = av[optind];
ssd = is_ssd(file);
 
if (is_vol_small(file) || mixed) {
@@ -1557,7 +1552,7 @@ int main(int ac, char **av)
btrfs_min_dev_size(nodesize));
exit(1);
}
-   for (i = saved_optind; i < saved_optind + dev_cnt; i++) {
+   for (i = optind; i < optind + dev_cnt; i++) {
char *path;
 
path = av[i];
@@ -1588,8 +1583,6 @@ int main(int ac, char **av)
printf("See %s for more information.\n\n", PACKAGE_URL);
}
 
-   dev_cnt--;
-
if (!source_dir_set) {
/*
 * open without O_EXCL so that the problem should not
@@ -1720,13 +1713,10 @@ int main(int ac, char **av)
if (is_block_device(file) == 1)
btrfs_register_one_device(file);
 
-   if (dev_cnt == 0)
-   goto raid_groups;
-
-   while (dev_cnt-- > 0) {
+   for (i = optind + 1; i < optind + dev_cnt; i++) {
int old_mixed = mixed;
 
-   file = av[optind++];
+   file = av[i];
 
/*
 * open without O_EXCL so that the problem should not
@@ -1771,7 +1761,6 @@ int main(int ac, char **av)
btrfs_register_one_device(file);
}
 
-raid_groups:
if (!source_dir_set) {
ret = create_raid_groups(trans, root, data_profile,
 metadata_profile, mixed, );
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: mkfs: Fix different mixed type by argument sequence

2015-10-13 Thread Zhao Lei
Given a 200G vdd1 and 1G vdd2:

In current code:
  # mkfs.btrfs -f /dev/vdd1 /dev/vdd2
  SMALL VOLUME: forcing mixed metadata/data groups
  btrfs-progs v4.1.2
  See http://btrfs.wiki.kernel.org for more information.

  Label:  (null)
  UUID:   7aa6fc75-ce23-4033-9d47-fd046afa2992
  Node size:  4096
  Sector size:4096
  Filesystem size:1.20GiB
  Block group profiles:
Data+Metadata:single8.00MiB
System:   single4.00MiB
  SSD detected:   no
  Incompat features:  mixed-bg, extref, skinny-metadata
  Number of devices:  2
  Devices:
 IDSIZE  PATH
  1   200.29MiB  /dev/vdd1
  2 1.00GiB  /dev/vdd2
  #
  # mkfs.btrfs -f /dev/vdd2 /dev/vdd1
  btrfs-progs v4.1.2
  See http://btrfs.wiki.kernel.org for more information.

  Label:  (null)
  UUID:   ac659809-66c1-427d-934d-bd4c209c91a8
  Node size:  16384
  Sector size:4096
  Filesystem size:1.20GiB
  Block group profiles:
Data: RAID0   136.00MiB
Metadata: RAID169.38MiB
System:   RAID112.00MiB
  SSD detected:   no
  Incompat features:  extref, skinny-metadata
  Number of devices:  2
  Devices:
 IDSIZE  PATH
  1 1.00GiB  /dev/vdd2
  2   200.29MiB  /dev/vdd1

We can see:
 mkfs.btrfs -f /dev/vdd1 /dev/vdd2
 and
 mkfs.btrfs -f /dev/vdd2 /dev/vdd1
 have different "mixed" type.

Reason:
 Current code determine "is to use mixed-type" only by
 first device.

Fix:
 Use mixed-type only if all device are small.

Signed-off-by: Zhao Lei 
---
 mkfs.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index cdae94d..29cab13 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1358,6 +1358,7 @@ int main(int ac, char **av)
u64 features = BTRFS_MKFS_DEFAULT_FEATURES;
struct mkfs_allocation allocation = { 0 };
struct btrfs_mkfs_config mkfs_cfg;
+   int large_device_cnt = 0;
 
while(1) {
int c;
@@ -1494,17 +1495,25 @@ int main(int ac, char **av)
if (is_block_device(file) == 1)
if (test_dev_for_mkfs(file, force_overwrite))
exit(1);
+   ret = is_vol_small(file);
+   if (ret < 0) {
+   error("Failed to check size for '%s': %s",
+ file, strerror(-ret));
+   exit(1);
+   }
+   large_device_cnt += (!ret);
+   ret = 0;
}
 
-   file = av[optind];
-   ssd = is_ssd(file);
-
-   if (is_vol_small(file) || mixed) {
+   if (!large_device_cnt || mixed) {
if (verbose)
printf("SMALL VOLUME: forcing mixed metadata/data 
groups\n");
mixed = 1;
}
 
+   file = av[optind];
+   ssd = is_ssd(file);
+
/*
* Set default profiles according to number of added devices.
* For mixed groups defaults are single/single.
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: mkfs: Fix inaccurate mixed information

2015-10-13 Thread Zhao Lei
In current code, with a "BIG VOLUME" /dev/vdd2:
 # ./mkfs.btrfs -f -M /dev/vdd2
 SMALL VOLUME: forcing mixed metadata/data groups
 ...

This patch changed above output to:
 Using mixed metadata/data groups

And the "SMALL VOLUME" output only when we exactly using
SMALL VOLUME.

Signed-off-by: Zhao Lei 
---
 mkfs.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mkfs.c b/mkfs.c
index 29cab13..0064a78 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1505,7 +1505,10 @@ int main(int ac, char **av)
ret = 0;
}
 
-   if (!large_device_cnt || mixed) {
+   if (mixed) {
+   if (verbose)
+   printf("Using mixed metadata/data groups\n");
+   } else if (!large_device_cnt) {
if (verbose)
printf("SMALL VOLUME: forcing mixed metadata/data 
groups\n");
mixed = 1;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix file corruption and data loss after cloning inline extents

2015-10-13 Thread fdmanana
From: Filipe Manana 

Currently the clone ioctl allows to clone an inline extent from one file
to another that already has other (non-inlined) extents. This is a problem
because btrfs is not designed to deal with files having inline and regular
extents, if a file has an inline extent then it must be the only extent
in the file and must start at file offset 0. Having a file with an inline
extent followed by regular extents results in EIO errors when doing reads
or writes against the first 4K of the file.

Also, the clone ioctl allows one to lose data if the source file consists
of a single inline extent, with a size of N bytes, and the destination
file consists of a single inline extent with a size of M bytes, where we
have M > N. In this case the clone operation removes the inline extent
from the destination file and then copies the inline extent from the
source file into the destination file - we lose the M - N bytes from the
destination file, a read operation will get the value 0x00 for any bytes
in the the range [N, M] (the destination inode's i_size remained as M,
that's why we can read past N bytes).

So fix this by not allowing such destructive operations to happen and
return errno EOPNOTSUPP to user space.

Currently the fstest btrfs/035 tests the data loss case but it totally
ignores this - i.e. expects the operation to succeed and does not check
the we got data loss.

The following test case for fstests exercises all these cases that result
in file corruption and data loss:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1  # failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
  rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter

  # real QA test starts here
  _need_to_be_root
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _require_cloner
  _require_btrfs_fs_feature "no_holes"
  _require_btrfs_mkfs_feature "no-holes"

  rm -f $seqres.full

  test_cloning_inline_extents()
  {
  local mkfs_opts=$1
  local mount_opts=$2

  _scratch_mkfs $mkfs_opts >>$seqres.full 2>&1
  _scratch_mount $mount_opts

  # File bar, the source for all the following clone operations, consists
  # of a # single inline extent (50 bytes).
  $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 50" $SCRATCH_MNT/bar \
  | _filter_xfs_io

  # Test cloning into a file with an extent (non-inlined) where the
  # destination offset overlaps that extent. It should not be possible to
  # clone the inline extent from file bar into this file.
  $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 16K" $SCRATCH_MNT/foo \
  | _filter_xfs_io
  $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo

  # Doing IO against any range in the first 4K of the file should work.
  # Due to a past clone ioctl bug which allowed cloning the inline extent,
  # these operations resulted in EIO errors.
  echo "File foo data after clone operation:"
  # All bytes should have the value 0xaa (clone operation failed and did
  # not modify our file).
  od -t x1 $SCRATCH_MNT/foo
  $XFS_IO_PROG -c "pwrite -S 0xcc 0 100" $SCRATCH_MNT/foo | _filter_xfs_io

  # Test cloning the inline extent against a file which has a hole in its
  # first 4K followed by a non-inlined extent. It should not be possible
  # as well to clone the inline extent from file bar into this file.
  $XFS_IO_PROG -f -c "pwrite -S 0xdd 4K 12K" $SCRATCH_MNT/foo2 \
  | _filter_xfs_io
  $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo2

  # Doing IO against any range in the first 4K of the file should work.
  # Due to a past clone ioctl bug which allowed cloning the inline extent,
  # these operations resulted in EIO errors.
  echo "File foo2 data after clone operation:"
  # All bytes should have the value 0x00 (clone operation failed and did
  # not modify our file).
  od -t x1 $SCRATCH_MNT/foo2
  $XFS_IO_PROG -c "pwrite -S 0xee 0 90" $SCRATCH_MNT/foo2 | _filter_xfs_io

  # Test cloning the inline extent against a file which has a size of zero
  # but has a prealloc extent. It should not be possible as well to clone
  # the inline extent from file bar into this file.
  $XFS_IO_PROG -f -c "falloc -k 0 1M" $SCRATCH_MNT/foo3 | _filter_xfs_io
  $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo3

  # Doing IO against any range in the first 4K of the file should work.
  # Due to a past clone ioctl bug which allowed cloning the inline extent,
  # these operations resulted in EIO errors.
  echo "First 50 bytes of foo3 after clone operation:"
  # Should not be able to read any bytes, file has 0 bytes i_size (the
  # clone operation failed and did not modify our file).
  od -t x1 $SCRATCH_MNT/foo3
  $XFS_IO_PROG -c 

[PATCH 1/2] fstests: btrfs test for cloning of inline extents

2015-10-13 Thread fdmanana
From: Filipe Manana 

Test several cases of cloning inline extents that used to lead to file
corruption or data loss.

These file corruption and data loss cases are fixed by the linux kernel
patch titled:

  "Btrfs: fix file corruption and data loss after cloning inline extents"

Signed-off-by: Filipe Manana 
---
 tests/btrfs/110 | 199 
 tests/btrfs/110.out | 257 
 tests/btrfs/group   |   1 +
 3 files changed, 457 insertions(+)
 create mode 100755 tests/btrfs/110
 create mode 100644 tests/btrfs/110.out

diff --git a/tests/btrfs/110 b/tests/btrfs/110
new file mode 100755
index 000..327c8c0
--- /dev/null
+++ b/tests/btrfs/110
@@ -0,0 +1,199 @@
+#! /bin/bash
+# FSQA Test No. 110
+#
+# Test several cases of cloning inline extents that used to lead to file
+# corruption or data loss.
+#
+#---
+#
+# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_cloner
+_require_btrfs_fs_feature "no_holes"
+_require_btrfs_mkfs_feature "no-holes"
+
+rm -f $seqres.full
+
+test_cloning_inline_extents()
+{
+   local mkfs_opts=$1
+   local mount_opts=$2
+
+   _scratch_mkfs $mkfs_opts >>$seqres.full 2>&1
+   _scratch_mount $mount_opts
+
+   # File bar, the source for all the following clone operations, consists
+# of a # single inline extent (50 bytes).
+   $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 50" $SCRATCH_MNT/bar \
+   | _filter_xfs_io
+
+   # Test cloning into a file with an extent (non-inlined) where the
+   # destination offset overlaps that extent. It should not be possible to
+   # clone the inline extent from file bar into this file.
+   $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 16K" $SCRATCH_MNT/foo \
+   | _filter_xfs_io
+   $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo
+
+   # Doing IO against any range in the first 4K of the file should work.
+   # Due to a past clone ioctl bug which allowed cloning the inline extent,
+   # these operations resulted in EIO errors.
+   echo "File foo data after clone operation:"
+   # All bytes should have the value 0xaa (clone operation failed and did
+   # not modify our file).
+   od -t x1 $SCRATCH_MNT/foo
+   $XFS_IO_PROG -c "pwrite -S 0xcc 0 100" $SCRATCH_MNT/foo | _filter_xfs_io
+
+   # Test cloning the inline extent against a file which has a hole in its
+   # first 4K followed by a non-inlined extent. It should not be possible
+   # as well to clone the inline extent from file bar into this file.
+   $XFS_IO_PROG -f -c "pwrite -S 0xdd 4K 12K" $SCRATCH_MNT/foo2 \
+   | _filter_xfs_io
+   $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo2
+
+   # Doing IO against any range in the first 4K of the file should work.
+   # Due to a past clone ioctl bug which allowed cloning the inline extent,
+   # these operations resulted in EIO errors.
+   echo "File foo2 data after clone operation:"
+   # All bytes should have the value 0x00 (clone operation failed and did
+   # not modify our file).
+   od -t x1 $SCRATCH_MNT/foo2
+   $XFS_IO_PROG -c "pwrite -S 0xee 0 90" $SCRATCH_MNT/foo2 | _filter_xfs_io
+
+   # Test cloning the inline extent against a file which has a size of zero
+   # but has a prealloc extent. It should not be possible as well to clone
+   # the inline extent from file bar into this file.
+   $XFS_IO_PROG -f -c "falloc -k 0 1M" $SCRATCH_MNT/foo3 | _filter_xfs_io
+   $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo3
+
+   # Doing IO against any range in the 

[PATCH 2/2] fstests: update btrfs/035 to check for data loss

2015-10-13 Thread fdmanana
From: Filipe Manana 

The test currently verifies that cloning one file with an inline extent
with a size of 10 bytes into a file with an inline extent that has a size
of 20 bytes succeeds. But this results in data loss, because the btrfs
clone operation drops the 20 bytes inline extent from the destination
inode and then copies the 10 bytes inline extent from the source file
into the destination file, resulting in data loss of the last 10 bytes
of data that the destination file had.

Fixing btrfs to correctly operate for this case (not resulting in data
loss) is actually a lot of work and brings a lot of complexity, specially
considering that any of the inline extents can be compressed. For the
moment there's a fix to make the clone operation return the errno
EOPNOTSUPP and not touch any of the inodes. This is the same approach
we do for other cases involving operation against inline extents, so
this just adds one more case that should have never been allowed.
Cloning inline extents is a rare operation and pointless, since it
involves copying them and not doing any actual deduplication or saving
space.

The btrfs patch for the linux kernel that prevents this data loss,
and fixes some file corruption cases, is titled:

  "Btrfs: fix file corruption and data loss after cloning inline extents"

Signed-off-by: Filipe Manana 
---
 tests/btrfs/035 | 14 ++
 tests/btrfs/035.out |  9 +
 2 files changed, 23 insertions(+)

diff --git a/tests/btrfs/035 b/tests/btrfs/035
index 35ddfce..0f8a70d 100755
--- a/tests/btrfs/035
+++ b/tests/btrfs/035
@@ -67,9 +67,23 @@ echo "attempting ioctl (src.clone1 src)"
 $CLONER_PROG -s 0 -d 0 -l ${snap_src_sz} \
$SCRATCH_MNT/src.clone1 $SCRATCH_MNT/src
 
+# The clone operation should have failed. If it did not it meant we had data
+# loss, because file "src.clone1" has an inline extent which is 10 bytes long
+# while file "src" has an inline extent which is 20 bytes long. The clone
+# operation would remove the inline extent of "src" and then copy the inline
+# extent from "src.clone1" into "src", which means we would lose the last 10
+# bytes of data from "src" (on read we would get 0x00 as the value for any
+# of those 10 bytes, because the file's size remains as 20 bytes).
+echo "File src data after attempt to clone from src.clone1 into src:"
+od -t x1 $SCRATCH_MNT/src
+
 snap_src_sz=`ls -lah $SCRATCH_MNT/src.clone2 | awk '{print $5}'`
 echo "attempting ioctl (src.clone2 src)"
 $CLONER_PROG -s 0 -d 0 -l ${snap_src_sz} \
$SCRATCH_MNT/src.clone2 $SCRATCH_MNT/src
 
+# The clone operation should have succeeded.
+echo "File src data after attempt to clone from src.clone2 into src:"
+od -t x1 $SCRATCH_MNT/src
+
 status=0 ; exit
diff --git a/tests/btrfs/035.out b/tests/btrfs/035.out
index f86cadf..3ea7d77 100644
--- a/tests/btrfs/035.out
+++ b/tests/btrfs/035.out
@@ -1,3 +1,12 @@
 QA output created by 035
 attempting ioctl (src.clone1 src)
+clone failed: Operation not supported
+File src data after attempt to clone from src.clone1 into src:
+000 62 62 62 62 62 62 62 62 62 62 63 63 63 63 63 63
+020 63 63 63 63
+024
 attempting ioctl (src.clone2 src)
+File src data after attempt to clone from src.clone2 into src:
+000 62 62 62 62 62 62 62 62 62 62 63 63 63 63 63 63
+020 63 63 63 63
+024
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel error in extent-tree, forced readonly

2015-10-13 Thread Axel Burri
One of my backup disks hit a btrfs bug yesterday, leaving me with a
forced readonly filesystem (see kernel trace below). This error is
reproducible, and happens on first access after mounting. This disk
receives snapshots (incrementally "ssh btrfs send -p | btrfs receive")
from several hosts on a daily schedule, and deletes old ones.

- kernel-4.2.3 (no quota support, no acl support)
- btrfs-progs-4.2.2
- mount: noatime,autodefrag,compress=zlib,subvolid=0

My short analysis reveals that the backref 1044608417792 points to
3231, which is most probably the parent subvolume used by a "btrfs
receive" operation, which was then deleted directly after receive
success (using --commit-after).

I suspect the error could have been triggered by a unmount operation
run directly after serveral (~10) receive and (~10) delete operations.
I enabled this unmounting feature in my cron job two days ago, maybe
this wasn't a good idea after all? Before that, the filesystem was
always mounted and I'm doing backups like this for about one year
without any problems.

Can someone please give me some hints on how I can get rid of the
broken backrefs? How do I find out which file is causing the trouble,
and which subvolume I need to delete? And how can I do this on a
readonly fs?

If you are interested, I could leave this disk untouched for some days
and help debugging.


# btrfs fi df /mnt/btr_backup
Data, single: total=487.29GiB, used=481.65GiB
System, DUP: total=32.00MiB, used=40.00KiB
Metadata, DUP: total=18.50GiB, used=7.33GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


Kernel trace:

[ cut here ]
WARNING: CPU: 0 PID: 17264 at fs/btrfs/extent-tree.c:6255
__btrfs_free_extent.isra.72+0xab0/0xce0()
Modules linked in: isofs sr_mod cdrom usblp f2fs usb_storage bridge stp
llc tun cpufreq_ondemand vfat fat mmc_block snd_hda_codec_hdmi
nvidia(PO) x86_pkg_temp_thermal iwldvm btusb kvm_intel btrtl btbcm
btintel dell_laptop snd_hda_codec_idt bluetooth kvm dcdbas
snd_hda_codec_generic psmouse dell_smm_hwmon iwlwifi sdhci_pci sdhci
mmc_core snd_hda_intel snd_hda_codec thermal snd_hwdep snd_hda_core
snd_pcm parport_pc snd_timer parport xhci_pci snd xhci_hcd acpi_cpufreq
soundcore dell_rbtn processor battery dell_smo8800 ac
CPU: 0 PID: 17264 Comm: btrfs-cleaner Tainted: P   O
4.2.3-gentoo #1
Hardware name: Dell Inc. Latitude E6430/0H3MT5, BIOS A16 08/19/2014
  81778e68 81618a50 
 810691e7 88021fad46c0 00f337835000 fffe
 1000 880138044000 811dcfc0 
Call Trace:
 [] ? dump_stack+0x47/0x67
 [] ? warn_slowpath_common+0x77/0xb0
 [] ? __btrfs_free_extent.isra.72+0xab0/0xce0
 [] ? __btrfs_run_delayed_refs+0x7a0/0xf80
 [] ? __percpu_counter_add+0x52/0x70
 [] ? btrfs_free_tree_block+0xe0/0x1e0
 [] ? btrfs_run_delayed_refs.part.78+0x6a/0x250
 [] ? walk_up_tree+0xe0/0x1d0
 [] ? btrfs_should_end_transaction+0x3e/0x60
 [] ? btrfs_drop_snapshot+0x41c/0x810
 [] ? btrfs_clean_one_deleted_snapshot+0x9e/0xd0
 [] ? cleaner_kthread+0x141/0x1d0
 [] ? btrfs_destroy_pinned_extent+0xa0/0xa0
 [] ? kthread+0xbc/0xe0
 [] ? kthread_create_on_node+0x170/0x170
 [] ? ret_from_fork+0x3f/0x70
 [] ? kthread_create_on_node+0x170/0x170
---[ end trace 937617c32053608b ]---
BTRFS info (device sdc1): leaf 1044023648256 total ptrs 55 free space 526
\x09item 0 key (1044608286720 168 4096) itemoff 3944 itemsize 51
\x09\x09extent refs 1 gen 10950 flags 2
\x09\x09tree block key (18446744073709551606 128 190953070592) level 0
\x09\x09tree block backref root 7
\x09item 1 key (1044608290816 168 4096) itemoff 3893 itemsize 51
\x09\x09extent refs 1 gen 11983 flags 258
\x09\x09tree block key (3259 12 3215) level 0
\x09\x09tree block backref root 3242
\x09item 2 key (1044608294912 168 4096) itemoff 3842 itemsize 51
\x09\x09extent refs 1 gen 10950 flags 258
\x09\x09tree block key (82282 108 0) level 0
\x09\x09shared block backref parent 1045418885120
\x09item 3 key (1044608299008 168 4096) itemoff 3782 itemsize 60
\x09\x09extent refs 2 gen 11983 flags 258
\x09\x09tree block key (770034 12 3265) level 0
\x09\x09tree block backref root 3242
\x09\x09tree block backref root 3231
\x09item 4 key (1044608303104 168 4096) itemoff 3731 itemsize 51
\x09\x09extent refs 1 gen 10950 flags 258
\x09\x09tree block key (82306 1 0) level 0
\x09\x09shared block backref parent 1045418885120
\x09item 5 key (1044608311296 168 4096) itemoff 3680 itemsize 51
\x09\x09extent refs 1 gen 10950 flags 258
\x09\x09tree block key (446163 1 0) level 0
\x09\x09shared block backref parent 1059954139136
\x09item 6 key (1044608315392 168 4096) itemoff 3629 itemsize 51
\x09\x09extent refs 1 gen 10950 flags 2
\x09\x09tree block key (18446744073709551606 128 190953070592) level 0
\x09\x09tree block backref root 7
\x09item 7 key (1044608319488 168 4096) itemoff 3578 itemsize 51
\x09\x09extent refs 1 gen 11983 flags 258
\x09\x09tree block key (3885 108 0) level 0
\x09\x09tree block 

[PATCH 3/7] btrfs-progs: add helpers for parsing 32bit ranges

2015-10-13 Thread David Sterba
Signed-off-by: David Sterba 
---
 cmds-balance.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/cmds-balance.c b/cmds-balance.c
index 62bee3cc78b6..72714b23b45c 100644
--- a/cmds-balance.c
+++ b/cmds-balance.c
@@ -159,6 +159,37 @@ static int parse_range_strict(const char *range, u64 
*start, u64 *end)
return 1;
 }
 
+/*
+ * Convert 64bit range to 32bit with boundary checkso
+ */
+static int range_to_u32(u64 start, u64 end, u32 *start32, u32 *end32)
+{
+   if (start > (u32)-1)
+   return 1;
+
+   if (end != (u64)-1 && end > (u32)-1)
+   return 1;
+
+   *start32 = (u32)start;
+   *end32 = (u32)end;
+
+   return 0;
+}
+
+static int parse_range_u32(const char *range, u32 *start, u32 *end)
+{
+   u64 tmp_start;
+   u64 tmp_end;
+
+   if (parse_range(range, _start, _end))
+   return 1;
+
+   if (range_to_u32(tmp_start, tmp_end, start, end))
+   return 1;
+
+   return 0;
+}
+
 static int parse_filters(char *filters, struct btrfs_balance_args *args)
 {
char *this_char;
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] btrfs-progs: extend balance args to take min/max limit filter

2015-10-13 Thread David Sterba
Add the overlapping limit and [limit_min, limit_max] members to the
balance args. The min/max values are interpreted iff the corresponding
flag BTRFS_BALANCE_ARGS_LIMITS is set.

Note that the values are only 32bit, but this should be enough for the
foreseeable future.

Signed-off-by: David Sterba 
---
 Documentation/btrfs-balance.asciidoc |  4 
 cmds-balance.c   |  4 
 ioctl.h  | 13 -
 volumes.h|  1 +
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/btrfs-balance.asciidoc 
b/Documentation/btrfs-balance.asciidoc
index 6d2fd0c36086..61517461ca90 100644
--- a/Documentation/btrfs-balance.asciidoc
+++ b/Documentation/btrfs-balance.asciidoc
@@ -109,6 +109,10 @@ parameters.
 Process only given number of chunks, after all filters apply. This can be used
 to specifically target a chunk in connection with other filters (drange,
 vrange) or just simply limit the amount of work done by a single balance run.
++
+The argument may be a single value or a range. The single value *N* means *at
+most N chunks*, equivalent to *..N* range syntax. Kernels prior to 4.4 accept
+only the single value format.
 
 *soft*::
 Takes no parameters. Only has meaning when converting between profiles.
diff --git a/cmds-balance.c b/cmds-balance.c
index dba6613b1540..1eadba417abc 100644
--- a/cmds-balance.c
+++ b/cmds-balance.c
@@ -343,6 +343,10 @@ static void dump_balance_args(struct btrfs_balance_args 
*args)
   (unsigned long long)args->vend);
if (args->flags & BTRFS_BALANCE_ARGS_LIMIT)
printf(", limit=%llu", (unsigned long long)args->limit);
+   if (args->flags & BTRFS_BALANCE_ARGS_LIMITS) {
+   printf(", limit=");
+   print_range_u32(args->limit_min, args->limit_max);
+   }
 
printf("\n");
 }
diff --git a/ioctl.h b/ioctl.h
index dff015a52b43..ff7a1a0610a1 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -227,7 +227,18 @@ struct btrfs_balance_args {
 
__u64 flags;
 
-   __u64 limit;/* limit number of processed chunks */
+   /*
+* BTRFS_BALANCE_ARGS_LIMIT with value 'limit'
+* BTRFS_BALANCE_ARGS_LIMITS - the extend version can use minimum and
+* maximum
+*/
+   union {
+   __u64 limit;/* limit number of processed chunks */
+   struct {
+   __u32 limit_min;
+   __u32 limit_max;
+   };
+   };
__u64 unused[7];
 } __attribute__ ((__packed__));
 
diff --git a/volumes.h b/volumes.h
index 4ecb99314a0c..cb6f5752cdda 100644
--- a/volumes.h
+++ b/volumes.h
@@ -136,6 +136,7 @@ struct map_lookup {
 #define BTRFS_BALANCE_ARGS_DRANGE  (1ULL << 3)
 #define BTRFS_BALANCE_ARGS_VRANGE  (1ULL << 4)
 #define BTRFS_BALANCE_ARGS_LIMIT   (1ULL << 5)
+#define BTRFS_BALANCE_ARGS_LIMITS  (1ULL << 6)
 
 /*
  * Profile changing flags.  When SOFT is set we won't relocate chunk if
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] btrfs-progs: add helpers to print ranges

2015-10-13 Thread David Sterba
Signed-off-by: David Sterba 
---
 cmds-balance.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/cmds-balance.c b/cmds-balance.c
index 72714b23b45c..dba6613b1540 100644
--- a/cmds-balance.c
+++ b/cmds-balance.c
@@ -190,6 +190,25 @@ static int parse_range_u32(const char *range, u32 *start, 
u32 *end)
return 0;
 }
 
+__attribute__ ((unused))
+static void print_range(u64 start, u64 end)
+{
+   if (start)
+   printf("%llu", (unsigned long long)start);
+   printf("..");
+   if (end != (u64)-1)
+   printf("%llu", (unsigned long long)end);
+}
+
+static void print_range_u32(u32 start, u32 end)
+{
+   if (start)
+   printf("%u", start);
+   printf("..");
+   if (end != (u32)-1)
+   printf("%u", end);
+}
+
 static int parse_filters(char *filters, struct btrfs_balance_args *args)
 {
char *this_char;
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] btrfs-progs: balance: add stripes filter

2015-10-13 Thread David Sterba
From: Gabríel Arthúr Pétursson 

Add new balance filter 'stripes=' to process only chunks that are
spread accross given number of chunks.

The range must be specified with both values, but they can be the same
to denote exact number of stripes.

Signed-off-by: Gabríel Arthúr Pétursson 
[ reworked a bit to use the range helpers, dropped the single value
  for stripes ]
Signed-off-by: David Sterba 
---
 Documentation/btrfs-balance.asciidoc |  4 
 cmds-balance.c   | 17 +
 ioctl.h  |  4 +++-
 volumes.h|  1 +
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/Documentation/btrfs-balance.asciidoc 
b/Documentation/btrfs-balance.asciidoc
index 61517461ca90..6bb9fffdf188 100644
--- a/Documentation/btrfs-balance.asciidoc
+++ b/Documentation/btrfs-balance.asciidoc
@@ -114,6 +114,10 @@ The argument may be a single value or a range. The single 
value *N* means *at
 most N chunks*, equivalent to *..N* range syntax. Kernels prior to 4.4 accept
 only the single value format.
 
+*stripes*::
+Balances only block groups which have the given number of stripes. The
+parameter is a range specified as .
+
 *soft*::
 Takes no parameters. Only has meaning when converting between profiles.
 When doing convert from one profile to another and soft mode is on,
diff --git a/cmds-balance.c b/cmds-balance.c
index 7aaf33d03630..a958e584eeb5 100644
--- a/cmds-balance.c
+++ b/cmds-balance.c
@@ -319,6 +319,19 @@ static int parse_filters(char *filters, struct 
btrfs_balance_args *args)
args->flags &= ~BTRFS_BALANCE_ARGS_LIMITS;
args->flags |= BTRFS_BALANCE_ARGS_LIMIT;
}
+   args->flags |= BTRFS_BALANCE_ARGS_LIMIT;
+   } else if (!strcmp(this_char, "stripes")) {
+   if (!value || !*value) {
+   fprintf(stderr,
+   "the stripes filter requires an 
argument\n");
+   return 1;
+   }
+   if (parse_range_u32(value, >stripes_min,
+   >stripes_max)) {
+   fprintf(stderr, "Invalid stripes argument\n");
+   return 1;
+   }
+   args->flags |= BTRFS_BALANCE_ARGS_STRIPES;
} else {
fprintf(stderr, "Unrecognized balance option '%s'\n",
this_char);
@@ -359,6 +372,10 @@ static void dump_balance_args(struct btrfs_balance_args 
*args)
printf(", limit=");
print_range_u32(args->limit_min, args->limit_max);
}
+   if (args->flags & BTRFS_BALANCE_ARGS_STRIPES) {
+   printf(", stripes=");
+   print_range_u32(args->stripes_min, args->stripes_max);
+   }
 
printf("\n");
 }
diff --git a/ioctl.h b/ioctl.h
index ff7a1a0610a1..50f9e1485a30 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -239,7 +239,9 @@ struct btrfs_balance_args {
__u32 limit_max;
};
};
-   __u64 unused[7];
+   __u32 stripes_min;
+   __u32 stripes_max;
+   __u64 unused[6];
 } __attribute__ ((__packed__));
 
 /* report balance progress to userspace */
diff --git a/volumes.h b/volumes.h
index cb6f5752cdda..150ea7f31659 100644
--- a/volumes.h
+++ b/volumes.h
@@ -137,6 +137,7 @@ struct map_lookup {
 #define BTRFS_BALANCE_ARGS_VRANGE  (1ULL << 4)
 #define BTRFS_BALANCE_ARGS_LIMIT   (1ULL << 5)
 #define BTRFS_BALANCE_ARGS_LIMITS  (1ULL << 6)
+#define BTRFS_BALANCE_ARGS_STRIPES (1ULL << 7)
 
 /*
  * Profile changing flags.  When SOFT is set we won't relocate chunk if
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] btrfs-progs: balance: enhance the limit fiter with range

2015-10-13 Thread David Sterba
We can do more with the balance filer. Enhance it so we can specify also
the minimum number of block groups to process. The 'limit' filter now
accepts a range (a..b, can be partial) and needs kernel support.

The 'limit=value' filter is equivalent to 'limit=..value' but works on
older kernels as well.

The min/max values are 32bit, unlike the single-value limit which is
64bit.

Signed-off-by: David Sterba 
---
 cmds-balance.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/cmds-balance.c b/cmds-balance.c
index 1eadba417abc..7aaf33d03630 100644
--- a/cmds-balance.c
+++ b/cmds-balance.c
@@ -301,12 +301,24 @@ static int parse_filters(char *filters, struct 
btrfs_balance_args *args)
"the limit filter requires an 
argument\n");
return 1;
}
-   if (parse_u64(value, >limit)) {
-   fprintf(stderr, "Invalid limit argument: %s\n",
-  value);
-   return 1;
+   /*
+* Try to parse the range first. A single value is not
+* a valid range
+*/
+   if (parse_range_u32(value, >limit_min,
+   >limit_max) == 0) {
+   args->flags &= ~BTRFS_BALANCE_ARGS_LIMIT;
+   args->flags |= BTRFS_BALANCE_ARGS_LIMITS;
+   } else {
+   if (parse_u64(value, >limit)) {
+   fprintf(stderr,
+   "Invalid limit argument: %s\n",
+  value);
+   return 1;
+   }
+   args->flags &= ~BTRFS_BALANCE_ARGS_LIMITS;
+   args->flags |= BTRFS_BALANCE_ARGS_LIMIT;
}
-   args->flags |= BTRFS_BALANCE_ARGS_LIMIT;
} else {
fprintf(stderr, "Unrecognized balance option '%s'\n",
this_char);
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] btrfs-progs: extend parse_range API to accept a relaxed range

2015-10-13 Thread David Sterba
In some cases we want to accept a range of type [a..a]. Add a new
function to do the 'a < b' check for the caller and use it.

Signed-off-by: David Sterba 
---
 cmds-balance.c | 30 +-
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/cmds-balance.c b/cmds-balance.c
index 798b533aa7d6..62bee3cc78b6 100644
--- a/cmds-balance.c
+++ b/cmds-balance.c
@@ -126,9 +126,10 @@ static int parse_range(const char *range, u64 *start, u64 
*end)
return 1;
}
 
-   if (*start >= *end) {
-   fprintf(stderr, "Range %llu..%llu doesn't make "
-   "sense\n", (unsigned long long)*start,
+   if (*start > *end) {
+   fprintf(stderr,
+   "ERROR: range %llu..%llu doesn't make sense\n",
+   (unsigned long long)*start,
(unsigned long long)*end);
return 1;
}
@@ -139,6 +140,25 @@ static int parse_range(const char *range, u64 *start, u64 
*end)
return 1;
 }
 
+/*
+ * Parse range and check if start < end
+ */
+static int parse_range_strict(const char *range, u64 *start, u64 *end)
+{
+   if (parse_range(range, start, end) == 0) {
+   if (*start >= *end) {
+   fprintf(stderr,
+   "ERROR: range %llu..%llu not allowed\n",
+   (unsigned long long)*start,
+   (unsigned long long)*end);
+   return 1;
+   }
+   return 0;
+   }
+
+   return 1;
+}
+
 static int parse_filters(char *filters, struct btrfs_balance_args *args)
 {
char *this_char;
@@ -196,7 +216,7 @@ static int parse_filters(char *filters, struct 
btrfs_balance_args *args)
   "an argument\n");
return 1;
}
-   if (parse_range(value, >pstart, >pend)) {
+   if (parse_range_strict(value, >pstart, 
>pend)) {
fprintf(stderr, "Invalid drange argument\n");
return 1;
}
@@ -207,7 +227,7 @@ static int parse_filters(char *filters, struct 
btrfs_balance_args *args)
   "an argument\n");
return 1;
}
-   if (parse_range(value, >vstart, >vend)) {
+   if (parse_range_strict(value, >vstart, 
>vend)) {
fprintf(stderr, "Invalid vrange argument\n");
return 1;
}
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] btrfs-progs: cleanup and comment parse_range

2015-10-13 Thread David Sterba
Simplify a check and unindent some code.

Signed-off-by: David Sterba 
---
 cmds-balance.c | 63 ++
 1 file changed, 37 insertions(+), 26 deletions(-)

diff --git a/cmds-balance.c b/cmds-balance.c
index 9af218bbfa51..798b533aa7d6 100644
--- a/cmds-balance.c
+++ b/cmds-balance.c
@@ -88,43 +88,54 @@ static int parse_u64(const char *str, u64 *result)
return 0;
 }
 
+/*
+ * Parse range that's missing some part that can be implicit:
+ * a..b- exact range, a can be equal to b
+ * a.. - implicitly unbounded maximum (end == (u64)-1)
+ * ..b - implicitly starting at 0
+ * a   - invalid; unclear semantics, use parse_u64 instead
+ *
+ * Returned values are u64, value validation and interpretation should be done
+ * by the caller.
+ */
 static int parse_range(const char *range, u64 *start, u64 *end)
 {
char *dots;
+   const char *rest;
+   int skipped = 0;
 
dots = strstr(range, "..");
-   if (dots) {
-   const char *rest = dots + 2;
-   int skipped = 0;
-
-   *dots = 0;
+   if (!dots)
+   return 1;
 
-   if (!*rest) {
-   *end = (u64)-1;
-   skipped++;
-   } else {
-   if (parse_u64(rest, end))
-   return 1;
-   }
-   if (dots == range) {
-   *start = 0;
-   skipped++;
-   } else {
-   if (parse_u64(range, start))
-   return 1;
-   }
+   rest = dots + 2;
+   *dots = 0;
 
-   if (*start >= *end) {
-   fprintf(stderr, "Range %llu..%llu doesn't make "
-   "sense\n", (unsigned long long)*start,
-   (unsigned long long)*end);
+   if (!*rest) {
+   *end = (u64)-1;
+   skipped++;
+   } else {
+   if (parse_u64(rest, end))
return 1;
-   }
+   }
+   if (dots == range) {
+   *start = 0;
+   skipped++;
+   } else {
+   if (parse_u64(range, start))
+   return 1;
+   }
 
-   if (skipped <= 1)
-   return 0;
+   if (*start >= *end) {
+   fprintf(stderr, "Range %llu..%llu doesn't make "
+   "sense\n", (unsigned long long)*start,
+   (unsigned long long)*end);
+   return 1;
}
 
+   if (skipped <= 1)
+   return 0;
+
return 1;
 }
 
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] Btrfs-progs, balance filters: stripes, limits

2015-10-13 Thread David Sterba
Here's the userspace part that enables the use of stripes and enhanced limit
filters.

David Sterba (6):
  btrfs-progs: cleanup and comment parse_range
  btrfs-progs: extend parse_range API to accept a relaxed range
  btrfs-progs: add helpers for parsing 32bit ranges
  btrfs-progs: add helpers to print ranges
  btrfs-progs: extend balance args to take min/max limit filter
  btrfs-progs: balance: enhance the limit fiter with range

Gabríel Arthúr Pétursson (1):
  btrfs-progs: balance: add stripes filter

 Documentation/btrfs-balance.asciidoc |   8 ++
 cmds-balance.c   | 172 +--
 ioctl.h  |  17 +++-
 volumes.h|   2 +
 4 files changed, 168 insertions(+), 31 deletions(-)

-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix double range unlock of hole region when reading page

2015-10-13 Thread fdmanana
From: Filipe Manana 

If when reading a page we find a hole and our caller had already locked
the range (bio flags has the bit EXTENT_BIO_PARENT_LOCKED set), we end
up unlocking the hole's range and then later our caller unlocks it
again, which might have already been locked by some other task once
the first unlock happened.

Currently this can only happen during a call to the extent_same ioctl,
as it's the only caller of __do_readpage() that sets the bit
EXTENT_BIO_PARENT_LOCKED for bio flags.

Fix this by leaving the unlock exclusively to the caller.

Signed-off-by: Filipe Manana 
---
 fs/btrfs/extent_io.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ecb1204..6e6df34 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3070,8 +3070,12 @@ static int __do_readpage(struct extent_io_tree *tree,
 
set_extent_uptodate(tree, cur, cur + iosize - 1,
, GFP_NOFS);
-   unlock_extent_cached(tree, cur, cur + iosize - 1,
-, GFP_NOFS);
+   if (parent_locked)
+   free_extent_state(cached);
+   else
+   unlock_extent_cached(tree, cur,
+cur + iosize - 1,
+, GFP_NOFS);
cur = cur + iosize;
pg_offset += iosize;
continue;
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11] btrfs-progs: Use btrfs_open_dir to avoid show error of ioctl or tree search

2015-10-13 Thread David Sterba
On Mon, Oct 12, 2015 at 09:22:53PM +0800, Zhao Lei wrote:
> Use btrfs_open_dir() instead of open_file_or_dir(), to show error before
> real action(in ioctl or tree search), to make the error message exact
> and unified.
> 
> Zhao Lei (11):
>   btrfs-progs: subvolume: use btrfs_open_dir for btrfs subvolume command
>   btrfs-progs: filesystem: use btrfs_open_dir for btrfs filesystem
> command
>   btrfs-progs: balance: use btrfs_open_dir for btrfs balance command
>   btrfs-progs: inspect: Bypass unnecessary clean function in open_error
>   btrfs-progs: inspect: set return value of error case
>   btrfs-progs: inspect: use btrfs_open_dir for btrfs inspect command
>   btrfs-progs: qgroup: use btrfs_open_dir for btrfs qgroup command
>   btrfs-progs: quota: use btrfs_open_dir for btrfs quota command
>   btrfs-progs: use btrfs_open_dir in open_path_or_dev_mnt
>   btrfs-progs: replace: use btrfs_open_dir for btrfs replace command
>   btrfs-progs: fragments: use btrfs_open_dir for btrfs-fragments command

All merged, thanks! I appreciate you took the time to test all the
changes and the patch separation made review easy.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: compress: put variables defined per compress type in struct to make cache friendly

2015-10-13 Thread Byongho Lee
Below variables are defined per compress type.
 - struct list_head comp_idle_workspace[BTRFS_COMPRESS_TYPES]
 - spinlock_t comp_workspace_lock[BTRFS_COMPRESS_TYPES]
 - int comp_num_workspace[BTRFS_COMPRESS_TYPES]
 - atomic_t comp_alloc_workspace[BTRFS_COMPRESS_TYPES]
 - wait_queue_head_t comp_workspace_wait[BTRFS_COMPRESS_TYPES]

BTW, while accessing one compress type of these variables, the next or
before address is other compress types of it.
So this patch puts these variables in a struct to make cache friendly.

Signed-off-by: Byongho Lee 
---
 fs/btrfs/compression.c | 46 --
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index ce62324c78e7..85a80931ae3f 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -744,11 +744,13 @@ out:
return ret;
 }
 
-static struct list_head comp_idle_workspace[BTRFS_COMPRESS_TYPES];
-static spinlock_t comp_workspace_lock[BTRFS_COMPRESS_TYPES];
-static int comp_num_workspace[BTRFS_COMPRESS_TYPES];
-static atomic_t comp_alloc_workspace[BTRFS_COMPRESS_TYPES];
-static wait_queue_head_t comp_workspace_wait[BTRFS_COMPRESS_TYPES];
+static struct {
+   struct list_head idle_workspace;
+   spinlock_t workspace_lock;
+   int num_workspace;
+   atomic_t alloc_workspace;
+   wait_queue_head_t workspace_wait;
+} comp[BTRFS_COMPRESS_TYPES];
 
 static const struct btrfs_compress_op * const btrfs_compress_op[] = {
_zlib_compress,
@@ -760,10 +762,10 @@ void __init btrfs_init_compress(void)
int i;
 
for (i = 0; i < BTRFS_COMPRESS_TYPES; i++) {
-   INIT_LIST_HEAD(_idle_workspace[i]);
-   spin_lock_init(_workspace_lock[i]);
-   atomic_set(_alloc_workspace[i], 0);
-   init_waitqueue_head(_workspace_wait[i]);
+   INIT_LIST_HEAD([i].idle_workspace);
+   spin_lock_init([i].workspace_lock);
+   atomic_set([i].alloc_workspace, 0);
+   init_waitqueue_head([i].workspace_wait);
}
 }
 
@@ -777,11 +779,11 @@ static struct list_head *find_workspace(int type)
int cpus = num_online_cpus();
int idx = type - 1;
 
-   struct list_head *idle_workspace= _idle_workspace[idx];
-   spinlock_t *workspace_lock  = _workspace_lock[idx];
-   atomic_t *alloc_workspace   = _alloc_workspace[idx];
-   wait_queue_head_t *workspace_wait   = _workspace_wait[idx];
-   int *num_workspace  = _num_workspace[idx];
+   struct list_head *idle_workspace= [idx].idle_workspace;
+   spinlock_t *workspace_lock  = [idx].workspace_lock;
+   atomic_t *alloc_workspace   = [idx].alloc_workspace;
+   wait_queue_head_t *workspace_wait   = [idx].workspace_wait;
+   int *num_workspace  = [idx].num_workspace;
 again:
spin_lock(workspace_lock);
if (!list_empty(idle_workspace)) {
@@ -820,11 +822,11 @@ again:
 static void free_workspace(int type, struct list_head *workspace)
 {
int idx = type - 1;
-   struct list_head *idle_workspace= _idle_workspace[idx];
-   spinlock_t *workspace_lock  = _workspace_lock[idx];
-   atomic_t *alloc_workspace   = _alloc_workspace[idx];
-   wait_queue_head_t *workspace_wait   = _workspace_wait[idx];
-   int *num_workspace  = _num_workspace[idx];
+   struct list_head *idle_workspace= [idx].idle_workspace;
+   spinlock_t *workspace_lock  = [idx].workspace_lock;
+   atomic_t *alloc_workspace   = [idx].alloc_workspace;
+   wait_queue_head_t *workspace_wait   = [idx].workspace_wait;
+   int *num_workspace  = [idx].num_workspace;
 
spin_lock(workspace_lock);
if (*num_workspace < num_online_cpus()) {
@@ -852,11 +854,11 @@ static void free_workspaces(void)
int i;
 
for (i = 0; i < BTRFS_COMPRESS_TYPES; i++) {
-   while (!list_empty(_idle_workspace[i])) {
-   workspace = comp_idle_workspace[i].next;
+   while (!list_empty([i].idle_workspace)) {
+   workspace = comp[i].idle_workspace.next;
list_del(workspace);
btrfs_compress_op[i]->free_workspace(workspace);
-   atomic_dec(_alloc_workspace[i]);
+   atomic_dec([i].alloc_workspace);
}
}
 }
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS with 8TB SMR drives

2015-10-13 Thread David Sterba
On Mon, Oct 12, 2015 at 06:25:52PM +0200, Henk Slager wrote:
> and looking at this spec:
> http://www.seagate.com/files/www-content/product-content/hdd-fam/seagate-archive-hdd/en-us/docs/archive-hdd-dS1834-3-1411us.pdf
> 
> it seems that it is a drive-managed SMR disk. I am not sure why David
> assumes it is host-managed, maybe drive firmware/functionality can be
> bypassed.

Because the drive-managed ones are not interesting from the filesystem POV.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: compress: put variables defined per compress type in struct to make cache friendly

2015-10-13 Thread David Sterba
On Wed, Oct 14, 2015 at 01:13:26AM +0900, Byongho Lee wrote:
> Below variables are defined per compress type.
>  - struct list_head comp_idle_workspace[BTRFS_COMPRESS_TYPES]
>  - spinlock_t comp_workspace_lock[BTRFS_COMPRESS_TYPES]
>  - int comp_num_workspace[BTRFS_COMPRESS_TYPES]
>  - atomic_t comp_alloc_workspace[BTRFS_COMPRESS_TYPES]
>  - wait_queue_head_t comp_workspace_wait[BTRFS_COMPRESS_TYPES]
> 
> BTW, while accessing one compress type of these variables, the next or
> before address is other compress types of it.
> So this patch puts these variables in a struct to make cache friendly.

Nice.

> +static struct {
> + struct list_head idle_workspace;
> + spinlock_t workspace_lock;
> + int num_workspace;
> + atomic_t alloc_workspace;
> + wait_queue_head_t workspace_wait;
> +} comp[BTRFS_COMPRESS_TYPES];

The name became too generic, please rename it to btrfs_comp_ws.
btrfs_comp_workspaces would be too long. I won't mind trimming the
members to 'ws' instead of 'workspace' so this does not result in too
wild code formatting. The use of the workspaces is localized only to the
compression code so it will not be confusing.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix resending received snapshot with parent

2015-10-13 Thread Filipe Manana
On Tue, Oct 13, 2015 at 1:31 AM, Ed Tomlinson  wrote:
> On Friday, October 9, 2015 4:24:10 PM EDT, Filipe Manana wrote:
>>
>> On Wed, Sep 30, 2015 at 8:23 PM, Robin Ruede  wrote:
>>>
>>> This fixes a regression introduced by 37b8d27d between v4.1 and v4.2.
>>>
>>> When a snapshot is received, its received_uuid is set to the original
>>> uuid of the subvolume. When that snapshot is then resent to a third
>>> filesystem, it's received_uuid is set to the second uuid
>>> instead of the original one. The same was true for the parent_uuid. ...
>>
>> Reviewed-by: Filipe Manana 
>>
>> Thanks for fixing this.
>> I've added this to my integration branch [1] and will send soon a pull
>> request to Chris for 4.4, including this fix plus a few others for
>> send/receive, after some more testing.
>>
>> I've also made an xfstest for it [1, 2]
>
>
> Another thanks for this fix.  It fixes things here.  I am runing 4.2.3 with
> the 4.3 btrfs tree pulled on top of it along with this fix.  Incremental
> sends
> are now working again.
> Tested-by: Ed Tomlinson 
>
> This fixes a regression, can we please get into 4.3?

I've tagged it for stable backports in my 4.4 integration branch [1].
Thanks.

[1] 
http://git.kernel.org/cgit/linux/kernel/git/fdmanana/linux.git/log/?h=integration-4.4

thanks
>
> Thanks
> Ed Tomlinson
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix use after free iterating extrefs

2015-10-13 Thread Mark Fasheh
On Tue, Oct 13, 2015 at 02:06:48PM -0400, Chris Mason wrote:
> The code for btrfs inode-resolve has never worked properly for
> files with enough hard links to trigger extrefs.  It was trying to
> get the leaf out of a path after freeing the path:
> 
>   btrfs_release_path(path);
>   leaf = path->nodes[0];
>   item_size = btrfs_item_size_nr(leaf, slot);
> 
> The fix here is to use the extent buffer we cloned just a little higher
> up to avoid deadlocks caused by using the leaf in the path.
> 
> Signed-off-by: Chris Mason 
> cc: sta...@vger.kernel.org # v3.7+
> cc: Mark Fasheh 
Reviewed-by: Mark Fasheh 

Thanks for the CC Chris.
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 8/9] vfs: Add vfs_copy_file_range() support for pagecache copies

2015-10-13 Thread Anna Schumaker
On 10/09/2015 07:15 AM, Pádraig Brady wrote:
> On 08/10/15 02:40, Neil Brown wrote:
>> Anna Schumaker  writes:
>>
>>> @@ -1338,34 +1362,26 @@ ssize_t vfs_copy_file_range(struct file *file_in, 
>>> loff_t pos_in,
>>> struct file *file_out, loff_t pos_out,
>>> size_t len, unsigned int flags)
>>>  {
>>> -   struct inode *inode_in;
>>> -   struct inode *inode_out;
>>> ssize_t ret;
>>>  
>>> -   if (flags)
>>> +   /* Flags should only be used exclusively. */
>>> +   if ((flags & COPY_FR_COPY) && (flags & ~COPY_FR_COPY))
>>> +   return -EINVAL;
>>> +   if ((flags & COPY_FR_REFLINK) && (flags & ~COPY_FR_REFLINK))
>>> +   return -EINVAL;
>>> +   if ((flags & COPY_FR_DEDUP) && (flags & ~COPY_FR_DEDUP))
>>> return -EINVAL;
>>>  
>>
>> Do you also need:
>>
>>if (flags & ~(COPY_FR_COPY | COPY_FR_REFLINK | COPY_FR_DEDUP))
>>  return -EINVAL;
>>
>> so that future user-space can test if the kernel supports new flags?
> 
> Seems like a good idea, yes.
> 
> Also that got me thinking about COPY_FR_SPARSE.
> What's the current behavior when copying a sparse range?
> Is the hole propagated by default (good), or is it expanded?

I haven't tried it, but I think the hole would be expanded :(.  I'm having 
splice() handle the pagecache copy part, and (as far as I know) splice() 
doesn't know anything about sparse files.  I might be able to put in some kind 
of fallocate() / splice() loop to copy the range in multiple pieces.

I don't want to add COPY_FR_SPARSE_AUTO, because then the kernel will have to 
determine how best to interpret "auto".  I'm more inclined to add a single 
COPY_FR_SPARSE flag to enable creating sparse files, and then have the 
application tell us what to do for any given range.

Anna

> 
> Note cp(1) has --sparse={never,auto,always}. Auto is the default,
> so it would be good I think if that was the default mode for 
> copy_file_range().
> With other sparse modes, we'd have to avoid copy_file_range() unless
> there was control possible with COPY_FR_SPARSE_{AUTO,NONE,ALWAYS}.
> Note currently cp --sparse=always will detect runs of zeros and also
> avoid speculative preallocation by using fallocate (fd, FALLOC_FL_PUNCH_HOLE, 
> ...)
> 
> thanks,
> Pádraig.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 8/9] vfs: Add vfs_copy_file_range() support for pagecache copies

2015-10-13 Thread Anna Schumaker
On 10/07/2015 09:40 PM, Neil Brown wrote:
> Anna Schumaker  writes:
> 
>> @@ -1338,34 +1362,26 @@ ssize_t vfs_copy_file_range(struct file *file_in, 
>> loff_t pos_in,
>>  struct file *file_out, loff_t pos_out,
>>  size_t len, unsigned int flags)
>>  {
>> -struct inode *inode_in;
>> -struct inode *inode_out;
>>  ssize_t ret;
>>  
>> -if (flags)
>> +/* Flags should only be used exclusively. */
>> +if ((flags & COPY_FR_COPY) && (flags & ~COPY_FR_COPY))
>> +return -EINVAL;
>> +if ((flags & COPY_FR_REFLINK) && (flags & ~COPY_FR_REFLINK))
>> +return -EINVAL;
>> +if ((flags & COPY_FR_DEDUP) && (flags & ~COPY_FR_DEDUP))
>>  return -EINVAL;
>>  
> 
> Do you also need:
> 
>if (flags & ~(COPY_FR_COPY | COPY_FR_REFLINK | COPY_FR_DEDUP))
>   return -EINVAL;
> 
> so that future user-space can test if the kernel supports new flags?

Probably.  I'll add that in!

Thanks,
Anna

> 
> NeilBrown
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] btrfs-progs: mkfs: Fix different mixed type by argument sequence

2015-10-13 Thread David Sterba
On Tue, Oct 13, 2015 at 08:52:16PM +0800, Zhao Lei wrote:
> Given a 200G vdd1 and 1G vdd2:
> 
> In current code:
>  mkfs.btrfs -f /dev/vdd1 /dev/vdd2
>  and
>  mkfs.btrfs -f /dev/vdd2 /dev/vdd1
>  will create different "mixed" type.

I think combining large and small devices was not intended use for the
mixed-bg, nevertheless current behaviour is not right.

Chandan is working on dropping the forced mixed-bg completely. We've
discussed that on IRC, I'm ok with it but this needs more testing. So
far it looks fine, small filesystems get created and usable, though some
tuning might be needed.

My intentions for 4.3 is to take Chandan's work provided that we test it
enough. There are like 3 weeks left. In case of problems, I'll take this
patchset so at least we get the inconsisten behaviour fixed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix use after free iterating extrefs

2015-10-13 Thread Chris Mason
The code for btrfs inode-resolve has never worked properly for
files with enough hard links to trigger extrefs.  It was trying to
get the leaf out of a path after freeing the path:

btrfs_release_path(path);
leaf = path->nodes[0];
item_size = btrfs_item_size_nr(leaf, slot);

The fix here is to use the extent buffer we cloned just a little higher
up to avoid deadlocks caused by using the leaf in the path.

Signed-off-by: Chris Mason 
cc: sta...@vger.kernel.org # v3.7+
cc: Mark Fasheh 
---
 fs/btrfs/backref.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index ecbc63d..9a2ec79 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -1828,7 +1828,6 @@ static int iterate_inode_extrefs(u64 inum, struct 
btrfs_root *fs_root,
int found = 0;
struct extent_buffer *eb;
struct btrfs_inode_extref *extref;
-   struct extent_buffer *leaf;
u32 item_size;
u32 cur_offset;
unsigned long ptr;
@@ -1856,9 +1855,8 @@ static int iterate_inode_extrefs(u64 inum, struct 
btrfs_root *fs_root,
btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
btrfs_release_path(path);
 
-   leaf = path->nodes[0];
-   item_size = btrfs_item_size_nr(leaf, slot);
-   ptr = btrfs_item_ptr_offset(leaf, slot);
+   item_size = btrfs_item_size_nr(eb, slot);
+   ptr = btrfs_item_ptr_offset(eb, slot);
cur_offset = 0;
 
while (cur_offset < item_size) {
@@ -1872,7 +1870,7 @@ static int iterate_inode_extrefs(u64 inum, struct 
btrfs_root *fs_root,
if (ret)
break;
 
-   cur_offset += btrfs_inode_extref_name_len(leaf, extref);
+   cur_offset += btrfs_inode_extref_name_len(eb, extref);
cur_offset += sizeof(*extref);
}
btrfs_tree_read_unlock_blocking(eb);
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix use after free iterating extrefs

2015-10-13 Thread Filipe Manana
On Tue, Oct 13, 2015 at 7:06 PM, Chris Mason  wrote:
> The code for btrfs inode-resolve has never worked properly for
> files with enough hard links to trigger extrefs.  It was trying to
> get the leaf out of a path after freeing the path:
>
> btrfs_release_path(path);
> leaf = path->nodes[0];
> item_size = btrfs_item_size_nr(leaf, slot);
>
> The fix here is to use the extent buffer we cloned just a little higher
> up to avoid deadlocks caused by using the leaf in the path.
>
> Signed-off-by: Chris Mason 
> cc: sta...@vger.kernel.org # v3.7+
> cc: Mark Fasheh 
Reviewed-by: Filipe Manana 

Looks good to me.
I failed to notice that problem at commit [1]

[1] 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3fe81ce206f3805e0eb5d886aabbf91064655144

> ---
>  fs/btrfs/backref.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> index ecbc63d..9a2ec79 100644
> --- a/fs/btrfs/backref.c
> +++ b/fs/btrfs/backref.c
> @@ -1828,7 +1828,6 @@ static int iterate_inode_extrefs(u64 inum, struct 
> btrfs_root *fs_root,
> int found = 0;
> struct extent_buffer *eb;
> struct btrfs_inode_extref *extref;
> -   struct extent_buffer *leaf;
> u32 item_size;
> u32 cur_offset;
> unsigned long ptr;
> @@ -1856,9 +1855,8 @@ static int iterate_inode_extrefs(u64 inum, struct 
> btrfs_root *fs_root,
> btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
> btrfs_release_path(path);
>
> -   leaf = path->nodes[0];
> -   item_size = btrfs_item_size_nr(leaf, slot);
> -   ptr = btrfs_item_ptr_offset(leaf, slot);
> +   item_size = btrfs_item_size_nr(eb, slot);
> +   ptr = btrfs_item_ptr_offset(eb, slot);
> cur_offset = 0;
>
> while (cur_offset < item_size) {
> @@ -1872,7 +1870,7 @@ static int iterate_inode_extrefs(u64 inum, struct 
> btrfs_root *fs_root,
> if (ret)
> break;
>
> -   cur_offset += btrfs_inode_extref_name_len(leaf, 
> extref);
> +   cur_offset += btrfs_inode_extref_name_len(eb, extref);
> cur_offset += sizeof(*extref);
> }
> btrfs_tree_read_unlock_blocking(eb);
> --
> 2.4.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


State of Dedup / Defrag

2015-10-13 Thread Rich Freeman
What is the current state of Dedup and Defrag in btrfs?  I seem to
recall there having been problems a few months ago and I've stopped
using it, but I haven't seen much news since.

I'm interested both in the 3.18 and subsequent kernel series.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


System completely unresponsive after `btrfs balance start -dconvert=raid0 /` and `btrfs fi show /`

2015-10-13 Thread Carmine Paolino
Hi all,

I have an home server with 3 hard drives that I added to the same btrfs 
filesystem. Several hours ago I run `btrfs balance start -dconvert=raid0 /` and 
as soon as I run `btrfs fi show /` I lost my ssh connection to the machine. The 
machine is still on, but it doesn’t even respond to ping: I always get a 
request timeout and sometimes even an host is down message. Its fans are 
spinning at full blast and the hard drives’s led are registering activity all 
the time. I run Plex Home Theater too there and the display output is stuck at 
the time when I run those two commands. I left it running because I fear to 
lose everything by powering it down manually.

Should I leave it like this and let it finish? How long it might take? (I have 
a 250gb internal hard drive, a 120gb usb 2.0 one and a 2TB usb 2.0 one so the 
transfer speeds are pretty low) Is it safe to power it off manually? Should I 
file a bug after it?

Any help would be appreciated.

Thanks,
Carmine--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can't mount btrfs: corrupt leaf, slot offset bad

2015-10-13 Thread EJ Parker
I rebooted my server last night and discovered that my btrfs
filesystem (3 disk raid1) would not mount anymore. After doing some
research and getting nowhere I went to IRC and user darkling asked me
a few questions and asked for output of btrfs-debug-tree and
ultimately sent me here saying I should include a handful of things:

Before I go further, let's get required info out of the way:

uname -a:
Linux archhost1 4.2.3-1-ARCH #1 SMP PREEMPT Sat Oct 3 18:52:50
CEST 2015 x86_64 GNU/Linux
btrfs --version:
btrfs-progs v4.2.1
output from "btrfs fi show":
Label: none  uuid: 5470630f-39f4-4d39-90a2-277d7991722a
Total devices 3 FS bytes used 3.10TiB
devid1 size 3.64TiB used 2.12TiB path /dev/sdd
devid2 size 3.64TiB used 2.12TiB path /dev/sde
devid3 size 3.64TiB used 2.12TiB path /dev/sdc

First, I am able to mount with -o ro,recovery, but not with just -o
recovery. When I attempt to mount w/o ro, I get this in dmesg:
[44478.800613] BTRFS critical (device sde): corrupt leaf, slot offset
bad: block=5674754899968,root=1, slot=147
[44478.802489] BTRFS critical (device sde): corrupt leaf, slot offset
bad: block=5674754899968,root=1, slot=147
[44478.804072] BTRFS error (device sde): Error removing orphan entry,
stopping orphan cleanup
[44478.805856] BTRFS error (device sde): could not do orphan cleanup -22
[44482.635498] BTRFS: open_ctree failed

Running "btrfs-debug-tree -b 5674754899968 /dev/sde" gave me this:
leaf 5674754899968 items 207 free space 30 generation 884595 owner 5
fs uuid 5470630f-39f4-4d39-90a2-277d7991722a
chunk uuid c269615e-7397-41bc-95d0-dfdb2a696b23
[...]
item 145 key (273094 EXTENT_DATA 364924928) itemoff 8545 itemsize 53
extent data disk byte 8658465382400 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0
item 146 key (273094 EXTENT_DATA 364929024) itemoff 8492 itemsize 53
extent data disk byte 8658465378304 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0
item 147 key (273094 EXTENT_DATA 364933120) itemoff 8439 itemsize 53
extent data disk byte 8677950173184 nr 24576
extent data offset 0 nr 20480 ram 24576
extent compression 0
item 148 key (273094 EXTENT_DATA 364953600) itemoff 8333 itemsize 53
extent data disk byte 8677990363136 nr 20480
extent data offset 0 nr 16384 ram 20480
extent compression 0
item 149 key (273094 EXTENT_DATA 364957696) itemoff 8386 itemsize 53
extent data disk byte 0 nr 0
extent data offset 0 nr 18446744073709514752 ram
18446744073709514752
extent compression 0
item 150 key (273094 EXTENT_DATA 364969984) itemoff 8280 itemsize 53
extent data disk byte 8678063341568 nr 20480
extent data offset 0 nr 16384 ram 20480
extent compression 0
item 151 key (273094 EXTENT_DATA 365002752) itemoff 8227 itemsize 53
extent data disk byte 8678025232384 nr 36864
extent data offset 0 nr 32768 ram 36864
extent compression 0
item 152 key (273094 EXTENT_DATA 365019136) itemoff 8174 itemsize 53
extent data disk byte 8678112104448 nr 36864
extent data offset 0 nr 32768 ram 36864
extent compression 0
item 153 key (273094 EXTENT_DATA 365051904) itemoff 8121 itemsize 53
extent data disk byte 8678052835328 nr 53248
extent data offset 0 nr 49152 ram 53248
extent compression 0
item 154 key (273094 EXTENT_DATA 365101056) itemoff 8068 itemsize 53
extent data disk byte 8678090510336 nr 20480
extent data offset 0 nr 16384 ram 20480
extent compression 0
item 155 key (273094 EXTENT_DATA 365117440) itemoff 8015 itemsize 53
extent data disk byte 8678117130240 nr 20480
extent data offset 0 nr 16384 ram 20480
extent compression 0
[...]


Output from "btrfs check --readonly /dev/sde":
Checking filesystem on /dev/sde
UUID: 5470630f-39f4-4d39-90a2-277d7991722a
checking extents
incorrect offsets 8439 8386
bad block 5674754899968
Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots

Output from (failed) "btrfs check --repair /dev/sdc" (which I tried
prior to seeking help):
enabling repair mode
Checking filesystem on /dev/sdc
UUID: 5470630f-39f4-4d39-90a2-277d7991722a
checking extents
incorrect offsets 8439 8386
shifting item nr 148 by bytes in block 5674754899968
items overlap, can't fix
cmds-check.c:4059: fix_item_offset: Assertion `ret` failed.


darklink also mentioned that btrfs-zero-log might help too, but that I
should get confirmation from one of the devs on 

Re: [PATCH] btrfs: compress: put variables defined per compress type in struct to make cache friendly

2015-10-13 Thread Byongho Lee

David Sterba writes:

>
>> +static struct {
>> +struct list_head idle_workspace;
>> +spinlock_t workspace_lock;
>> +int num_workspace;
>> +atomic_t alloc_workspace;
>> +wait_queue_head_t workspace_wait;
>> +} comp[BTRFS_COMPRESS_TYPES];
>
> The name became too generic, please rename it to btrfs_comp_ws.
> btrfs_comp_workspaces would be too long. I won't mind trimming the
> members to 'ws' instead of 'workspace' so this does not result in too
> wild code formatting. The use of the workspaces is localized only to the
> compression code so it will not be confusing.

Thanks for feedback.
I will prepare v2 patch applying your comment.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html