[PATCH 1/3] btrfs: backref: Don't merge refs which are not for same block.

2015-04-01 Thread Qu Wenruo
Old __merge_refs() in backref.c will even merge refs whose root_id are
different, which makes qgroup gives wrong result.

Fix it by checking ref_for_same_block() before any mode specific works.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 fs/btrfs/backref.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index f55721f..7f14275 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -507,7 +507,7 @@ static int __add_missing_keys(struct btrfs_fs_info *fs_info,
 }
 
 /*
- * merge two lists of backrefs and adjust counts accordingly
+ * merge backrefs and adjust counts accordingly
  *
  * mode = 1: merge identical keys, if key is set
  *FIXME: if we add more keys in __add_prelim_ref, we can merge more here.
@@ -535,9 +535,9 @@ static void __merge_refs(struct list_head *head, int mode)
 
ref2 = list_entry(pos2, struct __prelim_ref, list);
 
+   if (!ref_for_same_block(ref1, ref2))
+   continue;
if (mode == 1) {
-   if (!ref_for_same_block(ref1, ref2))
-   continue;
if (!ref1-parent  ref2-parent) {
xchg = ref1;
ref1 = ref2;
-- 
2.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Use list to replace rb_tree in btrfs_delayed_ref_head-ref_roots.

2015-04-01 Thread Qu Wenruo
The old rbtree implement of ref_head-ref_root sacrificed the insert
order to do better delayed_ref_node merging.
However the out of order behavior makes btrfs_find_all_roots() unable to
find correct root, since it needs the insert order to skip later
delayed_nodes.

Without such ability, qgroup can never be accurate (although it is never
accurate anyway, :-( ).

This patchset first fix a small but deadly (only for qgroup) bug in
backref, which will cause btrfs_find_all_roots() return result less than
expected one.

And the second patch do migrate from rb_tree implement to new list
implement, and still maintain the ability to merge delayed_ref_nodes.

The last patch will cleanup the unused rb_tree implement only functions.

The new list implement is in a simpler logic and code base (removes
about 200 lines).

Qu Wenruo (3):
  btrfs: backref: Don't merge refs which are not for same block.
  btrfs: delayed-ref: Use list to replace the ref_root in ref_head.
  btrfs: delayed-ref: Cleanup the unneeded functions.

 fs/btrfs/backref.c |  15 +--
 fs/btrfs/delayed-ref.c | 311 -
 fs/btrfs/delayed-ref.h |  18 ++-
 fs/btrfs/disk-io.c |   8 +-
 fs/btrfs/extent-tree.c |  46 ++--
 5 files changed, 108 insertions(+), 290 deletions(-)

-- 
2.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: prevent deletion of mounted subvolumes

2015-04-01 Thread Timo Kokkonen

Hi,

On 01.04.2015 10:03, Omar Sandoval wrote:

On Tue, Mar 31, 2015 at 10:54:55PM -0500, Eric W. Biederman wrote:

Omar Sandoval osan...@osandov.com writes:


On Mon, Mar 30, 2015 at 02:30:34PM +0200, David Sterba wrote:

On Mon, Mar 30, 2015 at 02:02:17AM -0700, Omar Sandoval wrote:

Before commit bafc9b754f75 (vfs: More precise tests in d_invalidate),
d_invalidate() could return -EBUSY when a dentry for a directory had
more than one reference to it. This is what prevented a mounted
subvolume from being deleted, as struct vfsmount holds a reference to
the subvolume dentry. However, that commit removed that case, and later
commits in that patch series removed the return code from d_invalidate()
completely, so we don't get that check for free anymore. So, reintroduce
it in btrfs_ioctl_snap_destroy().



This applies to 4.0-rc6. To be honest, I'm not sure that this is the most
correct fix for this bug, but it's equivalent to the pre-3.18 behavior and it's
the best that I could come up with. Thoughts?



+   spin_lock(dentry-d_lock);
+   err = dentry-d_lockref.count  1 ? -EBUSY : 0;
+   spin_unlock(dentry-d_lock);


The fix restores the original behaviour, but I don't think opencoding and
using internals is fine. Either there should be a vfs api for that or
there's an existing one that can be used instead.


I have a problem with restoring the original behavior as is.

In some sense it re-introduces the security issue that the d_invalidate
changes were built to fix.

Any user in the system can create a user namespace, create a mount
namespace and keep any subvolume pinned forever.  Which at the very
least could make a very nice DOS attack.  I am not familiar enough with
how people use subvolumes and

So let me ask.  How can userspace not know that a subvolume that they
want to delete is already mounted?



Currently, the entry in /proc/mounts doesn't tell you which subvolume is
mounted. The fix for that could be as simple as:


diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 05fef19..9492d83 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1024,6 +1024,10 @@ static int btrfs_show_options(struct seq_file *seq, 
struct dentry *dentry)
struct btrfs_root *root = info-tree_root;
char *compress_type;

+   if (dentry != dentry-d_sb-s_root) {
+   seq_puts(seq, ,subvol=);
+   seq_dentry(seq, dentry,  \t\n\\);
+   }
if (btrfs_test_opt(root, DEGRADED))
seq_puts(seq, ,degraded);
if (btrfs_test_opt(root, NODATASUM))


Then, maybe this policy could be pushed up to userspace. It feels
awkward to do it in the kernel, but users are apparently depending on
this behavior. Timo, do you mind sharing some more details about how
your scripts ran into the bug?



We are choosing the active subvolume via kernel command line parameter, eg:

root=/dev/mmcblk0p3 rw rootwait rootflags=subvol=/foobar

In the user space we run a script that does some upgrades on the root 
file system and in the process creates new subvolumes and deletes old 
ones. For this to work we mount the root subvolume eg. / onto 
somewhere so we can take a new snapshot from the currently active 
subvolume, eg.


mount /dev/mmcblk0p3 /mnt/root
btrfs subvolume snapshot /mnt/root/foobar /mnt/root/new_foobar
btrfs subvolume delete /mnt/root/old_foobar

and such. But if it happens that user gives faulty names for the 
subvolumes to operate with, the script might delete the subvolume that 
is the same where the device root file system is mounted from. That is, 
there is not anything anymore that prevents user from running this command:


btrfs subvolume delete /mnt/root/foobar

Once you delete the this subvolume, things obviously collapse into halt 
pretty soon as the userspace expects / to be present on the system.


That is something that is obviously wrong thing for the user to do to 
begin with, easily avoidable with more careful scripting. But I can 
think this is very bad if the user is doing root file system snapshot 
management by hand and might easily delete his mounted root file system 
by accident. And I can't think of any reason why kernel should allow 
user to do this.


I hope this clears up things a bit.

-Timo


I can see having something like is_local_mount_root and denying the
subvolume destruction if the mount that is pinning it is in your local
mount namespace.



The bug here seems defined up to the point that we're trying to delete a
subvolume that's a mountpoint. My next guess is that a check

if (d_mountpoint(dentry)) { ... }

could work.


That was my first instinct as well, but d_mountpoint() is true for
dentries that have a filesystem mounted on them (e.g., after mount
/dev/sda1 /mnt, the dentry for /mnt), not the dentry that is mounted.

I poked around the mount code for awhile and couldn't come up with
anything using the existing interface. Mounting subvolumes bubbles down
to mount_subtree(), which doesn't really leave 

[PATCH 2/3] btrfs: delayed-ref: Use list to replace the ref_root in ref_head.

2015-04-01 Thread Qu Wenruo
This patch replace the rbtree used in ref_head to list.
This has the following two advantages:
1) Qgroup codes now get an accurate view on the delayed_ref order.
This is the basis for the improved qgroup codes.

2) Easier merge logic.
With the new list implement, we only need to care merging the tail
ref_node with the new ref_node.
And this can be done quite easy at insert time, no need to do a
indicated merge at run_delayed_refs().

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 fs/btrfs/backref.c |   9 +--
 fs/btrfs/delayed-ref.c | 153 ++---
 fs/btrfs/delayed-ref.h |  18 +-
 fs/btrfs/disk-io.c |   8 +--
 fs/btrfs/extent-tree.c |  46 +++
 5 files changed, 113 insertions(+), 121 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 7f14275..a4c31dc 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -572,8 +572,8 @@ static int __add_delayed_refs(struct btrfs_delayed_ref_head 
*head, u64 seq,
  struct list_head *prefs, u64 *total_refs,
  u64 inum)
 {
+   struct btrfs_delayed_ref_node *node;
struct btrfs_delayed_extent_op *extent_op = head-extent_op;
-   struct rb_node *n = head-node.rb_node;
struct btrfs_key key;
struct btrfs_key op_key = {0};
int sgn;
@@ -583,12 +583,7 @@ static int __add_delayed_refs(struct 
btrfs_delayed_ref_head *head, u64 seq,
btrfs_disk_key_to_cpu(op_key, extent_op-key);
 
spin_lock(head-lock);
-   n = rb_first(head-ref_root);
-   while (n) {
-   struct btrfs_delayed_ref_node *node;
-   node = rb_entry(n, struct btrfs_delayed_ref_node,
-   rb_node);
-   n = rb_next(n);
+   list_for_each_entry(node, head-ref_list, list) {
if (node-seq  seq)
continue;
 
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 6d16bea..14bd476 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -268,7 +268,7 @@ static inline void drop_delayed_ref(struct 
btrfs_trans_handle *trans,
rb_erase(head-href_node, delayed_refs-href_root);
} else {
assert_spin_locked(head-lock);
-   rb_erase(ref-rb_node, head-ref_root);
+   list_del(ref-list);
}
ref-in_tree = 0;
btrfs_put_delayed_ref(ref);
@@ -328,48 +328,6 @@ static int merge_ref(struct btrfs_trans_handle *trans,
return done;
 }
 
-void btrfs_merge_delayed_refs(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info,
- struct btrfs_delayed_ref_root *delayed_refs,
- struct btrfs_delayed_ref_head *head)
-{
-   struct rb_node *node;
-   u64 seq = 0;
-
-   assert_spin_locked(head-lock);
-   /*
-* We don't have too much refs to merge in the case of delayed data
-* refs.
-*/
-   if (head-is_data)
-   return;
-
-   spin_lock(fs_info-tree_mod_seq_lock);
-   if (!list_empty(fs_info-tree_mod_seq_list)) {
-   struct seq_list *elem;
-
-   elem = list_first_entry(fs_info-tree_mod_seq_list,
-   struct seq_list, list);
-   seq = elem-seq;
-   }
-   spin_unlock(fs_info-tree_mod_seq_lock);
-
-   node = rb_first(head-ref_root);
-   while (node) {
-   struct btrfs_delayed_ref_node *ref;
-
-   ref = rb_entry(node, struct btrfs_delayed_ref_node,
-  rb_node);
-   /* We can't merge refs that are outside of our seq count */
-   if (seq  ref-seq = seq)
-   break;
-   if (merge_ref(trans, delayed_refs, head, ref, seq))
-   node = rb_first(head-ref_root);
-   else
-   node = rb_next(ref-rb_node);
-   }
-}
-
 int btrfs_check_delayed_seq(struct btrfs_fs_info *fs_info,
struct btrfs_delayed_ref_root *delayed_refs,
u64 seq)
@@ -485,6 +443,71 @@ update_existing_ref(struct btrfs_trans_handle *trans,
 }
 
 /*
+ * Helper to insert the ref_node to the tail or merge with tail.
+ *
+ * Return 0 for insert.
+ * Return 0 for merge.
+ */
+static int
+add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans,
+  struct btrfs_delayed_ref_root *root,
+  struct btrfs_delayed_ref_head *href,
+  struct btrfs_delayed_ref_node *ref)
+{
+   struct btrfs_delayed_ref_node *exist;
+   int mod;
+   int ret = 0;
+
+   /* Check whether we can merge the tail node with ref */
+   if (list_empty(href-ref_list))
+   goto add_tail;
+   exist = list_entry(href-ref_list.prev, struct btrfs_delayed_ref_node,
+  

[PATCH] btrfs-progs doc: emphasis that only mounted device works for btrfs device stats

2015-04-01 Thread Chen Hanxiao
We provided format path|device in command line.
But btrfs device stats doesn't work if device is not mounted.

Also fix some tailing whitespace.

Signed-off-by: Chen Hanxiao chenhanx...@cn.fujitsu.com
---
 Documentation/btrfs-device.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/btrfs-device.txt b/Documentation/btrfs-device.txt
index 66be6b3..3868fd4 100644
--- a/Documentation/btrfs-device.txt
+++ b/Documentation/btrfs-device.txt
@@ -83,14 +83,14 @@ Check device to see if it has all of it's devices in cache 
for mounting.
 *scan* [(--all-devices|-d)|device [device...]]::
 Scan devices for a btrfs filesystem.
 +
-If one or more devices are passed, these are scanned for a btrfs filesystem. 
+If one or more devices are passed, these are scanned for a btrfs filesystem.
 If no devices are passed, btrfs uses block devices containing btrfs
 filesystem as listed by blkid.
-Finally, if '--all-devices' or '-d' is passed, all the devices under /dev are 
+Finally, if '--all-devices' or '-d' is passed, all the devices under /dev are
 scanned.
 
 *stats* [-z] path|device::
-Read and print the device IO stats for all devices of the filesystem
+Read and print the device IO stats for all mounted devices of the filesystem
 identified by path or for a single device.
 +
 `Options`
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is rbtree really needed to restore ref_root in btrfs_delayed_ref_head?

2015-04-01 Thread Qu Wenruo

If no one disagree, I'll try to implement it using list.

In fact, after an easy patch and some tests, it doesn't bring much 
performance regression, and delayed_ref_nodes are still mergeable.


Thanks,
Qu

 Original Message  
Subject: Is rbtree really needed to restore ref_root in 
btrfs_delayed_ref_head?

From: Qu Wenruo quwen...@cn.fujitsu.com
To: linux-btrfs linux-btrfs@vger.kernel.org, David Sterba 
dste...@suse.cz, Chris Mason c...@fb.com, Josef Bacik jba...@fb.com

Date: 2015年03月24日 15:21


Hi all and maintainers.

I'm investigating several qgroup bugs, and find out current delayed ref
implement has several possible problem which may lead to qgroup bugs.

Although my previous RFC patchset
(http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg42458.html)
is trying to resolve some qgroup problems, some deep problem in
delayed-ref seems blocking further fix.

[Problem]
Seq in ref_node doesn't really make sense
For example, in Liu Bo's fstests btrfs/017, all DROP_DELAYED_REF
ref_node will have the same sequence number.

But qgroup routine, especially with my RFC
patchset(http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg42458.html),
needs the exact insert order to do accurate excl/rfer calculation.

My first idea was to reintroduce the minor sequence number in ref_node,
but soon I realized that we could have a better idea with [FIX].

[FIX]
Why not dual index ref_node with list only?
Current implement using rb-tree of ref_root is only
update_existing_ref(), which is in fact merging ref_nodes with same
(bytenr, parent) tuple.

But in fact, we have merge_refs() and doesn't need to do such thing at
insert time.

IMHO use list to index ref_node should be a quite qgroup friendly
implement, where qgroup codes can get the perfect insert sequence it needs.


Delayed-ref is somewhat fundamental piece of btrfs, so I send the mail
before writing the patch.
It would be quite nice if anyone can point if there is anything wrong
before I wasting several days to write a meaningless patch.

Thanks,
Qu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ERROR: error removing the device '/dev/sdXN' - Inappropriate ioctl for device

2015-04-01 Thread Anand Jain



btrfs device delete /dev/sdf5 /mnt/data2

ERROR: error removing the device '/dev/sdf5' - Inappropriate ioctl for
device


 very strange. 'btrfs fi show -m' shows btrfs fs(s) that are mounted.

- Anand
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: add debugfs file to test transaction aborts

2015-04-01 Thread Anand Jain




+bool debugfs_abort_transaction(struct btrfs_fs_info *fs_info)
+{
+   if (!btrfs_debugfs_label_trans_abort[0])
+   return false;
+   return strcmp(fs_info-super_copy-label,
+ btrfs_debugfs_label_trans_abort) == 0;
+}
+


 label is not mandatory to be present.

 did I missing something ?

Thanks, Anand
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: add debugfs file to test transaction aborts

2015-04-01 Thread Filipe David Manana
On Wed, Apr 1, 2015 at 7:41 AM, Anand Jain anand.j...@oracle.com wrote:


 +bool debugfs_abort_transaction(struct btrfs_fs_info *fs_info)
 +{
 +   if (!btrfs_debugfs_label_trans_abort[0])
 +   return false;
 +   return strcmp(fs_info-super_copy-label,
 + btrfs_debugfs_label_trans_abort) == 0;
 +}
 +


  label is not mandatory to be present.

  did I missing something ?

Yes. It's intentional.
This is for testing purposes only, not for users.

thanks


 Thanks, Anand

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: avoid syncing log in the fast fsync path when not necessary

2015-04-01 Thread Filipe Manana
Commit 3a8b36f37806 (Btrfs: fix data loss in the fast fsync path) added
a performance regression for that causes an unnecessary sync of the log
trees (fs/subvol and root log trees) when 2 consecutive fsyncs are done
against a file, without no writes or any metadata updates to the inode in
between them and if a transaction is committed before the second fsync is
called.

Huang Ying reported this to lkml (https://lkml.org/lkml/2015/3/18/99)
after a test sysbench test that measured a -62% decrease of file io
requests per second for that tests' workload.

The test is:

  echo performance  /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
  echo performance  /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
  echo performance  /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
  echo performance  /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
  mkfs -t btrfs /dev/sda2
  mount -t btrfs /dev/sda2 /fs/sda2
  cd /fs/sda2
  for ((i = 0; i  1024; i++)); do fallocate -l 67108864 testfile.$i; done
  sysbench --test=fileio --max-requests=0 --num-threads=4 --max-time=600 \
--file-test-mode=rndwr --file-total-size=68719476736 --file-io-mode=sync \
--file-num=1024 run

A test on kvm guest, running a debug kernel gave me the following results:

Without 3a8b36f378060d: 16.01 reqs/sec
With 3a8b36f378060d: 3.39 reqs/sec
With 3a8b36f378060d and this patch: 16.04 reqs/sec

Reported-by: Huang Ying ying.hu...@intel.com
Tested-by: Huang, Ying ying.hu...@intel.com
Signed-off-by: Filipe Manana fdman...@suse.com
---
 fs/btrfs/file.c |  9 ++---
 fs/btrfs/ordered-data.c | 14 ++
 fs/btrfs/ordered-data.h |  3 +++
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 309dd57..379275c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1878,6 +1878,7 @@ int btrfs_sync_file(struct file *file, loff_t start, 
loff_t end, int datasync)
struct btrfs_log_ctx ctx;
int ret = 0;
bool full_sync = 0;
+   const u64 len = end - start + 1;
 
trace_btrfs_sync_file(file, datasync);
 
@@ -1906,7 +1907,7 @@ int btrfs_sync_file(struct file *file, loff_t start, 
loff_t end, int datasync)
 * all extents are persisted and the respective file extent
 * items are in the fs/subvol btree.
 */
-   ret = btrfs_wait_ordered_range(inode, start, end - start + 1);
+   ret = btrfs_wait_ordered_range(inode, start, len);
} else {
/*
 * Start any new ordered operations before starting to log the
@@ -1978,8 +1979,10 @@ int btrfs_sync_file(struct file *file, loff_t start, 
loff_t end, int datasync)
 */
smp_mb();
if (btrfs_inode_in_log(inode, root-fs_info-generation) ||
-   (full_sync  BTRFS_I(inode)-last_trans =
-root-fs_info-last_trans_committed)) {
+   (BTRFS_I(inode)-last_trans =
+root-fs_info-last_trans_committed 
+(full_sync ||
+ !btrfs_have_ordered_extents_in_range(inode, start, len {
/*
 * We'v had everything committed since the last time we were
 * modified so clear this flag in case it was set for whatever
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 157cc54..72b6f0d 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -838,6 +838,20 @@ out:
return entry;
 }
 
+bool btrfs_have_ordered_extents_in_range(struct inode *inode,
+u64 file_offset,
+u64 len)
+{
+   struct btrfs_ordered_extent *oe;
+
+   oe = btrfs_lookup_ordered_range(inode, file_offset, len);
+   if (oe) {
+   btrfs_put_ordered_extent(oe);
+   return true;
+   }
+   return false;
+}
+
 /*
  * lookup and return any extent before 'file_offset'.  NULL is returned
  * if none is found
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index e96cd4c..9ba7209 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -191,6 +191,9 @@ btrfs_lookup_first_ordered_extent(struct inode * inode, u64 
file_offset);
 struct btrfs_ordered_extent *btrfs_lookup_ordered_range(struct inode *inode,
u64 file_offset,
u64 len);
+bool btrfs_have_ordered_extents_in_range(struct inode *inode,
+u64 file_offset,
+u64 len);
 int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
struct btrfs_ordered_extent *ordered);
 int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to 

How to avoid certain file being defragmented?

2015-04-01 Thread Wang, Zhiye
Hello,

I have some files which I hope their on-disk data can be on fixed location of 
disk. My understanding is that defragmentation operation can potentially move 
data blocks of a file.

So, can I avoid certain file being defragmented in btrfs? 


Thanks
Mike​



Re: Upgrade to 3.19.2 Kernel fails to boot

2015-04-01 Thread Anand Jain


Eric found something like this and has a fix with in the email.
Sub: I think btrfs: fix leak of path in btrfs_find_item broke stable 
trees ...


Anand

On 03/24/2015 06:40 PM, Rich Freeman wrote:

On Tue, Mar 24, 2015 at 2:31 AM, Anand Jain anand.j...@oracle.com wrote:

Do you have this fix ..

  [PATCH] Btrfs: release path before starting transaction in can_nocow_extent

could you try ?.


I believe I already have this patch.  3.18.9 contains this:

commit bdeeab62a611f1f7cd48fd285ce568e8dcd0455a
Merge: 797afdf 1bda19e
Author: Linus Torvalds torva...@linux-foundation.org
Date:   Fri Oct 18 16:46:21 2013 -0700

 Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

 Pull btrfs fix from Chris Mason:
  Sage hit a deadlock with ceph on btrfs, and Josef tracked it down to a
   regression in our initial rc1 pull.  When doing nocow writes we were
   sometimes starting a transaction with locks held

 * 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
   Btrfs: release path before starting transaction in can_nocow_extent
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: prevent deletion of mounted subvolumes

2015-04-01 Thread Omar Sandoval
On Tue, Mar 31, 2015 at 10:54:55PM -0500, Eric W. Biederman wrote:
 Omar Sandoval osan...@osandov.com writes:
 
  On Mon, Mar 30, 2015 at 02:30:34PM +0200, David Sterba wrote:
  On Mon, Mar 30, 2015 at 02:02:17AM -0700, Omar Sandoval wrote:
   Before commit bafc9b754f75 (vfs: More precise tests in d_invalidate),
   d_invalidate() could return -EBUSY when a dentry for a directory had
   more than one reference to it. This is what prevented a mounted
   subvolume from being deleted, as struct vfsmount holds a reference to
   the subvolume dentry. However, that commit removed that case, and later
   commits in that patch series removed the return code from d_invalidate()
   completely, so we don't get that check for free anymore. So, reintroduce
   it in btrfs_ioctl_snap_destroy().
  
   This applies to 4.0-rc6. To be honest, I'm not sure that this is the most
   correct fix for this bug, but it's equivalent to the pre-3.18 behavior 
   and it's
   the best that I could come up with. Thoughts?
  
   +spin_lock(dentry-d_lock);
   +err = dentry-d_lockref.count  1 ? -EBUSY : 0;
   +spin_unlock(dentry-d_lock);
  
  The fix restores the original behaviour, but I don't think opencoding and
  using internals is fine. Either there should be a vfs api for that or
  there's an existing one that can be used instead.
 
 I have a problem with restoring the original behavior as is.
 
 In some sense it re-introduces the security issue that the d_invalidate
 changes were built to fix.
 
 Any user in the system can create a user namespace, create a mount
 namespace and keep any subvolume pinned forever.  Which at the very
 least could make a very nice DOS attack.  I am not familiar enough with
 how people use subvolumes and 
 
 So let me ask.  How can userspace not know that a subvolume that they
 want to delete is already mounted?
 

Currently, the entry in /proc/mounts doesn't tell you which subvolume is
mounted. The fix for that could be as simple as:


diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 05fef19..9492d83 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1024,6 +1024,10 @@ static int btrfs_show_options(struct seq_file *seq, 
struct dentry *dentry)
struct btrfs_root *root = info-tree_root;
char *compress_type;
 
+   if (dentry != dentry-d_sb-s_root) {
+   seq_puts(seq, ,subvol=);
+   seq_dentry(seq, dentry,  \t\n\\);
+   }
if (btrfs_test_opt(root, DEGRADED))
seq_puts(seq, ,degraded);
if (btrfs_test_opt(root, NODATASUM))


Then, maybe this policy could be pushed up to userspace. It feels
awkward to do it in the kernel, but users are apparently depending on
this behavior. Timo, do you mind sharing some more details about how
your scripts ran into the bug?

 I can see having something like is_local_mount_root and denying the
 subvolume destruction if the mount that is pinning it is in your local
 mount namespace.  
 
 
  The bug here seems defined up to the point that we're trying to delete a
  subvolume that's a mountpoint. My next guess is that a check
  
 if (d_mountpoint(dentry)) { ... }
  
  could work.
 
  That was my first instinct as well, but d_mountpoint() is true for
  dentries that have a filesystem mounted on them (e.g., after mount
  /dev/sda1 /mnt, the dentry for /mnt), not the dentry that is mounted.
 
  I poked around the mount code for awhile and couldn't come up with
  anything using the existing interface. Mounting subvolumes bubbles down
  to mount_subtree(), which doesn't really leave any traces of which
  subvolume is mounted except for the dentry in struct vfsmount.
 
  (As far as I can tell, under the covers subvolume deletion is more or
  less equivalent to an rm -rf, and we obviously don't do anything to stop
  users from doing that on the root of their mounted filesystem, but it
  appears that users expect the original behavior.)
 
  Here's an idea: mark mount root dentries as such in the VFS and check it
  in the Btrfs code. Adding fsdevel ML for comments
  (https://lkml.org/lkml/2015/3/30/125 is the original message).
 
 Marking root dentries is needed to fix the bug that you can escape
 the limitations of loopback mounts with a carefully placed rename.
 
 I have a patch cooking that marks mountpoints and tracks all of the
 mounts on a dentry.  So except for the possibility of stepping on each
 others toes I have no objections.
 

We'll see how the discussion here plays out. I'll keep an eye out for
it, feel free to Cc me.

 Eric
 
  
  diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
  index 74609b9..8a0933d 100644
  --- a/fs/btrfs/ioctl.c
  +++ b/fs/btrfs/ioctl.c
  @@ -2384,6 +2384,11 @@ static noinline int btrfs_ioctl_snap_destroy(struct 
  file *file,
  goto out_dput;
  }
   
  +   if (d_is_mount_root(dentry)) {
  +   err = -EBUSY;
  +   goto out_dput;
  +   }
  +
  mutex_lock(inode-i_mutex);
   
  /*
  diff 

[PATCH 3/3] btrfs: delayed-ref: Cleanup the unneeded functions.

2015-04-01 Thread Qu Wenruo
Cleanup the rb_tree merge/insert/update functions, since now we use list
instead of rb_tree now.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 fs/btrfs/delayed-ref.c | 174 -
 1 file changed, 174 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 14bd476..4fff260 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -84,87 +84,6 @@ static int comp_data_refs(struct btrfs_delayed_data_ref 
*ref2,
return 0;
 }
 
-/*
- * entries in the rb tree are ordered by the byte number of the extent,
- * type of the delayed backrefs and content of delayed backrefs.
- */
-static int comp_entry(struct btrfs_delayed_ref_node *ref2,
- struct btrfs_delayed_ref_node *ref1,
- bool compare_seq)
-{
-   if (ref1-bytenr  ref2-bytenr)
-   return -1;
-   if (ref1-bytenr  ref2-bytenr)
-   return 1;
-   if (ref1-is_head  ref2-is_head)
-   return 0;
-   if (ref2-is_head)
-   return -1;
-   if (ref1-is_head)
-   return 1;
-   if (ref1-type  ref2-type)
-   return -1;
-   if (ref1-type  ref2-type)
-   return 1;
-   if (ref1-no_quota  ref2-no_quota)
-   return 1;
-   if (ref1-no_quota  ref2-no_quota)
-   return -1;
-   /* merging of sequenced refs is not allowed */
-   if (compare_seq) {
-   if (ref1-seq  ref2-seq)
-   return -1;
-   if (ref1-seq  ref2-seq)
-   return 1;
-   }
-   if (ref1-type == BTRFS_TREE_BLOCK_REF_KEY ||
-   ref1-type == BTRFS_SHARED_BLOCK_REF_KEY) {
-   return comp_tree_refs(btrfs_delayed_node_to_tree_ref(ref2),
- btrfs_delayed_node_to_tree_ref(ref1),
- ref1-type);
-   } else if (ref1-type == BTRFS_EXTENT_DATA_REF_KEY ||
-  ref1-type == BTRFS_SHARED_DATA_REF_KEY) {
-   return comp_data_refs(btrfs_delayed_node_to_data_ref(ref2),
- btrfs_delayed_node_to_data_ref(ref1));
-   }
-   BUG();
-   return 0;
-}
-
-/*
- * insert a new ref into the rbtree.  This returns any existing refs
- * for the same (bytenr,parent) tuple, or NULL if the new node was properly
- * inserted.
- */
-static struct btrfs_delayed_ref_node *tree_insert(struct rb_root *root,
- struct rb_node *node)
-{
-   struct rb_node **p = root-rb_node;
-   struct rb_node *parent_node = NULL;
-   struct btrfs_delayed_ref_node *entry;
-   struct btrfs_delayed_ref_node *ins;
-   int cmp;
-
-   ins = rb_entry(node, struct btrfs_delayed_ref_node, rb_node);
-   while (*p) {
-   parent_node = *p;
-   entry = rb_entry(parent_node, struct btrfs_delayed_ref_node,
-rb_node);
-
-   cmp = comp_entry(entry, ins, 1);
-   if (cmp  0)
-   p = (*p)-rb_left;
-   else if (cmp  0)
-   p = (*p)-rb_right;
-   else
-   return entry;
-   }
-
-   rb_link_node(node, parent_node, p);
-   rb_insert_color(node, root);
-   return NULL;
-}
-
 /* insert a new ref to head ref rbtree */
 static struct btrfs_delayed_ref_head *htree_insert(struct rb_root *root,
   struct rb_node *node)
@@ -277,57 +196,6 @@ static inline void drop_delayed_ref(struct 
btrfs_trans_handle *trans,
trans-delayed_ref_updates--;
 }
 
-static int merge_ref(struct btrfs_trans_handle *trans,
-struct btrfs_delayed_ref_root *delayed_refs,
-struct btrfs_delayed_ref_head *head,
-struct btrfs_delayed_ref_node *ref, u64 seq)
-{
-   struct rb_node *node;
-   int mod = 0;
-   int done = 0;
-
-   node = rb_next(ref-rb_node);
-   while (!done  node) {
-   struct btrfs_delayed_ref_node *next;
-
-   next = rb_entry(node, struct btrfs_delayed_ref_node, rb_node);
-   node = rb_next(node);
-   if (seq  next-seq = seq)
-   break;
-   if (comp_entry(ref, next, 0))
-   continue;
-
-   if (ref-action == next-action) {
-   mod = next-ref_mod;
-   } else {
-   if (ref-ref_mod  next-ref_mod) {
-   struct btrfs_delayed_ref_node *tmp;
-
-   tmp = ref;
-   ref = next;
-   next = tmp;
-   done = 1;
-   }
-   mod = -next-ref_mod;
-   }
-
-   drop_delayed_ref(trans, 

Re: ERROR: error removing the device '/dev/sdXN' - Inappropriate ioctl for device

2015-04-01 Thread Martin
On 01/04/15 08:06, Anand Jain wrote:
 
 btrfs device delete /dev/sdf5 /mnt/data2

 ERROR: error removing the device '/dev/sdf5' - Inappropriate ioctl for
 device
 
  very strange. 'btrfs fi show -m' shows btrfs fs(s) that are mounted.

Looks like my /dev/sdf isn't responding with anything useful at all... A
firmware crash?... (This is for a 256GB SSD.)


# btrfs fi show -m

[...]

Label: 'btrfs_data2'  uuid: 3aaee716-b98b-4c86-ba5a-53456994f152
Total devices 3 FS bytes used 159.31GiB
devid1 size 206.47GiB used 206.02GiB path /dev/sdb5
devid2 size 206.47GiB used 206.47GiB path /dev/sdd5
devid3 size 206.47GiB used 206.47GiB path /dev/sdf5


btrfs-progs v3.19.1



# smartctl -i /dev/sdf
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.14.10-gentoo_s03a_11]
(local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:   /7:0:0:0
Product:
Compliance:   SPC-5
User Capacity:600,332,565,813,390,450 bytes [600 PB]
Logical block size:   774843950 bytes
scsiModePageOffset: response length too short, resp_len=47 offset=50
bd_len=46
scsiModePageOffset: response length too short, resp_len=47 offset=50
bd_len=46
 Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more
'-T permissive' options.



And btrfs is still running ok so far... I'll be swapping that device at
the weekend. (First chance I have for a Sunday shutdown. :-( )


Still... For administering btrfs, it is a little disturbing not to be
able to remove/delete a device unless that device is both mounted and
working...

Or is that where the missing option comes in?



Comments welcomed,

Thanks,
Martin





--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: prevent deletion of mounted subvolumes

2015-04-01 Thread David Sterba
On Wed, Apr 01, 2015 at 12:03:28AM -0700, Omar Sandoval wrote:
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -1024,6 +1024,10 @@ static int btrfs_show_options(struct seq_file *seq, 
 struct dentry *dentry)
   struct btrfs_root *root = info-tree_root;
   char *compress_type;
  
 + if (dentry != dentry-d_sb-s_root) {
 + seq_puts(seq, ,subvol=);
 + seq_dentry(seq, dentry,  \t\n\\);

Unfortunatelly this does not work if the default subvolume is not the
toplevel one and the implicit mount (ie. without subvol=) is used. Then
this leads to subvol=/ although it should be subvol=/the/default .

There was a patch to build the path in the show_options callback, but it
looked too heavy (taking locks, doing lookups). This is unrelated to the
problem reported by Timo, though the fix might also fix this one.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to avoid certain file being defragmented?

2015-04-01 Thread Hugo Mills
On Wed, Apr 01, 2015 at 12:42:04PM +, Wang, Zhiye wrote:
 Hello,

 I have some files which I hope their on-disk data can be on fixed
 location of disk. My understanding is that defragmentation operation
 can potentially move data blocks of a file.

 So, can I avoid certain file being defragmented in btrfs? 

   Simple: Don't defragment it.

   Also, set the file to nodatacow if you're planning on writing to it
at any point; don't take snapshots or reflink copies of it if that's
the case. Finally, don't ever run a balance on the FS.

   Alternatively, have a hard think about why you want to bypass the
filesystem, because it's probably a fairly bad idea. Even most
bootloaders these days put the data they can't move in the 64k before
the FS starts, and then have enough code to read the FS through its
normal data structures.

   If you need some kind of raw(ish) block interface, using a file and
a loopback device may be more useful.

   Hugo.

-- 
Hugo Mills | I get nervous when I see words like 'mayhaps' in a
hugo@... carfax.org.uk | novel, because I fear that just round the corner is
http://carfax.org.uk/  | lurking 'forsooth'
PGP: 65E74AC0  |  GRRM's UK editor


signature.asc
Description: Digital signature


Re: ERROR: error removing the device '/dev/sdXN' - Inappropriate ioctl for device

2015-04-01 Thread Anand Jain



 Looks like an option to use devid to delete a device would
 have mitigated the issue.  Also the error reported is no
 where near the reality. Will fix them.

 Thanks for reporting.

Anand


On 04/01/2015 06:54 PM, Martin wrote:

On 01/04/15 08:06, Anand Jain wrote:



btrfs device delete /dev/sdf5 /mnt/data2

ERROR: error removing the device '/dev/sdf5' - Inappropriate ioctl for
device


  very strange. 'btrfs fi show -m' shows btrfs fs(s) that are mounted.


Looks like my /dev/sdf isn't responding with anything useful at all... A
firmware crash?... (This is for a 256GB SSD.)


# btrfs fi show -m

[...]

Label: 'btrfs_data2'  uuid: 3aaee716-b98b-4c86-ba5a-53456994f152
 Total devices 3 FS bytes used 159.31GiB
 devid1 size 206.47GiB used 206.02GiB path /dev/sdb5
 devid2 size 206.47GiB used 206.47GiB path /dev/sdd5
 devid3 size 206.47GiB used 206.47GiB path /dev/sdf5


btrfs-progs v3.19.1



# smartctl -i /dev/sdf
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.14.10-gentoo_s03a_11]
(local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:   /7:0:0:0
Product:
Compliance:   SPC-5
User Capacity:600,332,565,813,390,450 bytes [600 PB]
Logical block size:   774843950 bytes
scsiModePageOffset: response length too short, resp_len=47 offset=50
bd_len=46
scsiModePageOffset: response length too short, resp_len=47 offset=50
bd_len=46

Terminate command early due to bad response to IEC mode page

A mandatory SMART command failed: exiting. To continue, add one or more
'-T permissive' options.



And btrfs is still running ok so far... I'll be swapping that device at
the weekend. (First chance I have for a Sunday shutdown. :-( )


Still... For administering btrfs, it is a little disturbing not to be
able to remove/delete a device unless that device is both mounted and
working...

Or is that where the missing option comes in?



Comments welcomed,

Thanks,
Martin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING at fs/btrfs/super.c:260 __btrfs_abort_transaction (error -17)

2015-04-01 Thread Chris Mason



On Tue, Mar 24, 2015 at 6:23 PM, Sophie just4pleis...@gmail.com wrote:

On 24/03/15 17:34, Chris Mason wrote:



On Tue, Mar 24, 2015 at 9:43 AM, Sophie Dexter 
just4pleis...@gmail.com

wrote:

On 20/03/2015 15:19, Sophie Dexter wrote:
I'm given to understand that this is the right place to report a 
btrfs

problem, I apologise if not :-(

I have been using my router as a simple NFS NAS for around 2 years
with an ext3 formatted 2 TB Western Digital 2.5 USB Passport 
disk. I
have been slowly moving to BTRFS and thought it about time to 
convert
this disk too but unfortunately BTRFS is unreliable on my router 
:-(.

It doesn't take long for an error to happen causing a 'ro' remount.
However the disk is unreadable after the remount, both for NFS and
locally. Rebooting the router seems to be the only way to access 
the

disk again.

I also have a 1 GB swap partition on the disk although swap doesn't
appear to be a factor as the problem occurs whether or not swap is
enabled (this report is without swap).

I used my laptop to convert the fs to btrfs, not my router. My 
laptop
has Fedora 21 with 3.18 kernel and tools. No problems are found 
when I

use my laptop to check and scrub the disk (i.e. with the disk
connected directly to my laptop).


You have great timing, there are two reports of a very similar abort
with 4.0-rc5, but your report makes it clear these are not a 
regression

from 4.0-rc4.

Are you able to run btrfsck on this filesystem?  I'd like to check 
for

metadata inconsistencies.

-chris


Hi Chris,

Haha, great timing is the secret of good comedy lol

OpenWrt has only very recently signed off the 3.18 kernel as the 
default kernel for my router, I was using a build with 3.14 when I 
converted my disk and saw the same problem :!: I may have posted 
something I haven't repeated here in the OpenWrt ticket I opened:


https://dev.openwrt.org/ticket/19216

I previously checked and scrubbed the disk when the problem first 
occurred and happily no problems were found then. Although, I had to 
use another computer because btrfs check doesn't complete on my 
router, the process is killed due to lack of memory (btrfs invoked 
oom-killer) :-( Should I start another topic for this or just accept 
that that problem is due to a lack of memory?


I have just run btrfs check again using (yet another) laptop and I 
think everything is still OK:


# btrfs check /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: ----
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 930516788539 bytes used err is 0
total csum bytes: 1234353920
total tree bytes: 1458515968
total fs tree bytes: 54571008
total extent tree bytes: 66936832
btree space waste bytes: 73372568
file data blocks allocated: 1264250781696
 referenced 1264250781696
Btrfs v3.14.1
# uname -a
Linux ##-- 3.16.0-31-generic #43-Ubuntu SMP Tue Mar 
10 17:37:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux


Sophie, can you please grab the latest btrfs progs from git:

git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

And try with that btrfsck?

The other image that is reproducing this has an error in the free space 
cache, so I'd like to confirm if you're hitting the same problem.


-chris



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is this normal? Should I use scrub?

2015-04-01 Thread Andy Smith
Hello,

I have a 6 device RAID-1 filesystem:

$ sudo btrfs fi df /srv/tank
Data, RAID1: total=1.24TiB, used=1.24TiB
System, RAID1: total=32.00MiB, used=184.00KiB
Metadata, RAID1: total=3.00GiB, used=1.65GiB
unknown, single: total=512.00MiB, used=0.00
$ sudo btrfs fi sh /srv/tank
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
Total devices 6 FS bytes used 1.24TiB
devid2 size 1.82TiB used 384.03GiB path /dev/sdh
devid3 size 1.82TiB used 383.00GiB path /dev/sdg
devid4 size 1.82TiB used 384.00GiB path /dev/sdf
devid5 size 2.73TiB used 1.13TiB path /dev/sdk
devid6 size 1.82TiB used 121.00GiB path /dev/sdj
devid7 size 2.73TiB used 116.00GiB path /dev/sde

Btrfs v3.14.2

All of these devices are in an external eSATA enclosure.

A few days ago (I believe) something went wrong with the enclosure
hardware and the SCSI bus kept getting reset over and over. At one
point three of the six devices were kicked out and the filesystem
was left running (read-only) on three devices.

Through some trial and error I determined that the enclosure was
taking exception to one of the devices, and by removing it I was
able to get things up and running with five devices, writeable,
mounted in degraded mode. /dev/sdk is the device that was kept out
of the filesystem.

I do not believe that there is anything wrong with /dev/sdk as I put
it in another system and was able to read it entirely, do SMART long
tests on it, etc.

I wasn't able to prove it is a hardware problem until I took the
enclosure out of service as it's the only enclosure I had. So that's
a task for later.

I have now got a new enclosure and put this system back together
with all six devices. I was not expecting this filesystem to mount
without assistance on boot because of /dev/sdk being stale
compared to the other devices. I suppose this incorrect view is a
holdover from my experience with mdadm.

Anyway, I booted it and /srv/tank was mounted automatically with all
six devices.  I got a bunch of these messages as soon as it was
mounted:

http://pastie.org/private/2ghahjwtzlcm6hwp66hkg

There's lots more of it but it's all like that. That paste is from
the end of the log and there haven't been any more such message
since, so that's about 20 minutes (the times are in GMT).

Is that normal output indicating that btrfs is repairing the
staleness of sdk from the other copy?

I seem to be able to use the filesystem and a cursory inspection
isn't turning up anything that I can't read or that seems
corrupted. I will now run checksums against my last good backup.

Should I run a scrub as well?

Cheers,
Andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
On Tue, Mar 31, 2015 at 4:09 PM, Chris Murphy li...@colorremedies.com wrote:
A failure of the
 HDD cannot be ruled out, low power conditions, cheap consumer part...

 Well you have to rule that out before anyone on this list can really
 help. Try booting Fedora 21 install media, and using smartctl -x on
 the drive.

smartctl thinks the drive is ok. Unfortunately, it doesn't have a
truth serum to distinguish whether this drive lies about writes or
not...

[root@localhost liveuser]# smartctl -x /dev/sda
smartctl 6.2 2014-07-16 r3952 [x86_64-linux-3.17.4-301.fc21.x86_64]
(local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Laptop SSHD
Device Model: ST500LM000-1EJ162
Serial Number:W3709VQD
LU WWN Device Id: 5 000c50 069f901e9
Firmware Version: SM14
User Capacity:500,107,862,016 bytes [500 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate:5400 rpm
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Wed Apr  1 11:44:43 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is: 128 (minimum power consumption without standby)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Write SCT (Get) XXX Error Recovery Control Command failed: scsi error
aborted command
Wt Cache Reorder: N/A

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  (   0)The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (  139) seconds.
Offline data collection
capabilities:  (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003)Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01)Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:  (   1) minutes.
Extended self-test routine
recommended polling time:  (  98) minutes.
Conveyance self-test routine
recommended polling time:  (   2) minutes.
SCT capabilities:(0x1081)SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAGSVALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate POSR--   112   099   006-46707576
  3 Spin_Up_TimePO   099   099   000-0
  4 Start_Stop_Count-O--CK   100   100   020-147
  5 Reallocated_Sector_Ct   PO--CK   100   100   010-0
  7 Seek_Error_Rate POSR--   078   060   030-65832005
  9 Power_On_Hours  -O--CK   092   092   000-7775
 10 Spin_Retry_CountPO--C-   100   100   097-0
 12 Power_Cycle_Count   -O--CK   100   100   020-159
184 End-to-End_Error-O--CK   100   100   099-0
187 Reported_Uncorrect  -O--CK   100   100   000-0
188 Command_Timeout -O--CK   100   100   000-1
189 High_Fly_Writes -O-RCK   095   095   000-5
190 Airflow_Temperature_Cel -O---K   070   058   045-30 (Min/Max 27/31)
191 G-Sense_Error_Rate  -O--CK   100   100   000-0
192 Power-Off_Retract_Count -O--CK   100   100   000-25
193 Load_Cycle_Count-O--CK   066   066   000-68484
194 Temperature_Celsius -O---K   030   042   000-30 (0 16 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000-0
198 Offline_Uncorrectable   C-   100   100   000-0
199 UDMA_CRC_Error_Count-OSRCK   200   200   000-0
254 Free_Fall_Sensor-O--CK   100   100   000-0
||_ K auto-keep
|__ C event count
___ R error rate
||| S speed/performance
||_ O updated online

Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
On Wed, Apr 1, 2015 at 12:15 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 Will try to capture some info from a dracut breakpoint  (I'll try
 mount). At this point this really looks like a regression.

After a couple alternating boots, the problem vanished :-/:

 - 3.18.7-200.fc21 - success
 - 3.18.9-200.fc21 - timeout
 - 3.18.7-200.fc21 - success
 - 3.18.9-200.fc21 - timeout
 - 3.18.7-200.fc21 - success
 - 3.18.9-200.fc21 - with rd.shell rd.break=mount rd.debug -- success
 - 3.18.9-200.fc21 - with no special params -- success

(all boots had rhgb and quiet params removed)

So problem is gone, yet I feel like this
https://c2.staticflickr.com/4/3438/4593531893_f67a757fa1.jpg



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
Hi Chris, list,

thanks for your debugging ideas so far. Now this gets interesting. I
booted off a LiveUSB disk, and it just mounted sysroot. WTH?

See below. Perhaps the newer kernel (in latest F21) has regressed in
handling some kinds of errors during mount, or the dracut/systemd
mounting process is less resilient than mounting under a fully booted
system?


[root@localhost liveuser]# uname -a
Linux localhost 3.17.4-301.fc21.x86_64 #1 SMP Thu Nov 27 19:09:10 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux

## Before booting into liveUSB, I made a copy of
## rdsosreport.txt in the /home partition
## which is a separate btrfs fs, and seems
## to not be affected by the problem at all
[root@localhost liveuser]# mkdir /myhome
[root@localhost liveuser]# mkdir /mysysroot
[root@localhost liveuser]# mount /dev/sda2 /myhome
[root@localhost liveuser]# ls /myhome
home  rdsosreport.txt

[root@localhost liveuser]# fpaste  /myhome/rdsosreport.txt
Uploading (93.4KiB)...
http://ur1.ca/k2zue - http://paste.fedoraproject.org/205971/01928142

 Strange, on first book from live USB F21 image, it just mounts
## (I tried about half a dozen cold boots earlier -- all resulting in
##  the same initramfs/dracut/systemd emergency shell...)
[root@localhost liveuser]# mount /dev/sda6 /mysysroot

Apr 01 11:26:51 localhost kernel: BTRFS info (device sda6): disk space
caching is enabled
Apr 01 11:26:56 localhost kernel: BTRFS: checking UUID tree
Apr 01 11:26:56 localhost kernel: SELinux: initialized (dev sda6, type
btrfs), uses xattr

[root@localhost liveuser]# ls /mysysroot
root
[root@localhost liveuser]# ls /mysysroot/root
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root
run  sbin  srv  sys  sysroot  tmp  usr  var

[root@localhost liveuser]# umount /dev/sda6
[root@localhost liveuser]# btrfs check /dev/sda6
Checking filesystem on /dev/sda6
UUID: 94637b35-a294-4be2-aa47-82c52d6d53ef
checking extents
checking free space cache
checking fs roots
root 256 inode 39841 errors 400, nbytes wrong
found 7747100703 bytes used err is 1
total csum bytes: 11912932
total tree bytes: 476725248
total fs tree bytes: 434733056
total extent tree bytes: 22986752
btree space waste bytes: 83962424
file data blocks allocated: 30820143104
 referenced 11997040640
Btrfs v3.17

[root@localhost liveuser]# btrfs check --repair /dev/sda6
enabling repair mode
Fixed 0 roots.
Checking filesystem on /dev/sda6
UUID: 94637b35-a294-4be2-aa47-82c52d6d53ef
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 256 inode 39841 errors 400, nbytes wrong
found 7747100703 bytes used err is 1
total csum bytes: 11912932
total tree bytes: 476725248
total fs tree bytes: 434733056
total extent tree bytes: 22986752
btree space waste bytes: 83962424
file data blocks allocated: 30820143104
 referenced 11997040640
Btrfs v3.17



EOM

cheers,



martin
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is this normal? Should I use scrub?

2015-04-01 Thread Hugo Mills
   Hi, Andy,

On Wed, Apr 01, 2015 at 03:11:14PM +, Andy Smith wrote:
 I have a 6 device RAID-1 filesystem:

[snip tale of a filesystem with out of data data on one copy of the RAID]

 I have now got a new enclosure and put this system back together
 with all six devices. I was not expecting this filesystem to mount
 without assistance on boot because of /dev/sdk being stale
 compared to the other devices. I suppose this incorrect view is a
 holdover from my experience with mdadm.
 
 Anyway, I booted it and /srv/tank was mounted automatically with all
 six devices.  I got a bunch of these messages as soon as it was
 mounted:
 
 http://pastie.org/private/2ghahjwtzlcm6hwp66hkg
 
 There's lots more of it but it's all like that. That paste is from
 the end of the log and there haven't been any more such message
 since, so that's about 20 minutes (the times are in GMT).
 
 Is that normal output indicating that btrfs is repairing the
 staleness of sdk from the other copy?

   Yes, exactly. That output you pasted looks pretty much exactly like
what I'd expect to see in the situation described above. You might
also expect to see some checksum errors corrected in the data, as well
as the metadata messages you're getting.

 I seem to be able to use the filesystem and a cursory inspection
 isn't turning up anything that I can't read or that seems
 corrupted. I will now run checksums against my last good backup.
 
 Should I run a scrub as well?

   Yes. The output you've had so far will be just the pieces that the
FS has tried to read, and where, as a result, it's been able to detect
the out-of-date data. A scrub will check and fix everything.

   Hugo.

-- 
Hugo Mills | My karma has run over my dogma.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: 65E74AC0  |


signature.asc
Description: Digital signature


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
On Wed, Apr 1, 2015 at 11:42 AM, Martin Langhoff
martin.langh...@gmail.com wrote:
 See below. Perhaps the newer kernel (in latest F21) has regressed in
 handling some kinds of errors during mount, or the dracut/systemd
 mounting process is less resilient than mounting under a fully booted
 system?

This is getting even more interesting.

Under 3.17.4-301.fc21.x86 from LiveUSB, I could mount, even repair the disk.

Since the repair, the on-disk latest kernel (3.18.9-200.fc21) tries to
boot, but dracut/systemd time out on mounting sysroot after waiting
for quite a while. I don't get a dracut shell anymore so the failure
mode has changed. I may try to set a breakpoint to force a shell.

I do have an earlier F21 kernel on disk-- 3.18.7-200.fc21 -- and this
boots the system without a glitch. After a complete boot with
3.18.7.200, clean shutdown and booting into 3.18.9-200 is still
broken, same failure mode.

Will try to capture some info from a dracut breakpoint  (I'll try
mount). At this point this really looks like a regression.

cheers,



martin
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
Whenever I have these boot problems, I'm noticing that sometimes the
device, /dev/sda5, is showing up with lsblk (libblkid) as
/dev/block/8:5 while everything else (not-Btrfs) on that device shows
up as /dev/sdaX. Does anyone know what that might mean?


Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: prevent deletion of mounted subvolumes

2015-04-01 Thread Omar Sandoval
On Wed, Apr 01, 2015 at 01:22:42PM +0200, David Sterba wrote:
 On Wed, Apr 01, 2015 at 12:03:28AM -0700, Omar Sandoval wrote:
  --- a/fs/btrfs/super.c
  +++ b/fs/btrfs/super.c
  @@ -1024,6 +1024,10 @@ static int btrfs_show_options(struct seq_file *seq, 
  struct dentry *dentry)
  struct btrfs_root *root = info-tree_root;
  char *compress_type;
   
  +   if (dentry != dentry-d_sb-s_root) {
  +   seq_puts(seq, ,subvol=);
  +   seq_dentry(seq, dentry,  \t\n\\);
 
 Unfortunatelly this does not work if the default subvolume is not the
 toplevel one and the implicit mount (ie. without subvol=) is used. Then
 this leads to subvol=/ although it should be subvol=/the/default .
 
 There was a patch to build the path in the show_options callback, but it
 looked too heavy (taking locks, doing lookups). This is unrelated to the
 problem reported by Timo, though the fix might also fix this one.

Hm, yeah, that's unfortunate, thanks for pointing that out. It looks
like we can get the subvolume ID reliably:


diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 05fef19..a74ddb3 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1024,6 +1024,8 @@ static int btrfs_show_options(struct seq_file *seq, 
struct dentry *dentry)
struct btrfs_root *root = info-tree_root;
char *compress_type;
 
+   seq_printf(seq, ,subvolid=%llu,
+ BTRFS_I(d_inode(dentry))-root-root_key.objectid);
if (btrfs_test_opt(root, DEGRADED))
seq_puts(seq, ,degraded);
if (btrfs_test_opt(root, NODATASUM))


With that, userspace has enough information to determine whether a
subvolume is mounted. That would be racy with concurrent mounts,
though...

Just to throw another idea out there, what about doing something like my
VFS patch, but then making it optional whether the kernel should error
out on a mounted subvolume, e.g., with a flag to the ioctl? btrfs-progs
could default to the original EBUSY behavior for users who depend on it,
but we could add a force flag to `btrfs subvolume delete` in order to
avert the DoS situation Eric wants to avoid. Thoughts on that?

-- 
Omar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: RENAME_EXCHANGE semantic for renameat2()

2015-04-01 Thread Davide Italiano
Signed-off-by: Davide Italiano dccitali...@gmail.com
---
 fs/btrfs/inode.c | 190 ++-
 1 file changed, 189 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d2e732d..49b0867 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8890,6 +8890,190 @@ static int btrfs_getattr(struct vfsmount *mnt,
return 0;
 }
 
+static int btrfs_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
+{
+   struct btrfs_trans_handle *trans;
+   struct btrfs_root *root = BTRFS_I(old_dir)-root;
+   struct btrfs_root *dest = BTRFS_I(new_dir)-root;
+   struct inode *new_inode = new_dentry-d_inode;
+   struct inode *old_inode = old_dentry-d_inode;
+   struct timespec ctime = CURRENT_TIME;
+   u64 old_ino = btrfs_ino(old_inode);
+   u64 new_ino = btrfs_ino(new_inode);
+   u64 old_idx = 0;
+   u64 new_idx = 0;
+   u64 root_objectid;
+   int ret;
+
+   /* we only allow rename subvolume link between subvolumes */
+   if (old_ino != BTRFS_FIRST_FREE_OBJECTID  root != dest)
+   return -EXDEV;
+
+   /* close the racy window with snapshot create/destroy ioctl */
+   if (old_ino == BTRFS_FIRST_FREE_OBJECTID)
+   down_read(root-fs_info-subvol_sem);
+   if (new_ino == BTRFS_FIRST_FREE_OBJECTID)
+   down_read(dest-fs_info-subvol_sem);
+
+   /*
+* We want to reserve the absolute worst case amount of items.  So if
+* both inodes are subvols and we need to unlink them then that would
+* require 4 item modifications, but if they are both normal inodes it
+* would require 5 item modifications, so we'll assume their normal
+* inodes.  So 5 * 2 is 10, plus 2 for the new links, so 12 total items
+* should cover the worst case number of items we'll modify.
+*/
+   trans = btrfs_start_transaction(root, 12);
+   if (IS_ERR(trans)) {
+ret = PTR_ERR(trans);
+goto out_notrans;
+}
+
+   /*
+* We need to find a free sequence number both in the source and
+* in the destination directory for the exchange.
+*/
+   ret = btrfs_set_inode_index(new_dir, old_idx);
+   if (ret)
+   goto out_fail;
+   ret = btrfs_set_inode_index(old_dir, new_idx);
+   if (ret)
+   goto out_fail;
+
+   BTRFS_I(old_inode)-dir_index = 0ULL;
+   BTRFS_I(new_inode)-dir_index = 0ULL;
+
+   /* Reference for the source. */
+   if (unlikely(old_ino == BTRFS_FIRST_FREE_OBJECTID)) {
+   /* force full log commit if subvolume involved. */
+   btrfs_set_log_full_commit(root-fs_info, trans);
+   } else {
+   ret = btrfs_insert_inode_ref(trans, dest,
+new_dentry-d_name.name,
+new_dentry-d_name.len,
+old_ino,
+btrfs_ino(new_dir), old_idx);
+   if (ret)
+   goto out_fail;
+   btrfs_pin_log_trans(root);
+   }
+
+   /* And now for the dest. */
+   if (unlikely(new_ino == BTRFS_FIRST_FREE_OBJECTID)) {
+   /* force full log commit if subvolume involved. */
+   btrfs_set_log_full_commit(dest-fs_info, trans);
+   } else {
+   ret = btrfs_insert_inode_ref(trans, root,
+old_dentry-d_name.name,
+old_dentry-d_name.len,
+new_ino,
+btrfs_ino(old_dir), new_idx);
+   if (ret)
+   goto out_fail;
+   btrfs_pin_log_trans(dest);
+   }
+
+   /*
+* Update i-node version and ctime/mtime.
+*/
+   inode_inc_iversion(old_dir);
+   inode_inc_iversion(new_dir);
+   inode_inc_iversion(old_inode);
+   inode_inc_iversion(new_inode);
+   old_dir-i_ctime = old_dir-i_mtime = ctime;
+   new_dir-i_ctime = new_dir-i_mtime = ctime;
+   old_inode-i_ctime = ctime;
+   new_inode-i_ctime = ctime;
+
+   if (old_dentry-d_parent != new_dentry-d_parent) {
+   btrfs_record_unlink_dir(trans, old_dir, old_inode, 1);
+   btrfs_record_unlink_dir(trans, new_dir, new_inode, 1);
+   }
+
+   /* src is a subvolume */
+   if (unlikely(old_ino == BTRFS_FIRST_FREE_OBJECTID)) {
+   root_objectid = BTRFS_I(old_inode)-root-root_key.objectid;
+   ret = btrfs_unlink_subvol(trans, root, old_dir,
+ root_objectid,
+ old_dentry-d_name.name,
+

[PATCH] Btrfs: implement RENAME_EXCHANGE semantic

2015-04-01 Thread Davide Italiano
This is an attempt to implement RENAME_EXCHANGE in btrfs.
It survived basic testing and I think it's ready for others' feedback.
I'll stress test {and, or} rewrite it depending on people's comments.

Davide Italiano (1):
  Btrfs: RENAME_EXCHANGE semantic for renameat2()

 fs/btrfs/inode.c | 190 ++-
 1 file changed, 189 insertions(+), 1 deletion(-)

-- 
2.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs-progs: convert: Make ext*_image file obey datacsum setting.

2015-04-01 Thread Qu Wenruo
Before this patch, ext*_image is always set NODATACSUM inode flag.
However btrfs-convert will set normal file with DATACUSM flag by
default, and generate checksum for regular file extent.

Now, a regular file extent is shared by a btrfs file inode with DATACSUM
and ext*_image with NODATACSUM, and it has checksum in csum tree.
This will cause btrfsck complain about odd checksum, since ext*_image is
set NODATACSUM but has checksum generated from regular file extent.

This patch makes convert completely obey datacsum setting, meaning
btrfs-convert will generate csum for every file extent by default.

Reported-by: Tsutomu Itoh t-i...@jp.fujitsu.com
Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 btrfs-convert.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index 4dc33b3..d742307 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -1161,7 +1161,7 @@ static int create_image_file_range(struct 
btrfs_trans_handle *trans,
   struct btrfs_root *root, u64 objectid,
   struct btrfs_inode_item *inode,
   u64 start_byte, u64 end_byte,
-  ext2_filsys ext2_fs)
+  ext2_filsys ext2_fs, int datacsum)
 {
u32 blocksize = ext2_fs-blocksize;
u32 block = start_byte / blocksize;
@@ -1176,7 +1176,7 @@ static int create_image_file_range(struct 
btrfs_trans_handle *trans,
.disk_block = 0,
.num_blocks = 0,
.boundary   = (u64)-1,
-   .checksum   = 0,
+   .checksum   = datacsum,
.errcode= 0,
};
for (; start_byte  end_byte; block++, start_byte += blocksize) {
@@ -1191,7 +1191,7 @@ static int create_image_file_range(struct 
btrfs_trans_handle *trans,
if (data.num_blocks  0) {
ret = record_file_blocks(trans, root, objectid, inode,
 data.first_block, data.disk_block,
-data.num_blocks, 0);
+data.num_blocks, datacsum);
if (ret)
goto fail;
data.first_block += data.num_blocks;
@@ -1199,7 +1199,7 @@ static int create_image_file_range(struct 
btrfs_trans_handle *trans,
if (last_block  data.first_block) {
ret = record_file_blocks(trans, root, objectid, inode,
 data.first_block, 0, last_block -
-data.first_block, 0);
+data.first_block, datacsum);
if (ret)
goto fail;
}
@@ -1210,7 +1210,7 @@ fail:
  * Create the ext2fs image file.
  */
 static int create_ext2_image(struct btrfs_root *root, ext2_filsys ext2_fs,
-const char *name)
+const char *name, int datacsum)
 {
int ret;
struct btrfs_key key;
@@ -1231,11 +1231,14 @@ static int create_ext2_image(struct btrfs_root *root, 
ext2_filsys ext2_fs,
u64 last_byte;
u64 first_free;
u64 total_bytes;
+   u64 flags = BTRFS_INODE_READONLY;
u32 sectorsize = root-sectorsize;
 
total_bytes = btrfs_super_total_bytes(fs_info-super_copy);
first_free =  BTRFS_SUPER_INFO_OFFSET + sectorsize * 2 - 1;
first_free = ~((u64)sectorsize - 1);
+   if (!datacsum)
+   flags |= BTRFS_INODE_NODATASUM;
 
memset(btrfs_inode, 0, sizeof(btrfs_inode));
btrfs_set_stack_inode_generation(btrfs_inode, 1);
@@ -1243,8 +1246,7 @@ static int create_ext2_image(struct btrfs_root *root, 
ext2_filsys ext2_fs,
btrfs_set_stack_inode_nlink(btrfs_inode, 1);
btrfs_set_stack_inode_nbytes(btrfs_inode, 0);
btrfs_set_stack_inode_mode(btrfs_inode, S_IFREG | 0400);
-   btrfs_set_stack_inode_flags(btrfs_inode, BTRFS_INODE_NODATASUM |
-   BTRFS_INODE_READONLY);
+   btrfs_set_stack_inode_flags(btrfs_inode,  flags);
btrfs_init_path(path);
trans = btrfs_start_transaction(root, 1);
BUG_ON(!trans);
@@ -1271,6 +1273,12 @@ static int create_ext2_image(struct btrfs_root *root, 
ext2_filsys ext2_fs,
   key.objectid, sectorsize);
if (ret)
goto fail;
+   if (datacsum) {
+   ret = csum_disk_extent(trans, root, key.objectid,
+  sectorsize);
+   if (ret)
+   goto fail;
+   }
}
 
while(1) {
@@ -1323,7 +1331,8 @@ next:
if (bytenr  last_byte) {
ret = 

[PATCH 2/2] btrfs-progs: convert-test: Add test for converting ext* with regular file extent.

2015-04-01 Thread Qu Wenruo
Before previous patch, btrfs-convert will result fsck complain if there
is any regular file extent in newly converted btrfs.

Add test case for it.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 tests/convert-tests.sh | 85 --
 1 file changed, 55 insertions(+), 30 deletions(-)

diff --git a/tests/convert-tests.sh b/tests/convert-tests.sh
index f6b919d..5c4f22e 100644
--- a/tests/convert-tests.sh
+++ b/tests/convert-tests.sh
@@ -4,48 +4,73 @@
 # clean.
 #
 
-here=`pwd`
+unset top
+unset LANG
+LANG=C
+script_dir=$(dirname $(realpath $0))
+top=$(realpath $script_dir/../)
+TEST_DEV=${TEST_DEV:-}
+TEST_MNT=${TEST_MNT:-$top/tests/mnt}
+RESULT=$top/tests/convert-tests-results.txt
+IMAGE=$script_dir/test.img
 
-_fail()
-{
-   echo $* | tee -a convert-tests-results.txt
-   exit 1
-}
+source $top/tests/common
+export top
+export RESULT
+# For comprehensive convert test which needs write something into ext*
+export TEST_MNT
+export LANG
+
+rm -f $RESULT
+mkdir -p $TEST_MNT || _fail unable to create mount point on $TEST_MNT
+
+# test reply on btrfs-convert
+check_prereq btrfs-convert
+check_prereq btrfs
 
-rm -f convert-tests-results.txt
 
-test(){
+convert_test(){
echo [TEST]   $1
nodesize=$2
shift 2
-   echo creating ext image with: $*  convert-tests-results.txt
+   echo creating ext image with: $*  $RESULT
# 256MB is the smallest acceptable btrfs image.
-   rm -f $here/test.img  convert-tests-results.txt 21 \
+   rm -f $IMAGE  $RESULT 21 \
|| _fail could not remove test image file
-   truncate -s 256M $here/test.img  convert-tests-results.txt 21 \
+   truncate -s 256M $IMAGE  $RESULT 21 \
|| _fail could not create test image file
-   $* -F $here/test.img  convert-tests-results.txt 21 \
+   $* -F $IMAGE  $RESULT 21 \
|| _fail filesystem create failed
-   $here/btrfs-convert -N $nodesize $here/test.img \
-convert-tests-results.txt 21 \
+
+   # write a file with regular file extent
+   $SUDO_HELPER mount $IMAGE $TEST_MNT
+   $SUDO_HELPER dd if=/dev/zero bs=$nodesize count=4 of=$TEST_MNT/test \
+   1/dev/null 21
+   $SUDO_HELPER umount $TEST_MNT
+
+   # do convert test
+   $top/btrfs-convert -N $nodesize $script_dir/test.img \
+$RESULT 21 \
|| _fail btrfs-convert failed
-   $here/btrfs check $here/test.img  convert-tests-results.txt 21 \
+   $top/btrfs check $script_dir/test.img  $RESULT 21 \
|| _fail btrfs check detected errors
 }
 
+setup_root_helper
+
 # btrfs-convert requires 4k blocksize.
-test ext2 4k nodesize 4096 mke2fs -b 4096
-test ext3 4k nodesize 4096 mke2fs -j -b 4096
-test ext4 4k nodesize 4096 mke2fs -t ext4 -b 4096
-test ext2 8k nodesize 8192 mke2fs -b 4096
-test ext3 8k nodesize 8192 mke2fs -j -b 4096
-test ext4 8k nodesize 8192 mke2fs -t ext4 -b 4096
-test ext2 16k nodesize 16384 mke2fs -b 4096
-test ext3 16k nodesize 16384 mke2fs -j -b 4096
-test ext4 16k nodesize 16384 mke2fs -t ext4 -b 4096
-test ext2 32k nodesize 32768 mke2fs -b 4096
-test ext3 32k nodesize 32768 mke2fs -j -b 4096
-test ext4 32k nodesize 32768 mke2fs -t ext4 -b 4096
-test ext2 64k nodesize 65536 mke2fs -b 4096
-test ext3 64k nodesize 65536 mke2fs -j -b 4096
-test ext4 64k nodesize 65536 mke2fs -t ext4 -b 4096
+convert_test ext2 4k nodesize 4096 mke2fs -b 4096
+convert_test ext3 4k nodesize 4096 mke2fs -j -b 4096
+convert_test ext4 4k nodesize 4096 mke2fs -t ext4 -b 4096
+convert_test ext2 8k nodesize 8192 mke2fs -b 4096
+convert_test ext3 8k nodesize 8192 mke2fs -j -b 4096
+convert_test ext4 8k nodesize 8192 mke2fs -t ext4 -b 4096
+convert_test ext2 16k nodesize 16384 mke2fs -b 4096
+convert_test ext3 16k nodesize 16384 mke2fs -j -b 4096
+convert_test ext4 16k nodesize 16384 mke2fs -t ext4 -b 4096
+convert_test ext2 32k nodesize 32768 mke2fs -b 4096
+convert_test ext3 32k nodesize 32768 mke2fs -j -b 4096
+convert_test ext4 32k nodesize 32768 mke2fs -t ext4 -b 4096
+convert_test ext2 64k nodesize 65536 mke2fs -b 4096
+convert_test ext3 64k nodesize 65536 mke2fs -j -b 4096
+convert_test ext4 64k nodesize 65536 mke2fs -t ext4 -b 4096
-- 
2.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
On Wed, Apr 1, 2015 at 2:04 PM, Chris Murphy li...@colorremedies.com wrote:
 When I had this same btrfs check error, it was the exact inode number
 and same /etc/shadow file. I didn't diff the two shadow files, but I

That's too bizarre for words. Two folks, on two different systems,
getting btrfs problems on similar kernels on the exact same filepath.
In my case, the file was last frobbed by yum/rpm. Do we have a strange
interaction between a kernel regression and yum/rpm rubbing the
filesystem the wrong way?

BTW, I did not change/touch the file at all. My only fix action was
the btrfs check --repair mentioned earlier. Right now, on the booted
system I did

# uname -a
Linux tp-martin.remote-learner.net 3.18.9-200.fc21.x86_64 #1 SMP Mon
Mar 9 15:10:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
# btrfs scrub start -BrR   /
scrub done for 94637b35-a294-4be2-aa47-82c52d6d53ef
scrub started at Wed Apr  1 13:46:20 2015 and finished after 266 seconds
data_extents_scrubbed: 344155
tree_extents_scrubbed: 58048
data_bytes_scrubbed: 11896840192
tree_bytes_scrubbed: 951058432
read_errors: 0
csum_errors: 0
verify_errors: 0
no_csum: 20268
csum_discards: 254459
super_errors: 0
malloc_errors: 0
uncorrectable_errors: 0
unverified_errors: 0
corrected_errors: 0
last_physical: 23928504320

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
On Wed, Apr 1, 2015 at 12:16 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 On Wed, Apr 1, 2015 at 2:04 PM, Chris Murphy li...@colorremedies.com wrote:
 When I had this same btrfs check error, it was the exact inode number
 and same /etc/shadow file. I didn't diff the two shadow files, but I

 That's too bizarre for words. Two folks, on two different systems,
 getting btrfs problems on similar kernels on the exact same filepath.
 In my case, the file was last frobbed by yum/rpm. Do we have a strange
 interaction between a kernel regression and yum/rpm rubbing the
 filesystem the wrong way?

No idea, but it happened to me more than once, same inode number, same file.



 BTW, I did not change/touch the file at all. My only fix action was
 the btrfs check --repair mentioned earlier.

That won't fix it. Once errors 400 appears, at this point you have to
replace the affected file.





-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
On Wed, Apr 1, 2015 at 2:20 PM, Chris Murphy li...@colorremedies.com wrote:
 That won't fix it. Once errors 400 appears, at this point you have to
 replace the affected file.

Interesting.

Right now I am booting without problems. I have no evidence of
continued problems. What would I do to check whether I see an error
similar to yours on this fs?

Trying to ascertain whether my fs is cured, and whether we can learn
something else about this oddity...

cheers,


m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
On Wed, Apr 1, 2015 at 9:42 AM, Martin Langhoff
martin.langh...@gmail.com wrote:
# mount /dev/sda6 /mysysroot

 Apr 01 11:26:51 localhost kernel: BTRFS info (device sda6): disk space
 caching is enabled
 Apr 01 11:26:56 localhost kernel: BTRFS: checking UUID tree
 Apr 01 11:26:56 localhost kernel: SELinux: initialized (dev sda6, type
 btrfs), uses xattr

Right so it mounts fine with no errors from live media, but won't
mount at boot time. Same problem I was having.


# btrfs check /dev/sda6
 Checking filesystem on /dev/sda6
 UUID: 94637b35-a294-4be2-aa47-82c52d6d53ef
 checking extents
 checking free space cache
 checking fs roots
 root 256 inode 39841 errors 400, nbytes wrong

mount /dev/sda6 /mnt
btrfs inspect-internal inode-resolve 39841 /mnt

It should resolve a path to file for that inode. Chances are you can
just use cp to make a new copy of it, delete the original, and rename
the copy to match the original file name. Unmount. And now the btrfs
check error won't happen.



 [root@localhost liveuser]# btrfs check --repair /dev/sda6
 enabling repair mode
 Fixed 0 roots.
 Checking filesystem on /dev/sda6
 UUID: 94637b35-a294-4be2-aa47-82c52d6d53ef
 checking extents
 checking free space cache
 cache and super generation don't match, space cache will be invalidated
 checking fs roots
 root 256 inode 39841 errors 400, nbytes wrong
 found 7747100703 bytes used err is 1
 total csum bytes: 11912932
 total tree bytes: 476725248
 total fs tree bytes: 434733056
 total extent tree bytes: 22986752
 btree space waste bytes: 83962424
 file data blocks allocated: 30820143104
  referenced 11997040640
 Btrfs v3.17

Yeah I don't know what this errors 400 nbytes wrong means, but at the
moment btrfs-progs doesn't fix it.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ERROR: error removing the device '/dev/sdXN' - Inappropriate ioctl for device

2015-04-01 Thread Martin
Anand,

Thanks for picking up.

The devid option sounds good.


Thanks,
Martin



On 01/04/15 14:15, Anand Jain wrote:
 
 
  Looks like an option to use devid to delete a device would
  have mitigated the issue.  Also the error reported is no
  where near the reality. Will fix them.
 
  Thanks for reporting.
 
 Anand
 
 
 On 04/01/2015 06:54 PM, Martin wrote:
 On 01/04/15 08:06, Anand Jain wrote:

 btrfs device delete /dev/sdf5 /mnt/data2

 ERROR: error removing the device '/dev/sdf5' - Inappropriate ioctl
 for
 device

   very strange. 'btrfs fi show -m' shows btrfs fs(s) that are mounted.

 Looks like my /dev/sdf isn't responding with anything useful at all... A
 firmware crash?... (This is for a 256GB SSD.)


 # btrfs fi show -m

 [...]

 Label: 'btrfs_data2'  uuid: 3aaee716-b98b-4c86-ba5a-53456994f152
  Total devices 3 FS bytes used 159.31GiB
  devid1 size 206.47GiB used 206.02GiB path /dev/sdb5
  devid2 size 206.47GiB used 206.47GiB path /dev/sdd5
  devid3 size 206.47GiB used 206.47GiB path /dev/sdf5


 btrfs-progs v3.19.1



 # smartctl -i /dev/sdf
 smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.14.10-gentoo_s03a_11]
 (local build)
 Copyright (C) 2002-14, Bruce Allen, Christian Franke,
 www.smartmontools.org

 === START OF INFORMATION SECTION ===
 Vendor:   /7:0:0:0
 Product:
 Compliance:   SPC-5
 User Capacity:600,332,565,813,390,450 bytes [600 PB]
 Logical block size:   774843950 bytes
 scsiModePageOffset: response length too short, resp_len=47 offset=50
 bd_len=46
 scsiModePageOffset: response length too short, resp_len=47 offset=50
 bd_len=46
 Terminate command early due to bad response to IEC mode page
 A mandatory SMART command failed: exiting. To continue, add one or more
 '-T permissive' options.



 And btrfs is still running ok so far... I'll be swapping that device at
 the weekend. (First chance I have for a Sunday shutdown. :-( )


 Still... For administering btrfs, it is a little disturbing not to be
 able to remove/delete a device unless that device is both mounted and
 working...

 Or is that where the missing option comes in?



 Comments welcomed,

 Thanks,
 Martin
 -- 
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
On Wed, Apr 1, 2015 at 10:15 AM, Martin Langhoff
martin.langh...@gmail.com wrote:
 On Wed, Apr 1, 2015 at 11:42 AM, Martin Langhoff
 martin.langh...@gmail.com wrote:
 See below. Perhaps the newer kernel (in latest F21) has regressed in
 handling some kinds of errors during mount, or the dracut/systemd
 mounting process is less resilient than mounting under a fully booted
 system?

 This is getting even more interesting.

 Under 3.17.4-301.fc21.x86 from LiveUSB, I could mount, even repair the disk.

 Since the repair, the on-disk latest kernel (3.18.9-200.fc21) tries to
 boot, but dracut/systemd time out on mounting sysroot after waiting
 for quite a while. I don't get a dracut shell anymore so the failure
 mode has changed. I may try to set a breakpoint to force a shell.

 I do have an earlier F21 kernel on disk-- 3.18.7-200.fc21 -- and this
 boots the system without a glitch. After a complete boot with
 3.18.7.200, clean shutdown and booting into 3.18.9-200 is still
 broken, same failure mode.

 Will try to capture some info from a dracut breakpoint  (I'll try
 mount). At this point this really looks like a regression.

Yeah I don't know what's going on, but with a new file system, and
disabled i915 to avoid crashes, and thus no crashes since the new fs
was created, I get boot failure with 3.19.3 but not 3.19.2, and I
can't figure out why. I get the systemd cylon eye with 5 services
pending so I can't actually tell which one it's hung up on, but one of
them is looking for the fs volume UUID and apparently can't find it
which is completely bogus.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
On Wed, Apr 1, 2015 at 1:03 PM, Chris Murphy li...@colorremedies.com wrote:
 mount /dev/sda6 /mnt
 btrfs inspect-internal inode-resolve 39841 /mnt

on the booted system...
# uname -a
Linux tp-martin.remote-learner.net 3.18.9-200.fc21.x86_64 #1 SMP Mon
Mar 9 15:10:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
# btrfs inspect-internal inode-resolve 39841 /
//etc/shadow-
# diff -u /etc/shadow{,-}
--- /etc/shadow 2015-03-04 02:26:59.478255332 -0500
+++ /etc/shadow-2015-03-04 02:26:59.0 -0500
@@ -42,4 +42,3 @@
 systemd-timesync:!!:16498::
 systemd-network:!!:16498::
 systemd-resolve:!!:16498::
-systemd-bus-proxy:!!:16498::

Bizarre.

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fs: btrfs: Add missing include file

2015-04-01 Thread Chris Mason
On Sun, Mar 29, 2015 at 11:24 PM, Guenter Roeck li...@roeck-us.net 
wrote:

On Fri, Mar 13, 2015 at 01:58:46AM -0700, Guenter Roeck wrote:

 Building alpha:allmodconfig fails with

 fs/btrfs/inode.c: In function 'check_direct_IO':
 fs/btrfs/inode.c:8050:2: error: implicit declaration of function 
'iov_iter_alignment'


 due to a missing include file.

 Fixes: 3737c63e1fb0 (fs: move struct kiocb to fs.h)
 Cc: Christoph Hellwig h...@lst.de
 Signed-off-by: Guenter Roeck li...@roeck-us.net
 ---


This problem still affects the following builds as of today.

alpha:allmodconfig
i386:allyesconfig
i386:allmodconfig
m68k:allmodconfig
mips:allmodconfig
xtensa:allmodconfig

and thus probabably many other allmodconfig builds which I don't try 
to build.


This is getting really annoying, and prevents us from finding and 
fixing

other build problems.

It has been more than two weeks since I submitted the patch. This 
suggests
that the patch got lost otr that the Powers That Be don't care. Which 
one

is it ?

Should I request to revert 3737c63e1fb0 instead ?


I'll put the include into my branch for -next, thanks!

-chris



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
On Wed, Apr 1, 2015 at 12:29 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 On Wed, Apr 1, 2015 at 2:20 PM, Chris Murphy li...@colorremedies.com wrote:
 That won't fix it. Once errors 400 appears, at this point you have to
 replace the affected file.

 Interesting.

 Right now I am booting without problems. I have no evidence of
 continued problems. What would I do to check whether I see an error
 similar to yours on this fs?

 Trying to ascertain whether my fs is cured, and whether we can learn
 something else about this oddity...

Re-run the btrfs check. The error is still there even after a --repair.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: generic: test for discard properly discarding unused extents

2015-04-01 Thread Brian Foster
On Mon, Mar 30, 2015 at 03:11:06PM -0400, Jeff Mahoney wrote:
 This tests tests four conditions where discard can potentially not
 discard unused extents completely.
 
 We test, with -o discard and with fstrim, scenarios of removing many
 relatively small files and removing several large files.
 
 The important part of the two scenarios is that the large files must be
 large enough to span a blockgroup alone. It's possible for an
 entire block group to be emptied and dropped without an opportunity to
 discard individual extents as would happen with smaller files.
 
 The test confirms the discards have occured by using a sparse file
 mounted via loopback to punch holes and then check how many blocks
 are still allocated within the file.
 
 Signed-off-by: Jeff Mahoney je...@suse.com
 ---

The code looks mostly Ok to me, a few notes below. Those aside, this is
a longish test. It takes me about 8 minutes to run on my typical low end
vm.

Is the 1GB block group magic value mutable in any way, or is it a
hardcoded thing (for btrfs I presume)? It would be nice if we could
shrink that a bit. If not, perhaps there are some other ways to reduce
the runtime...

- Is there any reason a single discard or trim test instance must be all
large or small files? In other words, is there something that this
wouldn't catch if the 10GB were 50% filled with large files and %50 with
small files? That would allow us to trim the maximum on the range of
small file creation and only have two invocations instead of four.

- If the 1GB thing is in fact a btrfs thing, could we make the core test
a bit more size agnostic (e.g., perhaps pass the file count/size
values as parameters) and then scale the parameters up exclusively for
btrfs? For example, set defaults of fssize=1G, largefile=100MB,
smallfile=[512b-5MB] or something of that nature and override them to
the 10GB, 1GB, 32k-... values for btrfs? That way we don't need to write
as much data for fs' where it might not be necessary.

  tests/generic/326 | 164 
 ++
  tests/generic/326.out |   5 ++
  tests/generic/group   |   1 +
  3 files changed, 170 insertions(+)
  create mode 100644 tests/generic/326
  create mode 100644 tests/generic/326.out
 
 diff --git a/tests/generic/326 b/tests/generic/326
 new file mode 100644
 index 000..923a27f
 --- /dev/null
 +++ b/tests/generic/326
 @@ -0,0 +1,164 @@
 +#! /bin/bash
 +# FSQA Test No. 326
 +#
 +# This test uses a loopback mount with PUNCH_HOLE support to test
 +# whether discard operations are working as expected.
 +#
 +# It tests both -odiscard and fstrim.
 +#
 +# Copyright (C) 2015 SUSE. All Rights Reserved.
 +# Author: Jeff Mahoney je...@suse.com
 +#
 +# This program is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU General Public License as
 +# published by the Free Software Foundation.
 +#
 +# This program is distributed in the hope that it would be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +# GNU General Public License for more details.
 +#
 +# You should have received a copy of the GNU General Public License
 +# along with this program; if not, write the Free Software Foundation,
 +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
 +#---
 +#
 +
 +seq=`basename $0`
 +seqres=$RESULT_DIR/$seq
 +echo QA output created by $seq
 +
 +tmp=/tmp/$$
 +status=1 # failure is the default!
 +trap _cleanup; exit \$status 0 1 2 3 15
 +
 +loopdev=
 +tmpdir=
 +_cleanup()
 +{
 + [ -n $tmpdir ]  umount $tmpdir
 + [ -n $loopdev ]  losetup -d $loopdev
 +}
 +
 +# get standard environment, filters and checks
 +. ./common/rc
 +. ./common/filter
 +
 +# real QA test starts here
 +_need_to_be_root
 +_supported_fs generic
 +_supported_os Linux
 +_require_scratch
 +_require_fstrim
 +
 +rm -f $seqres.full
 +
 +_scratch_mkfs  $seqres.full
 +_require_fs_space $SCRATCH_MNT $(( 10 * 1024 * 1024 ))
 +_scratch_mount
 +
 +test_discard()
 +{
 + discard=$1
 + files=$2
 +
 + tmpfile=$SCRATCH_MNT/testfs.img.$$
 + tmpdir=$SCRATCH_MNT/testdir.$$
 + mkdir -p $tmpdir || _fail !!! failed to create temp mount dir
 +
 + # Create a sparse file to host the file system
 + dd if=/dev/zero of=$tmpfile bs=1M count=1 seek=10240  $seqres.full \
 + || _fail !!! failed to create fs image file

xfs_io -c truncate ... ?

 +
 + opts=
 + if [ $discard = discard ]; then
 + opts=-o discard
 + fi
 + losetup -f $tmpfile
 + loopdev=$(losetup -j $tmpfile|awk -F: '{print $1}')
 + _mkfs_dev $loopdev  $seqres.full
 + $MOUNT_PROG $opts $loopdev $tmpdir \
 + || _fail !!! failed to loopback mount
 +
 + if [ $files = large ]; then
 + # Create files larger than 1GB so each one occupies
 + # more than one 

Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
On Wed, Apr 1, 2015 at 2:54 PM, Chris Murphy li...@colorremedies.com wrote:
 Re-run the btrfs check. The error is still there even after a --repair.

Bingo! You are right the error persists.

It has no effect on my use of the system right now. Is anyone
interested in debugging this further?

cheers,



martin
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: generic: test for discard properly discarding unused extents

2015-04-01 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 4/1/15 2:44 PM, Brian Foster wrote:
 On Mon, Mar 30, 2015 at 03:11:06PM -0400, Jeff Mahoney wrote:
 This tests tests four conditions where discard can potentially 
 not discard unused extents completely.
 
 We test, with -o discard and with fstrim, scenarios of removing 
 many relatively small files and removing several large files.
 
 The important part of the two scenarios is that the large files 
 must be large enough to span a blockgroup alone. It's possible 
 for an entire block group to be emptied and dropped without an 
 opportunity to discard individual extents as would happen with 
 smaller files.
 
 The test confirms the discards have occured by using a sparse 
 file mounted via loopback to punch holes and then check how many 
 blocks are still allocated within the file.
 
 Signed-off-by: Jeff Mahoney je...@suse.com ---
 
 The code looks mostly Ok to me, a few notes below. Those aside, 
 this is a longish test. It takes me about 8 minutes to run on my 
 typical low end vm.

My test hardware is a 16 core / 16 GB RAM machine using a commodity
SSD. It ran pretty quickly.

I suppose I should start by explaining that I wrote the test to be
btrfs specific and then realized that the only thing that was
/actually/ btrfs-specific was the btrfs filesystem sync call. I ran it
on XFS to ensure it worked as expected, but didn't have any reason to
try to adapt it to work in any other environment.

 Is the 1GB block group magic value mutable in any way, or is it a 
 hardcoded thing (for btrfs I presume)? It would be nice if we could
 shrink that a bit. If not, perhaps there are some other ways to
 reduce the runtime...

It's not hardcoded for btrfs, but it is by far the most common sized
block group. I'd prefer to test what people are using.

 - Is there any reason a single discard or trim test instance must 
 be all large or small files? In other words, is there something 
 that this wouldn't catch if the 10GB were 50% filled with large 
 files and %50 with small files? That would allow us to trim the 
 maximum on the range of small file creation and only have two 
 invocations instead of four.

Only to draw attention to the obvious failure cases, which are
probably specific to btrfs. If a file spans an entire block group and
is removed, it skips the individual discards and depends on the block
group removal to discard the entire thing (this wasn't happening). If
there are lots of small files, it hits different paths, and I wanted
to make it clear which one each mode of the test was targeting.
Otherwise, whoever hits the failure is going to end up having to do it
manually, which defeats the purpose of having an automated test case, IM
O.

 - If the 1GB thing is in fact a btrfs thing, could we make the
 core test a bit more size agnostic (e.g., perhaps pass the file 
 count/size values as parameters) and then scale the parameters up 
 exclusively for btrfs? For example, set defaults of fssize=1G, 
 largefile=100MB, smallfile=[512b-5MB] or something of that nature 
 and override them to the 10GB, 1GB, 32k-... values for btrfs? That 
 way we don't need to write as much data for fs' where it might not 
 be necessary.

If someone wants to weigh in on what sane defaults for other file
systems might be, sure.

 tests/generic/326 | 164 
 ++ 
 tests/generic/326.out |   5 ++ tests/generic/group   |   1 + 3 
 files changed, 170 insertions(+) create mode 100644 
 tests/generic/326 create mode 100644 tests/generic/326.out
 
 diff --git a/tests/generic/326 b/tests/generic/326 new file mode 
 100644 index 000..923a27f --- /dev/null +++ 
 b/tests/generic/326 @@ -0,0 +1,164 @@ +#! /bin/bash +# FSQA Test 
 No. 326 +# +# This test uses a loopback mount with PUNCH_HOLE 
 support to test +# whether discard operations are working as 
 expected. +# +# It tests both -odiscard and fstrim. +# +# 
 Copyright (C) 2015 SUSE. All Rights Reserved. +# Author: Jeff 
 Mahoney je...@suse.com +# +# This program is free software;
 you can redistribute it and/or +# modify it under the terms of
 the GNU General Public License as +# published by the Free
 Software Foundation. +# +# This program is distributed in the
 hope that it would be useful, +# but WITHOUT ANY WARRANTY;
 without even the implied warranty of +# MERCHANTABILITY or
 FITNESS FOR A PARTICULAR PURPOSE.  See the +# GNU General Public
 License for more details. +# +# You should have received a copy
 of the GNU General Public License +# along with this program; if
 not, write the Free Software Foundation, +# Inc.,  51 Franklin
 St, Fifth Floor, Boston, MA  02110-1301  USA 
 +#---
- 


 
+#
 + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output 
 created by $seq + +tmp=/tmp/$$ +status=1# failure is the 
 default! +trap _cleanup; exit \$status 0 1 2 3 15 + +loopdev= 
 +tmpdir= +_cleanup() +{ +[ -n 

Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
On Wed, Apr 1, 2015 at 11:26 AM, Martin Langhoff
martin.langh...@gmail.com wrote:
 On Wed, Apr 1, 2015 at 1:03 PM, Chris Murphy li...@colorremedies.com wrote:
 mount /dev/sda6 /mnt
 btrfs inspect-internal inode-resolve 39841 /mnt

 on the booted system...
 # uname -a
 Linux tp-martin.remote-learner.net 3.18.9-200.fc21.x86_64 #1 SMP Mon
 Mar 9 15:10:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
 # btrfs inspect-internal inode-resolve 39841 /
 //etc/shadow-
 # diff -u /etc/shadow{,-}
 --- /etc/shadow 2015-03-04 02:26:59.478255332 -0500
 +++ /etc/shadow-2015-03-04 02:26:59.0 -0500
 @@ -42,4 +42,3 @@
  systemd-timesync:!!:16498::
  systemd-network:!!:16498::
  systemd-resolve:!!:16498::
 -systemd-bus-proxy:!!:16498::

 Bizarre.

When I had this same btrfs check error, it was the exact inode number
and same /etc/shadow file. I didn't diff the two shadow files, but I
the the cp mv rm routine, and then the system booted. Goofy cakes.
It's almost like an April Fools joke.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: generic: test for discard properly discarding unused extents

2015-04-01 Thread Dave Chinner
On Wed, Apr 01, 2015 at 03:01:07PM -0400, Jeff Mahoney wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 4/1/15 2:44 PM, Brian Foster wrote:
  On Mon, Mar 30, 2015 at 03:11:06PM -0400, Jeff Mahoney wrote:
  This tests tests four conditions where discard can potentially 
  not discard unused extents completely.
  
  We test, with -o discard and with fstrim, scenarios of removing 
  many relatively small files and removing several large files.
  
  The important part of the two scenarios is that the large files 
  must be large enough to span a blockgroup alone. It's possible 
  for an entire block group to be emptied and dropped without an 
  opportunity to discard individual extents as would happen with 
  smaller files.
  
  The test confirms the discards have occured by using a sparse 
  file mounted via loopback to punch holes and then check how many 
  blocks are still allocated within the file.
  
  Signed-off-by: Jeff Mahoney je...@suse.com ---
  
  The code looks mostly Ok to me, a few notes below. Those aside, 
  this is a longish test. It takes me about 8 minutes to run on my 
  typical low end vm.
 
 My test hardware is a 16 core / 16 GB RAM machine using a commodity
 SSD. It ran pretty quickly.

Yup, I have a test VM like that, too. However, like many other
people, I also have small VMs that share single spindles with other
test VMs, so we need to cater for them, too.

  +  if [ $FSTYP = btrfs ]; then
  +  _run_btrfs_util_prog filesystem sync $tmpdir
  +  fi
  +  sync
  +  sync
  
  Any reason for the double syncs?
 
 Superstition? IIRC, at one point in what is probably the ancient past,
 btrfs needed two syncs to be safe. I kept running into false failures
 without both a sync and the btrfs filesystem sync, so I just hit the
 no really, just do it button.

Urk. If btrfs requires two sync passes to really sync data/metadata,
then that's a bug that needs to be fixed. Let's not encode
superstition or work around bugs that really should be fixed in the
test code

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Upgrade to 3.19.2 Kernel fails to boot

2015-04-01 Thread Rich Freeman
On Wed, Apr 1, 2015 at 2:50 AM, Anand Jain anand.j...@oracle.com wrote:

 Eric found something like this and has a fix with in the email.
 Sub: I think btrfs: fix leak of path in btrfs_find_item broke stable
 trees ...


I don't mind trying this patch if the maintainers recommend it.  I'm
still getting panics every few days and 3.18.10 won't mount my root
filesystem, so I've been running on 3.18.8.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Test message...

2015-04-01 Thread Jim Mills
I am attempting to see if this message goes through.  My first one got
returned as I didn't know gmail defaulted to HTML.

My second and third never seemed distributed.

Are attachments allowed?

According to this,
https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list, they are,
but other things I have seen seem to say they are not.


--
Jim Mills (jmills...@gmail.com)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fs: btrfs: Add missing include file

2015-04-01 Thread Guenter Roeck

On 04/01/2015 12:28 PM, Chris Mason wrote:

On Sun, Mar 29, 2015 at 11:24 PM, Guenter Roeck li...@roeck-us.net wrote:

On Fri, Mar 13, 2015 at 01:58:46AM -0700, Guenter Roeck wrote:

 Building alpha:allmodconfig fails with

 fs/btrfs/inode.c: In function 'check_direct_Excellent idea. Done,IO':
 fs/btrfs/inode.c:8050:2: error: implicit declaration of function 
'iov_iter_alignment'

 due to a missing include file.

 Fixes: 3737c63e1fb0 (fs: move struct kiocb to fs.h)
 Cc: Christoph Hellwig h...@lst.de
 Signed-off-by: Guenter Roeck li...@roeck-us.net
 ---


This problem still affects the following builds as of today.

alpha:allmodconfig
i386:allyesconfig
i386:allmodconfig
m68k:allmodconfig
mips:allmodconfig
xtensa:allmodconfig

and thus probabably many other allmodconfig builds which I don't try to build.

This is getting really annoying, and prevents us from finding and fixing
other build problems.

It has been more than two weeks since I submitted the patch. This suggests
that the patch got lost otr that the Powers That Be don't care. Which one
is it ?

Should I request to revert 3737c63e1fb0 instead ?


I'll put the include into my branch for -next, thanks!



I have not seen the problem in the latest -next build,
which presumably means that some other patch must have
fixed the problem or at least hides it now. No idea
which one, though.

Guenter

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fs: btrfs: Add missing include file

2015-04-01 Thread Chris Mason
On Wed, Apr 1, 2015 at 3:58 PM, Guenter Roeck li...@roeck-us.net 
wrote:

On 04/01/2015 12:28 PM, Chris Mason wrote:
On Sun, Mar 29, 2015 at 11:24 PM, Guenter Roeck li...@roeck-us.net 
wrote:

On Fri, Mar 13, 2015 at 01:58:46AM -0700, Guenter Roeck wrote:

 Building alpha:allmodconfig fails with

 fs/btrfs/inode.c: In function 'check_direct_Excellent idea. 
Done,IO':
 fs/btrfs/inode.c:8050:2: error: implicit declaration of function 
'iov_iter_alignment'


 due to a missing include file.

 Fixes: 3737c63e1fb0 (fs: move struct kiocb to fs.h)
 Cc: Christoph Hellwig h...@lst.de
 Signed-off-by: Guenter Roeck li...@roeck-us.net
 ---


This problem still affects the following builds as of today.

alpha:allmodconfig
i386:allyesconfig
i386:allmodconfig
m68k:allmodconfig
mips:allmodconfig
xtensa:allmodconfig

and thus probabably many other allmodconfig builds which I don't 
try to build.


This is getting really annoying, and prevents us from finding and 
fixing

other build problems.

It has been more than two weeks since I submitted the patch. This 
suggests
that the patch got lost otr that the Powers That Be don't care. 
Which one

is it ?

Should I request to revert 3737c63e1fb0 instead ?


I'll put the include into my branch for -next, thanks!



I have not seen the problem in the latest -next build,
which presumably means that some other patch must have
fixed the problem or at least hides it now. No idea
which one, though.


It's not immediately obvious what might have fixed it, so I'll keep 
this patch in my -next for today at least ;)


-chris



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
Related bugs:

https://bugzilla.kernel.org/show_bug.cgi?id=68411
https://bugzilla.redhat.com/show_bug.cgi?id=1037963

The RHBZ one also mentioned the shadow file.

Anyway, it seems to be a somewhat known problem, but it's just not
known yet what causes it.

Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
On Wed, Apr 1, 2015 at 1:23 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 On Wed, Apr 1, 2015 at 2:54 PM, Chris Murphy li...@colorremedies.com wrote:
 Re-run the btrfs check. The error is still there even after a --repair.

 Bingo! You are right the error persists.

 It has no effect on my use of the system right now. Is anyone
 interested in debugging this further?

400 errors, nbytes wrong, isn't repaired by current btrfs check
https://bugzilla.kernel.org/show_bug.cgi?id=90071

What's interesting in that bug report that I'd forgotten about?

# btrfs inspect inode 804 /mnt/root
/mnt/root/etc/shadow-

Different inode number, but the shadow file is affected. In every
single case I've had now (about 1/2 dozen) with this errors 400
message, it's involved the shadow file. I have no idea what's going on
between Btrfs and the shadow file, but something seems to be. Or it's
quite a coincidence.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html