Re: [PATCH 4/9] staging/lustre: clean up and remove libcfs/linux/linux-fs.c

2013-06-04 Thread Peng Tao
On Tue, Jun 4, 2013 at 4:32 PM, Dan Carpenter dan.carpen...@oracle.com wrote:
 On Mon, Jun 03, 2013 at 09:58:17PM +0800, Peng Tao wrote:
  int libcfs_kkuc_msg_put(struct file *filp, void *payload)
  {
   struct kuc_hdr *kuch = (struct kuc_hdr *)payload;
 + ssize_t count = kuch-kuc_msglen;
 + loff_t offset = 0;
 + mm_segment_t fs;
   int rc = -ENOSYS;

   if (filp == NULL || IS_ERR(filp))
 @@ -165,11 +168,18 @@ int libcfs_kkuc_msg_put(struct file *filp, void 
 *payload)
   return -ENOSYS;
   }

 - {
 - loff_t offset = 0;
 - rc = filp_user_write(filp, payload, kuch-kuc_msglen,
 -  offset);
 + fs = get_fs();
 + set_fs(KERNEL_DS);
 + while ((ssize_t)count  0) {
 + rc = vfs_write(filp, (const void __user *)payload,
 +count, offset);
 + if (rc  0)
 + break;
 + count -= rc;
 + payload += rc;
 + rc = 0;
   }
 + set_fs(fs);

   if (rc  0)
   CWARN(message send failed (%d)\n, rc);

 This was all there in the original code, it has just been shifted.
 Still, I had some questions about it.  payload is a pointer to
 kernel stack memory, wouldn't the access_ok() check in vfs_write()
 fail every time on x86?

Thanks for reviewing. I am not familiar with access_ok() but I think
you are right and I'll test to see if it really breaks. In the
meantime, I took a look at other kernel callers of vfs_write() and
btrfs is also calling vfs_write() to write kernel stack memory
(fs/btrfs/send.c: send_header-write_buf-vfs_write). So I CC'ed btrfs
list in case someone knows better than I do.

 Some of the casting is not needed.  No need to cast count because
 it is already ssize_t.  I haven't tested but I think Sparse will
 complain about the __user cast until __force is added.  No need for
 the cast to const.

Thanks. I will fix it.

 I worry about partial writes and that this could be a forever loop
 but I don't know enough about anything to say if that's possible.
 Probably not.

For partial writes, both payload and count are advanced properly. So
it won't loop forever.

Thanks,
Tao
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/9] staging/lustre: clean up and remove libcfs/linux/linux-fs.c

2013-06-04 Thread Peng Tao
On Tue, Jun 4, 2013 at 6:23 PM, Peng Tao bergw...@gmail.com wrote:
 On Tue, Jun 4, 2013 at 4:32 PM, Dan Carpenter dan.carpen...@oracle.com 
 wrote:
 On Mon, Jun 03, 2013 at 09:58:17PM +0800, Peng Tao wrote:
  int libcfs_kkuc_msg_put(struct file *filp, void *payload)
  {
   struct kuc_hdr *kuch = (struct kuc_hdr *)payload;
 + ssize_t count = kuch-kuc_msglen;
 + loff_t offset = 0;
 + mm_segment_t fs;
   int rc = -ENOSYS;

   if (filp == NULL || IS_ERR(filp))
 @@ -165,11 +168,18 @@ int libcfs_kkuc_msg_put(struct file *filp, void 
 *payload)
   return -ENOSYS;
   }

 - {
 - loff_t offset = 0;
 - rc = filp_user_write(filp, payload, kuch-kuc_msglen,
 -  offset);
 + fs = get_fs();
 + set_fs(KERNEL_DS);
 + while ((ssize_t)count  0) {
 + rc = vfs_write(filp, (const void __user *)payload,
 +count, offset);
 + if (rc  0)
 + break;
 + count -= rc;
 + payload += rc;
 + rc = 0;
   }
 + set_fs(fs);

   if (rc  0)
   CWARN(message send failed (%d)\n, rc);

 This was all there in the original code, it has just been shifted.
 Still, I had some questions about it.  payload is a pointer to
 kernel stack memory, wouldn't the access_ok() check in vfs_write()
 fail every time on x86?

I've run some tests and it turns out that access_ok() doesn't deny
vfs_write() when passed in allocated kernel memory (e.g.,
mdc_wr_kuc()). There is indeed a bug in mdc_wr_kuc() for which I will
send a fix later. However, I cannot test the specific stack memory
write path because of a known issue of Lustre
(https://jira.hpdd.intel.com/browse/LU-2489). And FYI, the Lustre hsm
feature is still under development.

Thanks,
Tao

 Thanks for reviewing. I am not familiar with access_ok() but I think
 you are right and I'll test to see if it really breaks. In the
 meantime, I took a look at other kernel callers of vfs_write() and
 btrfs is also calling vfs_write() to write kernel stack memory
 (fs/btrfs/send.c: send_header-write_buf-vfs_write). So I CC'ed btrfs
 list in case someone knows better than I do.

 Some of the casting is not needed.  No need to cast count because
 it is already ssize_t.  I haven't tested but I think Sparse will
 complain about the __user cast until __force is added.  No need for
 the cast to const.

 Thanks. I will fix it.

 I worry about partial writes and that this could be a forever loop
 but I don't know enough about anything to say if that's possible.
 Probably not.

 For partial writes, both payload and count are advanced properly. So
 it won't loop forever.

 Thanks,
 Tao
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] introduce list_for_each_entry_del

2013-06-04 Thread Christoph Hellwig
On Mon, Jun 03, 2013 at 03:55:55PM -0400, J??rn Engel wrote:
 Actually, when I compare the two invocations, I prefer the
 list_for_each_entry_del() variant over list_pop_entry().
 
 while ((ref = list_pop_entry(prefs, struct __prelim_ref, list))) {
 list_for_each_entry_del(ref, prefs, list) {
 
 Christoph?

I really don't like something that looks like an iterator (*for_each*)
to modify a list.  Maybe it's just me, so I'd love to hear others chime
in.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] introduce list_for_each_entry_del

2013-06-04 Thread Chris Mason
Quoting Christoph Hellwig (2013-06-04 10:48:56)
 On Mon, Jun 03, 2013 at 03:55:55PM -0400, J??rn Engel wrote:
  Actually, when I compare the two invocations, I prefer the
  list_for_each_entry_del() variant over list_pop_entry().
  
  while ((ref = list_pop_entry(prefs, struct __prelim_ref, list))) {
  list_for_each_entry_del(ref, prefs, list) {
  
  Christoph?
 
 I really don't like something that looks like an iterator (*for_each*)
 to modify a list.  Maybe it's just me, so I'd love to hear others chime
 in.

Have to agree with Christoph.  I just couldn't put my finger on why I
didn't like it until I saw the list_pop_entry suggestion.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


wait_block_group_cache_progress() waits forever in case of drive failure

2013-06-04 Thread Alex Lyakas
Greetings all,
when testing drive failures, I occasionally hit the following hang:

# Block group is being cached-in by caching_thread()
# caching_thread() experiences an error, e.g., in btrfs_search_slot,
because of drive failure:
ret = btrfs_search_slot(NULL, extent_root, key, path, 0, 0);
if (ret  0)
goto err;

# caching thread exits:
err:
btrfs_free_path(path);
up_read(fs_info-extent_commit_sem);

free_excluded_extents(extent_root, block_group);

mutex_unlock(caching_ctl-mutex);
out:
wake_up(caching_ctl-wait);

put_caching_control(caching_ctl);
btrfs_put_block_group(block_group);

However, wait_block_group_cache_progress() is still stuck in a stack like this:
[816ec509] schedule+0x29/0x70
[a044bd42] wait_block_group_cache_progress+0xe2/0x110 [btrfs]
[8107fc10] ? add_wait_queue+0x60/0x60
[8107fc10] ? add_wait_queue+0x60/0x60
[a04568d6] find_free_extent+0x306/0xb90 [btrfs]
[a04462ee] ? btrfs_search_slot+0x2fe/0x820 [btrfs]
[a0457200] btrfs_reserve_extent+0xa0/0x1b0 [btrfs]
...
because of:
wait_event(caching_ctl-wait, block_group_cache_done(cache) ||
   (cache-free_space_ctl-free_space = num_bytes));

But cache-cached never becomes BTRFS_CACHE_FINISHED, and
cache-free_space_ctl-free_space will also not grow enough, so the
wait never finishes.
At this point, the system totally hangs.

Same problem can happen with wait_block_group_cache_done().

I am thinking: can we add additional condition, like:
wait_event(caching_ctl-wait,
   test_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state) ||
   block_group_cache_done(cache) ||
   (cache-free_space_ctl-free_space = num_bytes));

So that when transaction aborts, FS is marked as bad, and then all
these waits will complete, so that the user can unmount?

Or some other way to fix this problem?

Thanks,
Alex.

P.S: should I open a bugzilla for this?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] introduce list_for_each_entry_del

2013-06-04 Thread Arne Jansen
On 06/04/13 16:53, Chris Mason wrote:
 Quoting Christoph Hellwig (2013-06-04 10:48:56)
 On Mon, Jun 03, 2013 at 03:55:55PM -0400, J??rn Engel wrote:
 Actually, when I compare the two invocations, I prefer the
 list_for_each_entry_del() variant over list_pop_entry().

 while ((ref = list_pop_entry(prefs, struct __prelim_ref, list))) {
 list_for_each_entry_del(ref, prefs, list) {

 Christoph?

 I really don't like something that looks like an iterator (*for_each*)
 to modify a list.  Maybe it's just me, so I'd love to hear others chime
 in.
 
 Have to agree with Christoph.  I just couldn't put my finger on why I
 didn't like it until I saw the list_pop_entry suggestion.

list_pop_each_entry?

 
 -chris
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] introduce list_for_each_entry_del

2013-06-04 Thread Jörn Engel
On Tue, 4 June 2013 22:09:13 +0200, Arne Jansen wrote:
 On 06/04/13 16:53, Chris Mason wrote:
  Quoting Christoph Hellwig (2013-06-04 10:48:56)
  On Mon, Jun 03, 2013 at 03:55:55PM -0400, J??rn Engel wrote:
  Actually, when I compare the two invocations, I prefer the
  list_for_each_entry_del() variant over list_pop_entry().
 
  while ((ref = list_pop_entry(prefs, struct __prelim_ref, list))) 
  {
  list_for_each_entry_del(ref, prefs, list) {
 
  Christoph?
 
  I really don't like something that looks like an iterator (*for_each*)
  to modify a list.  Maybe it's just me, so I'd love to hear others chime
  in.
  
  Have to agree with Christoph.  I just couldn't put my finger on why I
  didn't like it until I saw the list_pop_entry suggestion.
 
 list_pop_each_entry?

Or while_list_drain?

I agree the *for_each* cover didn't exactly match the content.  But if
we find a better name and you are not opposed to the concept, ...

Jörn

--
tglx1 thinks that joern should get a (TM) for Thinking Is Hard
-- Thomas Gleixner
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: fix incorrect root backref errors in fsck

2013-06-04 Thread Josef Bacik
A user reported that fsck was complaining about unresolved refs for some
snapshots.  You can reproduce this by doing

mkfs.btrfs /dev/sdb
mount /dev/sdb /mnt
btrfs subvol snap /mnt/ /mnt/a
btrfs subvol snap /mnt/ /mnt/b
btrfs subvol del /mnt/a
umount /mnt
btrfsck /dev/sdb

and you'd get this

unresolved ref root 258 dir 256 index 2 namelen 1 name a error 600

because snapshot b has a dir item that points to a.  Except we encode in our
root ref the dirid of the ref holder, and if it doesn't match we just give it
back a empty directory since we can't hardlink directories.  This makes the
check in btrfsck bogus, when we delete a we remove the ref key for it so any
lookups into /mnt/b/a will just give a blank directory as it's supposed to.  Fix
this by only saying the backref is reachable if there is both a DIR_ITEM and a
REF_KEY for the given root.  With this patch I no longer see errors when running
this reproducer.  Thanks,

Signed-off-by: Josef Bacik jba...@fusionio.com
---
 cmds-check.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index bbef89a..4083298 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -1507,14 +1507,16 @@ static int add_root_backref(struct cache_tree 
*root_cache,
}
 
if (item_type == BTRFS_DIR_ITEM_KEY) {
+   if (backref-found_forward_ref)
+   rec-found_ref++;
backref-found_dir_item = 1;
-   backref-reachable = 1;
-   rec-found_ref++;
} else if (item_type == BTRFS_DIR_INDEX_KEY) {
backref-found_dir_index = 1;
} else if (item_type == BTRFS_ROOT_REF_KEY) {
if (backref-found_forward_ref)
backref-errors |= REF_ERR_DUP_ROOT_REF;
+   else if (backref-found_dir_item)
+   rec-found_ref++;
backref-found_forward_ref = 1;
} else if (item_type == BTRFS_ROOT_BACKREF_KEY) {
if (backref-found_back_ref)
@@ -1524,6 +1526,8 @@ static int add_root_backref(struct cache_tree *root_cache,
BUG_ON(1);
}
 
+   if (backref-found_forward_ref  backref-found_dir_item)
+   backref-reachable = 1;
return 0;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Recovering from btrfsck --init-csum-tree

2013-06-04 Thread Robin Kreis
Hello,

how can I recover from running btrfsck --init-csum-tree on my 2TB
btrfs?  Every attempt to read a file results in no checksum being
found, followed by a checksum mismatch which leads to the data block
being zeroed out (see label zeroit in inode.c).  My current fix is
simply skipping the whole checksum validation, which enables me to
read my files again.  I still think at least one of the following
should happen:

1. Add a warning that --init-csum-tree is pretty desctructive.
2. Make --init-csum-tree mark all inodes with INODE_NODATASUM.
3. Have some global flag or mount option to disable CRC checks.

What do you think?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-06-04 Thread Zach Brown
Hi gang,

I finally sat down to fix that readdir hang that has been in the back
of my mind for a while.  I *hope* that the fix is pretty simple: just
don't manufacture a fake f_pos, I *think* we can abuse f_version as an
indicator that we shouldn't return entries.  Does this look reasonable?

We still have the problem that we can generate valid large f_pos values
that can confuse 32bit userspace, but that's a different problem.  I
think we'll want filldir generation of EOVERFLOW like what exists for
large inodes. 

The rest of the patches are cleanups that I saw when absorbing the
code.  It's all lightly tested with xfstests but it wouldn't surprise
me if I missed something so review is appreciated.

Thanks!

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] btrfs: set readdir f_pos only after filldir

2013-06-04 Thread Zach Brown
The only time we need to advance f_pos is after we've successfully given
a result to userspace via filldir.  This simplification gets rid of the
is_curr variable used to update f_pos for the delayed item readdir
entries.

Signed-off-by: Zach Brown z...@redhat.com
---
 fs/btrfs/delayed-inode.c |  5 +++--
 fs/btrfs/inode.c | 17 +++--
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 5615eac..4d846a2 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1696,8 +1696,6 @@ int btrfs_readdir_delayed_dir_index(struct file *filp, 
void *dirent,
continue;
}
 
-   filp-f_pos = curr-key.offset;
-
di = (struct btrfs_dir_item *)curr-data;
name = (char *)(di + 1);
name_len = le16_to_cpu(di-name_len);
@@ -1708,6 +1706,9 @@ int btrfs_readdir_delayed_dir_index(struct file *filp, 
void *dirent,
over = filldir(dirent, name, name_len, curr-key.offset,
   location.objectid, d_type);
 
+   if (!over)
+   filp-f_pos = curr-key.offset + 1;
+
if (atomic_dec_and_test(curr-refs))
kfree(curr);
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b11a95e..6a5784b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5007,7 +5007,6 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
char tmp_name[32];
char *name_ptr;
int name_len;
-   int is_curr = 0;/* filp-f_pos points to the current index? */
 
/* FIXME, use a real flag for deciding about the key type */
if (root-fs_info-tree_root == root)
@@ -5076,9 +5075,6 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
  found_key.offset))
goto next;
 
-   filp-f_pos = found_key.offset;
-   is_curr = 1;
-
di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
di_cur = 0;
di_total = btrfs_item_size(leaf, item);
@@ -5130,6 +5126,9 @@ skip:
 
if (over)
goto nopos;
+
+   filp-f_pos = found_key.offset + 1;
+
di_len = btrfs_dir_name_len(leaf, di) +
 btrfs_dir_data_len(leaf, di) + sizeof(*di);
di_cur += di_len;
@@ -5140,23 +5139,21 @@ next:
}
 
if (key_type == BTRFS_DIR_INDEX_KEY) {
-   if (is_curr)
-   filp-f_pos++;
ret = btrfs_readdir_delayed_dir_index(filp, dirent, filldir,
  ins_list);
if (ret)
goto nopos;
}
 
-   /* Reached end of directory/root. Bump pos past the last item. */
-   if (key_type == BTRFS_DIR_INDEX_KEY)
+   /* Reached end of directory/root */
+   if (key_type == BTRFS_DIR_INDEX_KEY) {
/*
 * 32-bit glibc will use getdents64, but then strtol -
 * so the last number we can serve is this.
 */
filp-f_pos = 0x7fff;
-   else
-   filp-f_pos++;
+   }
+
 nopos:
ret = 0;
 err:
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] btrfs: fix readdir hang with offsets past INT_MAX

2013-06-04 Thread Zach Brown
To work around bugs in userspace btrfs_real_readdir() sets f_pos to an
offset that will prevent any future entries from being returned once the
last entry is hit.  Over time this supposedly impossible offset was
decreased from the initial U64_MAX to INT_MAX to appease 32bit
userspace.

  https://oss.oracle.com/pipermail/btrfs-devel/2008-January/000437.html
  commit c2a8b6e11009398ca9363d8ba8d4e7e40fb897fd
  commit 89f135d8b53bcccafd91a075366d2704ba257cf3
  commit 406266ab9ac8ed8b085c58aacd9e3161480dc5d5

The remaining problem is that resetting f_pos to some impossible offset
causes userspace to spin when it's, well, possible for an entry to have
that offset.  It takes a single thread on a modern cpu about nine hours
of constant file creation and removal to hit an offset past INT_MAX on a
single spindle.

Instead of trying to find an impossible f_pos that doesn't break various
layers of the stack, let's use f_version to indicate that readdir should
stop returning entries until seek changes f_pos and clears f_version.

Signed-off-by: Zach Brown z...@redhat.com
---
 fs/btrfs/inode.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6a5784b..e6e2b86 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4983,6 +4983,16 @@ unsigned char btrfs_filetype_table[] = {
DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
 };
 
+/*
+ * There have been buggy applications that can't handle one readdir pass
+ * returning the same name for different inodes that are unlinked and
+ * re-created during the readdir pass.  This was partially worked around
+ * by trying to set f_pos to magic values that broke either 32bit userspace
+ * or entries with huge offsets.  Now we set f_version to a magic value
+ * which prevents readdir results until seek resets f_pos and f_version.
+ */
+#define BTRFS_READDIR_EOF ~0ULL
+
 static int btrfs_real_readdir(struct file *filp, void *dirent,
  filldir_t filldir)
 {
@@ -5008,6 +5018,9 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
char *name_ptr;
int name_len;
 
+   if (filp-f_version == BTRFS_READDIR_EOF)
+   return 0;
+
/* FIXME, use a real flag for deciding about the key type */
if (root-fs_info-tree_root == root)
key_type = BTRFS_DIR_ITEM_KEY;
@@ -5145,14 +5158,9 @@ next:
goto nopos;
}
 
-   /* Reached end of directory/root */
-   if (key_type == BTRFS_DIR_INDEX_KEY) {
-   /*
-* 32-bit glibc will use getdents64, but then strtol -
-* so the last number we can serve is this.
-*/
-   filp-f_pos = 0x7fff;
-   }
+   /* prevent further readdir results without seeking once we hit EOF */
+   if (key_type == BTRFS_DIR_INDEX_KEY)
+   filp-f_version = BTRFS_READDIR_EOF;
 
 nopos:
ret = 0;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] btrfs: simplify finding next/prev delayed items

2013-06-04 Thread Zach Brown
I'd like to use the currently unused next/prev arguments to
__btrfs_lookup_delayed_item() in a future patch.  I noticed that the
code could be simplified.

We don't need to use rb_next() or rb_prev() to walk back up the tree
once we've failed to find the key at a leaf.  We can record the most
recent next and prev items as we descend and return those when the
search fails.

Signed-off-by: Zach Brown z...@redhat.com
---
 fs/btrfs/delayed-inode.c | 54 +++-
 1 file changed, 21 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index bc753fc..67e0f9f 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -326,11 +326,11 @@ static struct btrfs_delayed_item 
*btrfs_alloc_delayed_item(u32 data_len)
  * __btrfs_lookup_delayed_item - look up the delayed item by key
  * @delayed_node: pointer to the delayed node
  * @key: the key to look up
- * @prev:used to store the prev item if the right item isn't found
- * @next:used to store the next item if the right item isn't found
+ * @prev:set to the item before @key when it isn't found
+ * @next:set to the item after @key when it isn't found
  *
- * Note: if we don't find the right item, we will return the prev item and
- * the next item.
+ * @prev and @next are only correct if the search terminates at a leaf so
+ * they're only set if an item with @key is not found.
  */
 static struct btrfs_delayed_item *__btrfs_lookup_delayed_item(
struct rb_root *root,
@@ -338,48 +338,36 @@ static struct btrfs_delayed_item 
*__btrfs_lookup_delayed_item(
struct btrfs_delayed_item **prev,
struct btrfs_delayed_item **next)
 {
-   struct rb_node *node, *prev_node = NULL;
+   struct rb_node *node = root-rb_node;
+   struct btrfs_delayed_item *prev_item = NULL;
+   struct btrfs_delayed_item *next_item = NULL;
struct btrfs_delayed_item *delayed_item = NULL;
-   int ret = 0;
+   int ret;
 
-   node = root-rb_node;
+   if (prev)
+   *prev = NULL;
+   if (next)
+   *next = NULL;
 
while (node) {
delayed_item = rb_entry(node, struct btrfs_delayed_item,
rb_node);
-   prev_node = node;
ret = btrfs_comp_cpu_keys(delayed_item-key, key);
-   if (ret  0)
+   if (ret  0) {
+   prev_item = delayed_item;
node = node-rb_right;
-   else if (ret  0)
+   } else if (ret  0) {
+   next_item = delayed_item;
node = node-rb_left;
-   else
+   } else
return delayed_item;
}
 
-   if (prev) {
-   if (!prev_node)
-   *prev = NULL;
-   else if (ret  0)
-   *prev = delayed_item;
-   else if ((node = rb_prev(prev_node)) != NULL) {
-   *prev = rb_entry(node, struct btrfs_delayed_item,
-rb_node);
-   } else
-   *prev = NULL;
-   }
+   if (prev)
+   *prev = prev_item;
+   if (next)
+   *next = next_item;
 
-   if (next) {
-   if (!prev_node)
-   *next = NULL;
-   else if (ret  0)
-   *next = delayed_item;
-   else if ((node = rb_next(prev_node)) != NULL) {
-   *next = rb_entry(node, struct btrfs_delayed_item,
-rb_node);
-   } else
-   *next = NULL;
-   }
return NULL;
 }
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] btrfs: add helper to get delayed item root

2013-06-04 Thread Zach Brown
This just moves some duplicated code into a helper.  I couldn't bring
myself to add another copy in an upcoming patch.

The delayed_root BUG() in __btrfs_remove_delayed_item() wasn't needed.
The pointer deref will oops later if its null.

And now the remaining BUG() is in one place! :)

Signed-off-by: Zach Brown z...@redhat.com
---
 fs/btrfs/delayed-inode.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 67e0f9f..fcce951 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -382,6 +382,16 @@ static struct btrfs_delayed_item 
*__btrfs_lookup_delayed_insertion_item(
return item;
 }
 
+static struct rb_root *get_ins_del_root(struct btrfs_delayed_node 
*delayed_node,
+   int ins_del)
+{
+   if (ins_del == BTRFS_DELAYED_INSERTION_ITEM)
+   return delayed_node-ins_root;
+   if (ins_del == BTRFS_DELAYED_DELETION_ITEM)
+   return delayed_node-del_root;
+   BUG();
+}
+
 static int __btrfs_add_delayed_item(struct btrfs_delayed_node *delayed_node,
struct btrfs_delayed_item *ins,
int action)
@@ -392,12 +402,7 @@ static int __btrfs_add_delayed_item(struct 
btrfs_delayed_node *delayed_node,
struct btrfs_delayed_item *item;
int cmp;
 
-   if (action == BTRFS_DELAYED_INSERTION_ITEM)
-   root = delayed_node-ins_root;
-   else if (action == BTRFS_DELAYED_DELETION_ITEM)
-   root = delayed_node-del_root;
-   else
-   BUG();
+   root = get_ins_del_root(delayed_node, action);
p = root-rb_node;
node = ins-rb_node;
 
@@ -460,15 +465,8 @@ static void __btrfs_remove_delayed_item(struct 
btrfs_delayed_item *delayed_item)
 
delayed_root = delayed_item-delayed_node-root-fs_info-delayed_root;
 
-   BUG_ON(!delayed_root);
-   BUG_ON(delayed_item-ins_or_del != BTRFS_DELAYED_DELETION_ITEM 
-  delayed_item-ins_or_del != BTRFS_DELAYED_INSERTION_ITEM);
-
-   if (delayed_item-ins_or_del == BTRFS_DELAYED_INSERTION_ITEM)
-   root = delayed_item-delayed_node-ins_root;
-   else
-   root = delayed_item-delayed_node-del_root;
-
+   root = get_ins_del_root(delayed_item-delayed_node,
+   delayed_item-ins_or_del);
rb_erase(delayed_item-rb_node, root);
delayed_item-delayed_node-count--;
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] btrfs: trivial delayed item readdir list cleanups

2013-06-04 Thread Zach Brown
Just call btrfs_put_delayed_items() for each list rather than having two
list arguments and duplicated code.

list_for_each_entry_safe() can handle an empty list.

We don't have to conditionally use and tear down the lists if we always
initialize them to be empty.  They're only populated when needed and the
rest of the uses just find empty lists.

Signed-off-by: Zach Brown z...@redhat.com
---
 fs/btrfs/delayed-inode.c | 17 ++---
 fs/btrfs/delayed-inode.h |  3 +--
 fs/btrfs/inode.c | 17 ++---
 3 files changed, 9 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 4d846a2..bc753fc 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1618,18 +1618,11 @@ void btrfs_get_delayed_items(struct inode *inode, 
struct list_head *ins_list,
atomic_dec(delayed_node-refs);
 }
 
-void btrfs_put_delayed_items(struct list_head *ins_list,
-struct list_head *del_list)
+void btrfs_put_delayed_items(struct list_head *list)
 {
struct btrfs_delayed_item *curr, *next;
 
-   list_for_each_entry_safe(curr, next, ins_list, readdir_list) {
-   list_del(curr-readdir_list);
-   if (atomic_dec_and_test(curr-refs))
-   kfree(curr);
-   }
-
-   list_for_each_entry_safe(curr, next, del_list, readdir_list) {
+   list_for_each_entry_safe(curr, next, list, readdir_list) {
list_del(curr-readdir_list);
if (atomic_dec_and_test(curr-refs))
kfree(curr);
@@ -1642,9 +1635,6 @@ int btrfs_should_delete_dir_index(struct list_head 
*del_list,
struct btrfs_delayed_item *curr, *next;
int ret;
 
-   if (list_empty(del_list))
-   return 0;
-
list_for_each_entry_safe(curr, next, del_list, readdir_list) {
if (curr-key.offset  index)
break;
@@ -1679,9 +1669,6 @@ int btrfs_readdir_delayed_dir_index(struct file *filp, 
void *dirent,
int over = 0;
unsigned char d_type;
 
-   if (list_empty(ins_list))
-   return 0;
-
/*
 * Changing the data of the delayed item is impossible. So
 * we needn't lock them. And we have held i_mutex of the
diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
index 1d5c5f7..573506b 100644
--- a/fs/btrfs/delayed-inode.h
+++ b/fs/btrfs/delayed-inode.h
@@ -135,8 +135,7 @@ void btrfs_destroy_delayed_inodes(struct btrfs_root *root);
 /* Used for readdir() */
 void btrfs_get_delayed_items(struct inode *inode, struct list_head *ins_list,
 struct list_head *del_list);
-void btrfs_put_delayed_items(struct list_head *ins_list,
-struct list_head *del_list);
+void btrfs_put_delayed_items(struct list_head *list);
 int btrfs_should_delete_dir_index(struct list_head *del_list,
  u64 index);
 int btrfs_readdir_delayed_dir_index(struct file *filp, void *dirent,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e6e2b86..53a8696 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5003,8 +5003,8 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
struct btrfs_key key;
struct btrfs_key found_key;
struct btrfs_path *path;
-   struct list_head ins_list;
-   struct list_head del_list;
+   LIST_HEAD(ins_list);
+   LIST_HEAD(del_list);
int ret;
struct extent_buffer *leaf;
int slot;
@@ -5048,11 +5048,8 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
 
path-reada = 1;
 
-   if (key_type == BTRFS_DIR_INDEX_KEY) {
-   INIT_LIST_HEAD(ins_list);
-   INIT_LIST_HEAD(del_list);
+   if (key_type == BTRFS_DIR_INDEX_KEY)
btrfs_get_delayed_items(inode, ins_list, del_list);
-   }
 
btrfs_set_key_type(key, key_type);
key.offset = filp-f_pos;
@@ -5083,9 +5080,7 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
break;
if (found_key.offset  filp-f_pos)
goto next;
-   if (key_type == BTRFS_DIR_INDEX_KEY 
-   btrfs_should_delete_dir_index(del_list,
- found_key.offset))
+   if (btrfs_should_delete_dir_index(del_list, found_key.offset))
goto next;
 
di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
@@ -5165,8 +5160,8 @@ next:
 nopos:
ret = 0;
 err:
-   if (key_type == BTRFS_DIR_INDEX_KEY)
-   btrfs_put_delayed_items(ins_list, del_list);
+   btrfs_put_delayed_items(ins_list);
+   btrfs_put_delayed_items(del_list);
btrfs_free_path(path);
return ret;
 }
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a 

[PATCH 6/6] btrfs: get fewer delayed item refs during readdir

2013-06-04 Thread Zach Brown
On every readdir call all the delayed items for the dir are put on a
private list with a held reference.  If they're outside the f_pos values
that this readdir call ends up using they're just dropped and removed
from the list.  We can make some tiny changes to cut down on this
overhead.

First, let's use the delayed item's key-sorted rbtree to skip items that
are before f_pos and will never be used.

Second, let's only acquire the new delayed items after we've exausted
the existing in-tree items and still have room in the readdir buffer for
more entries.

Signed-off-by: Zach Brown z...@redhat.com
---
 fs/btrfs/delayed-inode.c | 21 ++---
 fs/btrfs/delayed-inode.h |  4 ++--
 fs/btrfs/inode.c | 14 +++---
 3 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index fcce951..2c3ec89 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1567,28 +1567,27 @@ int btrfs_inode_delayed_dir_index_count(struct inode 
*inode)
return 0;
 }
 
-void btrfs_get_delayed_items(struct inode *inode, struct list_head *ins_list,
-struct list_head *del_list)
+void btrfs_get_delayed_items(struct inode *inode, struct list_head *list,
+struct btrfs_key *key, int action)
 {
struct btrfs_delayed_node *delayed_node;
struct btrfs_delayed_item *item;
+   struct btrfs_delayed_item *next;
+   struct rb_root *root;
 
delayed_node = btrfs_get_delayed_node(inode);
if (!delayed_node)
return;
 
-   mutex_lock(delayed_node-mutex);
-   item = __btrfs_first_delayed_insertion_item(delayed_node);
-   while (item) {
-   atomic_inc(item-refs);
-   list_add_tail(item-readdir_list, ins_list);
-   item = __btrfs_next_delayed_item(item);
-   }
+   root = get_ins_del_root(delayed_node, action);
 
-   item = __btrfs_first_delayed_deletion_item(delayed_node);
+   mutex_lock(delayed_node-mutex);
+   item = __btrfs_lookup_delayed_item(root, key, NULL, next);
+   if (item == NULL)
+   item = next;
while (item) {
atomic_inc(item-refs);
-   list_add_tail(item-readdir_list, del_list);
+   list_add_tail(item-readdir_list, list);
item = __btrfs_next_delayed_item(item);
}
mutex_unlock(delayed_node-mutex);
diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
index 573506b..7c401e1 100644
--- a/fs/btrfs/delayed-inode.h
+++ b/fs/btrfs/delayed-inode.h
@@ -133,8 +133,8 @@ void btrfs_kill_all_delayed_nodes(struct btrfs_root *root);
 void btrfs_destroy_delayed_inodes(struct btrfs_root *root);
 
 /* Used for readdir() */
-void btrfs_get_delayed_items(struct inode *inode, struct list_head *ins_list,
-struct list_head *del_list);
+void btrfs_get_delayed_items(struct inode *inode, struct list_head *list,
+struct btrfs_key *key, int action);
 void btrfs_put_delayed_items(struct list_head *list);
 int btrfs_should_delete_dir_index(struct list_head *del_list,
  u64 index);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 53a8696..ad42724 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5048,13 +5048,14 @@ static int btrfs_real_readdir(struct file *filp, void 
*dirent,
 
path-reada = 1;
 
-   if (key_type == BTRFS_DIR_INDEX_KEY)
-   btrfs_get_delayed_items(inode, ins_list, del_list);
-
btrfs_set_key_type(key, key_type);
key.offset = filp-f_pos;
key.objectid = btrfs_ino(inode);
 
+   if (key_type == BTRFS_DIR_INDEX_KEY)
+   btrfs_get_delayed_items(inode, del_list, key,
+   BTRFS_DELAYED_DELETION_ITEM);
+
ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
if (ret  0)
goto err;
@@ -5146,7 +5147,14 @@ next:
path-slots[0]++;
}
 
+   /* don't acquire delayed item mutex while holding locked path */
+   btrfs_free_path(path);
+   path = NULL;
+
if (key_type == BTRFS_DIR_INDEX_KEY) {
+   key.offset = filp-f_pos;
+   btrfs_get_delayed_items(inode, ins_list, key,
+   BTRFS_DELAYED_INSERTION_ITEM);
ret = btrfs_readdir_delayed_dir_index(filp, dirent, filldir,
  ins_list);
if (ret)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-06-04 Thread Chris Mason
Quoting Zach Brown (2013-06-04 18:17:54)
 Hi gang,
 
 I finally sat down to fix that readdir hang that has been in the back
 of my mind for a while.  I *hope* that the fix is pretty simple: just
 don't manufacture a fake f_pos, I *think* we can abuse f_version as an
 indicator that we shouldn't return entries.  Does this look reasonable?

I like it, and it doesn't look too far away from how others are abusing
f_version.  Have you tried with NFS?  I don't think it'll hurt, but NFS
loves to surprise me.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-06-04 Thread Zach Brown
On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote:
 Quoting Zach Brown (2013-06-04 18:17:54)
  Hi gang,
  
  I finally sat down to fix that readdir hang that has been in the back
  of my mind for a while.  I *hope* that the fix is pretty simple: just
  don't manufacture a fake f_pos, I *think* we can abuse f_version as an
  indicator that we shouldn't return entries.  Does this look reasonable?
 
 I like it, and it doesn't look too far away from how others are abusing
 f_version.  Have you tried with NFS?  I don't think it'll hurt, but NFS
 loves to surprise me.

Mm, no, I hadn't.  I'll give it a go tomorrow.  What could go wrong? :)

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] btrfs: set readdir f_pos only after filldir

2013-06-04 Thread Miao Xie
On  tue, 4 Jun 2013 15:17:55 -0700, Zach Brown wrote:
 The only time we need to advance f_pos is after we've successfully given
 a result to userspace via filldir.  This simplification gets rid of the
 is_curr variable used to update f_pos for the delayed item readdir
 entries.
 
 Signed-off-by: Zach Brown z...@redhat.com

Reviewed-by: Miao Xie mi...@cn.fujitsu.com

 ---
  fs/btrfs/delayed-inode.c |  5 +++--
  fs/btrfs/inode.c | 17 +++--
  2 files changed, 10 insertions(+), 12 deletions(-)
 
 diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
 index 5615eac..4d846a2 100644
 --- a/fs/btrfs/delayed-inode.c
 +++ b/fs/btrfs/delayed-inode.c
 @@ -1696,8 +1696,6 @@ int btrfs_readdir_delayed_dir_index(struct file *filp, 
 void *dirent,
   continue;
   }
  
 - filp-f_pos = curr-key.offset;
 -
   di = (struct btrfs_dir_item *)curr-data;
   name = (char *)(di + 1);
   name_len = le16_to_cpu(di-name_len);
 @@ -1708,6 +1706,9 @@ int btrfs_readdir_delayed_dir_index(struct file *filp, 
 void *dirent,
   over = filldir(dirent, name, name_len, curr-key.offset,
  location.objectid, d_type);
  
 + if (!over)
 + filp-f_pos = curr-key.offset + 1;
 +
   if (atomic_dec_and_test(curr-refs))
   kfree(curr);
  
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index b11a95e..6a5784b 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -5007,7 +5007,6 @@ static int btrfs_real_readdir(struct file *filp, void 
 *dirent,
   char tmp_name[32];
   char *name_ptr;
   int name_len;
 - int is_curr = 0;/* filp-f_pos points to the current index? */
  
   /* FIXME, use a real flag for deciding about the key type */
   if (root-fs_info-tree_root == root)
 @@ -5076,9 +5075,6 @@ static int btrfs_real_readdir(struct file *filp, void 
 *dirent,
 found_key.offset))
   goto next;
  
 - filp-f_pos = found_key.offset;
 - is_curr = 1;
 -
   di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
   di_cur = 0;
   di_total = btrfs_item_size(leaf, item);
 @@ -5130,6 +5126,9 @@ skip:
  
   if (over)
   goto nopos;
 +
 + filp-f_pos = found_key.offset + 1;
 +
   di_len = btrfs_dir_name_len(leaf, di) +
btrfs_dir_data_len(leaf, di) + sizeof(*di);
   di_cur += di_len;
 @@ -5140,23 +5139,21 @@ next:
   }
  
   if (key_type == BTRFS_DIR_INDEX_KEY) {
 - if (is_curr)
 - filp-f_pos++;
   ret = btrfs_readdir_delayed_dir_index(filp, dirent, filldir,
 ins_list);
   if (ret)
   goto nopos;
   }
  
 - /* Reached end of directory/root. Bump pos past the last item. */
 - if (key_type == BTRFS_DIR_INDEX_KEY)
 + /* Reached end of directory/root */
 + if (key_type == BTRFS_DIR_INDEX_KEY) {
   /*
* 32-bit glibc will use getdents64, but then strtol -
* so the last number we can serve is this.
*/
   filp-f_pos = 0x7fff;
 - else
 - filp-f_pos++;
 + }
 +
  nopos:
   ret = 0;
  err:
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-06-04 Thread Miao Xie
On  tue, 4 Jun 2013 16:26:57 -0700, Zach Brown wrote:
 On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote:
 Quoting Zach Brown (2013-06-04 18:17:54)
 Hi gang,

 I finally sat down to fix that readdir hang that has been in the back
 of my mind for a while.  I *hope* that the fix is pretty simple: just
 don't manufacture a fake f_pos, I *think* we can abuse f_version as an
 indicator that we shouldn't return entries.  Does this look reasonable?

 I like it, and it doesn't look too far away from how others are abusing
 f_version.  Have you tried with NFS?  I don't think it'll hurt, but NFS
 loves to surprise me.
 
 Mm, no, I hadn't.  I'll give it a go tomorrow.  What could go wrong? :)

If we can not use f_version, we can use private_data. I think this variant is
safe.

Miao

 
 - z
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] list: add while_list_drain_entry

2013-06-04 Thread Jörn Engel
I have seen a lot of boilerplate code that either follows the pattern of
while (!list_empty(head)) {
pos = list_entry(head-next, struct foo, list);
list_del(pos-list);
...
}
or some variant thereof.

With this patch in, people can use
while_list_drain_entry(pos, head, list) {
...
}

The patch also adds a while_list_drain variant, even though I have
only found a single user for that one so far.

Signed-off-by: Joern Engel jo...@logfs.org
---
 include/linux/list.h |   18 ++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/list.h b/include/linux/list.h
index 6a1f8df..ab39c7d 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -557,6 +557,24 @@ static inline void list_splice_tail_init(struct list_head 
*list,
 #define list_safe_reset_next(pos, n, member)   \
n = list_entry(pos-member.next, typeof(*pos), member)
 
+/**
+ * while_list_drain - removes an entry from the list until it is empty
+ * @pos:   the struct list_head to use as a loop cursor.
+ * @head:  the head of your list.
+ */
+#define while_list_drain(pos, head) \
+   while (list_empty(head) ? 0 : (pos = (head)-next, list_del(pos), 1))
+
+/**
+ * while_list_drain_entry - removes an entry from the list until it is empty
+ * @pos:   the type * to use as loop cursor.
+ * @head:  the head of your list.
+ * @member:the name of the list_struct within the struct
+ */
+#define while_list_drain_entry(pos, head, member) \
+   while (list_empty(head)  (pos = list_first_entry((head), \
+   typeof(*pos), member), list_del((head)-next), 1))
+
 /*
  * Double linked lists with a single pointer list head.
  * Mostly useful for hash tables where the two pointer list head is
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: use while_list_drain_entry

2013-06-04 Thread Jörn Engel
Signed-off-by: Joern Engel jo...@logfs.org
---
 fs/btrfs/backref.c  |   15 +++
 fs/btrfs/compression.c  |4 +---
 fs/btrfs/disk-io.c  |6 +-
 fs/btrfs/extent-tree.c  |   17 +++--
 fs/btrfs/extent_io.c|8 ++--
 fs/btrfs/inode.c|   16 +++-
 fs/btrfs/ordered-data.c |7 +--
 fs/btrfs/qgroup.c   |   22 --
 fs/btrfs/relocation.c   |6 +-
 fs/btrfs/scrub.c|9 +++--
 fs/btrfs/transaction.c  |5 +
 fs/btrfs/volumes.c  |   11 ++-
 12 files changed, 25 insertions(+), 101 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index bd605c8..3a45e75 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -893,9 +893,7 @@ again:
if (ret)
goto out;
 
-   while (!list_empty(prefs)) {
-   ref = list_first_entry(prefs, struct __prelim_ref, list);
-   list_del(ref-list);
+   while_list_drain_entry(ref, prefs, list) {
WARN_ON(ref-count  0);
if (ref-count  ref-root_id  ref-parent == 0) {
/* no parent == root of tree */
@@ -937,17 +935,10 @@ again:
 
 out:
btrfs_free_path(path);
-   while (!list_empty(prefs)) {
-   ref = list_first_entry(prefs, struct __prelim_ref, list);
-   list_del(ref-list);
+   while_list_drain_entry(ref, prefs, list)
kfree(ref);
-   }
-   while (!list_empty(prefs_delayed)) {
-   ref = list_first_entry(prefs_delayed, struct __prelim_ref,
-  list);
-   list_del(ref-list);
+   while_list_drain_entry(ref, prefs_delayed, list)
kfree(ref);
-   }
 
return ret;
 }
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 15b9408..e5a7475 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -841,9 +841,7 @@ static void free_workspaces(void)
int i;
 
for (i = 0; i  BTRFS_COMPRESS_TYPES; i++) {
-   while (!list_empty(comp_idle_workspace[i])) {
-   workspace = comp_idle_workspace[i].next;
-   list_del(workspace);
+   while_list_drain(workspace, comp_idle_workspace[i]) {
btrfs_compress_op[i]-free_workspace(workspace);
atomic_dec(comp_alloc_workspace[i]);
}
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6d19a0a..2767b18 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3289,11 +3289,7 @@ static void del_fs_roots(struct btrfs_fs_info *fs_info)
struct btrfs_root *gang[8];
int i;
 
-   while (!list_empty(fs_info-dead_roots)) {
-   gang[0] = list_entry(fs_info-dead_roots.next,
-struct btrfs_root, root_list);
-   list_del(gang[0]-root_list);
-
+   while_list_drain_entry(gang[0], fs_info-dead_roots, root_list) {
if (gang[0]-in_radix) {
btrfs_free_fs_root(fs_info, gang[0]);
} else {
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d55123..f7afb9e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2435,10 +2435,7 @@ int btrfs_delayed_refs_qgroup_accounting(struct 
btrfs_trans_handle *trans,
if (!trans-delayed_ref_elem.seq)
return 0;
 
-   while (!list_empty(trans-qgroup_ref_list)) {
-   qgroup_update = list_first_entry(trans-qgroup_ref_list,
-struct qgroup_update, list);
-   list_del(qgroup_update-list);
+   while_list_drain_entry(qgroup_update, trans-qgroup_ref_list, list) {
if (!ret)
ret = btrfs_qgroup_account_ref(
trans, fs_info, qgroup_update-node,
@@ -7821,12 +7818,8 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
struct rb_node *n;
 
down_write(info-extent_commit_sem);
-   while (!list_empty(info-caching_block_groups)) {
-   caching_ctl = list_entry(info-caching_block_groups.next,
-struct btrfs_caching_control, list);
-   list_del(caching_ctl-list);
+   while_list_drain_entry(caching_ctl, info-caching_block_groups, list)
put_caching_control(caching_ctl);
-   }
up_write(info-extent_commit_sem);
 
spin_lock(info-block_group_cache_lock);
@@ -7868,10 +7861,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
 
release_global_block_rsv(info);
 
-   while(!list_empty(info-space_info)) {
-   space_info = list_entry(info-space_info.next,
-   struct btrfs_space_info,
-   list);
+   while_list_drain_entry(space_info, 

Re: [PATCH 0/2] introduce list_for_each_entry_del

2013-06-04 Thread Jörn Engel
On Tue, 4 June 2013 14:44:35 -0400, Jörn Engel wrote:
 
 Or while_list_drain?

Not sure if the silence is approval or lack of interest, but a new set
of patches is posted.  By playing around with the implementation a
bit, I have actually found a variant that makes the object code
shrink.  Not one variant gave same-size object code.  There's compiler
optimization for you.

Jörn

--
Money can buy bandwidth, but latency is forever.
-- John R. Mashey
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


v3.9 bug at /fs/btrfs/free-space-cache.c:1567 after powercycle

2013-06-04 Thread Sage Weil
Hi-

I'm pretty reliably triggering the following bug after powercycling an 
active btrfs + ceph workload and then trying to remount.  Is this a known 
issue?

sage


2013-06-04T18:54:28.532988-07:00 plana71 kernel: [   39.311120] [ 
cut here ]
2013-06-04T18:54:28.533002-07:00 plana71 kernel: [   39.315802] kernel BUG at 
/srv/autobuild-ceph/gitbuilder.git/build/fs/btrfs/free-space-cache.c:1567!
2013-06-04T18:54:28.533004-07:00 plana71 kernel: [   39.325013] invalid opcode: 
 [#1] SMP 
2013-06-04T18:54:28.533012-07:00 plana71 kernel: [   39.329278] Modules linked 
in: coretemp kvm_intel kvm ghash_clmulni_intel aesni_intel ablk_helper nfsd 
cryptd lrw aes_x86_64 nfs_acl xts auth_rpcgss gf128mul microcode psmouse 
exportfs nfs dcdbas lpc_ich mfd_core i7core_edac edac_core joydev serio_raw hed 
lp parport fscache lockd sunrpc hid_generic usbhid hid btrfs raid6_pq ixgbe dca 
ptp pps_core mdio mptsas mptscsih mptbase scsi_transport_sas bnx2 xor 
zlib_deflate crc32c_intel libcrc32c
2013-06-04T18:54:28.533014-07:00 plana71 kernel: [   39.370984] CPU 1 
2013-06-04T18:54:28.533017-07:00 plana71 kernel: [   39.372867] Pid: 1679, 
comm: mount Not tainted 3.9.0-ceph-00303-g19bb6a8 #1 Dell Inc. PowerEdge 
R410/01V648
2013-06-04T18:54:28.533021-07:00 plana71 kernel: [   39.382928] RIP: 
0010:[a0187917]  [a0187917] remove_from_bitmap+0x1b7/0x1c0 
[btrfs]
2013-06-04T18:54:28.533023-07:00 plana71 kernel: [   39.392479] RSP: 
0018:880212d456e8  EFLAGS: 00010287
2013-06-04T18:54:28.533025-07:00 plana71 kernel: [   39.397853] RAX: 
 RBX: 88021fea1100 RCX: 0034
2013-06-04T18:54:28.533028-07:00 plana71 kernel: [   39.405051] RDX: 
00048000 RSI: 61b0a000 RDI: 78c0
2013-06-04T18:54:28.533030-07:00 plana71 kernel: [   39.412249] RBP: 
880212d45738 R08: 8802200350f0 R09: 0740
2013-06-04T18:54:28.533032-07:00 plana71 kernel: [   39.419447] R10: 
88020a4e65d0 R11: 0001 R12: 880212d45760
2013-06-04T18:54:28.533034-07:00 plana71 kernel: [   39.426644] R13: 
88020ad97400 R14: 6940 R15: 880212d45758
2013-06-04T18:54:28.533036-07:00 plana71 kernel: [   39.433843] FS:  
7f035dc1a800() GS:88022722() knlGS:
2013-06-04T18:54:28.533038-07:00 plana71 kernel: [   39.442011] CS:  0010 DS: 
 ES:  CR0: 8005003b
2013-06-04T18:54:28.533040-07:00 plana71 kernel: [   39.447819] CR2: 
7ff46d4524d0 CR3: 00020a52c000 CR4: 07e0
2013-06-04T18:54:28.533042-07:00 plana71 kernel: [   39.455017] DR0: 
 DR1:  DR2: 
2013-06-04T18:54:28.533044-07:00 plana71 kernel: [   39.462215] DR3: 
 DR6: 0ff0 DR7: 0400
2013-06-04T18:54:28.533047-07:00 plana71 kernel: [   39.469414] Process mount 
(pid: 1679, threadinfo 880212d44000, task 88020a4e5eb0)
2013-06-04T18:54:28.533048-07:00 plana71 kernel: [   39.477669] Stack:
2013-06-04T18:54:28.533050-07:00 plana71 kernel: [   39.479739]  
 88020ad97454 61b0c000 00048000
2013-06-04T18:54:28.533052-07:00 plana71 kernel: [   39.487431]  
8802 88020ad97400  88020ad97454
2013-06-04T18:54:28.533054-07:00 plana71 kernel: [   39.495123]  
88022012b400 61b0a000 880212d45798 a0189073
2013-06-04T18:54:28.533055-07:00 plana71 kernel: [   39.502817] Call Trace:
2013-06-04T18:54:28.533058-07:00 plana71 kernel: [   39.505339]  
[a0189073] btrfs_remove_free_space+0x53/0x290 [btrfs]
2013-06-04T18:54:28.533061-07:00 plana71 kernel: [   39.512465]  
[a0135d20] btrfs_alloc_logged_file_extent+0x1c0/0x1e0 [btrfs]
2013-06-04T18:54:28.533063-07:00 plana71 kernel: [   39.520295]  
[a01215fa] ? btrfs_free_path+0x2a/0x40 [btrfs]
2013-06-04T18:54:28.533065-07:00 plana71 kernel: [   39.526815]  
[a0183aef] replay_one_extent+0x5ef/0x650 [btrfs]
2013-06-04T18:54:28.533068-07:00 plana71 kernel: [   39.533510]  
[a017da3a] ? btrfs_tree_read_lock+0x5a/0x140 [btrfs]
2013-06-04T18:54:28.533070-07:00 plana71 kernel: [   39.540553]  
[a0183e2b] replay_one_buffer+0x2db/0x390 [btrfs]
2013-06-04T18:54:28.533075-07:00 plana71 kernel: [   39.547247]  
[a01629f1] ? mark_extent_buffer_accessed+0x51/0x70 [btrfs]
2013-06-04T18:54:28.533078-07:00 plana71 kernel: [   39.554821]  
[a013f6d0] ? verify_parent_transid+0x160/0x160 [btrfs]
2013-06-04T18:54:28.533080-07:00 plana71 kernel: [   39.562037]  
[a017ee53] walk_up_log_tree+0x1c3/0x250 [btrfs]
2013-06-04T18:54:28.533082-07:00 plana71 kernel: [   39.568644]  
[a017fce1] walk_log_tree+0xb1/0x1f0 [btrfs]
2013-06-04T18:54:28.533085-07:00 plana71 kernel: [   39.574903]  
[a0186032] btrfs_recover_log_trees+0x212/0x3c0 [btrfs]
2013-06-04T18:54:28.533087-07:00 plana71 kernel: [   39.582119]  
[a0183b50] ?