Re: Writes in idle/How to debug filesystem activity

2011-08-30 Thread Tomasz Chmielewski

I use btrfs on a SD card. When my computer is running in idle (i.e.
nothing running but wmii and a few background tasks that don't do any
i/o; freshly synced), from time to time some [btrfs-*] tasks do writes
(for 5 seconds or so).  What are those processes doing on my disk,
having nothing to write/read? That leads to the second question: How can
I debug file system activity in inotify style?


There was once a similar thread about this issue; unfortunately, without 
any constructive answers:


http://thread.gmane.org/gmane.comp.file-systems.btrfs/10840


--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Writes in idle/How to debug filesystem activity

2011-08-30 Thread cwillu
 There was once a similar thread about this issue; unfortunately, without any
 constructive answers:

 http://thread.gmane.org/gmane.comp.file-systems.btrfs/10840

I believe sync does a transaction commit every time.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/inode.c:2299

2011-08-30 Thread Maciej Marcin Piechotka
On Tue, 2011-08-30 at 14:27 +0800, Miao Xie wrote:
 On mon, 29 Aug 2011 02:45:07 +0100, Maciej Marcin Piechotka wrote:
  I receive the bug when I try to snapshot from fcron:
  
  2011-08-29T02:00:46.529238+01:00 picard kernel: [ 4155.76]
  [ cut here ]
  2011-08-29T02:00:46.529253+01:00 picard kernel: [ 4155.90] kernel
  BUG at fs/btrfs/inode.c:2299!
  2011-08-29T02:00:46.529256+01:00 picard kernel: [ 4155.98] invalid
  opcode:  [#1] PREEMPT SMP 
  2011-08-29T02:00:46.529259+01:00 picard kernel: [ 4155.444506] CPU 1 
  2011-08-29T02:00:46.529264+01:00 picard kernel: [ 4155.444508] Modules
  linked in: microcode reiserfs btrfs zlib_deflate libcrc32c arc4
  snd_hda_codec_conexant iwlagn mac80211 uvcvideo videodev
  v4l2_compat_ioctl32 cfg80211 snd_hda_intel snd_hda_codec snd_hwdep
  snd_pcm snd_timer snd thinkpad_acpi soundcore snd_page_alloc hwmon
  rfkill nvram e1000e zram(C) rtc_cmos tp_smapi thinkpad_ec psmouse
  fscache kvm_intel kvm configs ipv6 autofs4 usbhid uhci_hcd firewire_ohci
  firewire_core sdhci_pci ehci_hcd crc_itu_t sdhci mmc_core sd_mod usbcore
  sr_mod cdrom
  2011-08-29T02:00:46.529267+01:00 picard kernel: [ 4155.444569] 
  2011-08-29T02:00:46.529270+01:00 picard kernel: [ 4155.444576] Pid:
  3167, comm: btrfs Tainted: P C  3.0.3-gentoo #1 LENOVO
  2242CTO/2242CTO
  2011-08-29T02:00:46.529274+01:00 picard kernel: [ 4155.444586] RIP:
  0010:[a031339b]  [a031339b] btrfs_orphan_del
  +0x8d/0xa9 [btrfs]
  2011-08-29T02:00:46.529276+01:00 picard kernel: [ 4155.444611] RSP:
  :880111c21b58  EFLAGS: 00010282
  2011-08-29T02:00:46.529279+01:00 picard kernel: [ 4155.444619] RAX:
  fffe RBX: 88011141bd60 RCX: 1000
  2011-08-29T02:00:46.529289+01:00 picard kernel: [ 4155.444626] RDX:
  000f RSI: 880110c8d250 RDI: 880134092d80
  2011-08-29T02:00:46.529292+01:00 picard kernel: [ 4155.444634] RBP:
   R08: 880111c21a30 R09: 0011
  2011-08-29T02:00:46.529297+01:00 picard kernel: [ 4155.444642] R10:
  01dc R11: 0286 R12: 880136866300
  2011-08-29T02:00:46.529299+01:00 picard kernel: [ 4155.444650] R13:
  880130cac800 R14: 0001 R15: 880130cacb88
  2011-08-29T02:00:46.529302+01:00 picard kernel: [ 4155.444658] FS:
  7f126715a740() GS:88013bc8() knlGS:
  2011-08-29T02:00:46.529304+01:00 picard kernel: [ 4155.444666] CS:  0010
  DS:  ES:  CR0: 8005003b
  2011-08-29T02:00:46.529307+01:00 picard kernel: [ 4155.444673] CR2:
  7f5149eb7008 CR3: 9caca000 CR4: 000406e0
  2011-08-29T02:00:46.529309+01:00 picard kernel: [ 4155.444681] DR0:
   DR1:  DR2: 
  2011-08-29T02:00:46.529311+01:00 picard kernel: [ 4155.444689] DR3:
   DR6: 0ff0 DR7: 0400
  2011-08-29T02:00:46.529314+01:00 picard kernel: [ 4155.444697] Process
  btrfs (pid: 3167, threadinfo 880111c2, task 88010b35a140)
  2011-08-29T02:00:46.529316+01:00 picard kernel: [ 4155.444705] Stack:
  2011-08-29T02:00:46.529318+01:00 picard kernel: [ 4155.444711]
  00c369c5 880130cac800 880110c8dbe0 00c369c5
  2011-08-29T02:00:46.529321+01:00 picard kernel: [ 4155.444720]
   880130cacb88 880130cacb90 a03173c4
  2011-08-29T02:00:46.562822+01:00 picard kernel: [ 4155.444730]
  880136866300 88011141bd60 880110c8dbe0 fffb
  2011-08-29T02:00:46.562959+01:00 picard kernel: [ 4155.444739] Call
  Trace:
  2011-08-29T02:00:46.562964+01:00 picard kernel: [ 4155.444756]
  [a03173c4] ? btrfs_orphan_cleanup+0x196/0x2d9 [btrfs]
  2011-08-29T02:00:46.562966+01:00 picard kernel: [ 4155.444773]
  [a0317862] ? btrfs_lookup_dentry+0x35b/0x392 [btrfs]
  2011-08-29T02:00:46.562969+01:00 picard kernel: [ 4155.444790]
  [a03178a2] ? btrfs_lookup+0x9/0x1f [btrfs]
  2011-08-29T02:00:46.562971+01:00 picard kernel: [ 4155.444801]
  [810ccf63] ? d_alloc_and_lookup+0x3a/0x60
  2011-08-29T02:00:46.562973+01:00 picard kernel: [ 4155.444810]
  [810ce356] ? walk_component+0x210/0x3b7
  2011-08-29T02:00:46.562975+01:00 picard kernel: [ 4155.444818]
  [810cea4a] ? path_lookupat+0x7c/0x29e
  2011-08-29T02:00:46.562979+01:00 picard kernel: [ 4155.444827]
  [810cfc34] ? do_path_lookup+0x1c/0x87
  2011-08-29T02:00:46.562981+01:00 picard kernel: [ 4155.444835]
  [810cffe2] ? user_path_at+0x49/0x7b
  2011-08-29T02:00:46.562983+01:00 picard kernel: [ 4155.444843]
  [810c8781] ? vfs_fstatat+0x3c/0x6a
  2011-08-29T02:00:46.562986+01:00 picard kernel: [ 4155.444853]
  [8102d38e] ? sub_preempt_count+0x83/0x94
  2011-08-29T02:00:46.562988+01:00 picard kernel: [ 4155.444861]
  [810c88ab] ? sys_newstat+0x12/0x2b
  2011-08-29T02:00:46.562990+01:00 picard kernel: [ 4155.444870]
  

[PATCH] Btrfs: handle enospc accounting for free space inodes

2011-08-30 Thread Josef Bacik
Since free space inodes now use normal checksumming we need to make sure to
account for their metadata use.  So reserve metadata space, and then if we fail
to write out the metadata we can just release it, otherwise it will be freed up
when the io completes.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/extent-tree.c  |   18 
 fs/btrfs/free-space-cache.c |   44 --
 fs/btrfs/inode-map.c|6 +++-
 fs/btrfs/inode.c|2 +-
 4 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1f1d3e8..ccdc4d1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2755,16 +2755,20 @@ again:
num_pages *= 16;
num_pages *= PAGE_CACHE_SIZE;
 
-   ret = btrfs_check_data_free_space(inode, num_pages);
+   ret = btrfs_delalloc_reserve_space(inode, num_pages);
if (ret)
goto out_put;
 
ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, num_pages,
  num_pages, num_pages,
  alloc_hint);
-   if (!ret)
+   if (!ret) {
dcs = BTRFS_DC_SETUP;
-   btrfs_free_reserved_data_space(inode, num_pages);
+   btrfs_free_reserved_data_space(inode, num_pages);
+   } else {
+   btrfs_delalloc_release_space(inode, num_pages);
+   }
+
 out_put:
iput(inode);
 out_free:
@@ -4002,9 +4006,13 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, 
u64 num_bytes)
struct btrfs_block_rsv *block_rsv = root-fs_info-delalloc_block_rsv;
u64 to_reserve = 0;
unsigned nr_extents = 0;
+   int flush = 1;
int ret;
 
-   if (btrfs_transaction_in_commit(root-fs_info))
+   if (btrfs_is_free_space_inode(root, inode))
+   flush = 0;
+
+   if (flush  btrfs_transaction_in_commit(root-fs_info))
schedule_timeout(1);
 
num_bytes = ALIGN(num_bytes, root-sectorsize);
@@ -4023,7 +4031,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, 
u64 num_bytes)
to_reserve += calc_csum_metadata_size(inode, num_bytes, 1);
spin_unlock(BTRFS_I(inode)-lock);
 
-   ret = reserve_metadata_bytes(NULL, root, block_rsv, to_reserve, 1);
+   ret = reserve_metadata_bytes(NULL, root, block_rsv, to_reserve, flush);
if (ret) {
u64 to_free = 0;
unsigned dropped;
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index d56ba54..13c8fb3 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -528,6 +528,19 @@ out:
return ret;
 }
 
+/**
+ * __btrfs_write_out_cache - write out cached info to an inode
+ * @root - the root the inode belongs to
+ * @ctl - the free space cache we are going to write out
+ * @block_group - the block_group for this cache if it belongs to a block_group
+ * @trans - the trans handle
+ * @path - the path to use
+ * @offset - the offset for the key we'll insert
+ *
+ * This function writes out a free space cache struct to disk for quick 
recovery
+ * on mount.  This will return 0 if it was successfull in writing the cache 
out,
+ * and -1 if it was not.
+ */
 int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode,
struct btrfs_free_space_ctl *ctl,
struct btrfs_block_group_cache *block_group,
@@ -551,7 +564,8 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct 
inode *inode,
int index = 0, num_pages = 0;
int entries = 0;
int bitmaps = 0;
-   int ret = -1;
+   int ret;
+   int err = -1;
bool next_page = false;
bool out_of_space = false;
 
@@ -559,7 +573,7 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct 
inode *inode,
 
node = rb_first(ctl-free_space_offset);
if (!node)
-   return 0;
+   return -1;
 
if (!i_size_read(inode))
return -1;
@@ -763,7 +777,6 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct 
inode *inode,
unlock_extent_cached(BTRFS_I(inode)-io_tree, 0,
 i_size_read(inode) - 1, cached_state,
 GFP_NOFS);
-   ret = 0;
goto out;
}
 
@@ -785,10 +798,8 @@ int __btrfs_write_out_cache(struct btrfs_root *root, 
struct inode *inode,
unlock_extent_cached(BTRFS_I(inode)-io_tree, 0,
 i_size_read(inode) - 1, cached_state, GFP_NOFS);
 
-   if (ret) {
-   ret = 0;
+   if (ret)
goto out;
-   }
 
BTRFS_I(inode)-generation = trans-transid;
 
@@ -800,7 +811,6 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct 
inode *inode,
 
ret = 

[PATCH] Btrfs: use the transactions block_rsv for the csum root

2011-08-30 Thread Josef Bacik
The alloc warnings everybody has been seeing is because we have been reserving
space for csums, but we weren't actually using that space.  So make
get_block_rsv() return the trans-block_rsv if we're modifying the csum root.
Also set the trans-block_rsv to NULL so that if we modify the csum root when
running delayed ref's that comes out of the global reserve like it's supposed
to.  With this patch I'm not seeing those alloc warnings anymore.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/extent-tree.c |   15 +--
 fs/btrfs/transaction.c |1 +
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ccdc4d1..53f6dbd 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3565,10 +3565,12 @@ out:
 static struct btrfs_block_rsv *get_block_rsv(struct btrfs_trans_handle *trans,
 struct btrfs_root *root)
 {
-   struct btrfs_block_rsv *block_rsv;
-   if (root-ref_cows)
+   struct btrfs_block_rsv *block_rsv = NULL;
+
+   if (root-ref_cows || root == root-fs_info-csum_root)
block_rsv = trans-block_rsv;
-   else
+
+   if (!block_rsv)
block_rsv = root-block_rsv;
 
if (!block_rsv)
@@ -3865,12 +3867,13 @@ static void release_global_block_rsv(struct 
btrfs_fs_info *fs_info)
 void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
  struct btrfs_root *root)
 {
+   struct btrfs_block_rsv *block_rsv;
+
if (!trans-bytes_reserved)
return;
 
-   BUG_ON(trans-block_rsv != root-fs_info-trans_block_rsv);
-   btrfs_block_rsv_release(root, trans-block_rsv,
-   trans-bytes_reserved);
+   block_rsv = root-fs_info-trans_block_rsv;
+   btrfs_block_rsv_release(root, block_rsv, trans-bytes_reserved);
trans-bytes_reserved = 0;
 }
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 3418acc..329e80a 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -453,6 +453,7 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
return 0;
}
 
+   trans-block_rsv = NULL;
while (count  4) {
unsigned long cur = trans-delayed_ref_updates;
trans-delayed_ref_updates = 0;
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: stop passing a trans handle all around the reservation code

2011-08-30 Thread Josef Bacik
The only thing that we need to have a trans handle for is in
reserve_metadata_bytes and thats to know how much flushing we can do.  So
instead of passing it around, just check current-journal_info for a
trans_handle so we know if we can commit a transaction to try and free up space
or not.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/ctree.h|6 +---
 fs/btrfs/extent-tree.c  |   45 ++-
 fs/btrfs/free-space-cache.c |4 +--
 fs/btrfs/inode.c|4 +-
 fs/btrfs/relocation.c   |   15 ++---
 fs/btrfs/transaction.c  |8 +++---
 6 files changed, 39 insertions(+), 43 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index caa73cd..a5faf8e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2237,12 +2237,10 @@ void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv);
 struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root);
 void btrfs_free_block_rsv(struct btrfs_root *root,
  struct btrfs_block_rsv *rsv);
-int btrfs_block_rsv_add(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root,
+int btrfs_block_rsv_add(struct btrfs_root *root,
struct btrfs_block_rsv *block_rsv,
u64 num_bytes);
-int btrfs_block_rsv_check(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
+int btrfs_block_rsv_check(struct btrfs_root *root,
  struct btrfs_block_rsv *block_rsv,
  u64 min_reserved, int min_factor, int flush);
 int btrfs_block_rsv_migrate(struct btrfs_block_rsv *src_rsv,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d236df7..9bb7159 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3404,29 +3404,34 @@ static int shrink_delalloc(struct btrfs_trans_handle 
*trans,
return reclaimed = to_reclaim;
 }
 
-/*
- * Retries tells us how many times we've called reserve_metadata_bytes.  The
- * idea is if this is the first call (retries == 0) then we will add to our
- * reserved count if we can't make the allocation in order to hold our place
- * while we go and try and free up space.  That way for retries  1 we don't 
try
- * and add space, we just check to see if the amount of unused space is = the
- * total space, meaning that our reservation is valid.
+/**
+ * reserve_metadata_bytes - try to reserve bytes from the block_rsv's space
+ * @root - the root we're allocating for
+ * @block_rsv - the block_rsv we're allocating for
+ * @orig_bytes - the number of bytes we want
+ * @flush - wether or not we can flush to make our reservation
  *
- * However if we don't intend to retry this reservation, pass -1 as retries so
- * that it short circuits this logic.
+ * This will reserve orgi_bytes number of bytes from the space info associated
+ * with the block_rsv.  If there is not enough space it will make an attempt to
+ * flush out space to make room.  It will do this by flushing delalloc if
+ * possible or committing the transaction.  If flush is 0 then no attempts to
+ * regain reservations will be made and this will fail if there is not enough
+ * space already.
  */
-static int reserve_metadata_bytes(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
+static int reserve_metadata_bytes(struct btrfs_root *root,
  struct btrfs_block_rsv *block_rsv,
  u64 orig_bytes, int flush)
 {
struct btrfs_space_info *space_info = block_rsv-space_info;
+   struct btrfs_trans_handle *trans;
u64 unused;
u64 num_bytes = orig_bytes;
int retries = 0;
int ret = 0;
bool committed = false;
bool flushing = false;
+
+   trans = (struct btrfs_trans_handle *)current-journal_info;
 again:
ret = 0;
spin_lock(space_info-lock);
@@ -3689,8 +3694,7 @@ void btrfs_free_block_rsv(struct btrfs_root *root,
kfree(rsv);
 }
 
-int btrfs_block_rsv_add(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root,
+int btrfs_block_rsv_add(struct btrfs_root *root,
struct btrfs_block_rsv *block_rsv,
u64 num_bytes)
 {
@@ -3699,7 +3703,7 @@ int btrfs_block_rsv_add(struct btrfs_trans_handle *trans,
if (num_bytes == 0)
return 0;
 
-   ret = reserve_metadata_bytes(trans, root, block_rsv, num_bytes, 1);
+   ret = reserve_metadata_bytes(root, block_rsv, num_bytes, 1);
if (!ret) {
block_rsv_add_bytes(block_rsv, num_bytes, 1);
return 0;
@@ -3708,8 +3712,7 @@ int btrfs_block_rsv_add(struct btrfs_trans_handle *trans,
return ret;
 }
 
-int btrfs_block_rsv_check(struct btrfs_trans_handle *trans,
- struct btrfs_root *root,
+int btrfs_block_rsv_check(struct btrfs_root *root,
 

Re: [PATCH] btrfs: fix warning in iput for bad-inode

2011-08-30 Thread Sergei Trofimovich
 Running 'sync' program after the load does not finish and eats 100%CPU
 busy-waiting for something in kernel.
 
 It's easy to reproduce hang with patch for me. I just run liferea and sync
 after it. Without patch I haven't managed to hang btrfs up.

And I think it's another btrfs bug. I've managed to reproduce it _without_
your patch and _without_ autodefrag enabled by manually running the
following commands:
$ btrfs fi defrag file-with-20_000-extents
$ sync

I think your patch just shuffles things a bit and forces autodefrag races
to pop-up sooner (which is good! :])

-- 

  Sergei


signature.asc
Description: PGP signature


Re: [PATCH] xfstests 255: add a seek_data/seek_hole tester

2011-08-30 Thread Sunil Mushran

On 08/27/2011 01:30 AM, Marco Stornelli wrote:

Il 26/08/2011 16:41, Zach Brown ha scritto:

Hole: a range of the file that contains no data or is made up
entirely of  NULL (zero) data. Holes include preallocated ranges of
files that have not had actual data written to them.



No for me. A hole is made up of zero data? It's a strange definition
for me.


It's a very natural definition for me.  It mirrors the behaviour of
read() of sparse data inside i_size that file system authors already
have to consider.

It's also a reminder for people that this interface is about avoiding
reading zeros.  Systems that track contents can do this for files that
had tons of zeros written.  The data is there but the app is
specifically asking us to skip it by using SEEK_DATA.

- z



I think we need to consider a hole and data not present/not written yet as 
two different cases even they are related. For example, if I do an fallocate without keep 
size option, then I do a read, I have the same behavior of sparse data inside i_size, but 
the blocks are allocated so no sparse data in this case. Simply there are no difference 
from app point of view.


Exactly. That's why seek_hole should identify them alike, if possible.
But that should not be a requirement because the sole aim here is
to improve performance. Identifying a hole as data is not the end of
the world. In some cases it may be more efficient. We just have to
ensure that we don't identify data as a hole.

BTW, we still have the fiemap interface that allows users to identify
unwritten extents, etc. Use that if you want that kind of detail.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix warning in iput for bad-inode

2011-08-30 Thread Josef Bacik
On 08/30/2011 12:53 PM, Sergei Trofimovich wrote:
 Running 'sync' program after the load does not finish and eats 
 100%CPU busy-waiting for something in kernel.
 
 It's easy to reproduce hang with patch for me. I just run
 liferea and sync after it. Without patch I haven't managed to
 hang btrfs up.
 
 And I think it's another btrfs bug. I've managed to reproduce it 
 _without_ your patch and _without_ autodefrag enabled by manually 
 running the following commands: $ btrfs fi defrag 
 file-with-20_000-extents $ sync
 
 I think your patch just shuffles things a bit and forces
 autodefrag races to pop-up sooner (which is good! :])
 

Sergei, can you do sysrq+w when this is happening, and maybe turn on
the softlockup detector so we can see where sync is getting stuck?
Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix warning in iput for bad-inode

2011-08-30 Thread Sergei Trofimovich
On Tue, 30 Aug 2011 14:02:37 -0400
Josef Bacik jo...@redhat.com wrote:

 On 08/30/2011 12:53 PM, Sergei Trofimovich wrote:
  Running 'sync' program after the load does not finish and eats 
  100%CPU busy-waiting for something in kernel.
  
  It's easy to reproduce hang with patch for me. I just run
  liferea and sync after it. Without patch I haven't managed to
  hang btrfs up.
  
  And I think it's another btrfs bug. I've managed to reproduce it 
  _without_ your patch and _without_ autodefrag enabled by manually 
  running the following commands: $ btrfs fi defrag 
  file-with-20_000-extents $ sync
  
  I think your patch just shuffles things a bit and forces
  autodefrag races to pop-up sooner (which is good! :])
  
 
 Sergei, can you do sysrq+w when this is happening, and maybe turn on
 the softlockup detector so we can see where sync is getting stuck?
 Thanks,

Sure. As I keep telling about 2 cases in IRC I will state both here explicitely:

==The First Issue (aka The Hung sync() case) ==

- it's an unpatched linus's v3.1-rc4-80-g0f43dd5
- /dev/root on / type btrfs (rw,noatime,compress=lzo)
- 50% full 30GB filesystem (usual nonmixed mode)

How I hung it:
$ /usr/sbin/filefrag ~/.bogofilter/wordlist.db 
/home/st/.bogofilter/wordlist.db: 19070 extents found
  the file is 138MB sqlite database for bayesian SPAM filter,
  it's being read and written every 20 minutes or so.
  Maybe, it was writtent even in defrag/sync time!
$~/dev/git/btrfs-progs-unstable/btrfs fi defrag ~/.bogofilter/wordlist.db 
$ sync
^Chung in D-state

I didn't try to reproduce it yet.
As for lockdep I'll try but I'm afraid I will fail to reproduce, but I'll try 
tomorrow.
I suspect I'll need to seriously fragment some file first down to such horrible 
state.

With help of David I've some (hopefully relevant) info:
#!/bin/sh -x

for i in $(ps aux|grep  D[+ ]\?|awk '{print $2}'); do
ps $i
sudo cat /proc/$i/stack
done

  PID TTY  STAT   TIME COMMAND
 1291 ?D  0:00 [btrfs-endio-wri]
[8130055d] btrfs_tree_read_lock+0x6d/0x120
[812b8e88] btrfs_search_slot+0x698/0x8b0
[812c9e18] btrfs_lookup_csum+0x68/0x190
[812ca10f] __btrfs_lookup_bio_sums+0x1cf/0x3e0
[812ca371] btrfs_lookup_bio_sums+0x11/0x20
[812d6a50] btrfs_submit_bio_hook+0x140/0x170
[812ed594] submit_one_bio+0x64/0xa0
[812f14f5] extent_readpages+0xe5/0x100
[812d7aaa] btrfs_readpages+0x1a/0x20
[810a6a02] __do_page_cache_readahead+0x1d2/0x280
[810a6d8c] ra_submit+0x1c/0x20
[810a6ebd] ondemand_readahead+0x12d/0x270
[810a70cc] page_cache_sync_readahead+0x2c/0x40
[81309987] __load_free_space_cache+0x1a7/0x5b0
[81309e61] load_free_space_cache+0xd1/0x190
[812be07b] cache_block_group+0xab/0x290
[812c3def] find_free_extent.clone.71+0x39f/0xab0
[812c5160] btrfs_reserve_extent+0xe0/0x170
[812c56df] btrfs_alloc_free_block+0xcf/0x330
[812b498d] __btrfs_cow_block+0x11d/0x4a0
[812b4df8] btrfs_cow_block+0xe8/0x1a0
[812b8965] btrfs_search_slot+0x175/0x8b0
[812c9e18] btrfs_lookup_csum+0x68/0x190
[812caf6e] btrfs_csum_file_blocks+0xbe/0x670
[812d7d91] add_pending_csums.clone.39+0x41/0x60
[812da528] btrfs_finish_ordered_io+0x218/0x310
[812da635] btrfs_writepage_end_io_hook+0x15/0x20
[8130c71a] end_compressed_bio_write+0x7a/0xe0
[811146f8] bio_endio+0x18/0x30
[812cd8fc] end_workqueue_fn+0xec/0x120
[812fb0ac] worker_loop+0xac/0x520
[8105d486] kthread+0x96/0xa0
[815f9214] kernel_thread_helper+0x4/0x10
[] 0x
  PID TTY  STAT   TIME COMMAND
 1296 ?D  0:00 [btrfs-transacti]
[812d3d6e] btrfs_commit_transaction+0x22e/0x870
[812cd5bb] transaction_kthread+0x26b/0x280
[8105d486] kthread+0x96/0xa0
[815f9214] kernel_thread_helper+0x4/0x10
[] 0x
  PID TTY  STAT   TIME COMMAND
 2891 pts/0D+ 0:00 sync
[8109d069] sleep_on_page+0x9/0x10
[8109d28e] wait_on_page_bit+0x6e/0x80
[8109d9aa] filemap_fdatawait_range+0xfa/0x190
[8109da62] filemap_fdatawait+0x22/0x30
[8110ae5a] sync_inodes_sb+0x17a/0x1e0
[8110eae8] __sync_filesystem+0x78/0x80
[8110eb07] sync_one_sb+0x17/0x20
[810e9cc5] iterate_supers+0x95/0xf0
[8110ea2b] sync_filesystems+0x1b/0x20
[8110eb8c] sys_sync+0x1c/0x40
[815f813b] system_call_fastpath+0x16/0x1b
[] 0x
  PID TTY  STAT   TIME COMMAND
cat: /proc/4454/stack: Нет такого файла или каталога

disassebbled btrfs_tree_read_lock:

813004f0 btrfs_tree_read_lock push   %rbp
813004f1 btrfs_tree_read_lock+0x1 mov%gs:0xb5c0,%rax
813004fa btrfs_tree_read_lock+0xa mov%rsp,%rbp
813004fd 

Re: [PATCH] btrfs: fix warning in iput for bad-inode

2011-08-30 Thread Josef Bacik
On 08/30/2011 03:31 PM, Sergei Trofimovich wrote:
 On Tue, 30 Aug 2011 14:02:37 -0400 Josef Bacik jo...@redhat.com
 wrote:
 
 On 08/30/2011 12:53 PM, Sergei Trofimovich wrote:
 Running 'sync' program after the load does not finish and
 eats 100%CPU busy-waiting for something in kernel.
 
 It's easy to reproduce hang with patch for me. I just run 
 liferea and sync after it. Without patch I haven't managed
 to hang btrfs up.
 
 And I think it's another btrfs bug. I've managed to reproduce
 it _without_ your patch and _without_ autodefrag enabled by
 manually running the following commands: $ btrfs fi defrag 
 file-with-20_000-extents $ sync
 
 I think your patch just shuffles things a bit and forces 
 autodefrag races to pop-up sooner (which is good! :])
 
 
 Sergei, can you do sysrq+w when this is happening, and maybe turn
 on the softlockup detector so we can see where sync is getting
 stuck? Thanks,
 
 Sure. As I keep telling about 2 cases in IRC I will state both here
 explicitely:
 
 ==The First Issue (aka The Hung sync() case) ==
 
 - it's an unpatched linus's v3.1-rc4-80-g0f43dd5 - /dev/root on /
 type btrfs (rw,noatime,compress=lzo) - 50% full 30GB filesystem
 (usual nonmixed mode)
 
 How I hung it: $ /usr/sbin/filefrag ~/.bogofilter/wordlist.db 
 /home/st/.bogofilter/wordlist.db: 19070 extents found the file is
 138MB sqlite database for bayesian SPAM filter, it's being read and
 written every 20 minutes or so. Maybe, it was writtent even in
 defrag/sync time! $~/dev/git/btrfs-progs-unstable/btrfs fi defrag
 ~/.bogofilter/wordlist.db $ sync ^Chung in D-state
 
 I didn't try to reproduce it yet. As for lockdep I'll try but I'm
 afraid I will fail to reproduce, but I'll try tomorrow. I suspect
 I'll need to seriously fragment some file first down to such
 horrible state.
 
 With help of David I've some (hopefully relevant) info: #!/bin/sh
 -x
 
 for i in $(ps aux|grep  D[+ ]\?|awk '{print $2}'); do ps $i sudo
 cat /proc/$i/stack done
 
 PID TTY  STAT   TIME COMMAND 1291 ?D  0:00
 [btrfs-endio-wri] [8130055d]
 btrfs_tree_read_lock+0x6d/0x120 [812b8e88]
 btrfs_search_slot+0x698/0x8b0 [812c9e18]
 btrfs_lookup_csum+0x68/0x190 [812ca10f]
 __btrfs_lookup_bio_sums+0x1cf/0x3e0 [812ca371]
 btrfs_lookup_bio_sums+0x11/0x20 [812d6a50]
 btrfs_submit_bio_hook+0x140/0x170 [812ed594]
 submit_one_bio+0x64/0xa0 [812f14f5]
 extent_readpages+0xe5/0x100 [812d7aaa]
 btrfs_readpages+0x1a/0x20 [810a6a02]
 __do_page_cache_readahead+0x1d2/0x280 [810a6d8c]
 ra_submit+0x1c/0x20 [810a6ebd]
 ondemand_readahead+0x12d/0x270 [810a70cc]
 page_cache_sync_readahead+0x2c/0x40 [81309987]
 __load_free_space_cache+0x1a7/0x5b0 [81309e61]
 load_free_space_cache+0xd1/0x190 [812be07b]
 cache_block_group+0xab/0x290 [812c3def]
 find_free_extent.clone.71+0x39f/0xab0 [812c5160]
 btrfs_reserve_extent+0xe0/0x170 [812c56df]
 btrfs_alloc_free_block+0xcf/0x330 [812b498d]
 __btrfs_cow_block+0x11d/0x4a0 [812b4df8]
 btrfs_cow_block+0xe8/0x1a0 [812b8965]
 btrfs_search_slot+0x175/0x8b0 [812c9e18]
 btrfs_lookup_csum+0x68/0x190 [812caf6e]
 btrfs_csum_file_blocks+0xbe/0x670 [812d7d91]
 add_pending_csums.clone.39+0x41/0x60 [812da528]
 btrfs_finish_ordered_io+0x218/0x310 [812da635]
 btrfs_writepage_end_io_hook+0x15/0x20 [8130c71a]
 end_compressed_bio_write+0x7a/0xe0 [811146f8]
 bio_endio+0x18/0x30 [812cd8fc]
 end_workqueue_fn+0xec/0x120 [812fb0ac]
 worker_loop+0xac/0x520 [8105d486] kthread+0x96/0xa0 
 [815f9214] kernel_thread_helper+0x4/0x10 
 [] 0x

Ok this should have been fixed with

Btrfs: use the commit_root for reading free_space_inode crcs

which is commit # 2cf8572dac62cc2ff7e995173e95b6c694401b3f.  Does your
kernel have this commit?  Because if it does then we did something
wrong.  If not it should be in linus's latest tree, so update and it
should go away (hopefully).  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: skip locking if searching the commit root in lookup_csums

2011-08-30 Thread Josef Bacik
It's not enough to just search the commit root, since we could be cow'ing the
very block we need to search through, which would mean that its locked and we'll
still deadlock.  So use path-skip_locking as well.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/file-item.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index b910694..a1cb782 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -183,8 +183,10 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
 * read from the commit root and sidestep a nasty deadlock
 * between reading the free space cache and updating the csum tree.
 */
-   if (btrfs_is_free_space_inode(root, inode))
+   if (btrfs_is_free_space_inode(root, inode)) {
path-search_commit_root = 1;
+   path-skip_locking = 1;
+   }
 
disk_bytenr = (u64)bio-bi_sector  9;
if (dio)
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix warning in iput for bad-inode

2011-08-30 Thread Josef Bacik
On 08/30/2011 03:40 PM, Josef Bacik wrote:
 On 08/30/2011 03:31 PM, Sergei Trofimovich wrote:
 On Tue, 30 Aug 2011 14:02:37 -0400 Josef Bacik jo...@redhat.com
 wrote:

 On 08/30/2011 12:53 PM, Sergei Trofimovich wrote:
 Running 'sync' program after the load does not finish and
 eats 100%CPU busy-waiting for something in kernel.

 It's easy to reproduce hang with patch for me. I just run 
 liferea and sync after it. Without patch I haven't managed
 to hang btrfs up.

 And I think it's another btrfs bug. I've managed to reproduce
 it _without_ your patch and _without_ autodefrag enabled by
 manually running the following commands: $ btrfs fi defrag 
 file-with-20_000-extents $ sync

 I think your patch just shuffles things a bit and forces 
 autodefrag races to pop-up sooner (which is good! :])


 Sergei, can you do sysrq+w when this is happening, and maybe turn
 on the softlockup detector so we can see where sync is getting
 stuck? Thanks,

 Sure. As I keep telling about 2 cases in IRC I will state both here
 explicitely:

 ==The First Issue (aka The Hung sync() case) ==

 - it's an unpatched linus's v3.1-rc4-80-g0f43dd5 - /dev/root on /
 type btrfs (rw,noatime,compress=lzo) - 50% full 30GB filesystem
 (usual nonmixed mode)

 How I hung it: $ /usr/sbin/filefrag ~/.bogofilter/wordlist.db 
 /home/st/.bogofilter/wordlist.db: 19070 extents found the file is
 138MB sqlite database for bayesian SPAM filter, it's being read and
 written every 20 minutes or so. Maybe, it was writtent even in
 defrag/sync time! $~/dev/git/btrfs-progs-unstable/btrfs fi defrag
 ~/.bogofilter/wordlist.db $ sync ^Chung in D-state

 I didn't try to reproduce it yet. As for lockdep I'll try but I'm
 afraid I will fail to reproduce, but I'll try tomorrow. I suspect
 I'll need to seriously fragment some file first down to such
 horrible state.

 With help of David I've some (hopefully relevant) info: #!/bin/sh
 -x

 for i in $(ps aux|grep  D[+ ]\?|awk '{print $2}'); do ps $i sudo
 cat /proc/$i/stack done

 PID TTY  STAT   TIME COMMAND 1291 ?D  0:00
 [btrfs-endio-wri] [8130055d]
 btrfs_tree_read_lock+0x6d/0x120 [812b8e88]
 btrfs_search_slot+0x698/0x8b0 [812c9e18]
 btrfs_lookup_csum+0x68/0x190 [812ca10f]
 __btrfs_lookup_bio_sums+0x1cf/0x3e0 [812ca371]
 btrfs_lookup_bio_sums+0x11/0x20 [812d6a50]
 btrfs_submit_bio_hook+0x140/0x170 [812ed594]
 submit_one_bio+0x64/0xa0 [812f14f5]
 extent_readpages+0xe5/0x100 [812d7aaa]
 btrfs_readpages+0x1a/0x20 [810a6a02]
 __do_page_cache_readahead+0x1d2/0x280 [810a6d8c]
 ra_submit+0x1c/0x20 [810a6ebd]
 ondemand_readahead+0x12d/0x270 [810a70cc]
 page_cache_sync_readahead+0x2c/0x40 [81309987]
 __load_free_space_cache+0x1a7/0x5b0 [81309e61]
 load_free_space_cache+0xd1/0x190 [812be07b]
 cache_block_group+0xab/0x290 [812c3def]
 find_free_extent.clone.71+0x39f/0xab0 [812c5160]
 btrfs_reserve_extent+0xe0/0x170 [812c56df]
 btrfs_alloc_free_block+0xcf/0x330 [812b498d]
 __btrfs_cow_block+0x11d/0x4a0 [812b4df8]
 btrfs_cow_block+0xe8/0x1a0 [812b8965]
 btrfs_search_slot+0x175/0x8b0 [812c9e18]
 btrfs_lookup_csum+0x68/0x190 [812caf6e]
 btrfs_csum_file_blocks+0xbe/0x670 [812d7d91]
 add_pending_csums.clone.39+0x41/0x60 [812da528]
 btrfs_finish_ordered_io+0x218/0x310 [812da635]
 btrfs_writepage_end_io_hook+0x15/0x20 [8130c71a]
 end_compressed_bio_write+0x7a/0xe0 [811146f8]
 bio_endio+0x18/0x30 [812cd8fc]
 end_workqueue_fn+0xec/0x120 [812fb0ac]
 worker_loop+0xac/0x520 [8105d486] kthread+0x96/0xa0 
 [815f9214] kernel_thread_helper+0x4/0x10 
 [] 0x
 
 Ok this should have been fixed with
 
 Btrfs: use the commit_root for reading free_space_inode crcs
 
 which is commit # 2cf8572dac62cc2ff7e995173e95b6c694401b3f.  Does your
 kernel have this commit?  Because if it does then we did something
 wrong.  If not it should be in linus's latest tree, so update and it
 should go away (hopefully).  Thanks,
 

Oops looks like that patch won't fix it completely, I just sent another
patch that will fix this problem totally, sorry about that

[PATCH] Btrfs: skip locking if searching the commit root in lookup_csums

Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix warning in iput for bad-inode

2011-08-30 Thread Sergei Trofimovich
 On 08/30/2011 03:40 PM, Josef Bacik wrote:
  On 08/30/2011 03:31 PM, Sergei Trofimovich wrote:
  On Tue, 30 Aug 2011 14:02:37 -0400 Josef Bacik jo...@redhat.com
  wrote:
 
  On 08/30/2011 12:53 PM, Sergei Trofimovich wrote:
  Running 'sync' program after the load does not finish and
  eats 100%CPU busy-waiting for something in kernel.
 
  It's easy to reproduce hang with patch for me. I just run 
  liferea and sync after it. Without patch I haven't managed
  to hang btrfs up.
 
  And I think it's another btrfs bug. I've managed to reproduce
  it _without_ your patch and _without_ autodefrag enabled by
  manually running the following commands: $ btrfs fi defrag 
  file-with-20_000-extents $ sync
 
  I think your patch just shuffles things a bit and forces 
  autodefrag races to pop-up sooner (which is good! :])
 
 
  Sergei, can you do sysrq+w when this is happening, and maybe turn
  on the softlockup detector so we can see where sync is getting
  stuck? Thanks,
 
  Sure. As I keep telling about 2 cases in IRC I will state both here
  explicitely:
 
  ==The First Issue (aka The Hung sync() case) ==
 
  - it's an unpatched linus's v3.1-rc4-80-g0f43dd5 - /dev/root on /
  type btrfs (rw,noatime,compress=lzo) - 50% full 30GB filesystem
  (usual nonmixed mode)
 
  How I hung it: $ /usr/sbin/filefrag ~/.bogofilter/wordlist.db 
  /home/st/.bogofilter/wordlist.db: 19070 extents found the file is
  138MB sqlite database for bayesian SPAM filter, it's being read and
  written every 20 minutes or so. Maybe, it was writtent even in
  defrag/sync time! $~/dev/git/btrfs-progs-unstable/btrfs fi defrag
  ~/.bogofilter/wordlist.db $ sync ^Chung in D-state
 
  I didn't try to reproduce it yet. As for lockdep I'll try but I'm
  afraid I will fail to reproduce, but I'll try tomorrow. I suspect
  I'll need to seriously fragment some file first down to such
  horrible state.
 
  With help of David I've some (hopefully relevant) info: #!/bin/sh
  -x
 
  for i in $(ps aux|grep  D[+ ]\?|awk '{print $2}'); do ps $i sudo
  cat /proc/$i/stack done
 
  PID TTY  STAT   TIME COMMAND 1291 ?D  0:00
  [btrfs-endio-wri] [8130055d]
  btrfs_tree_read_lock+0x6d/0x120 [812b8e88]
  btrfs_search_slot+0x698/0x8b0 [812c9e18]
  btrfs_lookup_csum+0x68/0x190 [812ca10f]
  __btrfs_lookup_bio_sums+0x1cf/0x3e0 [812ca371]
  btrfs_lookup_bio_sums+0x11/0x20 [812d6a50]
  btrfs_submit_bio_hook+0x140/0x170 [812ed594]
  submit_one_bio+0x64/0xa0 [812f14f5]
  extent_readpages+0xe5/0x100 [812d7aaa]
  btrfs_readpages+0x1a/0x20 [810a6a02]
  __do_page_cache_readahead+0x1d2/0x280 [810a6d8c]
  ra_submit+0x1c/0x20 [810a6ebd]
  ondemand_readahead+0x12d/0x270 [810a70cc]
  page_cache_sync_readahead+0x2c/0x40 [81309987]
  __load_free_space_cache+0x1a7/0x5b0 [81309e61]
  load_free_space_cache+0xd1/0x190 [812be07b]
  cache_block_group+0xab/0x290 [812c3def]
  find_free_extent.clone.71+0x39f/0xab0 [812c5160]
  btrfs_reserve_extent+0xe0/0x170 [812c56df]
  btrfs_alloc_free_block+0xcf/0x330 [812b498d]
  __btrfs_cow_block+0x11d/0x4a0 [812b4df8]
  btrfs_cow_block+0xe8/0x1a0 [812b8965]
  btrfs_search_slot+0x175/0x8b0 [812c9e18]
  btrfs_lookup_csum+0x68/0x190 [812caf6e]
  btrfs_csum_file_blocks+0xbe/0x670 [812d7d91]
  add_pending_csums.clone.39+0x41/0x60 [812da528]
  btrfs_finish_ordered_io+0x218/0x310 [812da635]
  btrfs_writepage_end_io_hook+0x15/0x20 [8130c71a]
  end_compressed_bio_write+0x7a/0xe0 [811146f8]
  bio_endio+0x18/0x30 [812cd8fc]
  end_workqueue_fn+0xec/0x120 [812fb0ac]
  worker_loop+0xac/0x520 [8105d486] kthread+0x96/0xa0 
  [815f9214] kernel_thread_helper+0x4/0x10 
  [] 0x
  
  Ok this should have been fixed with
  
  Btrfs: use the commit_root for reading free_space_inode crcs
  
  which is commit # 2cf8572dac62cc2ff7e995173e95b6c694401b3f.  Does your
  kernel have this commit?  Because if it does then we did something
  wrong.  If not it should be in linus's latest tree, so update and it
  should go away (hopefully).  Thanks,

Yeah, this one was in my local tree when hung.

 Oops looks like that patch won't fix it completely, I just sent another
 patch that will fix this problem totally, sorry about that
 
 [PATCH] Btrfs: skip locking if searching the commit root in lookup_csums

I'll try to reproduce/test it tomorrow.

About the second one:

 ==The Second Issue (aka The Busy Looping sync() case) ==
 The box is different from first, so conditions are a bit different.
 - /dev/root on / type btrfs (rw,noatime,autodefrag)
   (note autodefrag!)
 - 15% full 594GB filesystem (usual nonmixed mode)
 
$ liferea
wait it to calm down. it does a lot of SQLite reads/writes
$ sync
Got CPU is 100% loaded hung

Still reproducible with 2 patches above + $SUBJ 

Re: [PATCH] btrfs: fix warning in iput for bad-inode

2011-08-30 Thread Sergei Trofimovich
 About the second one:
 
  ==The Second Issue (aka The Busy Looping sync() case) ==
  The box is different from first, so conditions are a bit different.
  - /dev/root on / type btrfs (rw,noatime,autodefrag)
(note autodefrag!)
  - 15% full 594GB filesystem (usual nonmixed mode)
  
 $ liferea
 wait it to calm down. it does a lot of SQLite reads/writes
 $ sync
 Got CPU is 100% loaded hung
 
 Still reproducible with 2 patches above + $SUBJ one. strace says it hangs in
 strace() syscall. Stack trace is odd:
 # cat /proc/`pidof sync`/stack
 [] 0x

got stack of the looper with such hack:
$ while :; do sudo echo -; cat /proc/2304/stack | fgrep -v 
'[] 0x'; done | uniq

[8109d39d] find_get_pages_tag+0x14d/0x1a0
[81076981] __lock_acquire+0x5a1/0xbd0
[8109e7bd] filemap_fdatawait_range+0x9d/0x1b0
[8109e8f2] filemap_fdatawait+0x22/0x30
[81106a04] sync_inodes_sb+0x1c4/0x250
[8110b398] __sync_filesystem+0x78/0x80
[8110b3b7] sync_one_sb+0x17/0x20
[810e3fbe] iterate_supers+0x7e/0xe0
[8110b400] sys_sync+0x40/0x60
[81456afb] system_call_fastpath+0x16/0x1b
-
[811069f1] sync_inodes_sb+0x1b1/0x250
[8110b398] __sync_filesystem+0x78/0x80
[8110b3b7] sync_one_sb+0x17/0x20
[810e3fbe] iterate_supers+0x7e/0xe0
[8110b400] sys_sync+0x40/0x60
[81456afb] system_call_fastpath+0x16/0x1b
-
[811069f1] sync_inodes_sb+0x1b1/0x250
[8110b398] __sync_filesystem+0x78/0x80
[8110b3b7] sync_one_sb+0x17/0x20
[810e3fbe] iterate_supers+0x7e/0xe0
[8110b400] sys_sync+0x40/0x60
[81456afb] system_call_fastpath+0x16/0x1b
-
[814564b6] retint_kernel+0x26/0x30
[8109d398] find_get_pages_tag+0x148/0x1a0
[810a7a60] pagevec_lookup_tag+0x20/0x30
[8109e7bd] filemap_fdatawait_range+0x9d/0x1b0
[8109e8f2] filemap_fdatawait+0x22/0x30
[81106a04] sync_inodes_sb+0x1c4/0x250
[8110b398] __sync_filesystem+0x78/0x80
[8110b3b7] sync_one_sb+0x17/0x20
[810e3fbe] iterate_supers+0x7e/0xe0
[8110b400] sys_sync+0x40/0x60
[81456afb] system_call_fastpath+0x16/0x1b

Looks like dirty inode count (or dirty page count) is affected by
the $SUBJ patch.

-- 

  Sergei


signature.asc
Description: PGP signature


Re: [PATCH 1/4] fs: add SEEK_HOLE and SEEK_DATA flags

2011-08-30 Thread Dave Chinner
On Mon, Aug 22, 2011 at 07:56:31PM +0200, Marco Stornelli wrote:
 Il 22/08/2011 17:57, Sunil Mushran ha scritto:
 Any proposal that differentiates between holes is wrong. It should not
 matter where the hole is.
 
 Think of it from the usage-pov.
 
 doff = 0;
 while ((doff = lseek(SEEK_DATA, doff)) != -ENXIO) {
 hoff = lseek(SEEK_HOLE, doff);
 read_offset = doff;
 read_len = hoff -doff;
 process();
 doff = hoff;
 }
 
 The goal is to make this as efficient as follows. Treating the last
 hole differently adds more code for no benefit.
 
 Mmmm.It seems that Josef has to be clear in this point. However I
 looked for the seek hole test in xfs test suite, but I didn't find
 anything. Btrfs guys, how have you got tested the implementation? What
 do you think about this corner case? Al, what do you think about it?
 
 
 The following test was used to test the early implementations.
 http://oss.oracle.com/~smushran/seek_data/
 
 
 Thank you very much!! I found another point. Your test fails with my
 implementation because here
 (http://www.austingroupbugs.net/view.php?id=415) says: If whence is
 SEEK_DATA, the file offset shall be set to the smallest location of
 a byte not within a hole and not less than offset. It shall be an
 error if no such byte exists. So in this case I return ENXIO but
 the test expects another value. I have to say that there is a bit of
 confusion about the real behavior of this new feature :)

Which is exactly why I'm trying to get the definitions clarified
first, then the behaviour codified in a single test suite we can
call the 'authoritive test'.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests 255: add a seek_data/seek_hole tester

2011-08-30 Thread Sunil Mushran

On 08/25/2011 06:35 PM, Dave Chinner wrote:

Agreed, that's the way I'd interpret it, too. So perhaps we need to
ensure that this interpretation is actually tested by this test?

How about some definitions to work by:

Data: a range of the file that contains valid data, regardless of
whether it exists in memory or on disk. The valid data can be
preceeded and/or followed by an arbitrary number of zero bytes
dependent on the underlying implementation of hole detection.

Hole: a range of the file that contains no data or is made up
entirely of  NULL (zero) data. Holes include preallocated ranges of
files that have not had actual data written to them.

Does that make sense? It has sufficient flexibility in it for the
existing generic non-implementation, allows for filesystems to
define their own hole detection boundaries (e.g. filesystem block
size), and effectively defines how preallocated ranges from
fallocate() should be treated (i.e. as holes). If we can agree on
those definitions, I think that we should document them in both the
kernel and the man page that defines SEEK_HOLE/SEEK_DATA so everyone
is on the same page...


We should not tie in the definition to existing fs technologies. Instead
we should let the fs weigh the cost of providing accurate information
with the possible gain in performance.

Data:
A range in a file that could contain something other than nulls.
If in doubt, it is data.

Hole:
A range in a file that only contains nulls.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests 255: add a seek_data/seek_hole tester

2011-08-30 Thread Dave Chinner
On Tue, Aug 30, 2011 at 06:17:02PM -0700, Sunil Mushran wrote:
 On 08/25/2011 06:35 PM, Dave Chinner wrote:
 Agreed, that's the way I'd interpret it, too. So perhaps we need to
 ensure that this interpretation is actually tested by this test?
 
 How about some definitions to work by:
 
 Data: a range of the file that contains valid data, regardless of
 whether it exists in memory or on disk. The valid data can be
 preceeded and/or followed by an arbitrary number of zero bytes
 dependent on the underlying implementation of hole detection.
 
 Hole: a range of the file that contains no data or is made up
 entirely of  NULL (zero) data. Holes include preallocated ranges of
 files that have not had actual data written to them.
 
 Does that make sense? It has sufficient flexibility in it for the
 existing generic non-implementation, allows for filesystems to
 define their own hole detection boundaries (e.g. filesystem block
 size), and effectively defines how preallocated ranges from
 fallocate() should be treated (i.e. as holes). If we can agree on
 those definitions, I think that we should document them in both the
 kernel and the man page that defines SEEK_HOLE/SEEK_DATA so everyone
 is on the same page...
 
 We should not tie in the definition to existing fs technologies.

Such as? If we don't use well known, well defined terminology, we
end up with ambiguous, vague functionality and inconsistent
implementations.

 Instead
 we should let the fs weigh the cost of providing accurate information
 with the possible gain in performance.
 
 Data:
 A range in a file that could contain something other than nulls.
 If in doubt, it is data.
 
 Hole:
 A range in a file that only contains nulls.

And that's -exactly- the ambiguous, vague definition that has raised
all these questions in the first place. I was in doubt about whether
unwritten extents can be considered a hole, and by your definition
that means it should be data. But Andreas seems to be in no doubt it
should be considered a hole.

Hence if I implement XFS support and Andreas implements ext4 support
by your defintion, we end with vastly different behaviour even
though the two filesystems use the same underlying technology for
preallocated ranges. That's exactly the inconsistency in
implementation that I'd like us to avoid.

IOWs, the definition needs to be clear enough to prevent these
inconsistencies from occurring. Indeed, the phrase preallocated
ranges that have not had data written to them is as independent of
filesystem implementation or technologies as possible. However,
because Linux supports preallocation (unlike our reference
platform), and we encourage developers to use it where appropriate,
it is best that we define how we expect such ranges to behave
clearly. That makes life easier for everyone.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests 255: add a seek_data/seek_hole tester

2011-08-30 Thread david

On Wed, 31 Aug 2011, Dave Chinner wrote:


On Tue, Aug 30, 2011 at 06:17:02PM -0700, Sunil Mushran wrote:

On 08/25/2011 06:35 PM, Dave Chinner wrote:

Agreed, that's the way I'd interpret it, too. So perhaps we need to
ensure that this interpretation is actually tested by this test?

How about some definitions to work by:

Data: a range of the file that contains valid data, regardless of
whether it exists in memory or on disk. The valid data can be
preceeded and/or followed by an arbitrary number of zero bytes
dependent on the underlying implementation of hole detection.

Hole: a range of the file that contains no data or is made up
entirely of  NULL (zero) data. Holes include preallocated ranges of
files that have not had actual data written to them.

Does that make sense? It has sufficient flexibility in it for the
existing generic non-implementation, allows for filesystems to
define their own hole detection boundaries (e.g. filesystem block
size), and effectively defines how preallocated ranges from
fallocate() should be treated (i.e. as holes). If we can agree on
those definitions, I think that we should document them in both the
kernel and the man page that defines SEEK_HOLE/SEEK_DATA so everyone
is on the same page...


We should not tie in the definition to existing fs technologies.


Such as? If we don't use well known, well defined terminology, we
end up with ambiguous, vague functionality and inconsistent
implementations.


Instead
we should let the fs weigh the cost of providing accurate information
with the possible gain in performance.

Data:
A range in a file that could contain something other than nulls.
If in doubt, it is data.

Hole:
A range in a file that only contains nulls.


And that's -exactly- the ambiguous, vague definition that has raised
all these questions in the first place. I was in doubt about whether
unwritten extents can be considered a hole, and by your definition
that means it should be data. But Andreas seems to be in no doubt it
should be considered a hole.

Hence if I implement XFS support and Andreas implements ext4 support
by your defintion, we end with vastly different behaviour even
though the two filesystems use the same underlying technology for
preallocated ranges. That's exactly the inconsistency in
implementation that I'd like us to avoid.

IOWs, the definition needs to be clear enough to prevent these
inconsistencies from occurring. Indeed, the phrase preallocated
ranges that have not had data written to them is as independent of
filesystem implementation or technologies as possible. However,
because Linux supports preallocation (unlike our reference
platform), and we encourage developers to use it where appropriate,
it is best that we define how we expect such ranges to behave
clearly. That makes life easier for everyone.


Since a sparse file has the holes filled by nulls by definition, it seems 
fairly clear that they chould count as holes. In fact, I would not be 
surprised to see some filesystem _only_ report the unwritten pieces of 
sparse files as holes (not any other ranges of nulls)


the question I have is how large does the range of nulls need to be before 
it's reported as a hole? disk sectors, filesystem blocks, other?


David Lang
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: specify label length larger than 255 bytes cause mkfs.btrfs buffer overflow

2011-08-30 Thread Jeff Liu

Hello,

While going through the mkfs.c, I noticed there is an issue for label 
length checking, mkfs.btrfs will crashed if the label length exceeding 
255 bytes, it's easy to triggered that out as below:


jeff@pibroch:~/opensource/btrfs-progs$ sudo ./mkfs.btrfs -L `perl -e 
'print Ax256'` /usr/src/linux-3.0/img0


WARNING! - Btrfs v0.19-35-g1b444cd IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

*** buffer overflow detected ***: ./mkfs.btrfs terminated
=== Backtrace: =
/lib/i386-linux-gnu/libc.so.6(__fortify_fail+0x50)[0xb7774df0]
/lib/i386-linux-gnu/libc.so.6(+0xe4cca)[0xb7773cca]
/lib/i386-linux-gnu/libc.so.6(__strcpy_chk+0x3f)[0xb777305f]
./mkfs.btrfs[0x805acc4]
./mkfs.btrfs[0x805def6]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0xb76a5e37]
./mkfs.btrfs[0x8048ef1]
=== Memory map: 
..


a tiny patch could fix it.

Signed-off-by: Jie Liu jeff@oracle.com

---
 mkfs.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index 2e99b95..1598aae 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -308,9 +308,9 @@ static char *parse_label(char *input)
 int i;
 int len = strlen(input);

-if (len  BTRFS_LABEL_SIZE) {
+if (len = BTRFS_LABEL_SIZE) {
 fprintf(stderr, Label %s is too long (max %d)\n, input,
-BTRFS_LABEL_SIZE);
+BTRFS_LABEL_SIZE - 1);
 exit(1);
 }
 for (i = 0; i  len; i++) {
--
1.7.4.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests 255: add a seek_data/seek_hole tester

2011-08-30 Thread Sunil Mushran

On 8/30/2011 8:29 PM, Dave Chinner wrote:

And that's -exactly- the ambiguous, vague definition that has raised
all these questions in the first place. I was in doubt about whether
unwritten extents can be considered a hole, and by your definition
that means it should be data. But Andreas seems to be in no doubt it
should be considered a hole.


Fair enough. Let me rephrase.

Data:
A range in a file when read could return something other than nulls.

Hole:
A range in a file when read only returns nulls.

Considering preallocated extents only return null, they should
be considered holes.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests 255: add a seek_data/seek_hole tester

2011-08-30 Thread Dan Merillat
On Tue, Aug 30, 2011 at 11:29 PM, Dave Chinner da...@fromorbit.com wrote:
 On Tue, Aug 30, 2011 at 06:17:02PM -0700, Sunil Mushran wrote:
 Instead
 we should let the fs weigh the cost of providing accurate information
 with the possible gain in performance.

 Data:
 A range in a file that could contain something other than nulls.
 If in doubt, it is data.

 Hole:
 A range in a file that only contains nulls.

 And that's -exactly- the ambiguous, vague definition that has raised
 all these questions in the first place. I was in doubt about whether
 unwritten extents can be considered a hole, and by your definition
 that means it should be data. But Andreas seems to be in no doubt it
 should be considered a hole.

That's fine, though.   Different filesystems have different abilities
to recognize a data hole - FAT can't do it at all.   Perhaps the
requirements would be better stated in reverse:  If the filesystem
knows that a read() will return nulls (for whatever reason based on
it's internal knowledge), it can report a hole.  If it can't guarantee
that, it's data.   It's an absolute requirement that SEEK_DATA never
miss data.  SEEK_HOLE working is a nicety that userspace would
appreciate - remember that the consumer here is cp(1), using it to
skip empty portions of files and create sparse destination files.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html