Re: kernel BUG at fs/btrfs/extent-tree.c:6164!

2011-06-07 Thread Tsutomu Itoh
(2011/06/08 0:46), Chris Mason wrote:
> Excerpts from liubo's message of 2011-06-07 04:36:56 -0400:
>> On 06/07/2011 04:24 PM, Tsutomu Itoh wrote:
>>> (2011/06/07 15:17), Tsutomu Itoh wrote:
 (2011/06/07 14:59), Tsutomu Itoh wrote:
> Hi liubo,
>
> (2011/06/07 14:31), liubo wrote:
>> On 06/06/2011 04:33 PM, Tsutomu Itoh wrote:
>>> Hi,
>>>
>>> I encountered following panic using 'btrfs-unstable + for-linus'
>>> kernel.
>>>
>>> I ran "btrfs fi bal /test5" command, and mount option of /test5
>>> is as follows:
>>>
>>>  /dev/sdc3 on /test5 type btrfs 
>>> (rw,space_cache,compress=lzo,inode_cache)
>>>
>> So, just a "btrfs fi bal" would lead to the bug?
> I think so.
> 
> It should be specific to the inode caching code.  The balancing code is
> finding the inode map cache extents, but it doesn't know how to relocate
> them.

However, the panic has occurred even if inode_cahce is turned off.
Is this another problem?

---
Tsutomu



device fsid a46d03b5cb35c93-4713fead8acc709e devid 1 transid 7 /dev/sdc3
btrfs: enabling disk space caching
btrfs: use lzo compression
device fsid 914b303425ef9825-e448135c0d20babe devid 1 transid 7 /dev/sdd4
btrfs: disk space caching is enabled
btrfs: relocating block group 1103101952 flags 9
btrfs: found 540 extents
btrfs: found 540 extents
[ cut here ]
kernel BUG at fs/btrfs/extent-tree.c:1424!
invalid opcode:  [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand 
acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd 
dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg 
pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug 
i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom 
megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last 
unloaded: microcode]

Pid: 26884, comm: btrfs Not tainted 2.6.39btrfs-test+ #4 FUJITSU-SV  
PRIMERGY/D2399
RIP: 0010:[]  [] 
lookup_inline_extent_backref+0x2d2/0x3f0 [btrfs]
RSP: 0018:8801475db748  EFLAGS: 00010202
RAX: 0001 RBX: 880141d1a6d0 RCX: 8801475da000
RDX: 0008 RSI: 8800 RDI: 
RBP: 8801475db7e8 R08: 0001 R09: 6db6db6db6db6db7
R10: 0001 R11: 0014 R12: 00b8
R13: 880142bc8a08 R14: 0001 R15: 000d
FS:  7fbbaa8b0740() GS:88019fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0033cfeda340 CR3: 000145c04000 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs (pid: 26884, threadinfo 8801475da000, task 880160806ab0)
Stack:
 8801475db778 a0331ca6 88018c1087c8 8801475db830
 0821 000181f43000 8801475db7e8 88012cc27800
 082e 0794475db9a9 000181f43000 004000a8
Call Trace:
 [] ? btrfs_mark_buffer_dirty+0xb6/0x130 [btrfs]
 [] insert_inline_extent_backref+0x69/0x100 [btrfs]
 [] ? kmem_cache_alloc+0x186/0x190
 [] __btrfs_inc_extent_ref+0xa3/0x1e0 [btrfs]
 [] ? update_block_group+0xd9/0x2a0 [btrfs]
 [] run_clustered_refs+0x664/0x7f0 [btrfs]
 [] btrfs_run_delayed_refs+0xc8/0x210 [btrfs]
 [] btrfs_commit_transaction+0x7d/0x790 [btrfs]
 [] ? wake_up_bit+0x40/0x40
 [] prepare_to_merge+0x1fd/0x230 [btrfs]
 [] relocate_block_group+0x476/0x660 [btrfs]
 [] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
 [] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
 [] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
 [] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
 [] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
 [] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [] ? btrfs_previous_item+0xb1/0x150 [btrfs]
 [] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [] btrfs_balance+0x21a/0x2b0 [btrfs]
 [] ? path_openat+0x101/0x3d0
 [] btrfs_ioctl+0x51c/0xc40 [btrfs]
 [] ? handle_mm_fault+0x148/0x270
 [] ? do_page_fault+0x1d8/0x4b0
 [] do_vfs_ioctl+0x9a/0x540
 [] sys_ioctl+0xa1/0xb0
 [] system_call_fastpath+0x16/0x1b
Code: 48 8b 75 20 48 89 c3 48 8b 7d 18 e8 c9 bd ff ff 48 39 d8 77 26 b8 1d 00 
00 00 e9 15 ff ff ff a8 01 0f 85 8c fe ff ff 0f 0b eb fe <0f> 0b eb fe 0f 0b 0f 
1f 84 00 00 00 00 00 eb f6 4c 89 fb 44 8b
RIP  [] lookup_inline_extent_backref+0x2d2/0x3f0 [btrfs]
 RSP 


> 
> I think we need to switch the inode map cache over to regular extents
> that are not preallocated.  It will fix the overflow problem and it will
> fix the balancing.
> 
> There are a lot of special cases for the free extent cache that don't
> apply to the inode map cache, and I think sharing the extent
> preallocation is hurting us.
> 
> -chris
> 
> 

--
To unsubscribe from this list: se

[PATCH] Btrfs: avoid stack bloat in btrfs_ioctl_fs_info()

2011-06-07 Thread Li Zefan
The size of struct btrfs_ioctl_fs_info_args is as big as 1KB, so
don't declare the variable on stack.

Signed-off-by: Li Zefan 
---
 fs/btrfs/ioctl.c |   23 ++-
 1 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ac37040..9705c5c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2054,29 +2054,34 @@ static long btrfs_ioctl_rm_dev(struct btrfs_root *root, 
void __user *arg)
 
 static long btrfs_ioctl_fs_info(struct btrfs_root *root, void __user *arg)
 {
-   struct btrfs_ioctl_fs_info_args fi_args;
+   struct btrfs_ioctl_fs_info_args *fi_args;
struct btrfs_device *device;
struct btrfs_device *next;
struct btrfs_fs_devices *fs_devices = root->fs_info->fs_devices;
+   int ret = 0;
 
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
 
-   fi_args.num_devices = fs_devices->num_devices;
-   fi_args.max_id = 0;
-   memcpy(&fi_args.fsid, root->fs_info->fsid, sizeof(fi_args.fsid));
+   fi_args = kzalloc(sizeof(*fi_args), GFP_KERNEL);
+   if (!fi_args)
+   return -ENOMEM;
+
+   fi_args->num_devices = fs_devices->num_devices;
+   memcpy(&fi_args->fsid, root->fs_info->fsid, sizeof(fi_args->fsid));
 
mutex_lock(&fs_devices->device_list_mutex);
list_for_each_entry_safe(device, next, &fs_devices->devices, dev_list) {
-   if (device->devid > fi_args.max_id)
-   fi_args.max_id = device->devid;
+   if (device->devid > fi_args->max_id)
+   fi_args->max_id = device->devid;
}
mutex_unlock(&fs_devices->device_list_mutex);
 
-   if (copy_to_user(arg, &fi_args, sizeof(fi_args)))
-   return -EFAULT;
+   if (copy_to_user(arg, fi_args, sizeof(fi_args)))
+   ret = -EFAULT;
 
-   return 0;
+   kfree(fi_args);
+   return ret;
 }
 
 static long btrfs_ioctl_dev_info(struct btrfs_root *root, void __user *arg)
-- 1.7.3.1 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: use join_transaction in btrfs_evict_inode()

2011-06-07 Thread Li Zefan
The WARN_ON() in start_transaction() was triggered while balancing.

The cause is btrfs_relocate_chunk() started a transaction and
then called iput() on the inode that stores free space cache,
and iput() called btrfs_start_transaction() again.

Reported-by: Tsutomu Itoh 
Signed-off-by: Li Zefan 
---
 fs/btrfs/inode.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 02ff4a1..4e9aa28 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3646,7 +3646,7 @@ void btrfs_evict_inode(struct inode *inode)
btrfs_i_size_write(inode, 0);
 
while (1) {
-   trans = btrfs_start_transaction(root, 0);
+   trans = btrfs_join_transaction(root);
BUG_ON(IS_ERR(trans));
trans->block_rsv = root->orphan_block_rsv;
 
-- 1.7.3.1 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Delayed inode operations not doing the right thing with enospc

2011-06-07 Thread Josef Bacik
On 06/06/2011 09:39 PM, Miao Xie wrote:
> On fri, 03 Jun 2011 14:46:10 -0400, Josef Bacik wrote:
>> I got a lot of these when running stress.sh on my test box
>>
>> [ 9792.654889] [ cut here ]
>> [ 9792.654898] WARNING: at fs/btrfs/extent-tree.c:5681
>> btrfs_alloc_free_block+0xca/0x27c [btrfs]()
>> [ 9792.654899] Hardware name: To Be Filled By O.E.M.
>> [ 9792.654900] Modules linked in: btrfs zlib_deflate libcrc32c
>> ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
>> arc4 rt61pci rt2x00pci rt2x00lib snd_hda_codec_hdmi mac80211
>> snd_hda_codec_realtek cfg80211 snd_hda_intel edac_core snd_seq rfkill
>> pcspkr serio_raw snd_hda_codec eeprom_93cx6 edac_mce_amd sp5100_tco
>> i2c_piix4 k10temp snd_hwdep snd_seq_device snd_pcm floppy r8169 xhci_hcd
>> mii snd_timer snd soundcore snd_page_alloc ipv6 firewire_ohci pata_acpi
>> ata_generic firewire_core pata_via crc_itu_t radeon ttm drm_kms_helper
>> drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
>> [ 9792.654919] Pid: 2762, comm: rm Tainted: GW   2.6.39+ #1
>> [ 9792.654920] Call Trace:
>> [ 9792.654922]  [] warn_slowpath_common+0x83/0x9b
>> [ 9792.654925]  [] warn_slowpath_null+0x1a/0x1c
>> [ 9792.654933]  [] btrfs_alloc_free_block+0xca/0x27c
>> [btrfs]
>> [ 9792.654945]  [] ? map_extent_buffer+0x6e/0xa8 [btrfs]
>> [ 9792.654953]  [] __btrfs_cow_block+0xfc/0x30c [btrfs]
>> [ 9792.654963]  [] ? btrfs_buffer_uptodate+0x47/0x58
>> [btrfs]
>> [ 9792.654970]  [] ? read_block_for_search+0x94/0x368
>> [btrfs]
>> [ 9792.654978]  [] btrfs_cow_block+0xfe/0x146 [btrfs]
>> [ 9792.654986]  [] btrfs_search_slot+0x14d/0x4b6 [btrfs]
>> [ 9792.654997]  [] ? map_extent_buffer+0x6e/0xa8 [btrfs]
>> [ 9792.655022]  [] btrfs_lookup_inode+0x2f/0x8f [btrfs]
>> [ 9792.655025]  [] ? _cond_resched+0xe/0x22
>> [ 9792.655027]  [] ? mutex_lock+0x29/0x50
>> [ 9792.655039]  []
>> btrfs_update_delayed_inode+0x72/0x137 [btrfs]
>> [ 9792.655051]  [] btrfs_run_delayed_items+0x90/0xdb
>> [btrfs]
>> [ 9792.655062]  []
>> btrfs_commit_transaction+0x228/0x654 [btrfs]
>> [ 9792.655064]  [] ? remove_wait_queue+0x3a/0x3a
>> [ 9792.655075]  [] btrfs_evict_inode+0x14d/0x202 [btrfs]
>> [ 9792.655077]  [] evict+0x71/0x111
>> [ 9792.655079]  [] iput+0x12a/0x132
>> [ 9792.655081]  [] do_unlinkat+0x106/0x155
>> [ 9792.655083]  [] ? path_put+0x1f/0x23
>> [ 9792.655085]  [] ? audit_syscall_entry+0x145/0x171
>> [ 9792.655087]  [] ? putname+0x34/0x36
>> [ 9792.655090]  [] sys_unlinkat+0x29/0x2b
>> [ 9792.655092]  [] system_call_fastpath+0x16/0x1b
>> [ 9792.655093] ---[ end trace 02b696eb02b3f768 ]---
>>
>>
>> This is because use_block_rsv() is having to do a
>> reserve_metadata_bytes(), which shouldn't happen as we should have
>> reserved enough space for those operations to complete.  This is
>> happening because use_block_rsv() will call get_block_rsv(), which if
>> root->ref_cows is set (which is the case on all fs roots) we will use
>> trans->block_rsv, which will only have what the current transaction
>> starter had reserved.
>>
>> What needs to be done instead is we need to have a block reserve that
>> any reservation that is done at create time for these inodes is migrated
>> to this special reserve, and then when you run the delayed inode items
>> stuff you set trans->block_rsv to the special block reserve so the
>> accounting is all done properly.
>>
>> This is just off the top of my head, there may be a better way to do it,
>> I've not actually looked that the delayed inode code at all.
>>
>> I would do this myself but I have a ever increasing list of shit to do
>> so will somebody pick this up and fix it please?  Thanks,
> 
> Sorry, it's my miss.
> I forgot to set trans->block_rsv to global_block_rsv, since we have migrated
> the space from trans_block_rsv to global_block_rsv.
> 
> I'll fix it soon.
> 

There is another problem, we're failing xfstest 204.  I tried making
reserve_metadata_bytes commit the transaction regardless of whether or
not there were pinned bytes but the test just hung there.  Usually it
takes 7 seconds to run and I ctrl+c'ed it after a couple of minutes.
204 just creates a crap ton of files, which is what is killing us.
There needs to be a way to start flushing delayed inode items so we can
reclaim the space they are holding onto so we don't get enospc, and it
needs to be better than just committing the transaction because that is
dog slow.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: do transaction space reservation before joining the transaction

2011-06-07 Thread Josef Bacik
We have to do weird things when handling enospc in the transaction joining code.
Because we've already joined the transaction we cannot commit the transaction
within the reservation code since it will deadlock, so we have to return EAGAIN
and then make sure we don't retry too many times.  Instead of doing this, just
do the reservation the normal way before we join the transaction, that way we
can do whatever we want to try and reclaim space, and then if it fails we know
for sure we are out of space and we can return ENOSPC.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/ctree.h   |3 ---
 fs/btrfs/extent-tree.c |   20 
 fs/btrfs/transaction.c |   36 +---
 3 files changed, 17 insertions(+), 42 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0c62c6c..6034a23 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2205,9 +2205,6 @@ void btrfs_set_inode_space_info(struct btrfs_root *root, 
struct inode *ionde);
 void btrfs_clear_space_info_full(struct btrfs_fs_info *info);
 int btrfs_check_data_free_space(struct inode *inode, u64 bytes);
 void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes);
-int btrfs_trans_reserve_metadata(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root,
-   int num_items);
 void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
struct btrfs_root *root);
 int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index aa2b592a..b1c3ff7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3878,26 +3878,6 @@ int btrfs_truncate_reserve_metadata(struct 
btrfs_trans_handle *trans,
return 0;
 }
 
-int btrfs_trans_reserve_metadata(struct btrfs_trans_handle *trans,
-struct btrfs_root *root,
-int num_items)
-{
-   u64 num_bytes;
-   int ret;
-
-   if (num_items == 0 || root->fs_info->chunk_root == root)
-   return 0;
-
-   num_bytes = btrfs_calc_trans_metadata_size(root, num_items);
-   ret = btrfs_block_rsv_add(trans, root, &root->fs_info->trans_block_rsv,
- num_bytes);
-   if (!ret) {
-   trans->bytes_reserved += num_bytes;
-   trans->block_rsv = &root->fs_info->trans_block_rsv;
-   }
-   return ret;
-}
-
 void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
  struct btrfs_root *root)
 {
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index dd71966..c277448 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -203,7 +203,7 @@ static struct btrfs_trans_handle *start_transaction(struct 
btrfs_root *root,
 {
struct btrfs_trans_handle *h;
struct btrfs_transaction *cur_trans;
-   int retries = 0;
+   u64 num_bytes = 0;
int ret;
 
if (root->fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR)
@@ -217,6 +217,19 @@ static struct btrfs_trans_handle *start_transaction(struct 
btrfs_root *root,
h->block_rsv = NULL;
goto got_it;
}
+
+   /*
+* Do the reservation before we join the transaction so we can do all
+* the appropriate flushing if need be.
+*/
+   if (num_items > 0 && root != root->fs_info->chunk_root) {
+   num_bytes = btrfs_calc_trans_metadata_size(root, num_items);
+   ret = btrfs_block_rsv_add(NULL, root,
+ &root->fs_info->trans_block_rsv,
+ num_bytes);
+   if (ret)
+   return ERR_PTR(ret);
+   }
 again:
h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS);
if (!h)
@@ -253,24 +266,9 @@ again:
goto again;
}
 
-   if (num_items > 0) {
-   ret = btrfs_trans_reserve_metadata(h, root, num_items);
-   if (ret == -EAGAIN && !retries) {
-   retries++;
-   btrfs_commit_transaction(h, root);
-   goto again;
-   } else if (ret == -EAGAIN) {
-   /*
-* We have already retried and got EAGAIN, so really we
-* don't have space, so set ret to -ENOSPC.
-*/
-   ret = -ENOSPC;
-   }
-
-   if (ret < 0) {
-   btrfs_end_transaction(h, root);
-   return ERR_PTR(ret);
-   }
+   if (num_bytes) {
+   h->block_rsv = &root->fs_info->trans_block_rsv;
+   h->bytes_reserved = num_bytes;
}
 
 got_it:
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
th

[PATCH 2/2] Btrfs: serialize flushers in reserve_metadata_bytes

2011-06-07 Thread Josef Bacik
We keep having problems with early enospc, and that's because our method of
making space is inherently racy.  The problem is we can have one guy trying to
make space for himself, and in the meantime people come in and steal his
reservation.  In order to stop this we make a waitqueue and put anybody who
comes into reserve_metadata_bytes on that waitqueue if somebody is trying to
make more space.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/ctree.h   |3 ++
 fs/btrfs/extent-tree.c |   69 
 2 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6034a23..8857d82 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -756,6 +756,8 @@ struct btrfs_space_info {
   chunks for this space */
unsigned int chunk_alloc:1; /* set if we are allocating a chunk */
 
+   unsigned int flush:1;   /* set if we are trying to make space */
+
unsigned int force_alloc;   /* set if we need to force a chunk
   alloc for this space */
 
@@ -766,6 +768,7 @@ struct btrfs_space_info {
spinlock_t lock;
struct rw_semaphore groups_sem;
atomic_t caching_threads;
+   wait_queue_head_t wait;
 };
 
 struct btrfs_block_rsv {
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b1c3ff7..d86f7c5 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2932,6 +2932,8 @@ static int update_space_info(struct btrfs_fs_info *info, 
u64 flags,
found->full = 0;
found->force_alloc = CHUNK_ALLOC_NO_FORCE;
found->chunk_alloc = 0;
+   found->flush = 0;
+   init_waitqueue_head(&found->wait);
*space_info = found;
list_add_rcu(&found->list, &info->space_info);
atomic_set(&found->caching_threads, 0);
@@ -3314,9 +3316,13 @@ static int shrink_delalloc(struct btrfs_trans_handle 
*trans,
if (reserved == 0)
return 0;
 
-   /* nothing to shrink - nothing to reclaim */
-   if (root->fs_info->delalloc_bytes == 0)
+   smp_mb();
+   if (root->fs_info->delalloc_bytes == 0) {
+   if (trans)
+   return 0;
+   btrfs_wait_ordered_extents(root, 0, 0);
return 0;
+   }
 
max_reclaim = min(reserved, to_reclaim);
 
@@ -3360,6 +3366,8 @@ static int shrink_delalloc(struct btrfs_trans_handle 
*trans,
}
 
}
+   if (reclaimed >= to_reclaim && !trans)
+   btrfs_wait_ordered_extents(root, 0, 0);
return reclaimed >= to_reclaim;
 }
 
@@ -3384,15 +3392,36 @@ static int reserve_metadata_bytes(struct 
btrfs_trans_handle *trans,
u64 num_bytes = orig_bytes;
int retries = 0;
int ret = 0;
-   bool reserved = false;
bool committed = false;
+   bool flushing = false;
 
 again:
-   ret = -ENOSPC;
-   if (reserved)
-   num_bytes = 0;
-
+   ret = 0;
spin_lock(&space_info->lock);
+   /*
+* We only want to wait if somebody other than us is flushing and we are
+* actually alloed to flush.
+*/
+   while (flush && !flushing && space_info->flush) {
+   spin_unlock(&space_info->lock);
+   /*
+* If we have a trans handle we can't wait because the flusher
+* may have to commit the transaction, which would mean we would
+* deadlock since we are waiting for the flusher to finish, but
+* hold the current transaction open.
+*/
+   if (trans)
+   return -EAGAIN;
+   ret = wait_event_interruptible(space_info->wait,
+  !space_info->flush);
+   /* Must have been interrupted, return */
+   if (ret)
+   return -EINTR;
+
+   spin_lock(&space_info->lock);
+   }
+
+   ret = -ENOSPC;
unused = space_info->bytes_used + space_info->bytes_reserved +
 space_info->bytes_pinned + space_info->bytes_readonly +
 space_info->bytes_may_use;
@@ -3407,8 +3436,7 @@ again:
if (unused <= space_info->total_bytes) {
unused = space_info->total_bytes - unused;
if (unused >= num_bytes) {
-   if (!reserved)
-   space_info->bytes_may_use += orig_bytes;
+   space_info->bytes_may_use += orig_bytes;
ret = 0;
} else {
/*
@@ -3433,17 +3461,14 @@ again:
 * to reclaim space we can actually use it instead of somebody else
 * stealing it from us.
 */
-   if (ret && !reserved) {
-   space_info->bytes_may_use += orig_bytes;
-   reserved = true;
+   if (ret && 

[PATCH 0/2] Fix ENOSPC regression

2011-06-07 Thread Josef Bacik
Sergei accidently introduced a regression with

c4f675cd40d955d539180506c09515c90169b15b

The problem isn't his patch, it's that we are entirely too touchy to changes in
this area because the way we deal with pressure is racy in general.  The other
problem is even though delalloc bytes are 0, we still may not have reclaimed
space, rather we need to wait for the ordered extents to reclaim the space.  So
this patch set does that and it serialize the flushers to close this race we've
always had.  This fixes normal enospc cases we were seeing.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix btrfs_update_reserved_bytes usage

2011-06-07 Thread Josef Bacik
For some reason btrfs_update_reserved_bytes was only ever updating the
bytes_reserved counter of the space info if the space info was data.  I assume
this is because the original enospc stuff used bytes_reserved to account for
space reserved for enospc accounting, but now that we're using bytes_may_use
thats incorrect.  So this patch fixes btrfs_update_reserved_bytes to always
update the space_info as well.  Also it fixes a weird case where we tried to add
the space to the enospc accounting stuff.  Rather than doing that just add it
back to the space info and then it can be accounted for later.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/ctree.h|2 +-
 fs/btrfs/extent-tree.c  |   76 --
 fs/btrfs/free-space-cache.c |4 +-
 3 files changed, 25 insertions(+), 57 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 93a409f..0c62c6c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2177,7 +2177,7 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
 
 int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len);
 int btrfs_update_reserved_bytes(struct btrfs_block_group_cache *cache,
-   u64 num_bytes, int reserve, int sinfo);
+   u64 num_bytes, int reserve);
 int btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans,
struct btrfs_root *root);
 int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 933d7dc..aa2b592a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4194,44 +4194,33 @@ int btrfs_pin_extent(struct btrfs_root *root,
 
 /*
  * update size of reserved extents. this function may return -EAGAIN
- * if 'reserve' is true or 'sinfo' is false.
+ * if 'reserve' is true.
  */
 int btrfs_update_reserved_bytes(struct btrfs_block_group_cache *cache,
-   u64 num_bytes, int reserve, int sinfo)
+   u64 num_bytes, int reserve)
 {
int ret = 0;
-   if (sinfo) {
-   struct btrfs_space_info *space_info = cache->space_info;
-   spin_lock(&space_info->lock);
-   spin_lock(&cache->lock);
-   if (reserve) {
-   if (cache->ro) {
-   ret = -EAGAIN;
-   } else {
-   cache->reserved += num_bytes;
-   space_info->bytes_reserved += num_bytes;
-   }
-   } else {
-   if (cache->ro)
-   space_info->bytes_readonly += num_bytes;
-   cache->reserved -= num_bytes;
-   space_info->bytes_reserved -= num_bytes;
-   space_info->reservation_progress++;
-   }
-   spin_unlock(&cache->lock);
-   spin_unlock(&space_info->lock);
-   } else {
-   spin_lock(&cache->lock);
+   struct btrfs_space_info *space_info = cache->space_info;
+
+   spin_lock(&space_info->lock);
+   spin_lock(&cache->lock);
+   if (reserve) {
if (cache->ro) {
ret = -EAGAIN;
} else {
-   if (reserve)
-   cache->reserved += num_bytes;
-   else
-   cache->reserved -= num_bytes;
+   cache->reserved += num_bytes;
+   space_info->bytes_reserved += num_bytes;
}
-   spin_unlock(&cache->lock);
+   } else {
+   if (cache->ro)
+   space_info->bytes_readonly += num_bytes;
+   cache->reserved -= num_bytes;
+   WARN_ON(space_info->bytes_reserved < num_bytes);
+   space_info->bytes_reserved -= num_bytes;
+   space_info->reservation_progress++;
}
+   spin_unlock(&cache->lock);
+   spin_unlock(&space_info->lock);
return ret;
 }
 
@@ -4679,27 +4668,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle 
*trans,
WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, &buf->bflags));
 
btrfs_add_free_space(cache, buf->start, buf->len);
-   ret = btrfs_update_reserved_bytes(cache, buf->len, 0, 0);
-   if (ret == -EAGAIN) {
-   /* block group became read-only */
-   btrfs_update_reserved_bytes(cache, buf->len, 0, 1);
-   goto out;
-   }
-
-   ret = 1;
-   spin_lock(&block_rsv->lock);
-   if (block_rsv->reserved < block_rsv->size) {
-   block_rsv->reserved += buf->len;
-   ret = 0;
-   }
-   spin_unlock(&block_rsv->lock);
-
-

[PATCH] Btrfs: account for space reservations properly V2

2011-06-07 Thread Josef Bacik
We have been using space_info->bytes_reserved in the metadata case to cover our
reservations for ENOSPC.  The problem with this is thats horribly wrong.  We use
bytes_reserved to keep track of how many bytes the allocator has outstanding
that haven't actually been made into extents yet.  So what has been happening is
that we've been using bytes_reserved for our ENOSPC reservations and our
allocations.

Currently that isn't a big deal, everything is being accounted for
appropriately.  The only thing this affects is how we allocate chunks, so we've
grown all these horrible things to make sure we don't end up with a stupid
amount of metadata chunks.  The problem is we think that the entire space is
used up because we use bytes_used and bytes_reserved to get an idea of how much
is actually in use by real data, but thats not the case.

So switch over to using bytes_may_use, which the data space info stuff has
already been using for the same exact reason.  This will allow us to go back to
pre-emptively allocating chunks in the enospc code.  Thanks,

Signed-off-by: Josef Bacik 
---
V1->V2:
-fixed updating bytes_reserved in free_tree_block
-update bytes_may_use in unpin_extent_range

 fs/btrfs/ctree.h   |2 +-
 fs/btrfs/extent-tree.c |   22 +++---
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 91806fe..93a409f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -745,7 +745,7 @@ struct btrfs_space_info {
 
/*
 * we bump reservation progress every time we decrement
-* bytes_reserved.  This way people waiting for reservations
+* bytes_may_use.  This way people waiting for reservations
 * know something good has happened and they can check
 * for progress.  The number here isn't to be trusted, it
 * just shows reclaim activity
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b42efc2..933d7dc 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3308,7 +3308,7 @@ static int shrink_delalloc(struct btrfs_trans_handle 
*trans,
space_info = block_rsv->space_info;
 
smp_mb();
-   reserved = space_info->bytes_reserved;
+   reserved = space_info->bytes_may_use;
progress = space_info->reservation_progress;
 
if (reserved == 0)
@@ -3328,9 +3328,9 @@ static int shrink_delalloc(struct btrfs_trans_handle 
*trans,
writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages);
 
spin_lock(&space_info->lock);
-   if (reserved > space_info->bytes_reserved)
-   reclaimed += reserved - space_info->bytes_reserved;
-   reserved = space_info->bytes_reserved;
+   if (reserved > space_info->bytes_may_use)
+   reclaimed += reserved - space_info->bytes_may_use;
+   reserved = space_info->bytes_may_use;
spin_unlock(&space_info->lock);
 
loops++;
@@ -3408,7 +3408,7 @@ again:
unused = space_info->total_bytes - unused;
if (unused >= num_bytes) {
if (!reserved)
-   space_info->bytes_reserved += orig_bytes;
+   space_info->bytes_may_use += orig_bytes;
ret = 0;
} else {
/*
@@ -3434,7 +3434,7 @@ again:
 * stealing it from us.
 */
if (ret && !reserved) {
-   space_info->bytes_reserved += orig_bytes;
+   space_info->bytes_may_use += orig_bytes;
reserved = true;
}
 
@@ -3495,7 +3495,7 @@ again:
 out:
if (reserved) {
spin_lock(&space_info->lock);
-   space_info->bytes_reserved -= orig_bytes;
+   space_info->bytes_may_use -= orig_bytes;
spin_unlock(&space_info->lock);
}
 
@@ -3579,7 +3579,7 @@ static void block_rsv_release_bytes(struct 
btrfs_block_rsv *block_rsv,
}
if (num_bytes) {
spin_lock(&space_info->lock);
-   space_info->bytes_reserved -= num_bytes;
+   space_info->bytes_may_use -= num_bytes;
space_info->reservation_progress++;
spin_unlock(&space_info->lock);
}
@@ -3791,12 +3791,12 @@ static void update_global_block_rsv(struct 
btrfs_fs_info *fs_info)
if (sinfo->total_bytes > num_bytes) {
num_bytes = sinfo->total_bytes - num_bytes;
block_rsv->reserved += num_bytes;
-   sinfo->bytes_reserved += num_bytes;
+   sinfo->bytes_may_use += num_bytes;
}
 
if (block_rsv->reserved >= block_rsv->size) {
num_bytes = block_rsv->reserved - block_rsv->size;
-   sinfo->bytes_reserved -= num_bytes;
+   sinfo->bytes_may

Re: New btrfsck status

2011-06-07 Thread Jeff Putney
Me too.  I've got a 9TB filesystem that I can't mount since rebooting
during a rebalance.  I want to get the fs as repaired as possible, but
I am not in a hurry, and I have enough space at present to make a
duplicate and play with test versions of the repair.

--jeff

On Mon, Jun 6, 2011 at 9:41 AM, Christian Hesse  wrote:
> Chris Mason on 10 Feb 13:17:
>> Excerpts from Ben Gamari's message of 2011-02-09 21:52:20 -0500:
>> > Over the last several months there have been many claims regarding
>> > the release of the rewritten btrfsck. Unfortunately, despite
>> > numerous claims that it will be released Real Soon Now(c), I have
>> > yet to see even a repository with preliminary code. Did I miss an
>> > announcement? There is something to be said for "release early,
>> > release often." Is there a timeline for getting btrfsck into some
>> > sort of usable form?
>>
>> Yes, but its still real soon now.  I've been at about 90% done since
>> Christmas.  It would have been out last week but I've been chasing a
>> debugging a very difficult corruption under load.
>>
>> I finally found a race in btrfs causing the corruption and now I'm
>> back on fsck full time again.
>
> This mail was about four month ago...
> Any news on this topic?
>
> I really would like to test btrfs on my desktop systems, but I still
> hesitate because of the missing fsck.
> --
> Schoene Gruesse
> Chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/extent-tree.c:6164!

2011-06-07 Thread Chris Mason
Excerpts from liubo's message of 2011-06-07 04:36:56 -0400:
> On 06/07/2011 04:24 PM, Tsutomu Itoh wrote:
> > (2011/06/07 15:17), Tsutomu Itoh wrote:
> >> (2011/06/07 14:59), Tsutomu Itoh wrote:
> >>> Hi liubo,
> >>>
> >>> (2011/06/07 14:31), liubo wrote:
>  On 06/06/2011 04:33 PM, Tsutomu Itoh wrote:
> > Hi,
> >
> > I encountered following panic using 'btrfs-unstable + for-linus'
> > kernel.
> >
> > I ran "btrfs fi bal /test5" command, and mount option of /test5
> > is as follows:
> >
> >  /dev/sdc3 on /test5 type btrfs 
> > (rw,space_cache,compress=lzo,inode_cache)
> >
>  So, just a "btrfs fi bal" would lead to the bug?
> >>> I think so.

It should be specific to the inode caching code.  The balancing code is
finding the inode map cache extents, but it doesn't know how to relocate
them.

I think we need to switch the inode map cache over to regular extents
that are not preallocated.  It will fix the overflow problem and it will
fix the balancing.

There are a lot of special cases for the free extent cache that don't
apply to the inode map cache, and I think sharing the extent
preallocation is hurting us.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Delayed inode operations not doing the right thing with enospc

2011-06-07 Thread Josef Bacik

On 06/06/2011 09:39 PM, Miao Xie wrote:

On fri, 03 Jun 2011 14:46:10 -0400, Josef Bacik wrote:

I got a lot of these when running stress.sh on my test box

[ 9792.654889] [ cut here ]
[ 9792.654898] WARNING: at fs/btrfs/extent-tree.c:5681
btrfs_alloc_free_block+0xca/0x27c [btrfs]()
[ 9792.654899] Hardware name: To Be Filled By O.E.M.
[ 9792.654900] Modules linked in: btrfs zlib_deflate libcrc32c
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
arc4 rt61pci rt2x00pci rt2x00lib snd_hda_codec_hdmi mac80211
snd_hda_codec_realtek cfg80211 snd_hda_intel edac_core snd_seq rfkill
pcspkr serio_raw snd_hda_codec eeprom_93cx6 edac_mce_amd sp5100_tco
i2c_piix4 k10temp snd_hwdep snd_seq_device snd_pcm floppy r8169 xhci_hcd
mii snd_timer snd soundcore snd_page_alloc ipv6 firewire_ohci pata_acpi
ata_generic firewire_core pata_via crc_itu_t radeon ttm drm_kms_helper
drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
[ 9792.654919] Pid: 2762, comm: rm Tainted: GW   2.6.39+ #1
[ 9792.654920] Call Trace:
[ 9792.654922]  [] warn_slowpath_common+0x83/0x9b
[ 9792.654925]  [] warn_slowpath_null+0x1a/0x1c
[ 9792.654933]  [] btrfs_alloc_free_block+0xca/0x27c
[btrfs]
[ 9792.654945]  [] ? map_extent_buffer+0x6e/0xa8 [btrfs]
[ 9792.654953]  [] __btrfs_cow_block+0xfc/0x30c [btrfs]
[ 9792.654963]  [] ? btrfs_buffer_uptodate+0x47/0x58
[btrfs]
[ 9792.654970]  [] ? read_block_for_search+0x94/0x368
[btrfs]
[ 9792.654978]  [] btrfs_cow_block+0xfe/0x146 [btrfs]
[ 9792.654986]  [] btrfs_search_slot+0x14d/0x4b6 [btrfs]
[ 9792.654997]  [] ? map_extent_buffer+0x6e/0xa8 [btrfs]
[ 9792.655022]  [] btrfs_lookup_inode+0x2f/0x8f [btrfs]
[ 9792.655025]  [] ? _cond_resched+0xe/0x22
[ 9792.655027]  [] ? mutex_lock+0x29/0x50
[ 9792.655039]  []
btrfs_update_delayed_inode+0x72/0x137 [btrfs]
[ 9792.655051]  [] btrfs_run_delayed_items+0x90/0xdb
[btrfs]
[ 9792.655062]  []
btrfs_commit_transaction+0x228/0x654 [btrfs]
[ 9792.655064]  [] ? remove_wait_queue+0x3a/0x3a
[ 9792.655075]  [] btrfs_evict_inode+0x14d/0x202 [btrfs]
[ 9792.655077]  [] evict+0x71/0x111
[ 9792.655079]  [] iput+0x12a/0x132
[ 9792.655081]  [] do_unlinkat+0x106/0x155
[ 9792.655083]  [] ? path_put+0x1f/0x23
[ 9792.655085]  [] ? audit_syscall_entry+0x145/0x171
[ 9792.655087]  [] ? putname+0x34/0x36
[ 9792.655090]  [] sys_unlinkat+0x29/0x2b
[ 9792.655092]  [] system_call_fastpath+0x16/0x1b
[ 9792.655093] ---[ end trace 02b696eb02b3f768 ]---


This is because use_block_rsv() is having to do a
reserve_metadata_bytes(), which shouldn't happen as we should have
reserved enough space for those operations to complete.  This is
happening because use_block_rsv() will call get_block_rsv(), which if
root->ref_cows is set (which is the case on all fs roots) we will use
trans->block_rsv, which will only have what the current transaction
starter had reserved.

What needs to be done instead is we need to have a block reserve that
any reservation that is done at create time for these inodes is migrated
to this special reserve, and then when you run the delayed inode items
stuff you set trans->block_rsv to the special block reserve so the
accounting is all done properly.

This is just off the top of my head, there may be a better way to do it,
I've not actually looked that the delayed inode code at all.

I would do this myself but I have a ever increasing list of shit to do
so will somebody pick this up and fix it please?  Thanks,


Sorry, it's my miss.
I forgot to set trans->block_rsv to global_block_rsv, since we have migrated
the space from trans_block_rsv to global_block_rsv.

I'll fix it soon.



Great thanks Miao,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs: remove 64bit alignment padding to allow extent_buffer to fit into one fewer cacheline

2011-06-07 Thread Richard Kennedy
Reorder extent_buffer to remove 8 bytes of alignment padding on 64 bit
builds. This shrinks its size to 128 bytes allowing it to fit into one
fewer cache lines and allows more objects per slab in its kmem_cache.

slabinfo extent_buffer reports :-

 before:-
Sizes (bytes) Slabs
--
Object : 136  Total  : 123
SlabObj: 136  Full   : 121
SlabSiz:4096  Partial:   0
Loss   :   0  CpuSlab:   2
Align  :   8  Objects:  30

 after :-
Object : 128  Total  :   4
SlabObj: 128  Full   :   2
SlabSiz:4096  Partial:   0
Loss   :   0  CpuSlab:   2
Align  :   8  Objects:  32

Signed-off-by: Richard Kennedy 
---
patch against v3.0-rc2
compiled & tested on x86_64

This has only had a little light testing on a scratch volume but it
still seems to work.

regards
Richard


diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 4e8445a..a11a92e 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -126,9 +126,9 @@ struct extent_buffer {
unsigned long map_len;
struct page *first_page;
unsigned long bflags;
-   atomic_t refs;
struct list_head leak_list;
struct rcu_head rcu_head;
+   atomic_t refs;
 
/* the spinlock is used to protect most operations */
spinlock_t lock;


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/extent-tree.c:6164!

2011-06-07 Thread liubo
On 06/07/2011 04:24 PM, Tsutomu Itoh wrote:
> (2011/06/07 15:17), Tsutomu Itoh wrote:
>> (2011/06/07 14:59), Tsutomu Itoh wrote:
>>> Hi liubo,
>>>
>>> (2011/06/07 14:31), liubo wrote:
 On 06/06/2011 04:33 PM, Tsutomu Itoh wrote:
> Hi,
>
> I encountered following panic using 'btrfs-unstable + for-linus'
> kernel.
>
> I ran "btrfs fi bal /test5" command, and mount option of /test5
> is as follows:
>
>  /dev/sdc3 on /test5 type btrfs (rw,space_cache,compress=lzo,inode_cache)
>
 So, just a "btrfs fi bal" would lead to the bug?
>>> I think so.
>>>
 I've figured out the warnings, but not reproduced the bug yet...
 I used 'btrfs-unstable + for-linus" whose top commit is

 commit aa0467d8d2a00e75b2bb6a56a4ee6d70c5d1928f
 Author: David Sterba 
 Date:   Fri Jun 3 16:29:08 2011 +0200

 btrfs: fix uninitialized variable warning
>>> It's same of my environment.
>>>
 
 and tried on 1) a single disk, 2) 2 disks and 3) 4 disks respectively,
 but none of them leaded to the below bug.
>>> The test script and the volume composition that I am executing are
>>> same as following mail.
>>>
>>>   http://marc.info/?l=linux-btrfs&m=130680171426371&w=2
>>>
>>> and, in my environment, panic is done within almost 30 minutes when
>>> test script is executed.
> 
> I forgot to write.
> I am adding '-o inode_cache' to the mount option in my test script.
> 

Yep, I've added this and reproduced it.
Seems that there are several bugs.

Anyway, thanks for the report.  I'm trying to work it out. :)

thanks,
liubo

>> Another panic occurred when I executed it again.
>>
> 
> I rebuilt the kernel with 3.0-rc2. but, same problem occurred.
> 
> 
> <4>[  131.708325] WARNING: at fs/btrfs/transaction.c:213 
> start_transaction+0x74/0x259 [btrfs]()
> <4>[  131.708329] Hardware name: PRIMERGY
> <4>[  131.708330] Modules linked in: autofs4 sunrpc 8021q garp stp llc 
> cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c 
> libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev 
> parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support 
> tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod 
> crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata 
> scsi_mod floppy [last unloaded: microcode]
> <4>[  131.708378] Pid: 3041, comm: btrfs Not tainted 3.0.0-rc2test #1
> <4>[  131.708381] Call Trace:
> <4>[  131.708388]  [] warn_slowpath_common+0x85/0x9d
> <4>[  131.708392]  [] warn_slowpath_null+0x1a/0x1c
> <4>[  131.708410]  [] start_transaction+0x74/0x259 [btrfs]
> <4>[  131.708430]  [] ? btrfs_wait_ordered_range+0xf9/0x11d 
> [btrfs]
> <4>[  131.708448]  [] btrfs_start_transaction+0x13/0x15 
> [btrfs]
> <4>[  131.708467]  [] btrfs_evict_inode+0x113/0x22d [btrfs]
> <4>[  131.708471]  [] evict+0x77/0x118
> <4>[  131.708475]  [] iput+0x13d/0x146
> <4>[  131.708489]  [] btrfs_remove_block_group+0x14d/0x35b 
> [btrfs]
> <4>[  131.708508]  [] btrfs_relocate_chunk+0x464/0x50d 
> [btrfs]
> <4>[  131.708527]  [] ? btrfs_item_key_to_cpu+0x2a/0x46 
> [btrfs]
> <4>[  131.708545]  [] btrfs_balance+0x1ca/0x219 [btrfs]
> <4>[  131.708563]  [] btrfs_ioctl+0x890/0xb87 [btrfs]
> <4>[  131.708567]  [] ? handle_mm_fault+0x233/0x24a
> <4>[  131.708572]  [] ? do_page_fault+0x340/0x3b2
> <4>[  131.708577]  [] do_vfs_ioctl+0x474/0x4c3
> <4>[  131.708581]  [] ? virt_to_head_page+0xe/0x31
> <4>[  131.708585]  [] ? kmem_cache_free+0x20/0xae
> <4>[  131.708588]  [] sys_ioctl+0x56/0x79
> <4>[  131.708592]  [] system_call_fastpath+0x16/0x1b
> <4>[  131.708595] ---[ end trace 5f962f46d3ba5425 ]---
> <6>[  131.708777] btrfs: relocating block group 29360128 flags 20
> <6>[  132.385682] btrfs: found 85 extents
> <0>[  132.798892] [ cut here ]
> <2>[  132.799014] kernel BUG at fs/btrfs/extent-tree.c:1424!
> <0>[  132.799014] invalid opcode:  [#1] SMP
> <4>[  132.799014] CPU 0
> <4>[  132.799014] Modules linked in: autofs4 sunrpc 8021q garp stp llc 
> cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c 
> libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev 
> parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support 
> tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod 
> crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata 
> scsi_mod floppy [last unloaded: microcode]
> <4>[  132.799014]
> <4>[  132.799014] Pid: 3041, comm: btrfs Tainted: GW   3.0.0-rc2test 
> #1 FUJITSU-SV  PRIMERGY/D2399
> <4>[  132.799014] RIP: 0010:[]  [] 
> lookup_inline_extent_backref+0xe3/0x3a9 [btrfs]
> <4>[  132.799014] RSP: 0018:880193aa5808  EFLAGS: 00010202
> <4>[  132.799014] RAX: 0001 RBX: 880192fac000 RCX: 
> 0002
> <4>[  132.799014] RDX: 0002 RSI:  RDI: 
> 
> <4>[  132.

Re: kernel BUG at fs/btrfs/extent-tree.c:6164!

2011-06-07 Thread Tsutomu Itoh
(2011/06/07 15:17), Tsutomu Itoh wrote:
> (2011/06/07 14:59), Tsutomu Itoh wrote:
>> Hi liubo,
>>
>> (2011/06/07 14:31), liubo wrote:
>>> On 06/06/2011 04:33 PM, Tsutomu Itoh wrote:
 Hi,

 I encountered following panic using 'btrfs-unstable + for-linus'
 kernel.

 I ran "btrfs fi bal /test5" command, and mount option of /test5
 is as follows:

  /dev/sdc3 on /test5 type btrfs (rw,space_cache,compress=lzo,inode_cache)

>>>
>>> So, just a "btrfs fi bal" would lead to the bug?
>>
>> I think so.
>>
>>>
>>> I've figured out the warnings, but not reproduced the bug yet...
>>> I used 'btrfs-unstable + for-linus" whose top commit is
>>>
>>> commit aa0467d8d2a00e75b2bb6a56a4ee6d70c5d1928f
>>> Author: David Sterba 
>>> Date:   Fri Jun 3 16:29:08 2011 +0200
>>>
>>> btrfs: fix uninitialized variable warning
>>
>> It's same of my environment.
>>
>>> 
>>> and tried on 1) a single disk, 2) 2 disks and 3) 4 disks respectively,
>>> but none of them leaded to the below bug.
>>
>> The test script and the volume composition that I am executing are
>> same as following mail.
>>
>>   http://marc.info/?l=linux-btrfs&m=130680171426371&w=2
>>
>> and, in my environment, panic is done within almost 30 minutes when
>> test script is executed.

I forgot to write.
I am adding '-o inode_cache' to the mount option in my test script.

> 
> Another panic occurred when I executed it again.
> 

I rebuilt the kernel with 3.0-rc2. but, same problem occurred.


<4>[  131.708325] WARNING: at fs/btrfs/transaction.c:213 
start_transaction+0x74/0x259 [btrfs]()
<4>[  131.708329] Hardware name: PRIMERGY
<4>[  131.708330] Modules linked in: autofs4 sunrpc 8021q garp stp llc 
cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c 
libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev 
parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 
shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod 
crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata 
scsi_mod floppy [last unloaded: microcode]
<4>[  131.708378] Pid: 3041, comm: btrfs Not tainted 3.0.0-rc2test #1
<4>[  131.708381] Call Trace:
<4>[  131.708388]  [] warn_slowpath_common+0x85/0x9d
<4>[  131.708392]  [] warn_slowpath_null+0x1a/0x1c
<4>[  131.708410]  [] start_transaction+0x74/0x259 [btrfs]
<4>[  131.708430]  [] ? btrfs_wait_ordered_range+0xf9/0x11d 
[btrfs]
<4>[  131.708448]  [] btrfs_start_transaction+0x13/0x15 
[btrfs]
<4>[  131.708467]  [] btrfs_evict_inode+0x113/0x22d [btrfs]
<4>[  131.708471]  [] evict+0x77/0x118
<4>[  131.708475]  [] iput+0x13d/0x146
<4>[  131.708489]  [] btrfs_remove_block_group+0x14d/0x35b 
[btrfs]
<4>[  131.708508]  [] btrfs_relocate_chunk+0x464/0x50d [btrfs]
<4>[  131.708527]  [] ? btrfs_item_key_to_cpu+0x2a/0x46 
[btrfs]
<4>[  131.708545]  [] btrfs_balance+0x1ca/0x219 [btrfs]
<4>[  131.708563]  [] btrfs_ioctl+0x890/0xb87 [btrfs]
<4>[  131.708567]  [] ? handle_mm_fault+0x233/0x24a
<4>[  131.708572]  [] ? do_page_fault+0x340/0x3b2
<4>[  131.708577]  [] do_vfs_ioctl+0x474/0x4c3
<4>[  131.708581]  [] ? virt_to_head_page+0xe/0x31
<4>[  131.708585]  [] ? kmem_cache_free+0x20/0xae
<4>[  131.708588]  [] sys_ioctl+0x56/0x79
<4>[  131.708592]  [] system_call_fastpath+0x16/0x1b
<4>[  131.708595] ---[ end trace 5f962f46d3ba5425 ]---
<6>[  131.708777] btrfs: relocating block group 29360128 flags 20
<6>[  132.385682] btrfs: found 85 extents
<0>[  132.798892] [ cut here ]
<2>[  132.799014] kernel BUG at fs/btrfs/extent-tree.c:1424!
<0>[  132.799014] invalid opcode:  [#1] SMP
<4>[  132.799014] CPU 0
<4>[  132.799014] Modules linked in: autofs4 sunrpc 8021q garp stp llc 
cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c 
libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev 
parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 
shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod 
crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata 
scsi_mod floppy [last unloaded: microcode]
<4>[  132.799014]
<4>[  132.799014] Pid: 3041, comm: btrfs Tainted: GW   3.0.0-rc2test #1 
FUJITSU-SV  PRIMERGY/D2399
<4>[  132.799014] RIP: 0010:[]  [] 
lookup_inline_extent_backref+0xe3/0x3a9 [btrfs]
<4>[  132.799014] RSP: 0018:880193aa5808  EFLAGS: 00010202
<4>[  132.799014] RAX: 0001 RBX: 880192fac000 RCX: 
0002
<4>[  132.799014] RDX: 0002 RSI:  RDI: 

<4>[  132.799014] RBP: 880193aa58a8 R08: 029c R09: 
880193aa56f0
<4>[  132.799014] R10: 880193aa5648 R11: c2d107e744029d66 R12: 
00b2
<4>[  132.799014] R13: 880195075b88 R14: 0001 R15: 

<4>[  132.799014] FS:  7faaaf421740() GS:88019fc0() 
knlGS:
<4>[  132.799014] CS