Re: umount oops

2008-07-25 Thread Lukas Vacek
sorry, I made a typo in the testcase (the second mount)
Basically, it might be enough to mount two different btrfs filesystems
to two different locations, umount one of them and watch
/var/log/kern.log for the oops

dd if=/dev/zero of=mountme bs=4k count=10
dd if=/dev/zero of=mountme2 bs=4k count=10
mkfs.btrfs mountme
mkfs.btrfs mountme2
mkdir loop loop2
mount -o loop mountme loop
mount -o loop mountme2 loop2
umount loop
# wait a moment

On 7/24/08, Lukas Vacek [EMAIL PROTECTED] wrote:
 Hi,

 I tried very promising btrfs to test it a little and I experienced a
 little bug in implementation. I'm not sure where the bug lies however
 this works quite well to reproduce the problem:

 dd if=/dev/zero of=mountme bs=4k count=10
 dd if=/dev/zero of=mountme2 bs=4k count=10
 mkfs.btrfs mountme
 mkfs.btrfs mountme2
 mkdir loop loop2
 mount -o loop mountme loop
 mount -o loop mountme loop2
 umount loop
 # wait a moment

 maybe SMP machine will be necessary to experience the same

 thanks for the (otherwise ;-)) grea work and have a nice day,
 Lukas V.

 the interesting part of log goes next:

 Jul 24 22:44:00 minerva kernel: [ 1478.326985] device fsid
 5442602040543dd1-d32561672012f7a2 devid 1 transid 12 /dev/loop1
 Jul 24 22:44:54 minerva kernel: [ 1532.882212] Unable to handle kernel
 paging request at 7effdc171f47 RIP:
 Jul 24 22:44:54 minerva kernel: [ 1532.882256]  [wq_per_cpu+0x15/0x20]
 wq_per_cpu+0x15/0x20
 Jul 24 22:44:54 minerva kernel: [ 1532.882405] PGD 0
 Jul 24 22:44:54 minerva kernel: [ 1532.882476] Oops:  [1] SMP
 Jul 24 22:44:54 minerva kernel: [ 1532.882572] CPU 1
 Jul 24 22:44:54 minerva kernel: [ 1532.882641] Modules linked in: loop
 binfmt_misc af_packet i915 drm rfcomm l2cap bluetooth ppdev ipv6
 acpi_cpufreq cpufreq_ondemand cpufreq_stats freq_table
 cpufreq_userspace cpufreq_conservative cpufreq_powersave video output
 container sbs sbshc dock battery iptable_filter ip_tables x_tables
 aes_x86_64 dm_crypt dm_mod ac btrfs libcrc32c lp snd_hda_intel
 snd_hwdep snd_pcm_oss snd_pcm snd_page_alloc snd_mixer_oss
 snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event
 snd_seq snd_timer iTCO_wdt iTCO_vendor_support snd_seq_device button
 snd evdev shpchp pci_hotplug intel_agp soundcore parport_pc parport
 pcspkr ext3 jbd mbcache usbhid hid sg sd_mod ata_piix pata_acpi floppy
 ehci_hcd ata_generic libata scsi_mod r8169 uhci_hcd usbcore thermal
 processor fan fbcon tileblit font bitblit softcursor fuse
 Jul 24 22:44:54 minerva kernel: [ 1532.885056] Pid: 3959, comm:
 btrfs-transacti Not tainted 2.6.24-19-generic #1
 Jul 24 22:44:54 minerva kernel: [ 1532.885117] RIP:
 0010:[wq_per_cpu+0x15/0x20]  [wq_per_cpu+0x15/0x20]
 wq_per_cpu+0x15/0x20
 Jul 24 22:44:54 minerva kernel: [ 1532.885224] RSP:
 0018:81003e6a1c08  EFLAGS: 00010246
 Jul 24 22:44:54 minerva kernel: [ 1532.885281] RAX: 7effdc171f3f
 RBX: 81003db0b770 RCX: 
 Jul 24 22:44:54 minerva kernel: [ 1532.885342] RDX: 
 RSI: 0001 RDI: 810023e8e940
 Jul 24 22:44:54 minerva kernel: [ 1532.885404] RBP: 81003db0a000
 R08:  R09: 882c9d80
 Jul 24 22:44:54 minerva kernel: [ 1532.885465] R10: 81003653ade0
 R11:  R12: 81003653ade0
 Jul 24 22:44:54 minerva kernel: [ 1532.885526] R13: 0001
 R14: 810014366680 R15: 
 Jul 24 22:44:54 minerva kernel: [ 1532.885588] FS:
 () GS:81003e401700()
 knlGS:
 Jul 24 22:44:54 minerva kernel: [ 1532.885671] CS:  0010 DS: 0018 ES:
 0018 CR0: 8005003b
 Jul 24 22:44:54 minerva kernel: [ 1532.885728] CR2: 7effdc171f47
 CR3: 225cc000 CR4: 06e0
 Jul 24 22:44:54 minerva kernel: [ 1532.885789] DR0: 
 DR1:  DR2: 
 Jul 24 22:44:54 minerva kernel: [ 1532.885851] DR3: 
 DR6: 0ff0 DR7: 0400
 Jul 24 22:44:54 minerva kernel: [ 1532.885912] Process btrfs-transacti
 (pid: 3959, threadinfo 81003e6a, task 81003c74c7e0)
 Jul 24 22:44:54 minerva kernel: [ 1532.885996] Stack:
 8024f97c 81001e3db480 882ca4ce 882c9d80
 Jul 24 22:44:54 minerva kernel: [ 1532.886193]  01c08fff
 810014366680 81003653ad40 1000
 Jul 24 22:44:54 minerva kernel: [ 1532.886364]  81003ee140e8
 0001 882e19cf 01c09000
 Jul 24 22:44:54 minerva kernel: [ 1532.886495] Call Trace:
 Jul 24 22:44:54 minerva kernel: [ 1532.886587]
 [shpchp:queue_work+0x2c/0x50] queue_work+0x2c/0x50
 Jul 24 22:44:54 minerva kernel: [ 1532.886665]
 [btrfs:btrfs_wq_submit_bio+0xbe/0xf0]
 :btrfs:btrfs_wq_submit_bio+0xbe/0xf0
 Jul 24 22:44:54 minerva kernel: [ 1532.886743]
 [btrfs:__btree_submit_bio_hook+0x0/0x60]
 :btrfs:__btree_submit_bio_hook+0x0/0x60
 Jul 24 22:44:54 minerva kernel: [ 1532.886826]
 [btrfs:submit_one_bio+0xcf/0x120] :btrfs:submit_one_bio+0xcf/0x120

Re: umount oops

2008-07-25 Thread Chris Mason
On Fri, 2008-07-25 at 13:14 +0200, Lukas Vacek wrote:
 sorry, I made a typo in the testcase (the second mount)
 Basically, it might be enough to mount two different btrfs filesystems
 to two different locations, umount one of them and watch
 /var/log/kern.log for the oops
 

Thanks for this bug report, which Btrfs version was in use?

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] async-thread: fix possible memory leak

2008-07-25 Thread Chris Mason
On Fri, 2008-07-25 at 13:34 +0800, Li Zefan wrote:
 When kthread_run() returns failure, this worker hasn't been
 added to the list, so btrfs_stop_workers() won't free it.
 

Thanks, I've queued this one up for inclusion.

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New data=ordered code pushed out to btrfs-unstable

2008-07-25 Thread Chris Mason
On Mon, 2008-07-21 at 15:23 -0400, Ric Wheeler wrote:

  [ lock timeouts and stalls ]
 
 
  Ok, I've made a few changes that should lower overall contenion on the
  allocation mutex.  I'm getting better performance on a 3 million file
  run, please give it a shot.
 
  After an update, clean rebuild  reboot, the test is running along and 
  has hit about 10 million files. I still see some messages like:
 
  INFO: task pdflush:4051 blocked for more than 120 seconds.

The latest code in btrfs-unstable has everything I can safely do right
now :)  

Basically the stalls come from someone doing IO with the allocation
mutex held.  It is surprising that we should be stalling for such a long
time, it is probably a  mixture of elevator starvation and btrfs fun.

But, btrfs-unstable also has code to replace the page lock with a
per-tree block mutex, which will allow me to get rid of the big
allocation mutex over the long term.  I was able to break up most of the
long operations and have them drop/reacquire the allocation mutex to
prevent this starvation most of the time.

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: add orphan support to print-tree

2008-07-25 Thread Josef Bacik
Hello,

This adds orphan support to print-tree so when debug_tree hits an orphan item it
will print out orphan item under it so you know what it is.  Thanks,

Josef


diff -r e08f2f90e4f8 ctree.h
--- a/ctree.h   Thu Jul 24 13:52:04 2008 -0400
+++ b/ctree.h   Fri Jul 25 16:18:38 2008 -0400
@@ -54,6 +54,9 @@ struct btrfs_trans_handle;
 
 /* directory objectid inside the root tree */
 #define BTRFS_ROOT_TREE_DIR_OBJECTID 6ULL
+
+/* oprhan objectid for tracking unlinked/truncated files */
+#define BTRFS_ORPHAN_OBJECTID -5ULL
 
 /*
  * All files have objectids higher than this.
@@ -564,6 +567,7 @@ struct btrfs_root {
 #define BTRFS_INODE_ITEM_KEY   1
 #define BTRFS_INODE_REF_KEY2
 #define BTRFS_XATTR_ITEM_KEY   8
+#define BTRFS_ORPHAN_ITEM_KEY  9
 
 /* reserve 3-15 close to the inode for later flexibility */
 
diff -r e08f2f90e4f8 print-tree.c
--- a/print-tree.c  Thu Jul 24 13:52:04 2008 -0400
+++ b/print-tree.c  Fri Jul 25 16:18:38 2008 -0400
@@ -183,6 +183,9 @@ void btrfs_print_leaf(struct btrfs_root 
di = btrfs_item_ptr(l, i, struct btrfs_dir_item);
print_dir_item(l, item, di);
break;
+   case BTRFS_ORPHAN_ITEM_KEY:
+   printf(\t\torphan item\n);
+   break;
case BTRFS_ROOT_ITEM_KEY:
ri = btrfs_item_ptr(l, i, struct btrfs_root_item);
read_extent_buffer(l, root_item, (unsigned long)ri, 
sizeof(root_item));
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: umount oops

2008-07-25 Thread Chris Mason
On Fri, 2008-07-25 at 15:54 +0200, Lukas Vacek wrote:
 the newest in the mercurial repo
 
 changeset:   558:9da425337329
 tag: tip

This should be fixed by the unstable tree, the transaction work queues
were not properly being torn down.

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] initial version of reference cache

2008-07-25 Thread Yan Zheng
I miss two new created files in previous patch, please use this one. Thanks

---
diff -r eb4767aa190e Makefile
--- a/Makefile  Thu Jul 24 12:25:50 2008 -0400
+++ b/Makefile  Sat Jul 26 03:47:26 2008 +0800
@@ -6,7 +6,8 @@ btrfs-y := super.o ctree.o extent-tree.o
   hash.o file-item.o inode-item.o inode-map.o disk-io.o \
   transaction.o bit-radix.o inode.o file.o tree-defrag.o \
   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
-  extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o
+  extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
+  ref-cache.o

 btrfs-$(CONFIG_FS_POSIX_ACL)   += acl.o
 else
diff -r eb4767aa190e ctree.c
--- a/ctree.c   Thu Jul 24 12:25:50 2008 -0400
+++ b/ctree.c   Sat Jul 26 03:47:26 2008 +0800
@@ -165,7 +165,7 @@ int btrfs_copy_root(struct btrfs_trans_h
btrfs_clear_header_flag(cow, BTRFS_HEADER_FLAG_WRITTEN);

WARN_ON(btrfs_header_generation(buf)  trans-transid);
-   ret = btrfs_inc_ref(trans, new_root, buf);
+   ret = btrfs_inc_ref(trans, new_root, buf, 0);
kfree(new_root);

if (ret)
@@ -232,7 +232,7 @@ int __btrfs_cow_block(struct btrfs_trans
WARN_ON(btrfs_header_generation(buf)  trans-transid);
if (btrfs_header_generation(buf) != trans-transid) {
different_trans = 1;
-   ret = btrfs_inc_ref(trans, root, buf);
+   ret = btrfs_inc_ref(trans, root, buf, 1);
if (ret)
return ret;
} else {
diff -r eb4767aa190e ctree.h
--- a/ctree.h   Thu Jul 24 12:25:50 2008 -0400
+++ b/ctree.h   Sat Jul 26 03:47:26 2008 +0800
@@ -592,6 +592,10 @@ struct btrfs_fs_info {
u64 last_alloc;
u64 last_data_alloc;

+   spinlock_t ref_cache_lock;
+   u64 total_ref_cache_size;
+   u64 running_ref_cache_size;
+
u64 avail_data_alloc_bits;
u64 avail_metadata_alloc_bits;
u64 avail_system_alloc_bits;
@@ -613,6 +617,8 @@ struct btrfs_root {
spinlock_t node_lock;

struct extent_buffer *commit_root;
+   struct btrfs_leaf_ref_tree *ref_tree;
+
struct btrfs_root_item root_item;
struct btrfs_key root_key;
struct btrfs_fs_info *fs_info;
@@ -1430,7 +1436,7 @@ int btrfs_reserve_extent(struct btrfs_tr
  u64 search_end, struct btrfs_key *ins,
  u64 data);
 int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
- struct extent_buffer *buf);
+ struct extent_buffer *buf, int cache_ref);
 int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root
  *root, u64 bytenr, u64 num_bytes,
  u64 root_objectid, u64 ref_generation,
diff -r eb4767aa190e disk-io.c
--- a/disk-io.c Thu Jul 24 12:25:50 2008 -0400
+++ b/disk-io.c Sat Jul 26 03:47:26 2008 +0800
@@ -716,6 +716,7 @@ static int __setup_root(u32 nodesize, u3
root-node = NULL;
root-inode = NULL;
root-commit_root = NULL;
+   root-ref_tree = NULL;
root-sectorsize = sectorsize;
root-nodesize = nodesize;
root-leafsize = leafsize;
@@ -1165,12 +1166,19 @@ static int transaction_kthread(void *arg
vfs_check_frozen(root-fs_info-sb, SB_FREEZE_WRITE);
mutex_lock(root-fs_info-transaction_kthread_mutex);

+   printk(btrfs: total reference cache size %Lu\n,
+   root-fs_info-total_ref_cache_size);
+
mutex_lock(root-fs_info-trans_mutex);
cur = root-fs_info-running_transaction;
if (!cur) {
mutex_unlock(root-fs_info-trans_mutex);
goto sleep;
}
+
+   printk(btrfs: running reference cache size %Lu\n,
+   root-fs_info-running_ref_cache_size);
+
now = get_seconds();
if (now  cur-start_time || now - cur-start_time  30) {
mutex_unlock(root-fs_info-trans_mutex);
@@ -1233,6 +1241,7 @@ struct btrfs_root *open_ctree(struct sup
spin_lock_init(fs_info-hash_lock);
spin_lock_init(fs_info-delalloc_lock);
spin_lock_init(fs_info-new_trans_lock);
+   spin_lock_init(fs_info-ref_cache_lock);

init_completion(fs_info-kobj_unregister);
fs_info-tree_root = tree_root;
@@ -1699,6 +1708,11 @@ int close_ctree(struct btrfs_root *root)
printk(btrfs: at unmount delalloc count %Lu\n,
   fs_info-delalloc_bytes);
}
+   if (fs_info-total_ref_cache_size) {
+   printk(btrfs: at umount reference cache size %Lu\n,
+   fs_info-total_ref_cache_size);
+   }
+   
if (fs_info-extent_root-node)
free_extent_buffer(fs_info-extent_root-node);

diff -r eb4767aa190e extent-tree.c
--- 

Re: [PATCH] initial version of reference cache

2008-07-25 Thread Chris Mason
On Fri, 2008-07-25 at 14:29 -0500, Yan Zheng wrote:
 Hello,
 
 This is the initial version of leaf reference cache. The cache stores leaf 
 node's extent references in memory, this can improve the performance of 
 snapshot dropping. Outlines of this patch are (1) allocate struct dirty_root 
 when starting transaction (2) put reference cache in struct dirty_root (3) 
 cache extent references when tree leaves are cow'ed (4) when dropping 
 snapshot, use cached references directly to avoid reading tree leaf. 
 
 I only can access a notebook currenly, so benchmarking isn't enough. I 
 appreciate any help and comment.
 

I have modified this locally to always cache leaves, even when they
don't have file extents in them.  That way, walk_down_tree will find the
cache and won't have to read the leaf (that doesn't have any extents).

So far, it is working very well.  I did a run with fs_mark to create 58
million files and had very steady numbers.  The unmount took 4 seconds.
It used to take over an hour.

One question, why not use the block number (byte number) as the key to
the rbtree instead of the key?

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html