Honest timeline for btrfsck

2011-08-03 Thread Erik Jensen
The lack of any information on when btrfsck might be ready is a real
headache to those deciding what to do with a corrupted file system.

I am currently sitting on a btrfs array of 10 disks that has been
reporting parent transid verify failed since last November. While
the data on the drive is by no means irreplaceable, it would take a
fair amount of effort. At the time I was told that a btrfsck would
almost certainly be released by the end of the year. In January, it
was finally almost ready, and toward the end of May it was going to
be released in a couple of days
(hopefully).

Had I known back in November 9 months would go by with no such tool, I
would have certainly wiped the array and started over, as it was
certainly not worth the wait. So here I am, several assurances of
imminent release later, still wondering whether it would be better to
wait or cut my losses.

I understand that everyone is working hard, and I deeply appreciate
the effort being put into this filesystem. I'm not looking for an
exact date, just a rough order of magnitude on which to base
decisions.

Thank you very much.

--Erik Jensen
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: use plain page_address() in header fields setget functions

2011-08-03 Thread Li Zefan
We've stopped using highmem for extent buffers.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ctree.h |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 365c4e1..746e6b4 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1415,17 +1415,15 @@ void btrfs_set_##name(struct extent_buffer *eb, type 
*s, u##bits val);
 #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)\
 static inline u##bits btrfs_##name(struct extent_buffer *eb)   \
 {  \
-   type *p = kmap_atomic(eb-first_page, KM_USER0);\
+   type *p = page_address(eb-first_page); \
u##bits res = le##bits##_to_cpu(p-member); \
-   kunmap_atomic(p, KM_USER0); \
return res; \
 }  \
 static inline void btrfs_set_##name(struct extent_buffer *eb,  \
u##bits val)\
 {  \
-   type *p = kmap_atomic(eb-first_page, KM_USER0);\
+   type *p = page_address(eb-first_page); \
p-member = cpu_to_le##bits(val);   \
-   kunmap_atomic(p, KM_USER0); \
 }
 
 #define BTRFS_SETGET_STACK_FUNCS(name, type, member, bits) \
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: rewrite BTRFS_SETGET_FUNCS macro

2011-08-03 Thread Li Zefan
BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and
btrfs_foo() functions, which read and write specific fields in the
extent buffer.

The total number of set/get functions is ~200, but in fact we only
need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64.

It results in redunction of ~22K bytes.

   textdata bss dec hex filename
 52806943281060  533457   823d1 fs/btrfs/btrfs.o.orig
 50599743281060  511385   7cd99 fs/btrfs/btrfs.o

Compared btrfs_set_bits() with btrfs_set_foo(), the extra runtime overhead
is we have to pass one more argument.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ctree.h|   26 +--
 fs/btrfs/struct-funcs.c |  118 ---
 2 files changed, 83 insertions(+), 61 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 746e6b4..fae542e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1406,11 +1406,29 @@ struct btrfs_ioctl_defrag_range_args {
offsetof(type, member), \
   sizeof(((type *)0)-member)))
 
-#ifndef BTRFS_SETGET_FUNCS
+#define DECLARE_BTRFS_SETGET_BITS(bits)
\
+u##bits btrfs_get_u##bits(struct extent_buffer *eb, void *ptr, \
+ unsigned long off);   \
+void btrfs_set_u##bits(struct extent_buffer *eb, void *ptr,\
+  unsigned long off, u##bits val)
+
+DECLARE_BTRFS_SETGET_BITS(8);
+DECLARE_BTRFS_SETGET_BITS(16);
+DECLARE_BTRFS_SETGET_BITS(32);
+DECLARE_BTRFS_SETGET_BITS(64);
+
 #define BTRFS_SETGET_FUNCS(name, type, member, bits)   \
-u##bits btrfs_##name(struct extent_buffer *eb, type *s);   \
-void btrfs_set_##name(struct extent_buffer *eb, type *s, u##bits val);
-#endif
+static inline u##bits btrfs_##name(struct extent_buffer *eb, type *s)  \
+{  \
+   BUILD_BUG_ON(sizeof(u##bits) != sizeof(((type *)0)-member));   \
+   return btrfs_get_u##bits(eb, s, offsetof(type, member));\
+}  \
+static inline void btrfs_set_##name(struct extent_buffer *eb, type *s, \
+   u##bits val)\
+{  \
+   BUILD_BUG_ON(sizeof(u##bits) != sizeof(((type *)0))-member);   \
+   btrfs_set_u##bits(eb, s, offsetof(type, member), val);  \
+}
 
 #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)\
 static inline u##bits btrfs_##name(struct extent_buffer *eb)   \
diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c
index bc1f6ad..9f76745 100644
--- a/fs/btrfs/struct-funcs.c
+++ b/fs/btrfs/struct-funcs.c
@@ -17,80 +17,84 @@
  */
 
 #include linux/highmem.h
+#include asm/unaligned.h
 
-/* this is some deeply nasty code.  ctree.h has a different
- * definition for this BTRFS_SETGET_FUNCS macro, behind a #ifndef
- *
- * The end result is that anyone who #includes ctree.h gets a
- * declaration for the btrfs_set_foo functions and btrfs_foo functions
- *
- * This file declares the macros and then #includes ctree.h, which results
- * in cpp creating the function here based on the template below.
- *
+#include ctree.h
+
+/*
  * These setget functions do all the extent_buffer related mapping
  * required to efficiently read and write specific fields in the extent
  * buffers.  Every pointer to metadata items in btrfs is really just
  * an unsigned long offset into the extent buffer which has been
  * cast to a specific type.  This gives us all the gcc type checking.
  *
- * The extent buffer api is used to do all the kmapping and page
- * spanning work required to get extent buffers in highmem and have
- * a metadata blocksize different from the page size.
+ * The extent buffer api is used to do the page spanning work required
+ * to have a metadata blocksize different from the page size.
  *
  * The macro starts with a simple function prototype declaration so that
  * sparse won't complain about it being static.
  */
 
-#define BTRFS_SETGET_FUNCS(name, type, member, bits)   \
-u##bits btrfs_##name(struct extent_buffer *eb, type *s);   \
-void btrfs_set_##name(struct extent_buffer *eb, type *s, u##bits val); \
-u##bits btrfs_##name(struct extent_buffer *eb, \
-  type *s) \
+#define DEFINE_BTRFS_SETGET_BITS(bits) \
+u##bits btrfs_get_u##bits(struct extent_buffer *eb, void *ptr, \
+ unsigned long off);   \
+void btrfs_set_u##bits(struct extent_buffer *eb, void *ptr,\
+  unsigned long off, u##bits 

[PATCH] Btrfs: fix byte order issue in free space cache

2011-08-03 Thread Li Zefan
We should convert the generation number to little endian before saving
it to disk.

We've just changed to use the normal checksumming infrastructure for
free space cache, so it's the perfect time to fix this bug.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/free-space-cache.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 6377713..9277d65 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -325,7 +325,7 @@ int __load_free_space_cache(struct btrfs_root *root, struct 
inode *inode,
addr = kmap(page);
 
if (index == 0) {
-   u64 *gen;
+   u64 gen;
 
/*
 * We put a bogus crc in the front of the first page in
@@ -335,11 +335,11 @@ int __load_free_space_cache(struct btrfs_root *root, 
struct inode *inode,
addr += sizeof(u64);
offset += sizeof(u64);
 
-   gen = addr;
-   if (*gen != BTRFS_I(inode)-generation) {
+   gen = le64_to_cpu(*(__le64 *)addr);
+   if (gen != BTRFS_I(inode)-generation) {
printk(KERN_ERR btrfs: space cache generation
(%llu) does not match inode (%llu)\n,
-  (unsigned long long)*gen,
+  (unsigned long long)gen,
   (unsigned long long)
   BTRFS_I(inode)-generation);
kunmap(page);
@@ -636,7 +636,7 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct 
inode *inode,
 
orig = addr = kmap(page);
if (index == 0) {
-   u64 *gen;
+   __le64 *gen;
 
/*
 * We're going to put in a bogus crc for this page to
@@ -647,7 +647,7 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct 
inode *inode,
offset += sizeof(u64);
 
gen = addr;
-   *gen = trans-transid;
+   *gen = cpu_to_le64(trans-transid);
addr += sizeof(u64);
offset += sizeof(u64);
}
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Honest timeline for btrfsck

2011-08-03 Thread Jan Schmidt


On 03.08.2011 08:57, Erik Jensen wrote:
 Had I known back in November 9 months would go by with no such tool, I
 would have certainly wiped the array and started over, as it was
 certainly not worth the wait. So here I am, several assurances of
 imminent release later, still wondering whether it would be better to
 wait or cut my losses.

If you want to try a patch that might give you read-only access to your
data, have a look at that one:

 Date: Thu, 23 Jun 2011 15:54:09 -0400
 From: Josef Bacik jo...@redhat.com
 To:   Chris Mason chris.ma...@oracle.com
 Cc:   Andrej Podzimek and...@podzimek.org,
   Josef Bacik jo...@redhat.com,
   linux-btrfs linux-btrfs@vger.kernel.org
 Subject: Re: parent transid verify failures on 2.6.39
 Message-ID: 20110623195409.ga21...@dhcp231-156.rdu.redhat.com

-Jan

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


WARNING: at fs/btrfs/extent-tree.c:5703

2011-08-03 Thread Tsutomu Itoh
I ran subvol  balance test script at current for-linus branch, I got
following warning messages.

Thanks,
Tsutomu


Aug  3 17:54:01 luna kernel: [21310.079308] [ cut here ]
Aug  3 17:54:01 luna kernel: [21310.079326] WARNING: at 
fs/btrfs/extent-tree.c:5703 btrfs_alloc_free_block+0xc4/0x286 [btrfs]()
Aug  3 17:54:01 luna kernel: [21310.079329] Hardware name: PRIMERGY
Aug  3 17:54:01 luna kernel: [21310.079331] Modules linked in: btrfs 
zlib_deflate crc32c libcrc32c autofs4 sunrpc 8021q garp stp llc 
cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 ext3 jbd dm_mirror 
dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr 
i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug 
i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom 
megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last 
unloaded: microcode]
Aug  3 17:54:01 luna kernel: [21310.079374] Pid: 28048, comm: btrfs-freespace 
Tainted: GW   3.0.0test2+ #1
Aug  3 17:54:01 luna kernel: [21310.079377] Call Trace:
Aug  3 17:54:01 luna kernel: [21310.079383]  [81045426] 
warn_slowpath_common+0x85/0x9d
Aug  3 17:54:01 luna kernel: [21310.079387]  [81045458] 
warn_slowpath_null+0x1a/0x1c
Aug  3 17:54:01 luna kernel: [21310.079401]  [a037d63a] 
btrfs_alloc_free_block+0xc4/0x286 [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079414]  [a0370ebc] 
split_leaf+0x2d2/0x52a [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079427]  [a036e4be] ? 
btrfs_leaf_free_space+0x3a/0x7e [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079439]  [a037224c] 
btrfs_search_slot+0x558/0x5fc [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079455]  [a037fa42] 
btrfs_csum_file_blocks+0x20b/0x54f [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079473]  [a038a430] 
add_pending_csums+0x3b/0x57 [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079491]  [a0391a63] 
btrfs_finish_ordered_io+0x239/0x2bb [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079509]  [a0391b44] 
btrfs_writepage_end_io_hook+0x5f/0x7a [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079528]  [a03a05e2] 
end_bio_extent_writepage+0xae/0x159 [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079545]  [a0383a23] ? 
end_workqueue_fn+0x106/0x120 [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079549]  [811370e7] 
bio_endio+0x2d/0x2f
Aug  3 17:54:01 luna kernel: [21310.079564]  [a0383a2e] 
end_workqueue_fn+0x111/0x120 [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079582]  [a03a8abf] 
worker_loop+0x18a/0x4bb [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079600]  [a03a8935] ? 
btrfs_queue_worker+0x224/0x224 [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079618]  [a03a8935] ? 
btrfs_queue_worker+0x224/0x224 [btrfs]
Aug  3 17:54:01 luna kernel: [21310.079621]  [810608ac] 
kthread+0x82/0x8a
Aug  3 17:54:01 luna kernel: [21310.079625]  [813ac2a4] 
kernel_thread_helper+0x4/0x10
Aug  3 17:54:01 luna kernel: [21310.079629]  [8106082a] ? 
kthread_worker_fn+0x14a/0x14a
Aug  3 17:54:01 luna kernel: [21310.079633]  [813ac2a0] ? 
gs_change+0x13/0x13
Aug  3 17:54:01 luna kernel: [21310.079635] ---[ end trace f6966ebbfde87a2f ]---


[fs/btrfs/extent-tree.c]
5699 ret = block_rsv_use_bytes(block_rsv, blocksize);
5700 if (!ret)
5701 return block_rsv;
5702 if (ret) {
5703 WARN_ON(1);
5704 ret = reserve_metadata_bytes(trans, root, block_rsv, 
blocksize,
5705  0);
5706 if (!ret) {


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: check if there is enough space for balancing smarter

2011-08-03 Thread Liu Bo
When checking if there is enough space for balancing a block group,
since we do not take raid types into consideration, we do not account
corrent amounts of space that we needed.  This makes us do some extra
work before we get ENOSPC.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c |   41 +++--
 1 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3213c39..02216cc 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6682,6 +6682,10 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
bytenr)
struct btrfs_space_info *space_info;
struct btrfs_fs_devices *fs_devices = root-fs_info-fs_devices;
struct btrfs_device *device;
+   u64 min_free;
+   int index;
+   int dev_nr = 0;
+   int dev_min = 1;
int full = 0;
int ret = 0;
 
@@ -6691,8 +6695,10 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
bytenr)
if (!block_group)
return -1;
 
+   min_free = btrfs_block_group_used(block_group-item);
+
/* no bytes used, we're good */
-   if (!btrfs_block_group_used(block_group-item))
+   if (!min_free)
goto out;
 
space_info = block_group-space_info;
@@ -6708,10 +6714,9 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
bytenr)
 * all of the extents from this block group.  If we can, we're good
 */
if ((space_info-total_bytes != block_group-key.offset) 
-  (space_info-bytes_used + space_info-bytes_reserved +
-   space_info-bytes_pinned + space_info-bytes_readonly +
-   btrfs_block_group_used(block_group-item) 
-   space_info-total_bytes)) {
+   (space_info-bytes_used + space_info-bytes_reserved +
+space_info-bytes_pinned + space_info-bytes_readonly +
+min_free  space_info-total_bytes)) {
spin_unlock(space_info-lock);
goto out;
}
@@ -6728,9 +6733,29 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
bytenr)
if (full)
goto out;
 
+   /*
+* index:
+*  0: raid10
+*  1: raid1
+*  2: dup
+*  3: raid0
+*  4: single
+*/
+   index = get_block_group_index(block_group);
+   if (index == 0) {
+   dev_min = 4;
+   min_free /= 2;
+   } else if (index == 1) {
+   dev_min = 2;
+   } else if (index == 2) {
+   min_free *= 2;
+   } else if (index == 3) {
+   dev_min = fs_devices-rw_devices;
+   min_free /= dev_min;
+   }
+
mutex_lock(root-fs_info-chunk_mutex);
list_for_each_entry(device, fs_devices-alloc_list, dev_alloc_list) {
-   u64 min_free = btrfs_block_group_used(block_group-item);
u64 dev_offset;
 
/*
@@ -6741,7 +6766,11 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
bytenr)
ret = find_free_dev_extent(NULL, device, min_free,
   dev_offset, NULL);
if (!ret)
+   dev_nr++;
+
+   if (dev_nr = dev_min)
break;
+
ret = -1;
}
}
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: disk-io.c:416: find_and_setup_root: Assertion `!(!root-node)' failed.

2011-08-03 Thread Dave
OK so I have recovered all of my data.  This was sort of a nerve wrecking
experience.  I'll share what I've done in case others are experiencing the same
problem (I've seen other threads appear complaining of the same assertion which
draw no response).

So, I filled open_ctree_fd with printf statements to find exactly where it was
failing.  I found (per my previous mail to this list) that the assertion was
happening in the following call:

ret = find_and_setup_root(tree_root, fs_info, BTRFS_CSUM_TREE_OBJECTID, 
csum_root);

I also found that the fs would mount read-only on an older kernel but 85% of the
files read reported I/O errors.  It looks like the b-tree which stores checksums
was broken.  The breakage is likely high up on the tree and thus affects most,
but not all files.  Trying to determine how to get btrfs to ignore checksums
lead me here:

http://kerneltrap.org/mailarchive/linux-btrfs/2010/2/25/6806053/thread#mid-6806053

So I grabbed a copy of 2.6.32.10 and patched compression.c and inode.c.  I'm now
able to read ALL of the data when mounting read-only.

This whole process has left a bit of bad taste in my mouth.  A checksum tree
seems like a great way to add fault tolerance but in this case it was another
point of failure, rendering perfectly uncorrupted data unaccessible.  I suppose
this would have to be something a proper fsck would have to contend with.  My
questions for the developers are:

1. Would repairing or rebuilding a broken checksum tree be a trivial task for a
functional fsck?

2. Does a mount option which ignores the checksum tree altogether make sense?
Strictly for recovery purposes of course.  Not everyone is inclined to hack the
kernel to get access to their data.

Either way I've kept the dump of the broken filesystem.  If fsck ever makes it
out of development purgatory I'll definitely be running it against this as a
test case.  I saw an email to this list earlier today asking about the status of
fsck.  It seems like an it would be reasonable to know approximately when
something will be released to the public.  Not asking for a specific day, more
like which quarter of which year.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] btrfs send and receive

2011-08-03 Thread Jan Schmidt
On 02.08.2011 19:42, Goffredo Baroncelli wrote:
 Furthermore, receiving should not need kernel support at all (except for
 an optional interface to create a file with a certain inode, we'll see).
 Thus, replicating metadata corruptions should be very unlikely.
 
 I think that for receiving we can have three level, which may represent three 
 level in the develop:
 
 1) we store the information as a pax|tar|git|... file format. Then is the 
 user 
 that can expand this file when needed. I think that in case of backup this is 
 more useful than having a full filesystem. No help from kernel required.
 
 2) we expand the stream in files; so the final results would be a filesystem.

How would you test your stream from 1) if you can't unpack it?

 2.1) as above but preserving the inode number (small help from kernel 
 required, may be file-system independent also)

I would skip that and add it as an extention, later.

 2.2) as above but preserving the COW properties: if we update an already 
 snapshotted file, btrfs store the original one and the modified data. The 
 same 
 would be in the destination filesystem: if exists the previous file snapshot, 
 in the filesystem is COW-ed the file updating only the new data. (help from 
 kernel side. I don't know if it is possible to adapt this strategy for other 
 filesystem than BTRFS)

Again, I'd rather gather those information (possibly with help from the
kernel) when generating the stream. This is what I answered and tried to
explain by example in my mail yesterday. Please tell me which part was
unclear and I'll try to explain better.

With the algorithm outlined yesterday, you don't need any kernel support
when receiving, so it should be adaptable by any filesystem that
supports snapshots.

 3) extracting from the source filesystem the btree structure, and injecting 
 in 
 the btrfs filesystem this structure. I think that this has the best 
 performance, both in terms of CPU-power and in bandwidth. Full kernel support 
 required.

This is like a diff-aware dd, or did I get you wrong? If it is: do you
really think we need it? What for?

 One more thing to add: We have to make sure our stream doesn't get
 corrupted. So if the file format we're choosing does not include it, we
 should keep in mind to add something ourselves.
 
 The best would be using the BTRFS checksum.

Sounds interesting. How would you add a btrfs checksum to a stream file
(no matter what format we'll use)? And how would you verify it?

 I'll try to make a plan how it could be implemented with git, so that we
 have something we can compare.
 
 I suggest to give a look to the fast-import/export format, which is de 
 facto 
 standard about sharing information between the new CVS system.

Thanks for the hint, I will include that in my considerations.

 In terms of transmitting snapshot details, I always assumed we would
 need a snapshot tool that added extra metadata about parent
 relationships on the snapshots.  I didn't want to enforce this in the
 metadata on disk, but I have no problems with saying the send/receive
 tool requires extra metadata to tell us about parents.

 Oh, right. That's something that might not only need kernel support for
 send to determine a parent, but also a new key representing a
 snapshot's parent relationship information.
 
 I think that this information already exists. In fact every snapshot has a 
 reference to the original data, on the basis of which it is possible to 
 obtain 
 the snapshot's parent relationship information.

How can that be done? I don't see such a link.

 However we need to be sure that when we send the delta between two snapshot 
 to the receiver side, the receiver side:
 1) has a copy of the previous snapshot
 2) this copy is in sync to the original one
 
 I think (please Chris confirm that) that we can check this with the subvolume 
 id and the generation-no of every snapshot, which should be unique.

uuid + generation was my suggestion as well, should be unique, yes.

-Jan

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Applications using fsync cause hangs for several seconds every few minutes

2011-08-03 Thread mck
On Mon, 2011-07-18 at 14:17 -0400, Josef Bacik wrote:
 I've been looking into this and I have a suspicion.  Would you run
 with this patch and see if the problem goes away? 

Didn't help me.

2.6.39 is not usable. 3.0.0 is ok for a few hours then too becomes
unusable. This is discussed in future threads, eg Btrfs slowdown.

dumbing out fsync (libeatmydata) only gives marginal improvements...

~mck

-- 
Bombing for peace is like F***ing for virginity | www.semb.wever.org |
www.sesat.no | tech.finn.no | http://xss-http-filter.sf.net


signature.asc
Description: This is a digitally signed message part


Re: Btrfs slowdown

2011-08-03 Thread mck

I can confirm this as well (64-bit, Core i7, single-disk).

 The issue seems to be gone in 3.0.0.

After a few hours working 3.0.0 slows down on me too. The performance
becomes unusable and a reboot is a must. Certain applications
(particularly evolution ad firefox) are next to permanently greyed out.

I have had a couple of corrupted tree logs recently and had to use
btrfs-zero-log (mentioned in an earlier thread). Otherwise returning to
2.6.38 is the workaround.

~mck

-- 
A mind that has been stretched will never return to it's original
dimension. Albert Einstein 
| www.semb.wever.org | www.sesat.no 
| http://tech.finn.no | http://xss-http-filter.sf.net



signature.asc
Description: This is a digitally signed message part


[RFC, crash][PATCH] btrfs: allow cross-subvolume file clone

2011-08-03 Thread David Sterba
Hi,

I'm working on a patch to fix cross-volume cloning, worked for simple cases
like cloning a single file. When I cloned a full linux-2.6 tree there was a
immediate BUG_ON (after third cloned file) in btrfs_delayed_update_inode
with -ENOSPC :

[  925.546266] [ cut here ]
[  925.549921] kernel BUG at fs/btrfs/delayed-inode.c:1693!
[  925.549921] invalid opcode:  [#1] SMP
[  925.549921] CPU 0
[  925.549921] Modules linked in: btrfs
[  925.549921]
[  925.549921] Pid: 31167, comm: clone-file Not tainted 3.0.0-default+ #98 
Intel Corporation Santa Rosa platform/Matanzas
[  925.549921] RIP: 0010:[a00790e0]  [a00790e0] 
btrfs_delayed_update_inode+0x2e0/0x2f0 [btrfs]
[  925.549921] RSP: 0018:88004f229be8  EFLAGS: 00010286
[  925.549921] RAX: ffe4 RBX: 880048392c70 RCX: 00018000
[  925.549921] RDX: 1b1a RSI: 0001 RDI: 88007a6f8420
[  925.549921] RBP: 88004f229c28 R08: 0004 R09: 
[  925.549921] R10:  R11:  R12: 880048393bf8
[  925.549921] R13: 880048392cb8 R14: 880050ff3540 R15: 88005294
[  925.549921] FS:  7fbf18b23700() GS:88007dc0() 
knlGS:
[  925.549921] CS:  0010 DS:  ES:  CR0: 8005003b
[  925.549921] CR2: 7fbcc68ba000 CR3: 4b4a8000 CR4: 06f0
[  925.549921] DR0:  DR1:  DR2: 
[  925.549921] DR3:  DR6: 0ff0 DR7: 0400
[  925.549921] Process clone-file (pid: 31167, threadinfo 88004f228000, 
task 88004b4e5140)
[  925.549921] Stack:
[  925.549921]  880048f7ddc0 00018000 88004f229c38 
880048393bf8
[  925.549921]  880050ff3540 880048393bf8 880051a900a0 
88005294
[  925.549921]  88004f229c78 a0034633 88004f229c58 
a005f08b
[  925.549921] Call Trace:
[  925.549921]  [a0034633] btrfs_update_inode+0x53/0x160 [btrfs]
[  925.549921]  [a005f08b] ? btrfs_tree_unlock+0x6b/0xa0 [btrfs]
[  925.549921]  [a005b0ba] btrfs_ioctl_clone+0xa0a/0xcc0 [btrfs]
[  925.549921]  [81168c81] ? __do_fault+0x4a1/0x590
[  925.549921]  [810daa1d] ? lock_release_holdtime+0x3d/0x1c0
[  925.549921]  [81b8dc20] ? do_page_fault+0x2d0/0x580
[  925.549921]  [a005dfcb] btrfs_ioctl+0x2db/0xda0 [btrfs]
[  925.549921]  [81b8dc20] ? do_page_fault+0x2d0/0x580
[  925.549921]  [810e1467] ? debug_check_no_locks_freed+0x177/0x180
[  925.549921]  [811863c5] ? kmem_cache_free+0xb5/0x1b0
[  925.549921]  [811a5db8] do_vfs_ioctl+0x98/0x570
[  925.549921]  [8119476d] ? fget_light+0x2fd/0x3c0
[  925.549921]  [811a62df] sys_ioctl+0x4f/0x80
[  925.549921]  [81b92882] system_call_fastpath+0x16/0x1b
[  925.549921] Code: e8 06 00 00 8d 0c 49 48 89 ca 48 89 4d c8 e8 c8 0f fa ff 
85 c0 48 8b 4d c8 75 10 48 89 4b 08 e9 8e fd ff ff 0f 1f 80 00 00 00 00 0f 0b 
66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
[  925.549921] RIP  [a00790e0] btrfs_delayed_update_inode+0x2e0/0x2f0 
[btrfs]
[  925.549921]  RSP 88004f229be8
[  925.876182] ---[ end trace 8b4c2031e1394913 ]---

the patch has been applied on top of current linus which contains patches from
both pull requests (ed8f37370d83).

The filesystem consists of 5 devices 23G each, about 100G of usable space,
mkfs.btrfs with defaults. The kernel tree has about 6G:

$ btrfs fi df .
Data, RAID0: total=10.00GB, used=5.55GB
Data: total=8.00MB, used=0.00
System, RAID1: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=1.50GB, used=121.75MB
Metadata: total=8.00MB, used=0.00

$ df -h .
FilesystemSize  Used Avail Use% Mounted on
/dev/sda5 110G  5.8G   82G   7% /mnt/sda5

ie. plenty of free space.

It's possible that I've omitted some important bits in the patch itself, or
this exposes a bug of ENOSPC or delayed-inode.

david
---

From: David Sterba dste...@suse.cz

Lift the EXDEV condition and allow different root trees for files being
cloned, then pass source inode's root when searching for extents.

Signed-off-by: David Sterba dste...@suse.cz
---
 fs/btrfs/ioctl.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0b980af..58eb0ef 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2183,7 +2183,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, 
unsigned long srcfd,
goto out_fput;
 
ret = -EXDEV;
-   if (src-i_sb != inode-i_sb || BTRFS_I(src)-root != root)
+   if (src-i_sb != inode-i_sb)
goto out_fput;
 
ret = -ENOMEM;
@@ -2247,13 +2247,14 @@ static noinline long btrfs_ioctl_clone(struct file 
*file, unsigned long srcfd,
 * note the key will change type as we walk through the
   

Re: [PATCH] btrfs: do not allow mounting non-subvolumes via subvol option

2011-08-03 Thread David Sterba
On Fri, Jul 29, 2011 at 07:11:28PM +0200, Goffredo Baroncelli wrote:
 $ btrfs subvol list -p .
 ID 258 parent 5 top level 5 path subvol
 ID 259 parent 5 top level 5 path subvol1
 ID 260 parent 5 top level 5 path default-subvol1
 ID 262 parent 5 top level 5 path p1/p1-snapshot
 ID 263 parent 259 top level 5 path subvol1/subvol1-snap
 
 The problem I see is that this makes a false impression of snapshotting the
 given subvolume but in fact snapshots the default one: a user expects outcome
 
 Not that matter too much, but the old behavior was to snapshot not
 the default one but the one which contains the directory.
 This behavior leaded to a lot of misunderstanding about the btrfs
 capability of snapshot subvolume __only__.
 
 Only one question, what happens now if an user pass subvol=dir ?

$ mount /dev/sda5 /mnt/sda5
$ cd sda5
$ mkdir p
$ cd ..
$ umount sda5
$ mount -o subvol=p /dev/sda5 /mnt/sda5
mount: wrong fs type, bad option, bad superblock on /dev/sda5,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so

and dmesg says:
[ 7285.905195] device fsid 46e97521-a1c7-4509-954f-b32c90bd1d1e devid 1 transid 
10 /dev/sdb5
[ 7285.954435] btrfs: disk space caching is enabled
[ 7286.600155] btrfs: 'p' is not a valid subvolume

There could be a specific error code like ENSUBVOL and mount could be
taught to give better description of what has happened. Otherwise, I took
the approach of being verbose in dmesg.


HTH,
david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, crash][PATCH] btrfs: allow cross-subvolume file clone

2011-08-03 Thread David Sterba
On Wed, Aug 03, 2011 at 08:07:42PM +0200, David Sterba wrote:
 I'm working on a patch to fix cross-volume cloning, worked for simple cases
 like cloning a single file. When I cloned a full linux-2.6 tree there was a
 immediate BUG_ON (after third cloned file) in btrfs_delayed_update_inode
 with -ENOSPC :

oh, a similar issue was already reported on 5 Jul 2011:

[BUG] delayed inodes and reflinks
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/11763 

Jan Schmidt wrote:
 If I get back to a situation where I can reproduce the bug, I'll send
 a follow up.

I do have a reproducer:

$ mkfs.btrfs
$ mount ...
$ btrfs subvol create subvol1
$ btrfs subvol create subvol2
$ cp linux-2.6 subvol1
$ (in subvol1) find linux-2.6 -type d -exec mkdir -p ../subvol2/'{}' \;
$ (in subvol1) find linux-2.6 -type f -exec ./clone-file '{}' ../subvol2/'{}' \;

and this backtrace follows ...

david

 [  925.546266] [ cut here ]
 [  925.549921] kernel BUG at fs/btrfs/delayed-inode.c:1693!
 [  925.549921] invalid opcode:  [#1] SMP
 [  925.549921] CPU 0
 [  925.549921] Modules linked in: btrfs
 [  925.549921]
 [  925.549921] Pid: 31167, comm: clone-file Not tainted 3.0.0-default+ #98 
 Intel Corporation Santa Rosa platform/Matanzas
 [  925.549921] RIP: 0010:[a00790e0]  [a00790e0] 
 btrfs_delayed_update_inode+0x2e0/0x2f0 [btrfs]
 [  925.549921] RSP: 0018:88004f229be8  EFLAGS: 00010286
 [  925.549921] RAX: ffe4 RBX: 880048392c70 RCX: 
 00018000
 [  925.549921] RDX: 1b1a RSI: 0001 RDI: 
 88007a6f8420
 [  925.549921] RBP: 88004f229c28 R08: 0004 R09: 
 
 [  925.549921] R10:  R11:  R12: 
 880048393bf8
 [  925.549921] R13: 880048392cb8 R14: 880050ff3540 R15: 
 88005294
 [  925.549921] FS:  7fbf18b23700() GS:88007dc0() 
 knlGS:
 [  925.549921] CS:  0010 DS:  ES:  CR0: 8005003b
 [  925.549921] CR2: 7fbcc68ba000 CR3: 4b4a8000 CR4: 
 06f0
 [  925.549921] DR0:  DR1:  DR2: 
 
 [  925.549921] DR3:  DR6: 0ff0 DR7: 
 0400
 [  925.549921] Process clone-file (pid: 31167, threadinfo 88004f228000, 
 task 88004b4e5140)
 [  925.549921] Stack:
 [  925.549921]  880048f7ddc0 00018000 88004f229c38 
 880048393bf8
 [  925.549921]  880050ff3540 880048393bf8 880051a900a0 
 88005294
 [  925.549921]  88004f229c78 a0034633 88004f229c58 
 a005f08b
 [  925.549921] Call Trace:
 [  925.549921]  [a0034633] btrfs_update_inode+0x53/0x160 [btrfs]
 [  925.549921]  [a005f08b] ? btrfs_tree_unlock+0x6b/0xa0 [btrfs]
 [  925.549921]  [a005b0ba] btrfs_ioctl_clone+0xa0a/0xcc0 [btrfs]
 [  925.549921]  [81168c81] ? __do_fault+0x4a1/0x590
 [  925.549921]  [810daa1d] ? lock_release_holdtime+0x3d/0x1c0
 [  925.549921]  [81b8dc20] ? do_page_fault+0x2d0/0x580
 [  925.549921]  [a005dfcb] btrfs_ioctl+0x2db/0xda0 [btrfs]
 [  925.549921]  [81b8dc20] ? do_page_fault+0x2d0/0x580
 [  925.549921]  [810e1467] ? debug_check_no_locks_freed+0x177/0x180
 [  925.549921]  [811863c5] ? kmem_cache_free+0xb5/0x1b0
 [  925.549921]  [811a5db8] do_vfs_ioctl+0x98/0x570
 [  925.549921]  [8119476d] ? fget_light+0x2fd/0x3c0
 [  925.549921]  [811a62df] sys_ioctl+0x4f/0x80
 [  925.549921]  [81b92882] system_call_fastpath+0x16/0x1b
 [  925.549921] Code: e8 06 00 00 8d 0c 49 48 89 ca 48 89 4d c8 e8 c8 0f fa ff 
 85 c0 48 8b 4d c8 75 10 48 89 4b 08 e9 8e fd ff ff 0f 1f 80 00 00 00 00 0f 
 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
 [  925.549921] RIP  [a00790e0] 
 btrfs_delayed_update_inode+0x2e0/0x2f0 [btrfs]
 [  925.549921]  RSP 88004f229be8
 [  925.876182] ---[ end trace 8b4c2031e1394913 ]---
 
 the patch has been applied on top of current linus which contains patches from
 both pull requests (ed8f37370d83).
 
 The filesystem consists of 5 devices 23G each, about 100G of usable space,
 mkfs.btrfs with defaults. The kernel tree has about 6G:
 
 $ btrfs fi df .
 Data, RAID0: total=10.00GB, used=5.55GB
 Data: total=8.00MB, used=0.00
 System, RAID1: total=8.00MB, used=4.00KB
 System: total=4.00MB, used=0.00
 Metadata, RAID1: total=1.50GB, used=121.75MB
 Metadata: total=8.00MB, used=0.00
 
 $ df -h .
 FilesystemSize  Used Avail Use% Mounted on
 /dev/sda5 110G  5.8G   82G   7% /mnt/sda5
 
 ie. plenty of free space.
 
 It's possible that I've omitted some important bits in the patch itself, or
 this exposes a bug of ENOSPC or delayed-inode.
 
 david
 ---
 
 From: David Sterba dste...@suse.cz
 
 Lift the EXDEV condition and allow different root trees for files being
 cloned, then pass source inode's root when searching for extents.
 
 

Re: [PATCH] btrfs: do not allow mounting non-subvolumes via subvol option

2011-08-03 Thread David Sterba
On Sat, Jul 30, 2011 at 12:16:44AM +0800, Zhong, Xin wrote:
 I believe I have submit a similar patch months ago:
 http://marc.info/?l=linux-btrfsm=130208585106572w=2

You did! I was not aware of that. I believe adding a helper make things
more clear (if it were used all over the code).

 Hope it can be integrated this time, :-).

mehopes too,
david

 
 
  -Original Message-
  From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-
  ow...@vger.kernel.org] On Behalf Of David Sterba
  Sent: Friday, July 29, 2011 6:14 PM
  To: linux-btrfs@vger.kernel.org
  Cc: chris.ma...@oracle.com; David Sterba
  Subject: [PATCH] btrfs: do not allow mounting non-subvolumes via subvol
  option
  
  There's a missing test whether the path passed to subvol=path option
  during mount is a real subvolume, allowing any directory located in
  default subovlume to be passed and accepted for mount.
  
  (current btrfs progs prevent this early)
  $ btrfs subvol snapshot . p1-snap
  ERROR: '.' is not a subvolume
  
  (with is subvolume? test bypassed)
  $ btrfs subvol snapshot . p1-snap
  Create a snapshot of '.' in './p1-snap'
  
  $ btrfs subvol list -p .
  ID 258 parent 5 top level 5 path subvol
  ID 259 parent 5 top level 5 path subvol1
  ID 260 parent 5 top level 5 path default-subvol1
  ID 262 parent 5 top level 5 path p1/p1-snapshot
  ID 263 parent 259 top level 5 path subvol1/subvol1-snap
  
  The problem I see is that this makes a false impression of snapshotting
  the
  given subvolume but in fact snapshots the default one: a user expects
  outcome
  like ID 263 but in fact gets ID 262 .
  
  This patch makes mount fail with EINVAL with a message in syslog.
  
  Signed-off-by: David Sterba dste...@suse.cz
  ---
  
  I did not find a better errno than EINVAL, probably adding someting
  like
  ENSUBVOL would be better so that other filesystems with such
  functionality may
  use it in future.
  
   fs/btrfs/super.c |   19 +++
   1 files changed, 19 insertions(+), 0 deletions(-)
  
  diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
  index 15634d4..0c2a1d1 100644
  --- a/fs/btrfs/super.c
  +++ b/fs/btrfs/super.c
  @@ -753,6 +753,15 @@ static int btrfs_set_super(struct super_block *s,
  void *data)
  return set_anon_super(s, data);
   }
  
  +/*
  + * subvolumes are identified by ino 256
  + */
  +static inline int is_subvolume_inode(struct inode *inode)
  +{
  +   if (inode  inode-i_ino == BTRFS_FIRST_FREE_OBJECTID)
  +   return 1;
  +   return 0;
  +}
  
   /*
* Find a superblock for the given device / mount point.
  @@ -873,6 +882,16 @@ static struct dentry *btrfs_mount(struct
  file_system_type *fs_type, int flags,
  error = -ENXIO;
  goto error_free_subvol_name;
  }
  +
  +   if (!is_subvolume_inode(new_root-d_inode)) {
  +   dput(root);
  +   dput(new_root);
  +   deactivate_locked_super(s);
  +   error = -EINVAL;
  +   printk(KERN_ERR btrfs: '%s' is not a valid
  subvolume\n,
  +   subvol_name);
  +   goto error_free_subvol_name;
  +   }
  dput(root);
  root = new_root;
  } else {
  --
  1.7.6
  
  --
  To unsubscribe from this list: send the line unsubscribe linux-btrfs
  in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: do not allow mounting non-subvolumes via subvol option

2011-08-03 Thread C Anthony Risinger
On Wed, Aug 3, 2011 at 1:47 PM, David Sterba d...@jikos.cz wrote:
 On Sat, Jul 30, 2011 at 12:16:44AM +0800, Zhong, Xin wrote:
 I believe I have submit a similar patch months ago:
 http://marc.info/?l=linux-btrfsm=130208585106572w=2

 You did! I was not aware of that. I believe adding a helper make things
 more clear (if it were used all over the code).

 Hope it can be integrated this time, :-).

 mehopes too

i corrupted an FS after doing this back in Nov of last year (though i
was also --bind mounting it after the fact)

http://marc.info/?l=linux-btrfsm=129091436915724w=2

...and a patch proposed:

http://marc.info/?l=linux-btrfsm=129091815217860w=2

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS partition won't mount

2011-08-03 Thread Adam Newby
Hello all,

I recently had a power failure and can no longer mount my /home directory. The 
harddrive has two BTRFS partitions: sda7(/) and sda8(/home). The / partition 
loads up just fine, but /home does not. I've tried btrfsck as shown below and 
I've included dmesg pertaining to btrfs. This is on ArchLinux and the software 
versions are as follows:
btrfs-progs-unstable 0.19.20101006-1
linux 3.0

I actually have quite a bit of data on this partition that I would rather not 
lose. Please help!

Thanks,
Adam

btrfsck /dev/sda8:
found 152764059648 bytes used err is 1
total csum bytes: 148756860
total tree bytes: 436686848
total fs tree bytes: 160870400
btree space waste bytes: 110424811
file data blocks allocated: 4925582483456
 referenced 114231959552
Btrfs Btrfs v0.19
root 5 inode 738980 errors 400
root 5 inode 771936 errors 400
root 5 inode 771937 errors 400
root 5 inode 771938 errors 400
root 5 inode 771939 errors 400
root 5 inode 771941 errors 400
root 5 inode 771942 errors 400

dmesg |grep btrfs
[   15.148281] btrfs: unlinked 13 orphans
[   27.156006] kernel BUG at fs/btrfs/inode.c:4586!
[   27.156124] Modules linked in: vboxnetflt vboxdrv snd_hda_codec_hdmi joydev 
usbhid hid snd_hda_codec_idt snd_hda_intel snd_hda_codec sdhci_pci sdhci 
psmouse snd_pcm snd_hwdep btusb firewire_ohci bluetooth crc16 dell_wmi sg 
nvidia(P) firewire_core arc4 snd_timer evdev serio_raw sparse_keymap i2c_i801 
iwlagn mmc_core snd battery video soundcore intel_ips wmi ppdev dell_laptop 
parport_pc container button ac mac80211 cfg80211 pcspkr rfkill parport dcdbas 
iTCO_wdt i2c_core crc_itu_t iTCO_vendor_support intel_agp snd_page_alloc 
processor intel_gtt e1000e btrfs zlib_deflate crc32c libcrc32c ext2 mbcache 
ehci_hcd usbcore sr_mod cdrom sd_mod ahci libahci libata scsi_mod
[   27.157096] RIP: 0010:[a010b781]  [a010b781] 
btrfs_add_link+0x161/0x1c0 [btrfs]
[   27.158176]  [a013482f] add_inode_ref+0x30f/0x3d0 [btrfs]
[   27.158245]  [a013567b] replay_one_buffer+0x2bb/0x3b0 [btrfs]
[   27.158318]  [a0122f27] ? alloc_extent_buffer+0x87/0x3d0 [btrfs]
[   27.158391]  [a0132cd1] walk_down_log_tree+0x391/0x540 [btrfs]
[   27.160851]  [a0132f7d] walk_log_tree+0xfd/0x270 [btrfs]
[   27.165784]  [a0136d11] btrfs_recover_log_trees+0x211/0x300 [btrfs]
[   27.168297]  [a01353c0] ? replay_one_dir_item+0xe0/0xe0 [btrfs]
[   27.170832]  [a00fd907] open_ctree+0x13e7/0x17a0 [btrfs]
[   27.178506]  [a00d879e] btrfs_mount+0x40e/0x5c0 [btrfs]
[   27.212137] RIP  [a010b781] btrfs_add_link+0x161/0x1c0 [btrfs]
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS partition won't mount

2011-08-03 Thread Hugo Mills
On Wed, Aug 03, 2011 at 04:46:01PM -0400, Adam Newby wrote:
 I recently had a power failure and can no longer mount my /home
 directory. The harddrive has two BTRFS partitions: sda7(/) and
 sda8(/home). The / partition loads up just fine, but /home does
 not. I've tried btrfsck as shown below and I've included dmesg
 pertaining to btrfs. This is on ArchLinux and the software versions
 are as follows:
 btrfs-progs-unstable 0.19.20101006-1
 linux 3.0

   Try the instructions on the wiki at [1]. (And please feed back
and/or fix any issues you have with the instructions -- they're still
quite new and probably have awkward corners).

 I actually have quite a bit of data on this partition that I would
 rather not lose. Please help!

   You *really* need to think about making good backups (otherwise
we'll have to set the cwillu on you).

   Hugo.

[1] 
https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_can.27t_mount_my_filesystem.2C_and_I_get_a_kernel_oops.21

 Thanks,
 Adam
 
 btrfsck /dev/sda8:
 found 152764059648 bytes used err is 1
 total csum bytes: 148756860
 total tree bytes: 436686848
 total fs tree bytes: 160870400
 btree space waste bytes: 110424811
 file data blocks allocated: 4925582483456
  referenced 114231959552
 Btrfs Btrfs v0.19
 root 5 inode 738980 errors 400
 root 5 inode 771936 errors 400
 root 5 inode 771937 errors 400
 root 5 inode 771938 errors 400
 root 5 inode 771939 errors 400
 root 5 inode 771941 errors 400
 root 5 inode 771942 errors 400
 
 dmesg |grep btrfs
 [   15.148281] btrfs: unlinked 13 orphans
 [   27.156006] kernel BUG at fs/btrfs/inode.c:4586!
 [   27.156124] Modules linked in: vboxnetflt vboxdrv snd_hda_codec_hdmi 
 joydev usbhid hid snd_hda_codec_idt snd_hda_intel snd_hda_codec sdhci_pci 
 sdhci psmouse snd_pcm snd_hwdep btusb firewire_ohci bluetooth crc16 dell_wmi 
 sg nvidia(P) firewire_core arc4 snd_timer evdev serio_raw sparse_keymap 
 i2c_i801 iwlagn mmc_core snd battery video soundcore intel_ips wmi ppdev 
 dell_laptop parport_pc container button ac mac80211 cfg80211 pcspkr rfkill 
 parport dcdbas iTCO_wdt i2c_core crc_itu_t iTCO_vendor_support intel_agp 
 snd_page_alloc processor intel_gtt e1000e btrfs zlib_deflate crc32c libcrc32c 
 ext2 mbcache ehci_hcd usbcore sr_mod cdrom sd_mod ahci libahci libata scsi_mod
 [   27.157096] RIP: 0010:[a010b781]  [a010b781] 
 btrfs_add_link+0x161/0x1c0 [btrfs]
 [   27.158176]  [a013482f] add_inode_ref+0x30f/0x3d0 [btrfs]
 [   27.158245]  [a013567b] replay_one_buffer+0x2bb/0x3b0 [btrfs]
 [   27.158318]  [a0122f27] ? alloc_extent_buffer+0x87/0x3d0 [btrfs]
 [   27.158391]  [a0132cd1] walk_down_log_tree+0x391/0x540 [btrfs]
 [   27.160851]  [a0132f7d] walk_log_tree+0xfd/0x270 [btrfs]
 [   27.165784]  [a0136d11] btrfs_recover_log_trees+0x211/0x300 
 [btrfs]
 [   27.168297]  [a01353c0] ? replay_one_dir_item+0xe0/0xe0 [btrfs]
 [   27.170832]  [a00fd907] open_ctree+0x13e7/0x17a0 [btrfs]
 [   27.178506]  [a00d879e] btrfs_mount+0x40e/0x5c0 [btrfs]
 [   27.212137] RIP  [a010b781] btrfs_add_link+0x161/0x1c0 [btrfs]

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- You got very nice eyes, Deedee. Never noticed them ---   
   before. They real?   


signature.asc
Description: Digital signature


Re: Honest timeline for btrfsck

2011-08-03 Thread Chris Mason
Excerpts from Erik Jensen's message of 2011-08-03 02:57:24 -0400:
 The lack of any information on when btrfsck might be ready is a real
 headache to those deciding what to do with a corrupted file system.
 
 I am currently sitting on a btrfs array of 10 disks that has been
 reporting parent transid verify failed since last November. While
 the data on the drive is by no means irreplaceable, it would take a
 fair amount of effort. At the time I was told that a btrfsck would
 almost certainly be released by the end of the year. In January, it
 was finally almost ready, and toward the end of May it was going to
 be released in a couple of days
 (hopefully).
 
 Had I known back in November 9 months would go by with no such tool, I
 would have certainly wiped the array and started over, as it was
 certainly not worth the wait. So here I am, several assurances of
 imminent release later, still wondering whether it would be better to
 wait or cut my losses.
 
 I understand that everyone is working hard, and I deeply appreciate
 the effort being put into this filesystem. I'm not looking for an
 exact date, just a rough order of magnitude on which to base
 decisions.

This part is definitely my fault.  I've gone through a bunch of
variations on bigger and smaller tools, and had to juggle the kernel
maintenance as well.

Aside from making sure the kernel code is stable, btrfsck is all I'm
working on right now.  I do expect a release in the next two weeks that
can recover your data (and many others).

Thanks,
Chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hot rb_next, setup_cluster_no_bitmap

2011-08-03 Thread Simon Kirby
Hi!

Since upgrading from 2.6.35+bits to 2.6.38 and then more recently to 3.0,
our big btrfs backup box with 20 * 3 TB AOE-attached btrfs volumes
started showing more CPU usage and backups were no longer completing in a
day. I tried Linus HEAD from yesterday merged with btrfs for-linus (same
as Linus HEAD as of today), and things are better again, but perf top
output still looks pretty interesting after a night of rsync running:

 samples  pcnt function   DSO
 ___ _ __ __

13537.00 59.2% rb_next[kernel]
 3539.00 15.5% _raw_spin_lock [kernel]
 1668.00  7.3% setup_cluster_no_bitmap[kernel]
  799.00  3.5% tree_search_offset [kernel]
  476.00  2.1% fill_window[kernel]
  370.00  1.6% find_free_extent   [kernel]
  238.00  1.0% longest_match  [kernel]
  128.00  0.6% build_tree [kernel]
   95.00  0.4% pqdownheap [kernel]
   79.00  0.3% chksum_update  [kernel]
   72.00  0.3% btrfs_find_space_cluster   [kernel]
   65.00  0.3% deflate_fast   [kernel]
   61.00  0.3% memcpy [kernel]

With call-graphs enabled:

- 50.24%  btrfs-transacti  [kernel.kallsyms]  [k] rb_next
   - rb_next
  - 97.36% setup_cluster_no_bitmap
   btrfs_find_space_cluster
   find_free_extent
   btrfs_reserve_extent
   btrfs_alloc_free_block
   __btrfs_cow_block
 + btrfs_cow_block
  - 2.29% btrfs_find_space_cluster
   find_free_extent
   btrfs_reserve_extent
   btrfs_alloc_free_block
   __btrfs_cow_block
   btrfs_cow_block
 - btrfs_search_slot
- 56.96% lookup_inline_extent_backref
   - 97.23% __btrfs_free_extent
run_clustered_refs
  - btrfs_run_delayed_refs
 - 91.23% btrfs_commit_transaction
  transaction_kthread
  kthread
  kernel_thread_helper
 - 8.77% btrfs_write_dirty_block_groups
  commit_cowonly_roots
  btrfs_commit_transaction
  transaction_kthread
  kthread
  kernel_thread_helper
   - 2.77% insert_inline_extent_backref
__btrfs_inc_extent_ref
run_clustered_refs
btrfs_run_delayed_refs
btrfs_commit_transaction
transaction_kthread
kthread
kernel_thread_helper
- 41.03% btrfs_insert_empty_items
   - 99.89% run_clustered_refs
  - btrfs_run_delayed_refs
 + 89.93% btrfs_commit_transaction
 + 10.07% btrfs_write_dirty_block_groups
+ 1.87% btrfs_write_dirty_block_groups
-  7.41%  btrfs-transacti  [kernel.kallsyms]  [k] setup_cluster_no_bitmap
   + setup_cluster_no_bitmap
+  4.34%rsync  [kernel.kallsyms]  [k] _raw_spin_lock
+  3.68%rsync  [kernel.kallsyms]  [k] rb_next
+  3.09%  btrfs-transacti  [kernel.kallsyms]  [k] tree_search_offset
+  1.40%  btrfs-delalloc-  [kernel.kallsyms]  [k] fill_window
+  1.31%  btrfs-transacti  [kernel.kallsyms]  [k] _raw_spin_lock
+  1.19%  btrfs-delalloc-  [kernel.kallsyms]  [k] longest_match
+  1.18%  btrfs-delalloc-  [kernel.kallsyms]  [k] deflate_fast
+  1.09%  btrfs-transacti  [kernel.kallsyms]  [k] find_free_extent
+  0.90%  btrfs-delalloc-  [kernel.kallsyms]  [k] pqdownheap
+  0.67%  btrfs-delalloc-  [kernel.kallsyms]  [k] compress_block
+  0.66%  btrfs-delalloc-  [kernel.kallsyms]  [k] build_tree
+  0.61%rsync  [kernel.kallsyms]  [k] page_fault

rb_next() from setup_cluster_no_bitmap() is very hot. From the
annotated assembly output, it looks like the while (window_free =
min_bytes) loop is where the CPU is spending most of the time.

A few thoughts:

Shouldn't (window_free = min_bytes) be (window_free  min_bytes)?

I'm not really up to speed with SMP memory caching behaviour, but I'm
thinking the constant list creation of bitmap entries from the shared
free_space_cache objects might be helping bounce around these pages
between CPUs, which is why instructions that deference the object
pointers always seem to be cache misses...Or there's just too much of
this stuff in memory for it to fit in cache. Top of slabtop -sc:

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
1760061 1706286 96%   0.97K  53351   33   1707232K nfs_inode_cache
1623423 1617242 99%   0.95K  49279   33   1576928K 

Re: Hot rb_next, setup_cluster_no_bitmap

2011-08-03 Thread Simon Kirby
On Wed, Aug 03, 2011 at 03:06:55PM -0700, Simon Kirby wrote:

 I see Josef's 86d4a77ba3dc4ace238a0556541a41df2bd71d49 introduced the
 bitmaps list. I could try temporarily reverting this (some fixups needed)
 if anybody thinks my cache bouncing idea might be slightly possible.

I'll try the attached and see how the profile changes.

Simon-
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 6377713..99582f9 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2148,7 +2148,7 @@ again:
 static noinline int
 setup_cluster_no_bitmap(struct btrfs_block_group_cache *block_group,
 			struct btrfs_free_cluster *cluster,
-			struct list_head *bitmaps, u64 offset, u64 bytes,
+			u64 offset, u64 bytes,
 			u64 min_bytes)
 {
 	struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
@@ -2171,8 +2171,6 @@ setup_cluster_no_bitmap(struct btrfs_block_group_cache *block_group,
 	 * extent entry.
 	 */
 	while (entry-bitmap) {
-		if (list_empty(entry-list))
-			list_add_tail(entry-list, bitmaps);
 		node = rb_next(entry-offset_index);
 		if (!node)
 			return -ENOSPC;
@@ -2192,11 +2190,8 @@ setup_cluster_no_bitmap(struct btrfs_block_group_cache *block_group,
 			return -ENOSPC;
 		entry = rb_entry(node, struct btrfs_free_space, offset_index);
 
-		if (entry-bitmap) {
-			if (list_empty(entry-list))
-list_add_tail(entry-list, bitmaps);
+		if (entry-bitmap)
 			continue;
-		}
 
 		/*
 		 * we haven't filled the empty size and the window is
@@ -2252,7 +2247,7 @@ setup_cluster_no_bitmap(struct btrfs_block_group_cache *block_group,
 static noinline int
 setup_cluster_bitmap(struct btrfs_block_group_cache *block_group,
 		 struct btrfs_free_cluster *cluster,
-		 struct list_head *bitmaps, u64 offset, u64 bytes,
+		 u64 offset, u64 bytes,
 		 u64 min_bytes)
 {
 	struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
@@ -2263,39 +2258,10 @@ setup_cluster_bitmap(struct btrfs_block_group_cache *block_group,
 	if (ctl-total_bitmaps == 0)
 		return -ENOSPC;
 
-	/*
-	 * First check our cached list of bitmaps and see if there is an entry
-	 * here that will work.
-	 */
-	list_for_each_entry(entry, bitmaps, list) {
-		if (entry-bytes  min_bytes)
-			continue;
-		ret = btrfs_bitmap_cluster(block_group, entry, cluster, offset,
-	   bytes, min_bytes);
-		if (!ret)
-			return 0;
-	}
-
-	/*
-	 * If we do have entries on our list and we are here then we didn't find
-	 * anything, so go ahead and get the next entry after the last entry in
-	 * this list and start the search from there.
-	 */
-	if (!list_empty(bitmaps)) {
-		entry = list_entry(bitmaps-prev, struct btrfs_free_space,
-   list);
-		node = rb_next(entry-offset_index);
-		if (!node)
-			return -ENOSPC;
-		entry = rb_entry(node, struct btrfs_free_space, offset_index);
-		goto search;
-	}
-
 	entry = tree_search_offset(ctl, offset_to_bitmap(ctl, offset), 0, 1);
 	if (!entry)
 		return -ENOSPC;
 
-search:
 	node = entry-offset_index;
 	do {
 		entry = rb_entry(node, struct btrfs_free_space, offset_index);
@@ -2326,8 +2292,6 @@ int btrfs_find_space_cluster(struct btrfs_trans_handle *trans,
 			 u64 offset, u64 bytes, u64 empty_size)
 {
 	struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
-	struct list_head bitmaps;
-	struct btrfs_free_space *entry, *tmp;
 	u64 min_bytes;
 	int ret;
 
@@ -2366,17 +2330,12 @@ int btrfs_find_space_cluster(struct btrfs_trans_handle *trans,
 		goto out;
 	}
 
-	INIT_LIST_HEAD(bitmaps);
-	ret = setup_cluster_no_bitmap(block_group, cluster, bitmaps, offset,
+	ret = setup_cluster_no_bitmap(block_group, cluster, offset,
   bytes, min_bytes);
 	if (ret)
-		ret = setup_cluster_bitmap(block_group, cluster, bitmaps,
+		ret = setup_cluster_bitmap(block_group, cluster,
 	   offset, bytes, min_bytes);
 
-	/* Clear our temporary list */
-	list_for_each_entry_safe(entry, tmp, bitmaps, list)
-		list_del_init(entry-list);
-
 	if (!ret) {
 		atomic_inc(block_group-count);
 		list_add_tail(cluster-block_group_list,


Re: Hot rb_next, setup_cluster_no_bitmap

2011-08-03 Thread Simon Kirby
On Wed, Aug 03, 2011 at 03:39:49PM -0700, Simon Kirby wrote:

 On Wed, Aug 03, 2011 at 03:06:55PM -0700, Simon Kirby wrote:
 
  I see Josef's 86d4a77ba3dc4ace238a0556541a41df2bd71d49 introduced the
  bitmaps list. I could try temporarily reverting this (some fixups needed)
  if anybody thinks my cache bouncing idea might be slightly possible.
 
 I'll try the attached and see how the profile changes.

Hmm, I bound the SMP affinity of all of the btrfs processes to one CPU,
and the page dirtying rate got slower, so I suspect the writes aren't
really a big deal, and the problem is just that there is way too much
walking going on after rsync has ran for a while and loads everything
into memory.

Any ideas?

Simon-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, crash][PATCH] btrfs: allow cross-subvolume file clone

2011-08-03 Thread Miao Xie
On Wed, 3 Aug 2011 20:07:42 +0200, David Sterba wrote:
 I'm working on a patch to fix cross-volume cloning, worked for simple cases
 like cloning a single file. When I cloned a full linux-2.6 tree there was a
 immediate BUG_ON (after third cloned file) in btrfs_delayed_update_inode
 with -ENOSPC :
 
 [  925.546266] [ cut here ]
 [  925.549921] kernel BUG at fs/btrfs/delayed-inode.c:1693!
 [  925.549921] invalid opcode:  [#1] SMP
 [  925.549921] CPU 0
 [  925.549921] Modules linked in: btrfs
 [  925.549921]
 [  925.549921] Pid: 31167, comm: clone-file Not tainted 3.0.0-default+ #98 
 Intel Corporation Santa Rosa platform/Matanzas
 [  925.549921] RIP: 0010:[a00790e0]  [a00790e0] 
 btrfs_delayed_update_inode+0x2e0/0x2f0 [btrfs]
 [  925.549921] RSP: 0018:88004f229be8  EFLAGS: 00010286
 [  925.549921] RAX: ffe4 RBX: 880048392c70 RCX: 
 00018000
 [  925.549921] RDX: 1b1a RSI: 0001 RDI: 
 88007a6f8420
 [  925.549921] RBP: 88004f229c28 R08: 0004 R09: 
 
 [  925.549921] R10:  R11:  R12: 
 880048393bf8
 [  925.549921] R13: 880048392cb8 R14: 880050ff3540 R15: 
 88005294
 [  925.549921] FS:  7fbf18b23700() GS:88007dc0() 
 knlGS:
 [  925.549921] CS:  0010 DS:  ES:  CR0: 8005003b
 [  925.549921] CR2: 7fbcc68ba000 CR3: 4b4a8000 CR4: 
 06f0
 [  925.549921] DR0:  DR1:  DR2: 
 
 [  925.549921] DR3:  DR6: 0ff0 DR7: 
 0400
 [  925.549921] Process clone-file (pid: 31167, threadinfo 88004f228000, 
 task 88004b4e5140)
 [  925.549921] Stack:
 [  925.549921]  880048f7ddc0 00018000 88004f229c38 
 880048393bf8
 [  925.549921]  880050ff3540 880048393bf8 880051a900a0 
 88005294
 [  925.549921]  88004f229c78 a0034633 88004f229c58 
 a005f08b
 [  925.549921] Call Trace:
 [  925.549921]  [a0034633] btrfs_update_inode+0x53/0x160 [btrfs]
 [  925.549921]  [a005f08b] ? btrfs_tree_unlock+0x6b/0xa0 [btrfs]
 [  925.549921]  [a005b0ba] btrfs_ioctl_clone+0xa0a/0xcc0 [btrfs]
 [  925.549921]  [81168c81] ? __do_fault+0x4a1/0x590
 [  925.549921]  [810daa1d] ? lock_release_holdtime+0x3d/0x1c0
 [  925.549921]  [81b8dc20] ? do_page_fault+0x2d0/0x580
 [  925.549921]  [a005dfcb] btrfs_ioctl+0x2db/0xda0 [btrfs]
 [  925.549921]  [81b8dc20] ? do_page_fault+0x2d0/0x580
 [  925.549921]  [810e1467] ? debug_check_no_locks_freed+0x177/0x180
 [  925.549921]  [811863c5] ? kmem_cache_free+0xb5/0x1b0
 [  925.549921]  [811a5db8] do_vfs_ioctl+0x98/0x570
 [  925.549921]  [8119476d] ? fget_light+0x2fd/0x3c0
 [  925.549921]  [811a62df] sys_ioctl+0x4f/0x80
 [  925.549921]  [81b92882] system_call_fastpath+0x16/0x1b
 [  925.549921] Code: e8 06 00 00 8d 0c 49 48 89 ca 48 89 4d c8 e8 c8 0f fa ff 
 85 c0 48 8b 4d c8 75 10 48 89 4b 08 e9 8e fd ff ff 0f 1f 80 00 00 00 00 0f 
 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
 [  925.549921] RIP  [a00790e0] 
 btrfs_delayed_update_inode+0x2e0/0x2f0 [btrfs]
 [  925.549921]  RSP 88004f229be8
 [  925.876182] ---[ end trace 8b4c2031e1394913 ]---
 
 the patch has been applied on top of current linus which contains patches from
 both pull requests (ed8f37370d83).

I think it is because the caller didn't reserve enough space.Could you try to
apply the following patch? It might fix this bug.

[PATCH v2] Btrfs: reserve enough space for file clone
http://marc.info/?l=linux-btrfsm=131192686626576w=2

Thanks
Miao

 
 The filesystem consists of 5 devices 23G each, about 100G of usable space,
 mkfs.btrfs with defaults. The kernel tree has about 6G:
 
 $ btrfs fi df .
 Data, RAID0: total=10.00GB, used=5.55GB
 Data: total=8.00MB, used=0.00
 System, RAID1: total=8.00MB, used=4.00KB
 System: total=4.00MB, used=0.00
 Metadata, RAID1: total=1.50GB, used=121.75MB
 Metadata: total=8.00MB, used=0.00
 
 $ df -h .
 FilesystemSize  Used Avail Use% Mounted on
 /dev/sda5 110G  5.8G   82G   7% /mnt/sda5
 
 ie. plenty of free space.
 
 It's possible that I've omitted some important bits in the patch itself, or
 this exposes a bug of ENOSPC or delayed-inode.
 
 david
 ---
 
 From: David Sterba dste...@suse.cz
 
 Lift the EXDEV condition and allow different root trees for files being
 cloned, then pass source inode's root when searching for extents.
 
 Signed-off-by: David Sterba dste...@suse.cz
 ---
  fs/btrfs/ioctl.c |7 ---
  1 files changed, 4 insertions(+), 3 deletions(-)
 
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index 0b980af..58eb0ef 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -2183,7 +2183,7 @@ static noinline long btrfs_ioctl_clone(struct file 
 *file, 

Re: Hot rb_next, setup_cluster_no_bitmap

2011-08-03 Thread Chris Mason
Excerpts from Simon Kirby's message of 2011-08-03 19:10:59 -0400:
 On Wed, Aug 03, 2011 at 03:39:49PM -0700, Simon Kirby wrote:
 
  On Wed, Aug 03, 2011 at 03:06:55PM -0700, Simon Kirby wrote:
  
   I see Josef's 86d4a77ba3dc4ace238a0556541a41df2bd71d49 introduced the
   bitmaps list. I could try temporarily reverting this (some fixups needed)
   if anybody thinks my cache bouncing idea might be slightly possible.
  
  I'll try the attached and see how the profile changes.
 
 Hmm, I bound the SMP affinity of all of the btrfs processes to one CPU,
 and the page dirtying rate got slower, so I suspect the writes aren't
 really a big deal, and the problem is just that there is way too much
 walking going on after rsync has ran for a while and loads everything
 into memory.
 
 Any ideas?

The current for-linus branch gets rid of all the bottlenecks in the
metadata blocks.  So now we're stuck with the bottlenecks in the
allocator.

There are a few simple things we can do here but Josef has a patch that
fixes delalloc reservations for inline extents that might help as a
first step.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hot rb_next, setup_cluster_no_bitmap

2011-08-03 Thread Simon Kirby
Perhaps as a further clue as to what is going on, on this same backup box
after all of the rsyncs are finished/killed and a good amount of time has
passed (no cleaner processes running in the background or anything),
sync is still consistently takes ~4 minutes to run, and pushes out a
lot to disk every time it is run. Example:

echo 3  /proc/sys/vm/drop_caches
sync
echo 3  /proc/sys/vm/drop_caches
sync
vmstat 1 
time sync

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  0 68 15656528   3660 25823200   783   673   72   33  3 30 13 54
 0  0 68 15656612   3660 25823200 0 0  439  152  0  0 100  0
 0  0 68 15656488   3660 25823200 0 0  395   87  0  0 100  0
 0  0 68 15656488   3660 25823200 0 0  309   89  0  0 100  0
 0  0 68 15655992   3660 25823200 0 0  450  128  0  0 100  0
 0  0 68 15655984   3660 25823200 0 0  446  159  0  1 99  0
 0  0 68 15655860   3660 25823200 0 0  448  105  0  0 100  0
 0  0 68 15655720   3660 25827600 0 0  824  238  1  2 98  0
 1  0 68 15651388   3660 25827600 0 0  621  150  2  0 98  0
 0  0 68 15655860   3660 25827600 044  886  236  2  1 98  0
 0  0 68 15656604   3660 25827600 0 0  544  161  0  1 99  0
 0  0 68 15656612   3660 25827600 0 0  616  328  0  1 99  0
sync started here
 0  2 68 15655764   3660 25842000   328   607 1777 1450  0  1 74 25
 0  1 68 15654648   3660 25996800   752 0 1978 1519  0  1 66 33
 0  1 68 15654028   3660 26052000   616 0 1498 1186  0  0 75 25
 0  1 68 15653556   3660 26099200   220  1545 1878 1937  0  1 75 24
 1  0 68 15652288   3660 26254000   392  1976 2990 3072  0  4 75 21
 1  0 68 15652252   3660 26249600 0 0 2848  164  0 27 73  0
 1  0 68 15652260   3660 26249600 0 0 2586   86  0 26 74  0
 1  0 68 15652260   3660 26249600 0 0 2591  148  0 25 75  0
 1  0 68 15652136   3660 26249600 0 0 2544   98  0 24 76  0
 1  0 68 15652136   3660 26249600 0 0 2518   75  0 26 74  0
 1  0 68 15652136   3660 26249600 0 0 2676  105  0 25 75  0
 1  0 68 15652136   3660 26249600 0 0 2531   83  0 25 75  0
 1  0 68 15652136   3660 26249600 0 0 2595   81  0 25 75  0
 1  0 68 15652136   3660 26249600 0 0 2570   89  0 25 75  0
 1  0 68 15652136   3660 26249600 0 0 2539   76  0 25 75  0
 1  0 68 15652004   3660 26254000 0 0 2914  166  0 25 74  0
 1  0 68 15652012   3660 26254000 0 0 2596   87  0 25 75  0
 1  0 68 15652012   3660 26254000 0 0 2591   82  0 25 75  0
 1  0 68 15652012   3660 26254000 0 0 2607   90  0 25 75  0
 1  0 68 15652012   3660 26254000 0 0 2535   89  0 25 75  0
 1  0 68 15652012   3660 26254000 0 0 2629  109  0 26 74  0
 1  0 68 15652012   3660 26254000 0 0 2549   81  0 25 75  0
 1  0 68 15652012   3660 26254000 0 0 2757  230  0 26 73  0
 1  0 68 15652012   3660 26254000 0 0 2571  105  0 24 76  0
 1  0 68 15652012   3660 26254000 0 0 2568   96  0 26 74  0
 1  0 68 15651880   3660 26258400 0 0 2930  173  0 28 72  0
 1  0 68 15651888   3660 26258400 0 0 2564   79  0 26 74  0
 1  0 68 15651888   3660 26258400 0 0 2594   84  0 22 78  0
 1  0 68 15651888   3660 26258400 0 0 2568   96  0 25 75  0
 1  0 68 15651888   3660 26258400 0 0 2578   91  0 25 75  0
 1  0 68 15651888   3660 26258400 0 0 2660  104  0 26 74  0
 1  0 68 15651888   3660 26258400 0 0 2537   84  0 25 75  0
 1  0 68 15651888   3660 26258400 0 0 2553   82  0 25 75  0
 1  0 68 15651888   3660 26258400 039 2808  204  0 26 74  0
 1  0 68 15651888   3660 26258400 0 0 2573   91  0 25 75  0
 1  0 68 15651756   3660 26262800 044 2868  153  1 29 71  0
 1  0 68 15651764   3660 26262800 0 0 2569   79  0 23 76  0
 1  0 68 15651764   3660 26262800 0 0 2587   79  0 27 73  0
 1  0 68 15651764   3660 26262800 0 0 2509   73  0 23 77  0
 1  0 68 15651764   3660 26262800 0 0 2520   81  0 25 75  0
 1  0 68 15651764   3660 26262800 0 0 2664   97  0 25 75  0
 1  0 68 15651740   3660 26262800   112 0 2680  146  0 25 75  0
 1  0 68 15651640   3660 26274000 0 0 2627   79  0 

Re: [RFC, crash][PATCH] btrfs: allow cross-subvolume file clone

2011-08-03 Thread Li Zefan
David Sterba wrote:
 On Wed, Aug 03, 2011 at 08:07:42PM +0200, David Sterba wrote:
 I'm working on a patch to fix cross-volume cloning, worked for simple cases
 like cloning a single file. When I cloned a full linux-2.6 tree there was a
 immediate BUG_ON (after third cloned file) in btrfs_delayed_update_inode
 with -ENOSPC :
 
 oh, a similar issue was already reported on 5 Jul 2011:
 
 [BUG] delayed inodes and reflinks
 http://permalink.gmane.org/gmane.comp.file-systems.btrfs/11763 
 

We've got four reports on this bug.

The cause is we didn't reserve enough space when starting a transaction.

We need space for:

1. btrfs_insert_empty_item()
2. btrfs_update_inode()
3. btrfs_drop_extents()

The first 2 are easy, but drop_extents is not, we have to calc the space
needed for drop_extents in worst case.

--
Li Zefan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html