Honest timeline for btrfsck
The lack of any information on when btrfsck might be ready is a real headache to those deciding what to do with a corrupted file system. I am currently sitting on a btrfs array of 10 disks that has been reporting parent transid verify failed since last November. While the data on the drive is by no means irreplaceable, it would take a fair amount of effort. At the time I was told that a btrfsck would almost certainly be released by the end of the year. In January, it was finally almost ready, and toward the end of May it was going to be released in a couple of days (hopefully). Had I known back in November 9 months would go by with no such tool, I would have certainly wiped the array and started over, as it was certainly not worth the wait. So here I am, several assurances of imminent release later, still wondering whether it would be better to wait or cut my losses. I understand that everyone is working hard, and I deeply appreciate the effort being put into this filesystem. I'm not looking for an exact date, just a rough order of magnitude on which to base decisions. Thank you very much. --Erik Jensen -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: use plain page_address() in header fields setget functions
We've stopped using highmem for extent buffers. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ctree.h |6 ++ 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 365c4e1..746e6b4 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1415,17 +1415,15 @@ void btrfs_set_##name(struct extent_buffer *eb, type *s, u##bits val); #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)\ static inline u##bits btrfs_##name(struct extent_buffer *eb) \ { \ - type *p = kmap_atomic(eb-first_page, KM_USER0);\ + type *p = page_address(eb-first_page); \ u##bits res = le##bits##_to_cpu(p-member); \ - kunmap_atomic(p, KM_USER0); \ return res; \ } \ static inline void btrfs_set_##name(struct extent_buffer *eb, \ u##bits val)\ { \ - type *p = kmap_atomic(eb-first_page, KM_USER0);\ + type *p = page_address(eb-first_page); \ p-member = cpu_to_le##bits(val); \ - kunmap_atomic(p, KM_USER0); \ } #define BTRFS_SETGET_STACK_FUNCS(name, type, member, bits) \ -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: rewrite BTRFS_SETGET_FUNCS macro
BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and btrfs_foo() functions, which read and write specific fields in the extent buffer. The total number of set/get functions is ~200, but in fact we only need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64. It results in redunction of ~22K bytes. textdata bss dec hex filename 52806943281060 533457 823d1 fs/btrfs/btrfs.o.orig 50599743281060 511385 7cd99 fs/btrfs/btrfs.o Compared btrfs_set_bits() with btrfs_set_foo(), the extra runtime overhead is we have to pass one more argument. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ctree.h| 26 +-- fs/btrfs/struct-funcs.c | 118 --- 2 files changed, 83 insertions(+), 61 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 746e6b4..fae542e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1406,11 +1406,29 @@ struct btrfs_ioctl_defrag_range_args { offsetof(type, member), \ sizeof(((type *)0)-member))) -#ifndef BTRFS_SETGET_FUNCS +#define DECLARE_BTRFS_SETGET_BITS(bits) \ +u##bits btrfs_get_u##bits(struct extent_buffer *eb, void *ptr, \ + unsigned long off); \ +void btrfs_set_u##bits(struct extent_buffer *eb, void *ptr,\ + unsigned long off, u##bits val) + +DECLARE_BTRFS_SETGET_BITS(8); +DECLARE_BTRFS_SETGET_BITS(16); +DECLARE_BTRFS_SETGET_BITS(32); +DECLARE_BTRFS_SETGET_BITS(64); + #define BTRFS_SETGET_FUNCS(name, type, member, bits) \ -u##bits btrfs_##name(struct extent_buffer *eb, type *s); \ -void btrfs_set_##name(struct extent_buffer *eb, type *s, u##bits val); -#endif +static inline u##bits btrfs_##name(struct extent_buffer *eb, type *s) \ +{ \ + BUILD_BUG_ON(sizeof(u##bits) != sizeof(((type *)0)-member)); \ + return btrfs_get_u##bits(eb, s, offsetof(type, member));\ +} \ +static inline void btrfs_set_##name(struct extent_buffer *eb, type *s, \ + u##bits val)\ +{ \ + BUILD_BUG_ON(sizeof(u##bits) != sizeof(((type *)0))-member); \ + btrfs_set_u##bits(eb, s, offsetof(type, member), val); \ +} #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)\ static inline u##bits btrfs_##name(struct extent_buffer *eb) \ diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c index bc1f6ad..9f76745 100644 --- a/fs/btrfs/struct-funcs.c +++ b/fs/btrfs/struct-funcs.c @@ -17,80 +17,84 @@ */ #include linux/highmem.h +#include asm/unaligned.h -/* this is some deeply nasty code. ctree.h has a different - * definition for this BTRFS_SETGET_FUNCS macro, behind a #ifndef - * - * The end result is that anyone who #includes ctree.h gets a - * declaration for the btrfs_set_foo functions and btrfs_foo functions - * - * This file declares the macros and then #includes ctree.h, which results - * in cpp creating the function here based on the template below. - * +#include ctree.h + +/* * These setget functions do all the extent_buffer related mapping * required to efficiently read and write specific fields in the extent * buffers. Every pointer to metadata items in btrfs is really just * an unsigned long offset into the extent buffer which has been * cast to a specific type. This gives us all the gcc type checking. * - * The extent buffer api is used to do all the kmapping and page - * spanning work required to get extent buffers in highmem and have - * a metadata blocksize different from the page size. + * The extent buffer api is used to do the page spanning work required + * to have a metadata blocksize different from the page size. * * The macro starts with a simple function prototype declaration so that * sparse won't complain about it being static. */ -#define BTRFS_SETGET_FUNCS(name, type, member, bits) \ -u##bits btrfs_##name(struct extent_buffer *eb, type *s); \ -void btrfs_set_##name(struct extent_buffer *eb, type *s, u##bits val); \ -u##bits btrfs_##name(struct extent_buffer *eb, \ - type *s) \ +#define DEFINE_BTRFS_SETGET_BITS(bits) \ +u##bits btrfs_get_u##bits(struct extent_buffer *eb, void *ptr, \ + unsigned long off); \ +void btrfs_set_u##bits(struct extent_buffer *eb, void *ptr,\ + unsigned long off, u##bits
[PATCH] Btrfs: fix byte order issue in free space cache
We should convert the generation number to little endian before saving it to disk. We've just changed to use the normal checksumming infrastructure for free space cache, so it's the perfect time to fix this bug. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/free-space-cache.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 6377713..9277d65 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -325,7 +325,7 @@ int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, addr = kmap(page); if (index == 0) { - u64 *gen; + u64 gen; /* * We put a bogus crc in the front of the first page in @@ -335,11 +335,11 @@ int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, addr += sizeof(u64); offset += sizeof(u64); - gen = addr; - if (*gen != BTRFS_I(inode)-generation) { + gen = le64_to_cpu(*(__le64 *)addr); + if (gen != BTRFS_I(inode)-generation) { printk(KERN_ERR btrfs: space cache generation (%llu) does not match inode (%llu)\n, - (unsigned long long)*gen, + (unsigned long long)gen, (unsigned long long) BTRFS_I(inode)-generation); kunmap(page); @@ -636,7 +636,7 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode, orig = addr = kmap(page); if (index == 0) { - u64 *gen; + __le64 *gen; /* * We're going to put in a bogus crc for this page to @@ -647,7 +647,7 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode, offset += sizeof(u64); gen = addr; - *gen = trans-transid; + *gen = cpu_to_le64(trans-transid); addr += sizeof(u64); offset += sizeof(u64); } -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Honest timeline for btrfsck
On 03.08.2011 08:57, Erik Jensen wrote: Had I known back in November 9 months would go by with no such tool, I would have certainly wiped the array and started over, as it was certainly not worth the wait. So here I am, several assurances of imminent release later, still wondering whether it would be better to wait or cut my losses. If you want to try a patch that might give you read-only access to your data, have a look at that one: Date: Thu, 23 Jun 2011 15:54:09 -0400 From: Josef Bacik jo...@redhat.com To: Chris Mason chris.ma...@oracle.com Cc: Andrej Podzimek and...@podzimek.org, Josef Bacik jo...@redhat.com, linux-btrfs linux-btrfs@vger.kernel.org Subject: Re: parent transid verify failures on 2.6.39 Message-ID: 20110623195409.ga21...@dhcp231-156.rdu.redhat.com -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: at fs/btrfs/extent-tree.c:5703
I ran subvol balance test script at current for-linus branch, I got following warning messages. Thanks, Tsutomu Aug 3 17:54:01 luna kernel: [21310.079308] [ cut here ] Aug 3 17:54:01 luna kernel: [21310.079326] WARNING: at fs/btrfs/extent-tree.c:5703 btrfs_alloc_free_block+0xc4/0x286 [btrfs]() Aug 3 17:54:01 luna kernel: [21310.079329] Hardware name: PRIMERGY Aug 3 17:54:01 luna kernel: [21310.079331] Modules linked in: btrfs zlib_deflate crc32c libcrc32c autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode] Aug 3 17:54:01 luna kernel: [21310.079374] Pid: 28048, comm: btrfs-freespace Tainted: GW 3.0.0test2+ #1 Aug 3 17:54:01 luna kernel: [21310.079377] Call Trace: Aug 3 17:54:01 luna kernel: [21310.079383] [81045426] warn_slowpath_common+0x85/0x9d Aug 3 17:54:01 luna kernel: [21310.079387] [81045458] warn_slowpath_null+0x1a/0x1c Aug 3 17:54:01 luna kernel: [21310.079401] [a037d63a] btrfs_alloc_free_block+0xc4/0x286 [btrfs] Aug 3 17:54:01 luna kernel: [21310.079414] [a0370ebc] split_leaf+0x2d2/0x52a [btrfs] Aug 3 17:54:01 luna kernel: [21310.079427] [a036e4be] ? btrfs_leaf_free_space+0x3a/0x7e [btrfs] Aug 3 17:54:01 luna kernel: [21310.079439] [a037224c] btrfs_search_slot+0x558/0x5fc [btrfs] Aug 3 17:54:01 luna kernel: [21310.079455] [a037fa42] btrfs_csum_file_blocks+0x20b/0x54f [btrfs] Aug 3 17:54:01 luna kernel: [21310.079473] [a038a430] add_pending_csums+0x3b/0x57 [btrfs] Aug 3 17:54:01 luna kernel: [21310.079491] [a0391a63] btrfs_finish_ordered_io+0x239/0x2bb [btrfs] Aug 3 17:54:01 luna kernel: [21310.079509] [a0391b44] btrfs_writepage_end_io_hook+0x5f/0x7a [btrfs] Aug 3 17:54:01 luna kernel: [21310.079528] [a03a05e2] end_bio_extent_writepage+0xae/0x159 [btrfs] Aug 3 17:54:01 luna kernel: [21310.079545] [a0383a23] ? end_workqueue_fn+0x106/0x120 [btrfs] Aug 3 17:54:01 luna kernel: [21310.079549] [811370e7] bio_endio+0x2d/0x2f Aug 3 17:54:01 luna kernel: [21310.079564] [a0383a2e] end_workqueue_fn+0x111/0x120 [btrfs] Aug 3 17:54:01 luna kernel: [21310.079582] [a03a8abf] worker_loop+0x18a/0x4bb [btrfs] Aug 3 17:54:01 luna kernel: [21310.079600] [a03a8935] ? btrfs_queue_worker+0x224/0x224 [btrfs] Aug 3 17:54:01 luna kernel: [21310.079618] [a03a8935] ? btrfs_queue_worker+0x224/0x224 [btrfs] Aug 3 17:54:01 luna kernel: [21310.079621] [810608ac] kthread+0x82/0x8a Aug 3 17:54:01 luna kernel: [21310.079625] [813ac2a4] kernel_thread_helper+0x4/0x10 Aug 3 17:54:01 luna kernel: [21310.079629] [8106082a] ? kthread_worker_fn+0x14a/0x14a Aug 3 17:54:01 luna kernel: [21310.079633] [813ac2a0] ? gs_change+0x13/0x13 Aug 3 17:54:01 luna kernel: [21310.079635] ---[ end trace f6966ebbfde87a2f ]--- [fs/btrfs/extent-tree.c] 5699 ret = block_rsv_use_bytes(block_rsv, blocksize); 5700 if (!ret) 5701 return block_rsv; 5702 if (ret) { 5703 WARN_ON(1); 5704 ret = reserve_metadata_bytes(trans, root, block_rsv, blocksize, 5705 0); 5706 if (!ret) { -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: check if there is enough space for balancing smarter
When checking if there is enough space for balancing a block group, since we do not take raid types into consideration, we do not account corrent amounts of space that we needed. This makes us do some extra work before we get ENOSPC. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/extent-tree.c | 41 +++-- 1 files changed, 35 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3213c39..02216cc 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6682,6 +6682,10 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr) struct btrfs_space_info *space_info; struct btrfs_fs_devices *fs_devices = root-fs_info-fs_devices; struct btrfs_device *device; + u64 min_free; + int index; + int dev_nr = 0; + int dev_min = 1; int full = 0; int ret = 0; @@ -6691,8 +6695,10 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr) if (!block_group) return -1; + min_free = btrfs_block_group_used(block_group-item); + /* no bytes used, we're good */ - if (!btrfs_block_group_used(block_group-item)) + if (!min_free) goto out; space_info = block_group-space_info; @@ -6708,10 +6714,9 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr) * all of the extents from this block group. If we can, we're good */ if ((space_info-total_bytes != block_group-key.offset) - (space_info-bytes_used + space_info-bytes_reserved + - space_info-bytes_pinned + space_info-bytes_readonly + - btrfs_block_group_used(block_group-item) - space_info-total_bytes)) { + (space_info-bytes_used + space_info-bytes_reserved + +space_info-bytes_pinned + space_info-bytes_readonly + +min_free space_info-total_bytes)) { spin_unlock(space_info-lock); goto out; } @@ -6728,9 +6733,29 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr) if (full) goto out; + /* +* index: +* 0: raid10 +* 1: raid1 +* 2: dup +* 3: raid0 +* 4: single +*/ + index = get_block_group_index(block_group); + if (index == 0) { + dev_min = 4; + min_free /= 2; + } else if (index == 1) { + dev_min = 2; + } else if (index == 2) { + min_free *= 2; + } else if (index == 3) { + dev_min = fs_devices-rw_devices; + min_free /= dev_min; + } + mutex_lock(root-fs_info-chunk_mutex); list_for_each_entry(device, fs_devices-alloc_list, dev_alloc_list) { - u64 min_free = btrfs_block_group_used(block_group-item); u64 dev_offset; /* @@ -6741,7 +6766,11 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr) ret = find_free_dev_extent(NULL, device, min_free, dev_offset, NULL); if (!ret) + dev_nr++; + + if (dev_nr = dev_min) break; + ret = -1; } } -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: disk-io.c:416: find_and_setup_root: Assertion `!(!root-node)' failed.
OK so I have recovered all of my data. This was sort of a nerve wrecking experience. I'll share what I've done in case others are experiencing the same problem (I've seen other threads appear complaining of the same assertion which draw no response). So, I filled open_ctree_fd with printf statements to find exactly where it was failing. I found (per my previous mail to this list) that the assertion was happening in the following call: ret = find_and_setup_root(tree_root, fs_info, BTRFS_CSUM_TREE_OBJECTID, csum_root); I also found that the fs would mount read-only on an older kernel but 85% of the files read reported I/O errors. It looks like the b-tree which stores checksums was broken. The breakage is likely high up on the tree and thus affects most, but not all files. Trying to determine how to get btrfs to ignore checksums lead me here: http://kerneltrap.org/mailarchive/linux-btrfs/2010/2/25/6806053/thread#mid-6806053 So I grabbed a copy of 2.6.32.10 and patched compression.c and inode.c. I'm now able to read ALL of the data when mounting read-only. This whole process has left a bit of bad taste in my mouth. A checksum tree seems like a great way to add fault tolerance but in this case it was another point of failure, rendering perfectly uncorrupted data unaccessible. I suppose this would have to be something a proper fsck would have to contend with. My questions for the developers are: 1. Would repairing or rebuilding a broken checksum tree be a trivial task for a functional fsck? 2. Does a mount option which ignores the checksum tree altogether make sense? Strictly for recovery purposes of course. Not everyone is inclined to hack the kernel to get access to their data. Either way I've kept the dump of the broken filesystem. If fsck ever makes it out of development purgatory I'll definitely be running it against this as a test case. I saw an email to this list earlier today asking about the status of fsck. It seems like an it would be reasonable to know approximately when something will be released to the public. Not asking for a specific day, more like which quarter of which year. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] btrfs send and receive
On 02.08.2011 19:42, Goffredo Baroncelli wrote: Furthermore, receiving should not need kernel support at all (except for an optional interface to create a file with a certain inode, we'll see). Thus, replicating metadata corruptions should be very unlikely. I think that for receiving we can have three level, which may represent three level in the develop: 1) we store the information as a pax|tar|git|... file format. Then is the user that can expand this file when needed. I think that in case of backup this is more useful than having a full filesystem. No help from kernel required. 2) we expand the stream in files; so the final results would be a filesystem. How would you test your stream from 1) if you can't unpack it? 2.1) as above but preserving the inode number (small help from kernel required, may be file-system independent also) I would skip that and add it as an extention, later. 2.2) as above but preserving the COW properties: if we update an already snapshotted file, btrfs store the original one and the modified data. The same would be in the destination filesystem: if exists the previous file snapshot, in the filesystem is COW-ed the file updating only the new data. (help from kernel side. I don't know if it is possible to adapt this strategy for other filesystem than BTRFS) Again, I'd rather gather those information (possibly with help from the kernel) when generating the stream. This is what I answered and tried to explain by example in my mail yesterday. Please tell me which part was unclear and I'll try to explain better. With the algorithm outlined yesterday, you don't need any kernel support when receiving, so it should be adaptable by any filesystem that supports snapshots. 3) extracting from the source filesystem the btree structure, and injecting in the btrfs filesystem this structure. I think that this has the best performance, both in terms of CPU-power and in bandwidth. Full kernel support required. This is like a diff-aware dd, or did I get you wrong? If it is: do you really think we need it? What for? One more thing to add: We have to make sure our stream doesn't get corrupted. So if the file format we're choosing does not include it, we should keep in mind to add something ourselves. The best would be using the BTRFS checksum. Sounds interesting. How would you add a btrfs checksum to a stream file (no matter what format we'll use)? And how would you verify it? I'll try to make a plan how it could be implemented with git, so that we have something we can compare. I suggest to give a look to the fast-import/export format, which is de facto standard about sharing information between the new CVS system. Thanks for the hint, I will include that in my considerations. In terms of transmitting snapshot details, I always assumed we would need a snapshot tool that added extra metadata about parent relationships on the snapshots. I didn't want to enforce this in the metadata on disk, but I have no problems with saying the send/receive tool requires extra metadata to tell us about parents. Oh, right. That's something that might not only need kernel support for send to determine a parent, but also a new key representing a snapshot's parent relationship information. I think that this information already exists. In fact every snapshot has a reference to the original data, on the basis of which it is possible to obtain the snapshot's parent relationship information. How can that be done? I don't see such a link. However we need to be sure that when we send the delta between two snapshot to the receiver side, the receiver side: 1) has a copy of the previous snapshot 2) this copy is in sync to the original one I think (please Chris confirm that) that we can check this with the subvolume id and the generation-no of every snapshot, which should be unique. uuid + generation was my suggestion as well, should be unique, yes. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Applications using fsync cause hangs for several seconds every few minutes
On Mon, 2011-07-18 at 14:17 -0400, Josef Bacik wrote: I've been looking into this and I have a suspicion. Would you run with this patch and see if the problem goes away? Didn't help me. 2.6.39 is not usable. 3.0.0 is ok for a few hours then too becomes unusable. This is discussed in future threads, eg Btrfs slowdown. dumbing out fsync (libeatmydata) only gives marginal improvements... ~mck -- Bombing for peace is like F***ing for virginity | www.semb.wever.org | www.sesat.no | tech.finn.no | http://xss-http-filter.sf.net signature.asc Description: This is a digitally signed message part
Re: Btrfs slowdown
I can confirm this as well (64-bit, Core i7, single-disk). The issue seems to be gone in 3.0.0. After a few hours working 3.0.0 slows down on me too. The performance becomes unusable and a reboot is a must. Certain applications (particularly evolution ad firefox) are next to permanently greyed out. I have had a couple of corrupted tree logs recently and had to use btrfs-zero-log (mentioned in an earlier thread). Otherwise returning to 2.6.38 is the workaround. ~mck -- A mind that has been stretched will never return to it's original dimension. Albert Einstein | www.semb.wever.org | www.sesat.no | http://tech.finn.no | http://xss-http-filter.sf.net signature.asc Description: This is a digitally signed message part
[RFC, crash][PATCH] btrfs: allow cross-subvolume file clone
Hi, I'm working on a patch to fix cross-volume cloning, worked for simple cases like cloning a single file. When I cloned a full linux-2.6 tree there was a immediate BUG_ON (after third cloned file) in btrfs_delayed_update_inode with -ENOSPC : [ 925.546266] [ cut here ] [ 925.549921] kernel BUG at fs/btrfs/delayed-inode.c:1693! [ 925.549921] invalid opcode: [#1] SMP [ 925.549921] CPU 0 [ 925.549921] Modules linked in: btrfs [ 925.549921] [ 925.549921] Pid: 31167, comm: clone-file Not tainted 3.0.0-default+ #98 Intel Corporation Santa Rosa platform/Matanzas [ 925.549921] RIP: 0010:[a00790e0] [a00790e0] btrfs_delayed_update_inode+0x2e0/0x2f0 [btrfs] [ 925.549921] RSP: 0018:88004f229be8 EFLAGS: 00010286 [ 925.549921] RAX: ffe4 RBX: 880048392c70 RCX: 00018000 [ 925.549921] RDX: 1b1a RSI: 0001 RDI: 88007a6f8420 [ 925.549921] RBP: 88004f229c28 R08: 0004 R09: [ 925.549921] R10: R11: R12: 880048393bf8 [ 925.549921] R13: 880048392cb8 R14: 880050ff3540 R15: 88005294 [ 925.549921] FS: 7fbf18b23700() GS:88007dc0() knlGS: [ 925.549921] CS: 0010 DS: ES: CR0: 8005003b [ 925.549921] CR2: 7fbcc68ba000 CR3: 4b4a8000 CR4: 06f0 [ 925.549921] DR0: DR1: DR2: [ 925.549921] DR3: DR6: 0ff0 DR7: 0400 [ 925.549921] Process clone-file (pid: 31167, threadinfo 88004f228000, task 88004b4e5140) [ 925.549921] Stack: [ 925.549921] 880048f7ddc0 00018000 88004f229c38 880048393bf8 [ 925.549921] 880050ff3540 880048393bf8 880051a900a0 88005294 [ 925.549921] 88004f229c78 a0034633 88004f229c58 a005f08b [ 925.549921] Call Trace: [ 925.549921] [a0034633] btrfs_update_inode+0x53/0x160 [btrfs] [ 925.549921] [a005f08b] ? btrfs_tree_unlock+0x6b/0xa0 [btrfs] [ 925.549921] [a005b0ba] btrfs_ioctl_clone+0xa0a/0xcc0 [btrfs] [ 925.549921] [81168c81] ? __do_fault+0x4a1/0x590 [ 925.549921] [810daa1d] ? lock_release_holdtime+0x3d/0x1c0 [ 925.549921] [81b8dc20] ? do_page_fault+0x2d0/0x580 [ 925.549921] [a005dfcb] btrfs_ioctl+0x2db/0xda0 [btrfs] [ 925.549921] [81b8dc20] ? do_page_fault+0x2d0/0x580 [ 925.549921] [810e1467] ? debug_check_no_locks_freed+0x177/0x180 [ 925.549921] [811863c5] ? kmem_cache_free+0xb5/0x1b0 [ 925.549921] [811a5db8] do_vfs_ioctl+0x98/0x570 [ 925.549921] [8119476d] ? fget_light+0x2fd/0x3c0 [ 925.549921] [811a62df] sys_ioctl+0x4f/0x80 [ 925.549921] [81b92882] system_call_fastpath+0x16/0x1b [ 925.549921] Code: e8 06 00 00 8d 0c 49 48 89 ca 48 89 4d c8 e8 c8 0f fa ff 85 c0 48 8b 4d c8 75 10 48 89 4b 08 e9 8e fd ff ff 0f 1f 80 00 00 00 00 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 [ 925.549921] RIP [a00790e0] btrfs_delayed_update_inode+0x2e0/0x2f0 [btrfs] [ 925.549921] RSP 88004f229be8 [ 925.876182] ---[ end trace 8b4c2031e1394913 ]--- the patch has been applied on top of current linus which contains patches from both pull requests (ed8f37370d83). The filesystem consists of 5 devices 23G each, about 100G of usable space, mkfs.btrfs with defaults. The kernel tree has about 6G: $ btrfs fi df . Data, RAID0: total=10.00GB, used=5.55GB Data: total=8.00MB, used=0.00 System, RAID1: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=1.50GB, used=121.75MB Metadata: total=8.00MB, used=0.00 $ df -h . FilesystemSize Used Avail Use% Mounted on /dev/sda5 110G 5.8G 82G 7% /mnt/sda5 ie. plenty of free space. It's possible that I've omitted some important bits in the patch itself, or this exposes a bug of ENOSPC or delayed-inode. david --- From: David Sterba dste...@suse.cz Lift the EXDEV condition and allow different root trees for files being cloned, then pass source inode's root when searching for extents. Signed-off-by: David Sterba dste...@suse.cz --- fs/btrfs/ioctl.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0b980af..58eb0ef 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2183,7 +2183,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, goto out_fput; ret = -EXDEV; - if (src-i_sb != inode-i_sb || BTRFS_I(src)-root != root) + if (src-i_sb != inode-i_sb) goto out_fput; ret = -ENOMEM; @@ -2247,13 +2247,14 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, * note the key will change type as we walk through the
Re: [PATCH] btrfs: do not allow mounting non-subvolumes via subvol option
On Fri, Jul 29, 2011 at 07:11:28PM +0200, Goffredo Baroncelli wrote: $ btrfs subvol list -p . ID 258 parent 5 top level 5 path subvol ID 259 parent 5 top level 5 path subvol1 ID 260 parent 5 top level 5 path default-subvol1 ID 262 parent 5 top level 5 path p1/p1-snapshot ID 263 parent 259 top level 5 path subvol1/subvol1-snap The problem I see is that this makes a false impression of snapshotting the given subvolume but in fact snapshots the default one: a user expects outcome Not that matter too much, but the old behavior was to snapshot not the default one but the one which contains the directory. This behavior leaded to a lot of misunderstanding about the btrfs capability of snapshot subvolume __only__. Only one question, what happens now if an user pass subvol=dir ? $ mount /dev/sda5 /mnt/sda5 $ cd sda5 $ mkdir p $ cd .. $ umount sda5 $ mount -o subvol=p /dev/sda5 /mnt/sda5 mount: wrong fs type, bad option, bad superblock on /dev/sda5, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so and dmesg says: [ 7285.905195] device fsid 46e97521-a1c7-4509-954f-b32c90bd1d1e devid 1 transid 10 /dev/sdb5 [ 7285.954435] btrfs: disk space caching is enabled [ 7286.600155] btrfs: 'p' is not a valid subvolume There could be a specific error code like ENSUBVOL and mount could be taught to give better description of what has happened. Otherwise, I took the approach of being verbose in dmesg. HTH, david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, crash][PATCH] btrfs: allow cross-subvolume file clone
On Wed, Aug 03, 2011 at 08:07:42PM +0200, David Sterba wrote: I'm working on a patch to fix cross-volume cloning, worked for simple cases like cloning a single file. When I cloned a full linux-2.6 tree there was a immediate BUG_ON (after third cloned file) in btrfs_delayed_update_inode with -ENOSPC : oh, a similar issue was already reported on 5 Jul 2011: [BUG] delayed inodes and reflinks http://permalink.gmane.org/gmane.comp.file-systems.btrfs/11763 Jan Schmidt wrote: If I get back to a situation where I can reproduce the bug, I'll send a follow up. I do have a reproducer: $ mkfs.btrfs $ mount ... $ btrfs subvol create subvol1 $ btrfs subvol create subvol2 $ cp linux-2.6 subvol1 $ (in subvol1) find linux-2.6 -type d -exec mkdir -p ../subvol2/'{}' \; $ (in subvol1) find linux-2.6 -type f -exec ./clone-file '{}' ../subvol2/'{}' \; and this backtrace follows ... david [ 925.546266] [ cut here ] [ 925.549921] kernel BUG at fs/btrfs/delayed-inode.c:1693! [ 925.549921] invalid opcode: [#1] SMP [ 925.549921] CPU 0 [ 925.549921] Modules linked in: btrfs [ 925.549921] [ 925.549921] Pid: 31167, comm: clone-file Not tainted 3.0.0-default+ #98 Intel Corporation Santa Rosa platform/Matanzas [ 925.549921] RIP: 0010:[a00790e0] [a00790e0] btrfs_delayed_update_inode+0x2e0/0x2f0 [btrfs] [ 925.549921] RSP: 0018:88004f229be8 EFLAGS: 00010286 [ 925.549921] RAX: ffe4 RBX: 880048392c70 RCX: 00018000 [ 925.549921] RDX: 1b1a RSI: 0001 RDI: 88007a6f8420 [ 925.549921] RBP: 88004f229c28 R08: 0004 R09: [ 925.549921] R10: R11: R12: 880048393bf8 [ 925.549921] R13: 880048392cb8 R14: 880050ff3540 R15: 88005294 [ 925.549921] FS: 7fbf18b23700() GS:88007dc0() knlGS: [ 925.549921] CS: 0010 DS: ES: CR0: 8005003b [ 925.549921] CR2: 7fbcc68ba000 CR3: 4b4a8000 CR4: 06f0 [ 925.549921] DR0: DR1: DR2: [ 925.549921] DR3: DR6: 0ff0 DR7: 0400 [ 925.549921] Process clone-file (pid: 31167, threadinfo 88004f228000, task 88004b4e5140) [ 925.549921] Stack: [ 925.549921] 880048f7ddc0 00018000 88004f229c38 880048393bf8 [ 925.549921] 880050ff3540 880048393bf8 880051a900a0 88005294 [ 925.549921] 88004f229c78 a0034633 88004f229c58 a005f08b [ 925.549921] Call Trace: [ 925.549921] [a0034633] btrfs_update_inode+0x53/0x160 [btrfs] [ 925.549921] [a005f08b] ? btrfs_tree_unlock+0x6b/0xa0 [btrfs] [ 925.549921] [a005b0ba] btrfs_ioctl_clone+0xa0a/0xcc0 [btrfs] [ 925.549921] [81168c81] ? __do_fault+0x4a1/0x590 [ 925.549921] [810daa1d] ? lock_release_holdtime+0x3d/0x1c0 [ 925.549921] [81b8dc20] ? do_page_fault+0x2d0/0x580 [ 925.549921] [a005dfcb] btrfs_ioctl+0x2db/0xda0 [btrfs] [ 925.549921] [81b8dc20] ? do_page_fault+0x2d0/0x580 [ 925.549921] [810e1467] ? debug_check_no_locks_freed+0x177/0x180 [ 925.549921] [811863c5] ? kmem_cache_free+0xb5/0x1b0 [ 925.549921] [811a5db8] do_vfs_ioctl+0x98/0x570 [ 925.549921] [8119476d] ? fget_light+0x2fd/0x3c0 [ 925.549921] [811a62df] sys_ioctl+0x4f/0x80 [ 925.549921] [81b92882] system_call_fastpath+0x16/0x1b [ 925.549921] Code: e8 06 00 00 8d 0c 49 48 89 ca 48 89 4d c8 e8 c8 0f fa ff 85 c0 48 8b 4d c8 75 10 48 89 4b 08 e9 8e fd ff ff 0f 1f 80 00 00 00 00 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 [ 925.549921] RIP [a00790e0] btrfs_delayed_update_inode+0x2e0/0x2f0 [btrfs] [ 925.549921] RSP 88004f229be8 [ 925.876182] ---[ end trace 8b4c2031e1394913 ]--- the patch has been applied on top of current linus which contains patches from both pull requests (ed8f37370d83). The filesystem consists of 5 devices 23G each, about 100G of usable space, mkfs.btrfs with defaults. The kernel tree has about 6G: $ btrfs fi df . Data, RAID0: total=10.00GB, used=5.55GB Data: total=8.00MB, used=0.00 System, RAID1: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=1.50GB, used=121.75MB Metadata: total=8.00MB, used=0.00 $ df -h . FilesystemSize Used Avail Use% Mounted on /dev/sda5 110G 5.8G 82G 7% /mnt/sda5 ie. plenty of free space. It's possible that I've omitted some important bits in the patch itself, or this exposes a bug of ENOSPC or delayed-inode. david --- From: David Sterba dste...@suse.cz Lift the EXDEV condition and allow different root trees for files being cloned, then pass source inode's root when searching for extents.
Re: [PATCH] btrfs: do not allow mounting non-subvolumes via subvol option
On Sat, Jul 30, 2011 at 12:16:44AM +0800, Zhong, Xin wrote: I believe I have submit a similar patch months ago: http://marc.info/?l=linux-btrfsm=130208585106572w=2 You did! I was not aware of that. I believe adding a helper make things more clear (if it were used all over the code). Hope it can be integrated this time, :-). mehopes too, david -Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- ow...@vger.kernel.org] On Behalf Of David Sterba Sent: Friday, July 29, 2011 6:14 PM To: linux-btrfs@vger.kernel.org Cc: chris.ma...@oracle.com; David Sterba Subject: [PATCH] btrfs: do not allow mounting non-subvolumes via subvol option There's a missing test whether the path passed to subvol=path option during mount is a real subvolume, allowing any directory located in default subovlume to be passed and accepted for mount. (current btrfs progs prevent this early) $ btrfs subvol snapshot . p1-snap ERROR: '.' is not a subvolume (with is subvolume? test bypassed) $ btrfs subvol snapshot . p1-snap Create a snapshot of '.' in './p1-snap' $ btrfs subvol list -p . ID 258 parent 5 top level 5 path subvol ID 259 parent 5 top level 5 path subvol1 ID 260 parent 5 top level 5 path default-subvol1 ID 262 parent 5 top level 5 path p1/p1-snapshot ID 263 parent 259 top level 5 path subvol1/subvol1-snap The problem I see is that this makes a false impression of snapshotting the given subvolume but in fact snapshots the default one: a user expects outcome like ID 263 but in fact gets ID 262 . This patch makes mount fail with EINVAL with a message in syslog. Signed-off-by: David Sterba dste...@suse.cz --- I did not find a better errno than EINVAL, probably adding someting like ENSUBVOL would be better so that other filesystems with such functionality may use it in future. fs/btrfs/super.c | 19 +++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 15634d4..0c2a1d1 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -753,6 +753,15 @@ static int btrfs_set_super(struct super_block *s, void *data) return set_anon_super(s, data); } +/* + * subvolumes are identified by ino 256 + */ +static inline int is_subvolume_inode(struct inode *inode) +{ + if (inode inode-i_ino == BTRFS_FIRST_FREE_OBJECTID) + return 1; + return 0; +} /* * Find a superblock for the given device / mount point. @@ -873,6 +882,16 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags, error = -ENXIO; goto error_free_subvol_name; } + + if (!is_subvolume_inode(new_root-d_inode)) { + dput(root); + dput(new_root); + deactivate_locked_super(s); + error = -EINVAL; + printk(KERN_ERR btrfs: '%s' is not a valid subvolume\n, + subvol_name); + goto error_free_subvol_name; + } dput(root); root = new_root; } else { -- 1.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: do not allow mounting non-subvolumes via subvol option
On Wed, Aug 3, 2011 at 1:47 PM, David Sterba d...@jikos.cz wrote: On Sat, Jul 30, 2011 at 12:16:44AM +0800, Zhong, Xin wrote: I believe I have submit a similar patch months ago: http://marc.info/?l=linux-btrfsm=130208585106572w=2 You did! I was not aware of that. I believe adding a helper make things more clear (if it were used all over the code). Hope it can be integrated this time, :-). mehopes too i corrupted an FS after doing this back in Nov of last year (though i was also --bind mounting it after the fact) http://marc.info/?l=linux-btrfsm=129091436915724w=2 ...and a patch proposed: http://marc.info/?l=linux-btrfsm=129091815217860w=2 C Anthony -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS partition won't mount
Hello all, I recently had a power failure and can no longer mount my /home directory. The harddrive has two BTRFS partitions: sda7(/) and sda8(/home). The / partition loads up just fine, but /home does not. I've tried btrfsck as shown below and I've included dmesg pertaining to btrfs. This is on ArchLinux and the software versions are as follows: btrfs-progs-unstable 0.19.20101006-1 linux 3.0 I actually have quite a bit of data on this partition that I would rather not lose. Please help! Thanks, Adam btrfsck /dev/sda8: found 152764059648 bytes used err is 1 total csum bytes: 148756860 total tree bytes: 436686848 total fs tree bytes: 160870400 btree space waste bytes: 110424811 file data blocks allocated: 4925582483456 referenced 114231959552 Btrfs Btrfs v0.19 root 5 inode 738980 errors 400 root 5 inode 771936 errors 400 root 5 inode 771937 errors 400 root 5 inode 771938 errors 400 root 5 inode 771939 errors 400 root 5 inode 771941 errors 400 root 5 inode 771942 errors 400 dmesg |grep btrfs [ 15.148281] btrfs: unlinked 13 orphans [ 27.156006] kernel BUG at fs/btrfs/inode.c:4586! [ 27.156124] Modules linked in: vboxnetflt vboxdrv snd_hda_codec_hdmi joydev usbhid hid snd_hda_codec_idt snd_hda_intel snd_hda_codec sdhci_pci sdhci psmouse snd_pcm snd_hwdep btusb firewire_ohci bluetooth crc16 dell_wmi sg nvidia(P) firewire_core arc4 snd_timer evdev serio_raw sparse_keymap i2c_i801 iwlagn mmc_core snd battery video soundcore intel_ips wmi ppdev dell_laptop parport_pc container button ac mac80211 cfg80211 pcspkr rfkill parport dcdbas iTCO_wdt i2c_core crc_itu_t iTCO_vendor_support intel_agp snd_page_alloc processor intel_gtt e1000e btrfs zlib_deflate crc32c libcrc32c ext2 mbcache ehci_hcd usbcore sr_mod cdrom sd_mod ahci libahci libata scsi_mod [ 27.157096] RIP: 0010:[a010b781] [a010b781] btrfs_add_link+0x161/0x1c0 [btrfs] [ 27.158176] [a013482f] add_inode_ref+0x30f/0x3d0 [btrfs] [ 27.158245] [a013567b] replay_one_buffer+0x2bb/0x3b0 [btrfs] [ 27.158318] [a0122f27] ? alloc_extent_buffer+0x87/0x3d0 [btrfs] [ 27.158391] [a0132cd1] walk_down_log_tree+0x391/0x540 [btrfs] [ 27.160851] [a0132f7d] walk_log_tree+0xfd/0x270 [btrfs] [ 27.165784] [a0136d11] btrfs_recover_log_trees+0x211/0x300 [btrfs] [ 27.168297] [a01353c0] ? replay_one_dir_item+0xe0/0xe0 [btrfs] [ 27.170832] [a00fd907] open_ctree+0x13e7/0x17a0 [btrfs] [ 27.178506] [a00d879e] btrfs_mount+0x40e/0x5c0 [btrfs] [ 27.212137] RIP [a010b781] btrfs_add_link+0x161/0x1c0 [btrfs] -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS partition won't mount
On Wed, Aug 03, 2011 at 04:46:01PM -0400, Adam Newby wrote: I recently had a power failure and can no longer mount my /home directory. The harddrive has two BTRFS partitions: sda7(/) and sda8(/home). The / partition loads up just fine, but /home does not. I've tried btrfsck as shown below and I've included dmesg pertaining to btrfs. This is on ArchLinux and the software versions are as follows: btrfs-progs-unstable 0.19.20101006-1 linux 3.0 Try the instructions on the wiki at [1]. (And please feed back and/or fix any issues you have with the instructions -- they're still quite new and probably have awkward corners). I actually have quite a bit of data on this partition that I would rather not lose. Please help! You *really* need to think about making good backups (otherwise we'll have to set the cwillu on you). Hugo. [1] https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_can.27t_mount_my_filesystem.2C_and_I_get_a_kernel_oops.21 Thanks, Adam btrfsck /dev/sda8: found 152764059648 bytes used err is 1 total csum bytes: 148756860 total tree bytes: 436686848 total fs tree bytes: 160870400 btree space waste bytes: 110424811 file data blocks allocated: 4925582483456 referenced 114231959552 Btrfs Btrfs v0.19 root 5 inode 738980 errors 400 root 5 inode 771936 errors 400 root 5 inode 771937 errors 400 root 5 inode 771938 errors 400 root 5 inode 771939 errors 400 root 5 inode 771941 errors 400 root 5 inode 771942 errors 400 dmesg |grep btrfs [ 15.148281] btrfs: unlinked 13 orphans [ 27.156006] kernel BUG at fs/btrfs/inode.c:4586! [ 27.156124] Modules linked in: vboxnetflt vboxdrv snd_hda_codec_hdmi joydev usbhid hid snd_hda_codec_idt snd_hda_intel snd_hda_codec sdhci_pci sdhci psmouse snd_pcm snd_hwdep btusb firewire_ohci bluetooth crc16 dell_wmi sg nvidia(P) firewire_core arc4 snd_timer evdev serio_raw sparse_keymap i2c_i801 iwlagn mmc_core snd battery video soundcore intel_ips wmi ppdev dell_laptop parport_pc container button ac mac80211 cfg80211 pcspkr rfkill parport dcdbas iTCO_wdt i2c_core crc_itu_t iTCO_vendor_support intel_agp snd_page_alloc processor intel_gtt e1000e btrfs zlib_deflate crc32c libcrc32c ext2 mbcache ehci_hcd usbcore sr_mod cdrom sd_mod ahci libahci libata scsi_mod [ 27.157096] RIP: 0010:[a010b781] [a010b781] btrfs_add_link+0x161/0x1c0 [btrfs] [ 27.158176] [a013482f] add_inode_ref+0x30f/0x3d0 [btrfs] [ 27.158245] [a013567b] replay_one_buffer+0x2bb/0x3b0 [btrfs] [ 27.158318] [a0122f27] ? alloc_extent_buffer+0x87/0x3d0 [btrfs] [ 27.158391] [a0132cd1] walk_down_log_tree+0x391/0x540 [btrfs] [ 27.160851] [a0132f7d] walk_log_tree+0xfd/0x270 [btrfs] [ 27.165784] [a0136d11] btrfs_recover_log_trees+0x211/0x300 [btrfs] [ 27.168297] [a01353c0] ? replay_one_dir_item+0xe0/0xe0 [btrfs] [ 27.170832] [a00fd907] open_ctree+0x13e7/0x17a0 [btrfs] [ 27.178506] [a00d879e] btrfs_mount+0x40e/0x5c0 [btrfs] [ 27.212137] RIP [a010b781] btrfs_add_link+0x161/0x1c0 [btrfs] -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You got very nice eyes, Deedee. Never noticed them --- before. They real? signature.asc Description: Digital signature
Re: Honest timeline for btrfsck
Excerpts from Erik Jensen's message of 2011-08-03 02:57:24 -0400: The lack of any information on when btrfsck might be ready is a real headache to those deciding what to do with a corrupted file system. I am currently sitting on a btrfs array of 10 disks that has been reporting parent transid verify failed since last November. While the data on the drive is by no means irreplaceable, it would take a fair amount of effort. At the time I was told that a btrfsck would almost certainly be released by the end of the year. In January, it was finally almost ready, and toward the end of May it was going to be released in a couple of days (hopefully). Had I known back in November 9 months would go by with no such tool, I would have certainly wiped the array and started over, as it was certainly not worth the wait. So here I am, several assurances of imminent release later, still wondering whether it would be better to wait or cut my losses. I understand that everyone is working hard, and I deeply appreciate the effort being put into this filesystem. I'm not looking for an exact date, just a rough order of magnitude on which to base decisions. This part is definitely my fault. I've gone through a bunch of variations on bigger and smaller tools, and had to juggle the kernel maintenance as well. Aside from making sure the kernel code is stable, btrfsck is all I'm working on right now. I do expect a release in the next two weeks that can recover your data (and many others). Thanks, Chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hot rb_next, setup_cluster_no_bitmap
Hi! Since upgrading from 2.6.35+bits to 2.6.38 and then more recently to 3.0, our big btrfs backup box with 20 * 3 TB AOE-attached btrfs volumes started showing more CPU usage and backups were no longer completing in a day. I tried Linus HEAD from yesterday merged with btrfs for-linus (same as Linus HEAD as of today), and things are better again, but perf top output still looks pretty interesting after a night of rsync running: samples pcnt function DSO ___ _ __ __ 13537.00 59.2% rb_next[kernel] 3539.00 15.5% _raw_spin_lock [kernel] 1668.00 7.3% setup_cluster_no_bitmap[kernel] 799.00 3.5% tree_search_offset [kernel] 476.00 2.1% fill_window[kernel] 370.00 1.6% find_free_extent [kernel] 238.00 1.0% longest_match [kernel] 128.00 0.6% build_tree [kernel] 95.00 0.4% pqdownheap [kernel] 79.00 0.3% chksum_update [kernel] 72.00 0.3% btrfs_find_space_cluster [kernel] 65.00 0.3% deflate_fast [kernel] 61.00 0.3% memcpy [kernel] With call-graphs enabled: - 50.24% btrfs-transacti [kernel.kallsyms] [k] rb_next - rb_next - 97.36% setup_cluster_no_bitmap btrfs_find_space_cluster find_free_extent btrfs_reserve_extent btrfs_alloc_free_block __btrfs_cow_block + btrfs_cow_block - 2.29% btrfs_find_space_cluster find_free_extent btrfs_reserve_extent btrfs_alloc_free_block __btrfs_cow_block btrfs_cow_block - btrfs_search_slot - 56.96% lookup_inline_extent_backref - 97.23% __btrfs_free_extent run_clustered_refs - btrfs_run_delayed_refs - 91.23% btrfs_commit_transaction transaction_kthread kthread kernel_thread_helper - 8.77% btrfs_write_dirty_block_groups commit_cowonly_roots btrfs_commit_transaction transaction_kthread kthread kernel_thread_helper - 2.77% insert_inline_extent_backref __btrfs_inc_extent_ref run_clustered_refs btrfs_run_delayed_refs btrfs_commit_transaction transaction_kthread kthread kernel_thread_helper - 41.03% btrfs_insert_empty_items - 99.89% run_clustered_refs - btrfs_run_delayed_refs + 89.93% btrfs_commit_transaction + 10.07% btrfs_write_dirty_block_groups + 1.87% btrfs_write_dirty_block_groups - 7.41% btrfs-transacti [kernel.kallsyms] [k] setup_cluster_no_bitmap + setup_cluster_no_bitmap + 4.34%rsync [kernel.kallsyms] [k] _raw_spin_lock + 3.68%rsync [kernel.kallsyms] [k] rb_next + 3.09% btrfs-transacti [kernel.kallsyms] [k] tree_search_offset + 1.40% btrfs-delalloc- [kernel.kallsyms] [k] fill_window + 1.31% btrfs-transacti [kernel.kallsyms] [k] _raw_spin_lock + 1.19% btrfs-delalloc- [kernel.kallsyms] [k] longest_match + 1.18% btrfs-delalloc- [kernel.kallsyms] [k] deflate_fast + 1.09% btrfs-transacti [kernel.kallsyms] [k] find_free_extent + 0.90% btrfs-delalloc- [kernel.kallsyms] [k] pqdownheap + 0.67% btrfs-delalloc- [kernel.kallsyms] [k] compress_block + 0.66% btrfs-delalloc- [kernel.kallsyms] [k] build_tree + 0.61%rsync [kernel.kallsyms] [k] page_fault rb_next() from setup_cluster_no_bitmap() is very hot. From the annotated assembly output, it looks like the while (window_free = min_bytes) loop is where the CPU is spending most of the time. A few thoughts: Shouldn't (window_free = min_bytes) be (window_free min_bytes)? I'm not really up to speed with SMP memory caching behaviour, but I'm thinking the constant list creation of bitmap entries from the shared free_space_cache objects might be helping bounce around these pages between CPUs, which is why instructions that deference the object pointers always seem to be cache misses...Or there's just too much of this stuff in memory for it to fit in cache. Top of slabtop -sc: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 1760061 1706286 96% 0.97K 53351 33 1707232K nfs_inode_cache 1623423 1617242 99% 0.95K 49279 33 1576928K
Re: Hot rb_next, setup_cluster_no_bitmap
On Wed, Aug 03, 2011 at 03:06:55PM -0700, Simon Kirby wrote: I see Josef's 86d4a77ba3dc4ace238a0556541a41df2bd71d49 introduced the bitmaps list. I could try temporarily reverting this (some fixups needed) if anybody thinks my cache bouncing idea might be slightly possible. I'll try the attached and see how the profile changes. Simon- diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 6377713..99582f9 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2148,7 +2148,7 @@ again: static noinline int setup_cluster_no_bitmap(struct btrfs_block_group_cache *block_group, struct btrfs_free_cluster *cluster, - struct list_head *bitmaps, u64 offset, u64 bytes, + u64 offset, u64 bytes, u64 min_bytes) { struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl; @@ -2171,8 +2171,6 @@ setup_cluster_no_bitmap(struct btrfs_block_group_cache *block_group, * extent entry. */ while (entry-bitmap) { - if (list_empty(entry-list)) - list_add_tail(entry-list, bitmaps); node = rb_next(entry-offset_index); if (!node) return -ENOSPC; @@ -2192,11 +2190,8 @@ setup_cluster_no_bitmap(struct btrfs_block_group_cache *block_group, return -ENOSPC; entry = rb_entry(node, struct btrfs_free_space, offset_index); - if (entry-bitmap) { - if (list_empty(entry-list)) -list_add_tail(entry-list, bitmaps); + if (entry-bitmap) continue; - } /* * we haven't filled the empty size and the window is @@ -2252,7 +2247,7 @@ setup_cluster_no_bitmap(struct btrfs_block_group_cache *block_group, static noinline int setup_cluster_bitmap(struct btrfs_block_group_cache *block_group, struct btrfs_free_cluster *cluster, - struct list_head *bitmaps, u64 offset, u64 bytes, + u64 offset, u64 bytes, u64 min_bytes) { struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl; @@ -2263,39 +2258,10 @@ setup_cluster_bitmap(struct btrfs_block_group_cache *block_group, if (ctl-total_bitmaps == 0) return -ENOSPC; - /* - * First check our cached list of bitmaps and see if there is an entry - * here that will work. - */ - list_for_each_entry(entry, bitmaps, list) { - if (entry-bytes min_bytes) - continue; - ret = btrfs_bitmap_cluster(block_group, entry, cluster, offset, - bytes, min_bytes); - if (!ret) - return 0; - } - - /* - * If we do have entries on our list and we are here then we didn't find - * anything, so go ahead and get the next entry after the last entry in - * this list and start the search from there. - */ - if (!list_empty(bitmaps)) { - entry = list_entry(bitmaps-prev, struct btrfs_free_space, - list); - node = rb_next(entry-offset_index); - if (!node) - return -ENOSPC; - entry = rb_entry(node, struct btrfs_free_space, offset_index); - goto search; - } - entry = tree_search_offset(ctl, offset_to_bitmap(ctl, offset), 0, 1); if (!entry) return -ENOSPC; -search: node = entry-offset_index; do { entry = rb_entry(node, struct btrfs_free_space, offset_index); @@ -2326,8 +2292,6 @@ int btrfs_find_space_cluster(struct btrfs_trans_handle *trans, u64 offset, u64 bytes, u64 empty_size) { struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl; - struct list_head bitmaps; - struct btrfs_free_space *entry, *tmp; u64 min_bytes; int ret; @@ -2366,17 +2330,12 @@ int btrfs_find_space_cluster(struct btrfs_trans_handle *trans, goto out; } - INIT_LIST_HEAD(bitmaps); - ret = setup_cluster_no_bitmap(block_group, cluster, bitmaps, offset, + ret = setup_cluster_no_bitmap(block_group, cluster, offset, bytes, min_bytes); if (ret) - ret = setup_cluster_bitmap(block_group, cluster, bitmaps, + ret = setup_cluster_bitmap(block_group, cluster, offset, bytes, min_bytes); - /* Clear our temporary list */ - list_for_each_entry_safe(entry, tmp, bitmaps, list) - list_del_init(entry-list); - if (!ret) { atomic_inc(block_group-count); list_add_tail(cluster-block_group_list,
Re: Hot rb_next, setup_cluster_no_bitmap
On Wed, Aug 03, 2011 at 03:39:49PM -0700, Simon Kirby wrote: On Wed, Aug 03, 2011 at 03:06:55PM -0700, Simon Kirby wrote: I see Josef's 86d4a77ba3dc4ace238a0556541a41df2bd71d49 introduced the bitmaps list. I could try temporarily reverting this (some fixups needed) if anybody thinks my cache bouncing idea might be slightly possible. I'll try the attached and see how the profile changes. Hmm, I bound the SMP affinity of all of the btrfs processes to one CPU, and the page dirtying rate got slower, so I suspect the writes aren't really a big deal, and the problem is just that there is way too much walking going on after rsync has ran for a while and loads everything into memory. Any ideas? Simon- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, crash][PATCH] btrfs: allow cross-subvolume file clone
On Wed, 3 Aug 2011 20:07:42 +0200, David Sterba wrote: I'm working on a patch to fix cross-volume cloning, worked for simple cases like cloning a single file. When I cloned a full linux-2.6 tree there was a immediate BUG_ON (after third cloned file) in btrfs_delayed_update_inode with -ENOSPC : [ 925.546266] [ cut here ] [ 925.549921] kernel BUG at fs/btrfs/delayed-inode.c:1693! [ 925.549921] invalid opcode: [#1] SMP [ 925.549921] CPU 0 [ 925.549921] Modules linked in: btrfs [ 925.549921] [ 925.549921] Pid: 31167, comm: clone-file Not tainted 3.0.0-default+ #98 Intel Corporation Santa Rosa platform/Matanzas [ 925.549921] RIP: 0010:[a00790e0] [a00790e0] btrfs_delayed_update_inode+0x2e0/0x2f0 [btrfs] [ 925.549921] RSP: 0018:88004f229be8 EFLAGS: 00010286 [ 925.549921] RAX: ffe4 RBX: 880048392c70 RCX: 00018000 [ 925.549921] RDX: 1b1a RSI: 0001 RDI: 88007a6f8420 [ 925.549921] RBP: 88004f229c28 R08: 0004 R09: [ 925.549921] R10: R11: R12: 880048393bf8 [ 925.549921] R13: 880048392cb8 R14: 880050ff3540 R15: 88005294 [ 925.549921] FS: 7fbf18b23700() GS:88007dc0() knlGS: [ 925.549921] CS: 0010 DS: ES: CR0: 8005003b [ 925.549921] CR2: 7fbcc68ba000 CR3: 4b4a8000 CR4: 06f0 [ 925.549921] DR0: DR1: DR2: [ 925.549921] DR3: DR6: 0ff0 DR7: 0400 [ 925.549921] Process clone-file (pid: 31167, threadinfo 88004f228000, task 88004b4e5140) [ 925.549921] Stack: [ 925.549921] 880048f7ddc0 00018000 88004f229c38 880048393bf8 [ 925.549921] 880050ff3540 880048393bf8 880051a900a0 88005294 [ 925.549921] 88004f229c78 a0034633 88004f229c58 a005f08b [ 925.549921] Call Trace: [ 925.549921] [a0034633] btrfs_update_inode+0x53/0x160 [btrfs] [ 925.549921] [a005f08b] ? btrfs_tree_unlock+0x6b/0xa0 [btrfs] [ 925.549921] [a005b0ba] btrfs_ioctl_clone+0xa0a/0xcc0 [btrfs] [ 925.549921] [81168c81] ? __do_fault+0x4a1/0x590 [ 925.549921] [810daa1d] ? lock_release_holdtime+0x3d/0x1c0 [ 925.549921] [81b8dc20] ? do_page_fault+0x2d0/0x580 [ 925.549921] [a005dfcb] btrfs_ioctl+0x2db/0xda0 [btrfs] [ 925.549921] [81b8dc20] ? do_page_fault+0x2d0/0x580 [ 925.549921] [810e1467] ? debug_check_no_locks_freed+0x177/0x180 [ 925.549921] [811863c5] ? kmem_cache_free+0xb5/0x1b0 [ 925.549921] [811a5db8] do_vfs_ioctl+0x98/0x570 [ 925.549921] [8119476d] ? fget_light+0x2fd/0x3c0 [ 925.549921] [811a62df] sys_ioctl+0x4f/0x80 [ 925.549921] [81b92882] system_call_fastpath+0x16/0x1b [ 925.549921] Code: e8 06 00 00 8d 0c 49 48 89 ca 48 89 4d c8 e8 c8 0f fa ff 85 c0 48 8b 4d c8 75 10 48 89 4b 08 e9 8e fd ff ff 0f 1f 80 00 00 00 00 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 [ 925.549921] RIP [a00790e0] btrfs_delayed_update_inode+0x2e0/0x2f0 [btrfs] [ 925.549921] RSP 88004f229be8 [ 925.876182] ---[ end trace 8b4c2031e1394913 ]--- the patch has been applied on top of current linus which contains patches from both pull requests (ed8f37370d83). I think it is because the caller didn't reserve enough space.Could you try to apply the following patch? It might fix this bug. [PATCH v2] Btrfs: reserve enough space for file clone http://marc.info/?l=linux-btrfsm=131192686626576w=2 Thanks Miao The filesystem consists of 5 devices 23G each, about 100G of usable space, mkfs.btrfs with defaults. The kernel tree has about 6G: $ btrfs fi df . Data, RAID0: total=10.00GB, used=5.55GB Data: total=8.00MB, used=0.00 System, RAID1: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=1.50GB, used=121.75MB Metadata: total=8.00MB, used=0.00 $ df -h . FilesystemSize Used Avail Use% Mounted on /dev/sda5 110G 5.8G 82G 7% /mnt/sda5 ie. plenty of free space. It's possible that I've omitted some important bits in the patch itself, or this exposes a bug of ENOSPC or delayed-inode. david --- From: David Sterba dste...@suse.cz Lift the EXDEV condition and allow different root trees for files being cloned, then pass source inode's root when searching for extents. Signed-off-by: David Sterba dste...@suse.cz --- fs/btrfs/ioctl.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0b980af..58eb0ef 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2183,7 +2183,7 @@ static noinline long btrfs_ioctl_clone(struct file *file,
Re: Hot rb_next, setup_cluster_no_bitmap
Excerpts from Simon Kirby's message of 2011-08-03 19:10:59 -0400: On Wed, Aug 03, 2011 at 03:39:49PM -0700, Simon Kirby wrote: On Wed, Aug 03, 2011 at 03:06:55PM -0700, Simon Kirby wrote: I see Josef's 86d4a77ba3dc4ace238a0556541a41df2bd71d49 introduced the bitmaps list. I could try temporarily reverting this (some fixups needed) if anybody thinks my cache bouncing idea might be slightly possible. I'll try the attached and see how the profile changes. Hmm, I bound the SMP affinity of all of the btrfs processes to one CPU, and the page dirtying rate got slower, so I suspect the writes aren't really a big deal, and the problem is just that there is way too much walking going on after rsync has ran for a while and loads everything into memory. Any ideas? The current for-linus branch gets rid of all the bottlenecks in the metadata blocks. So now we're stuck with the bottlenecks in the allocator. There are a few simple things we can do here but Josef has a patch that fixes delalloc reservations for inline extents that might help as a first step. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hot rb_next, setup_cluster_no_bitmap
Perhaps as a further clue as to what is going on, on this same backup box after all of the rsyncs are finished/killed and a good amount of time has passed (no cleaner processes running in the background or anything), sync is still consistently takes ~4 minutes to run, and pushes out a lot to disk every time it is run. Example: echo 3 /proc/sys/vm/drop_caches sync echo 3 /proc/sys/vm/drop_caches sync vmstat 1 time sync procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 68 15656528 3660 25823200 783 673 72 33 3 30 13 54 0 0 68 15656612 3660 25823200 0 0 439 152 0 0 100 0 0 0 68 15656488 3660 25823200 0 0 395 87 0 0 100 0 0 0 68 15656488 3660 25823200 0 0 309 89 0 0 100 0 0 0 68 15655992 3660 25823200 0 0 450 128 0 0 100 0 0 0 68 15655984 3660 25823200 0 0 446 159 0 1 99 0 0 0 68 15655860 3660 25823200 0 0 448 105 0 0 100 0 0 0 68 15655720 3660 25827600 0 0 824 238 1 2 98 0 1 0 68 15651388 3660 25827600 0 0 621 150 2 0 98 0 0 0 68 15655860 3660 25827600 044 886 236 2 1 98 0 0 0 68 15656604 3660 25827600 0 0 544 161 0 1 99 0 0 0 68 15656612 3660 25827600 0 0 616 328 0 1 99 0 sync started here 0 2 68 15655764 3660 25842000 328 607 1777 1450 0 1 74 25 0 1 68 15654648 3660 25996800 752 0 1978 1519 0 1 66 33 0 1 68 15654028 3660 26052000 616 0 1498 1186 0 0 75 25 0 1 68 15653556 3660 26099200 220 1545 1878 1937 0 1 75 24 1 0 68 15652288 3660 26254000 392 1976 2990 3072 0 4 75 21 1 0 68 15652252 3660 26249600 0 0 2848 164 0 27 73 0 1 0 68 15652260 3660 26249600 0 0 2586 86 0 26 74 0 1 0 68 15652260 3660 26249600 0 0 2591 148 0 25 75 0 1 0 68 15652136 3660 26249600 0 0 2544 98 0 24 76 0 1 0 68 15652136 3660 26249600 0 0 2518 75 0 26 74 0 1 0 68 15652136 3660 26249600 0 0 2676 105 0 25 75 0 1 0 68 15652136 3660 26249600 0 0 2531 83 0 25 75 0 1 0 68 15652136 3660 26249600 0 0 2595 81 0 25 75 0 1 0 68 15652136 3660 26249600 0 0 2570 89 0 25 75 0 1 0 68 15652136 3660 26249600 0 0 2539 76 0 25 75 0 1 0 68 15652004 3660 26254000 0 0 2914 166 0 25 74 0 1 0 68 15652012 3660 26254000 0 0 2596 87 0 25 75 0 1 0 68 15652012 3660 26254000 0 0 2591 82 0 25 75 0 1 0 68 15652012 3660 26254000 0 0 2607 90 0 25 75 0 1 0 68 15652012 3660 26254000 0 0 2535 89 0 25 75 0 1 0 68 15652012 3660 26254000 0 0 2629 109 0 26 74 0 1 0 68 15652012 3660 26254000 0 0 2549 81 0 25 75 0 1 0 68 15652012 3660 26254000 0 0 2757 230 0 26 73 0 1 0 68 15652012 3660 26254000 0 0 2571 105 0 24 76 0 1 0 68 15652012 3660 26254000 0 0 2568 96 0 26 74 0 1 0 68 15651880 3660 26258400 0 0 2930 173 0 28 72 0 1 0 68 15651888 3660 26258400 0 0 2564 79 0 26 74 0 1 0 68 15651888 3660 26258400 0 0 2594 84 0 22 78 0 1 0 68 15651888 3660 26258400 0 0 2568 96 0 25 75 0 1 0 68 15651888 3660 26258400 0 0 2578 91 0 25 75 0 1 0 68 15651888 3660 26258400 0 0 2660 104 0 26 74 0 1 0 68 15651888 3660 26258400 0 0 2537 84 0 25 75 0 1 0 68 15651888 3660 26258400 0 0 2553 82 0 25 75 0 1 0 68 15651888 3660 26258400 039 2808 204 0 26 74 0 1 0 68 15651888 3660 26258400 0 0 2573 91 0 25 75 0 1 0 68 15651756 3660 26262800 044 2868 153 1 29 71 0 1 0 68 15651764 3660 26262800 0 0 2569 79 0 23 76 0 1 0 68 15651764 3660 26262800 0 0 2587 79 0 27 73 0 1 0 68 15651764 3660 26262800 0 0 2509 73 0 23 77 0 1 0 68 15651764 3660 26262800 0 0 2520 81 0 25 75 0 1 0 68 15651764 3660 26262800 0 0 2664 97 0 25 75 0 1 0 68 15651740 3660 26262800 112 0 2680 146 0 25 75 0 1 0 68 15651640 3660 26274000 0 0 2627 79 0
Re: [RFC, crash][PATCH] btrfs: allow cross-subvolume file clone
David Sterba wrote: On Wed, Aug 03, 2011 at 08:07:42PM +0200, David Sterba wrote: I'm working on a patch to fix cross-volume cloning, worked for simple cases like cloning a single file. When I cloned a full linux-2.6 tree there was a immediate BUG_ON (after third cloned file) in btrfs_delayed_update_inode with -ENOSPC : oh, a similar issue was already reported on 5 Jul 2011: [BUG] delayed inodes and reflinks http://permalink.gmane.org/gmane.comp.file-systems.btrfs/11763 We've got four reports on this bug. The cause is we didn't reserve enough space when starting a transaction. We need space for: 1. btrfs_insert_empty_item() 2. btrfs_update_inode() 3. btrfs_drop_extents() The first 2 are easy, but drop_extents is not, we have to calc the space needed for drop_extents in worst case. -- Li Zefan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html