[PATCH] xfstests: fix 251's cp -axT problem

2012-01-10 Thread Liu Bo
When I ran xfstests, 251 got failed cause cp -axT did not work as wish:
cp: cannot overwrite directory `/mnt/scratch/1' with non-directory

With this patch, 251 has passed.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 251 |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/251 b/251
index fa3d74a..b54e4c3 100755
--- a/251
+++ b/251
@@ -130,7 +130,7 @@ function run_process() {
 
# Copy content - partition.
mkdir $SCRATCH_MNT/$p
-   cp -axT $content $SCRATCH_MNT/$p
+   cp -axT $content/ $SCRATCH_MNT/$p/
export chpid=$!  wait $chpid  /dev/null
 
check_sums
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Broken btrfs filesystem

2012-01-10 Thread Peter Hjalmarsson
Hi,

I have problems with a btrfs filesystem, and am holding on to it for
some more days before reformat.
What I am interested about is two things:
1. Is there any way to restore more stuff from the filesystem then
already fetched (it would help to get the system up faster, but nothing
really of worth on that computer that is not already backed up)?
2. Is there anything here that resembles a bug that should be fixed
somewhere and do you need more information to fix this bug?

Please CC me as I am not subscribed.

So here comes the gory details:

I have a latop on which I have stock Fedora 16 installed with a ext4
boot, and then a luks-encrypted swap partiton and a luks-encrypted root
partition holding a btrfs volume.
Yesterday I hibernated my laptop, and when I resumed it it seemed to
resume normally, it let me unlock the screensaver, but did not allow any
file-system-access and suddenly oopsed within seconds. Afterwards the
system failed to mount the root partition.

So I hooked the harddrive up to my desktop running Gentoo with a
3.2.0-kernel and latest btrfs-progs from git.

Trying to mount the filesystem does not work:
[11353.370007] device fsid 4c86ad4c-0d71-48a3-8cd2-058cccda2a07 devid 1
transid
83234 /dev/mapper/udisks-luks-uuid-d7efe74d-ed8f-425a-942c-c6bbc44483a3-uid1000
[11353.391953] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391958] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391961] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391964] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391966] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391968] Failed to read block groups: -5
[11353.404931] btrfs: open_ctree failed


Trying with -o recovery
[11353.370007] device fsid 4c86ad4c-0d71-48a3-8cd2-058cccda2a07 devid 1
transid
83234 /dev/mapper/udisks-luks-uuid-d7efe74d-ed8f-425a-942c-c6bbc44483a3-uid1000
[11353.391953] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391958] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391961] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391964] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391966] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391968] Failed to read block groups: -5
[11353.404931] btrfs: open_ctree failed


So mounting it seems not to be an option.

So I tried restore.
First run it restored one file, then it stopped. Upon retrying it
restored a lot more files (mostly the /var/lib/yum directory, and a
couple of empty directories), but now it never restores more then up to
one certain file, and it always fails after that with the following:

# ./restore /dev/dm-1 /home/xake/Skrivbord/ferra-rescue
parent transid verify failed on 869829160960 wanted 82376 found 83320
parent transid verify failed on 869829160960 wanted 82376 found 83320
parent transid verify failed on 869829160960 wanted 82376 found 83320
parent transid verify failed on 869829160960 wanted 82376 found 83320
Ignoring transid failure
parent transid verify failed on 869828055040 wanted 82376 found 83315
parent transid verify failed on 869828055040 wanted 82376 found 83315
parent transid verify failed on 869828055040 wanted 82376 found 83315
parent transid verify failed on 869828055040 wanted 82376 found 83315
Ignoring transid failure
parent transid verify failed on 823939305472 wanted 83180 found 83847
parent transid verify failed on 823939305472 wanted 83180 found 83847
parent transid verify failed on 823939305472 wanted 83180 found 83847
parent transid verify failed on 823939305472 wanted 83180 found 83847
Ignoring transid failure
Root objectid is 5
Skipping existing
file /home/xake/Skrivbord/ferra-rescue/var/lib/rpm/.rpm.lock
If you wish to overwrite use the -o option to overwrite
parent transid verify failed on 823805370368 wanted 83121 found 83393
parent transid verify failed on 823805370368 wanted 83121 found 83393
parent transid verify failed on 823805370368 wanted 83121 found 83393
parent transid verify failed on 823805370368 wanted 83121 found 83393
Ignoring transid failure
parent transid verify failed on 823789125632 wanted 83120 found 83356
parent transid verify failed on 823789125632 wanted 83120 found 83356
parent transid verify failed on 823789125632 wanted 83120 found 83356
parent transid verify failed on 823789125632 wanted 83120 found 83356
Ignoring transid failure
parent transid verify failed on 823789125632 wanted 83120 found 83356
Ignoring transid failure
parent transid verify failed on 823784792064 wanted 81189 found 83707
parent transid verify failed on 823784792064 wanted 81189 found 83707
parent transid verify failed on 823784792064 wanted 81189 found 83707
parent transid verify failed on 823784792064 wanted 81189 found 83707
Ignoring transid failure
parent 

Broken btrfs filesystem

2012-01-10 Thread Peter Hjalmarsson
Hi,

I have problems with a btrfs filesystem, and am holding on to it for
some more days before reformat.
What I am interested about is two things:
1. Is there any way to restore more stuff from the filesystem then
already fetched (it would help to get the system up faster, but nothing
really of worth on that computer that is not already backed up)?
2. Is there anything here that resembles a bug that should be fixed
somewhere and do you need more information to fix this bug?

Please CC me as I am not subscribed.

So here comes the gory details:

I have a latop on which I have stock Fedora 16 installed with a ext4
boot, and then a luks-encrypted swap partiton and a luks-encrypted root
partition holding a btrfs volume.
Yesterday I hibernated my laptop, and when I resumed it it seemed to
resume normally, it let me unlock the screensaver, but did not allow any
file-system-access and suddenly oopsed within seconds. Afterwards the
system failed to mount the root partition.

So I hooked the harddrive up to my desktop running Gentoo with a
3.2.0-kernel and latest btrfs-progs from git.

Trying to mount the filesystem does not work:
[11353.370007] device fsid 4c86ad4c-0d71-48a3-8cd2-058cccda2a07 devid 1
transid
83234 /dev/mapper/udisks-luks-uuid-d7efe74d-ed8f-425a-942c-c6bbc44483a3-uid1000
[11353.391953] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391958] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391961] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391964] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391966] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391968] Failed to read block groups: -5
[11353.404931] btrfs: open_ctree failed


Trying with -o recovery
[11353.370007] device fsid 4c86ad4c-0d71-48a3-8cd2-058cccda2a07 devid 1
transid
83234 /dev/mapper/udisks-luks-uuid-d7efe74d-ed8f-425a-942c-c6bbc44483a3-uid1000
[11353.391953] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391958] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391961] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391964] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391966] parent transid verify failed on 869829160960 wanted 82376
found 83320
[11353.391968] Failed to read block groups: -5
[11353.404931] btrfs: open_ctree failed


So mounting it seems not to be an option.

So I tried restore.
First run it restored one file, then it stopped. Upon retrying it
restored a lot more files (mostly the /var/lib/yum directory, and a
couple of empty directories), but now it never restores more then up to
one certain file, and it always fails after that with the following:

# ./restore /dev/dm-1 /home/xake/Skrivbord/ferra-rescue
parent transid verify failed on 869829160960 wanted 82376 found 83320
parent transid verify failed on 869829160960 wanted 82376 found 83320
parent transid verify failed on 869829160960 wanted 82376 found 83320
parent transid verify failed on 869829160960 wanted 82376 found 83320
Ignoring transid failure
parent transid verify failed on 869828055040 wanted 82376 found 83315
parent transid verify failed on 869828055040 wanted 82376 found 83315
parent transid verify failed on 869828055040 wanted 82376 found 83315
parent transid verify failed on 869828055040 wanted 82376 found 83315
Ignoring transid failure
parent transid verify failed on 823939305472 wanted 83180 found 83847
parent transid verify failed on 823939305472 wanted 83180 found 83847
parent transid verify failed on 823939305472 wanted 83180 found 83847
parent transid verify failed on 823939305472 wanted 83180 found 83847
Ignoring transid failure
Root objectid is 5
Skipping existing
file /home/xake/Skrivbord/ferra-rescue/var/lib/rpm/.rpm.lock
If you wish to overwrite use the -o option to overwrite
parent transid verify failed on 823805370368 wanted 83121 found 83393
parent transid verify failed on 823805370368 wanted 83121 found 83393
parent transid verify failed on 823805370368 wanted 83121 found 83393
parent transid verify failed on 823805370368 wanted 83121 found 83393
Ignoring transid failure
parent transid verify failed on 823789125632 wanted 83120 found 83356
parent transid verify failed on 823789125632 wanted 83120 found 83356
parent transid verify failed on 823789125632 wanted 83120 found 83356
parent transid verify failed on 823789125632 wanted 83120 found 83356
Ignoring transid failure
parent transid verify failed on 823789125632 wanted 83120 found 83356
Ignoring transid failure
parent transid verify failed on 823784792064 wanted 81189 found 83707
parent transid verify failed on 823784792064 wanted 81189 found 83707
parent transid verify failed on 823784792064 wanted 81189 found 83707
parent transid verify failed on 823784792064 wanted 81189 found 83707
Ignoring transid failure
parent 

[PATCH V2] Btrfs: cleanup: move node-,leaf-,sectorsize to fs_info

2012-01-10 Thread Peeters Simon
moved the node-,leaf-,sectorsize from btrfs_root to btrfs_fs_info
since we don't intend to allow different sizes between trees
also removed sectorsize from btrfs_block_group_cache because it now
can use the one in fs_info

updated all uses accordingly

please note in disk-io.c:
-static int __setup_root(nodesize, leafsize, sectorsize, stripesize,
- *root, *fs_info, objectid)
+static int __setup_root(stripesize, *root, *fs_info, objectid)

Signed-off-by: Simon Peeters peeters.si...@gmail.com
---
 fs/btrfs/backref.c  |2 +-
 fs/btrfs/compression.c  |8 +++---
 fs/btrfs/ctree.c|   12 +-
 fs/btrfs/ctree.h|   31 +-
 fs/btrfs/disk-io.c  |   50 ++
 fs/btrfs/extent-tree.c  |   22 --
 fs/btrfs/extent_io.c|6 ++--
 fs/btrfs/file-item.c|   20 
 fs/btrfs/file.c |   28 
 fs/btrfs/free-space-cache.c |   22 +-
 fs/btrfs/inode.c|   42 ++--
 fs/btrfs/ioctl.c|8 +++---
 fs/btrfs/ordered-data.c |4 +-
 fs/btrfs/ordered-data.h |4 +-
 fs/btrfs/relocation.c   |   16 +++---
 fs/btrfs/scrub.c|8 +++---
 fs/btrfs/super.c|2 +-
 fs/btrfs/tree-log.c |2 +-
 fs/btrfs/volumes.c  |   10 
 19 files changed, 139 insertions(+), 158 deletions(-)
---

Simon Peeters
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 22c64ff..45d9cf8 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -420,7 +420,7 @@ static int __iter_shared_inline_ref(struct btrfs_fs_info *fs_info,
 	int found = 0;
 
 	eb = read_tree_block(fs_info-tree_root, logical,
-fs_info-tree_root-leafsize, 0);
+fs_info-leafsize, 0);
 	if (!eb)
 		return -EIO;
 
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 14f1c5a..535ff98 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -88,8 +88,8 @@ static inline int compressed_bio_size(struct btrfs_root *root,
 	u16 csum_size = btrfs_super_csum_size(root-fs_info-super_copy);
 
 	return sizeof(struct compressed_bio) +
-		((disk_size + root-sectorsize - 1) / root-sectorsize) *
-		csum_size;
+		((disk_size + root-fs_info-sectorsize - 1) /
+		root-fs_info-sectorsize) * csum_size;
 }
 
 static struct bio *compressed_bio_alloc(struct block_device *bdev,
@@ -675,8 +675,8 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 			comp_bio, sums);
 BUG_ON(ret);
 			}
-			sums += (comp_bio-bi_size + root-sectorsize - 1) /
-root-sectorsize;
+			sums += (comp_bio-bi_size + root-fs_info-sectorsize - 1) /
+root-fs_info-sectorsize;
 
 			ret = btrfs_map_bio(root, READ, comp_bio,
 	mirror_num, 0);
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index dede441..b72272f 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2087,13 +2087,13 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans,
 	else
 		btrfs_node_key(lower, lower_key, 0);
 
-	c = btrfs_alloc_free_block(trans, root, root-nodesize, 0,
+	c = btrfs_alloc_free_block(trans, root, root-fs_info-nodesize, 0,
    root-root_key.objectid, lower_key,
    level, root-node-start, 0);
 	if (IS_ERR(c))
 		return PTR_ERR(c);
 
-	root_add_used(root, root-nodesize);
+	root_add_used(root, root-fs_info-nodesize);
 
 	memset_extent_buffer(c, 0, 0, sizeof(struct btrfs_header));
 	btrfs_set_header_nritems(c, 1);
@@ -2214,13 +2214,13 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
 	mid = (c_nritems + 1) / 2;
 	btrfs_node_key(c, disk_key, mid);
 
-	split = btrfs_alloc_free_block(trans, root, root-nodesize, 0,
+	split = btrfs_alloc_free_block(trans, root, root-fs_info-nodesize, 0,
 	root-root_key.objectid,
 	disk_key, level, c-start, 0);
 	if (IS_ERR(split))
 		return PTR_ERR(split);
 
-	root_add_used(root, root-nodesize);
+	root_add_used(root, root-fs_info-nodesize);
 
 	memset_extent_buffer(split, 0, 0, sizeof(struct btrfs_header));
 	btrfs_set_header_level(split, btrfs_header_level(c));
@@ -2968,13 +2968,13 @@ again:
 	else
 		btrfs_item_key(l, disk_key, mid);
 
-	right = btrfs_alloc_free_block(trans, root, root-leafsize, 0,
+	right = btrfs_alloc_free_block(trans, root, root-fs_info-leafsize, 0,
 	root-root_key.objectid,
 	disk_key, 0, l-start, 0);
 	if (IS_ERR(right))
 		return PTR_ERR(right);
 
-	root_add_used(root, root-leafsize);
+	root_add_used(root, root-fs_info-leafsize);
 
 	memset_extent_buffer(right, 0, 0, sizeof(struct btrfs_header));
 	btrfs_set_header_bytenr(right, right-start);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6738503..d5ca265 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -340,11 +340,11 @@ struct btrfs_header {
 	u8 level;
 } __attribute__ ((__packed__));
 
-#define BTRFS_NODEPTRS_PER_BLOCK(r) (((r)-nodesize - 

real free space on btrfs volume (performance impact)

2012-01-10 Thread Michal Suba

Hello

 we are currently investigating performance issue on system runing 
above btrs filesystem. Is it possible, that performance is impacted by 
lack of free space? Also, how to get info about real free space on btrfs 
volume?


# btrfs-show /dev/sdb1
Label: opt  uuid: 28a55827-e677-47a9-98d5-d31eb3d71436
Total devices 1 FS bytes used 167.83GB
devid1 size 240.00GB used *229.25GB* path /dev/sdb1

Btrfs Btrfs v0.19

# btrfs filesystem df /opt
Data: total=213.23GB, used=165.26GB
System, DUP: total=8.00MB, used=40.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=8.00GB, used=2.57GB

# df -h /opt
FilesystemSize  Used Avail Use% Mounted on
/dev/sdb1 240G  171G   59G  75% /opt

How come that there is difference detween btrfs-show and df .. 40GB Is 
the space really usead or can I claim it back? (there are no snapshots)


# btrfs subvolume list /opt
#

 Thanks
michal

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[v3.2-4874-ge4e1118 OOPS] btrfs-related kernel oops due to media error

2012-01-10 Thread Vincent Vanackere
[Note : this is a resent of a mail I send to linux-btrfs earlier, this 
time tested with the lastest git kernel]


Hi,

One of my disks, partitioned into a single btrfs partition, is showing 
media errors. The problem is that these errors lead to kernel panic from 
btrfs - that make the filesystem unusable until reboot - and therefore 
it is very hard for me to do a full backup of the data prior to changing 
the disk.
My current kernel is a vanilla kernel at current tip (output from git 
describe is v3.2-4874-ge4e1118).
I assume that the filesystem should not panic even in case of a media 
error... Is there any procedure I can follow / patch I could apply to 
salvage my data while ignoring media errors ?


logs/OOPS at the end of this mail, please let me know if more 
information is needed,


Best regards,

Vincent

---

[ 3210.717304] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 3210.717309] ata6.00: BMDMA stat 0x24
[ 3210.717312] ata6.00: failed command: READ DMA EXT
[ 3210.717318] ata6.00: cmd 25/00:08:5f:dc:2f/00:00:70:00:00/e0 tag 0 
dma 4096 in
[ 3210.717320]  res 51/40:00:61:dc:2f/40:00:70:00:00/e0 Emask 
0x9 (media error)

[ 3210.717323] ata6.00: status: { DRDY ERR }
[ 3210.717325] ata6.00: error: { UNC }
[ 3210.732234] ata6.00: configured for UDMA/133
[ 3210.732248] sd 5:0:0:0: [sdd] Unhandled sense code
[ 3210.732250] sd 5:0:0:0: [sdd]  Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[ 3210.732254] sd 5:0:0:0: [sdd]  Sense Key : Medium Error [current] 
[descriptor]

[ 3210.732259] Descriptor sense data with sense descriptors (in hex):
[ 3210.732261] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 3210.732270] 70 2f dc 61
[ 3210.732274] sd 5:0:0:0: [sdd]  Add. Sense: Unrecovered read error - 
auto reallocate failed
[ 3210.732278] sd 5:0:0:0: [sdd] CDB: Read(10): 28 00 70 2f dc 5f 00 00 
08 00

[ 3210.732287] end_request: I/O error, dev sdd, sector 1882184801
[ 3210.732305] ata6: EH complete
[ 3210.732322] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[ 3210.732373] IP: [a017f129] extent_range_uptodate+0x59/0xe0 
[btrfs]

[ 3210.732426] PGD 21e9b7067 PUD 21e9b6067 PMD 0
[ 3210.732455] Oops:  [#1] SMP
[ 3210.732475] CPU 3
[ 3210.732486] Modules linked in: ip6table_filter ip6_tables 
ipt_MASQUERADE bnep iptable_nat nf_nat rfcomm bluetooth 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT 
xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables 
bridge stp kvm_intel kvm parport_pc ppdev nfsd nfs lockd fscache 
binfmt_misc auth_rpcgss nfs_acl sunrpc dm_crypt snd_usb_audio 
snd_usbmidi_lib joydev snd_hda_codec_realtek snd_hda_intel snd_hda_codec 
snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq 
snd_timer snd_seq_device snd soundcore snd_page_alloc psmouse serio_raw 
cdc_acm lp parport btrfs zlib_deflate libcrc32c hid_logitech ff_memless 
usbhid hid i915 drm_kms_helper drm r8169 i2c_algo_bit video pata_jmicron

[ 3210.732870]
[ 3210.732880] Pid: 3856, comm: btrfs-endio-met Not tainted 3.2.0-custom 
#2 Gigabyte Technology Co., Ltd. G33-DS3R/G33-DS3R
[ 3210.732933] RIP: 0010:[a017f129]  [a017f129] 
extent_range_uptodate+0x59/0xe0 [btrfs]

[ 3210.732989] RSP: 0018:880006f3fde0  EFLAGS: 00010246
[ 3210.733014] RAX:  RBX: 00df57385000 RCX: 

[ 3210.733047] RDX: 0001 RSI: 0df57385 RDI: 

[ 3210.733079] RBP: 880006f3fe00 R08:  R09: 
88008bce5200
[ 3210.733111] R10: 8800299f9010 R11: 1000 R12: 
8802190f4030
[ 3210.733143] R13: 00df573853ff R14: 880006f3fe98 R15: 
880143263d88
[ 3210.733175] FS:  () GS:88022fd8() 
knlGS:

[ 3210.733212] CS:  0010 DS:  ES:  CR0: 8005003b
[ 3210.733238] CR2:  CR3: 00021f35a000 CR4: 
000406e0
[ 3210.733270] DR0:  DR1:  DR2: 

[ 3210.733302] DR3:  DR6: 0ff0 DR7: 
0400
[ 3210.74] Process btrfs-endio-met (pid: 3856, threadinfo 
880006f3e000, task 8801fa8d8000)

[ 3210.733374] Stack:
[ 3210.733385]   8800298dd838 8801f9cc9840 
88021ee05000
[ 3210.733423]  880006f3fe30 a01581f9 880143263d80 
8800298dd860
[ 3210.733461]  880143263d80 880143263d98 880006f3fee0 
a0187fef

[ 3210.733499] Call Trace:
[ 3210.733524]  [a01581f9] end_workqueue_fn+0x119/0x140 [btrfs]
[ 3210.733567]  [a0187fef] worker_loop+0x16f/0x5d0 [btrfs]
[ 3210.733608]  [a0187e80] ? btrfs_queue_worker+0x310/0x310 
[btrfs]

[ 3210.733643]  [8106fa93] kthread+0x93/0xa0
[ 3210.733668]  [8162caa4] kernel_thread_helper+0x4/0x10
[ 3210.733697]  [8106fa00] ? 

Re: [PATCH 00/21] Btrfs: restriper

2012-01-10 Thread Ilya Dryomov
On Mon, Jan 09, 2012 at 03:44:18PM +0200, Ilya Dryomov wrote:
 On Mon, Jan 09, 2012 at 01:50:34AM -0500, Marios Titas wrote:
  I tried this for many different scenarios and it seems to work pretty
  well. I only ran into one problematic case: If you remove a device
  from a multidevice filesystem it crashes. Here's how to reproduce it:
  
  truncate -s1g /tmp/test1
  truncate -s1g /tmp/test2
  losetup /dev/loop1 /tmp/test1
  losetup /dev/loop2 /tmp/test2
  mkdir /tmp/test
  ./mkfs.btrfs -L test -d single -m single /dev/loop1 /dev/loop2
  mount -o noatime /dev/loop1 /tmp/test
  ./btrfs dev del /dev/loop1 /tmp/test
  ./btrfs fi bal start /tmp/test
  
  There is no actual restriping involved but the above example does work
  corretly under 3.1+for-linus whereas it fails with your patches.
 
 Thanks for your testing.  The good news is that I put that BUG() there
 simply for debugging so it's nothing major:
 
 2520if (ret)
 2521BUG(); /* FIXME break ? */
 
 It used to be just a break out of the loop there, so that's the reason
 it doesn't panic with 3.1+for-linus.  I'll investigate further and fix
 this.

I force-rebased my tree, removed two other BUG_ONs along with this one.

Thanks,

Ilya



diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d7c5c7d..9b3d03d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2312,7 +2312,8 @@ static int chunk_drange_filter(struct extent_buffer *leaf,
int factor;
int i;
 
-   BUG_ON(!(bargs-flags  BTRFS_BALANCE_ARGS_DEVID));
+   if (!(bargs-flags  BTRFS_BALANCE_ARGS_DEVID))
+   return 0;
 
if (btrfs_chunk_type(leaf, chunk)  (BTRFS_BLOCK_GROUP_DUP |
 BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10))
@@ -2355,7 +2356,8 @@ static int chunk_vrange_filter(struct extent_buffer *leaf,
 static int chunk_soft_convert_filter(u64 chunk_profile,
 struct btrfs_balance_args *bargs)
 {
-   BUG_ON(!(bargs-flags  BTRFS_BALANCE_ARGS_CONVERT));
+   if (!(bargs-flags  BTRFS_BALANCE_ARGS_CONVERT))
+   return 0;
 
chunk_profile = BTRFS_BLOCK_GROUP_PROFILE_MASK;
 
@@ -2518,7 +2520,7 @@ again:
ret = btrfs_previous_item(chunk_root, path, 0,
  BTRFS_CHUNK_ITEM_KEY);
if (ret)
-   BUG(); /* FIXME break ? */
+   break;
 
leaf = path-nodes[0];
slot = path-slots[0];

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: real free space on btrfs volume (performance impact)

2012-01-10 Thread Mitch Harder
2012/1/10 Michal Suba michal.s...@pantheon.sk:
 Hello

  we are currently investigating performance issue on system runing above
 btrs filesystem. Is it possible, that performance is impacted by lack of
 free space? Also, how to get info about real free space on btrfs volume?

 # btrfs-show /dev/sdb1
 Label: opt  uuid: 28a55827-e677-47a9-98d5-d31eb3d71436
    Total devices 1 FS bytes used 167.83GB
    devid    1 size 240.00GB used *229.25GB* path /dev/sdb1

 Btrfs Btrfs v0.19

 # btrfs filesystem df /opt
 Data: total=213.23GB, used=165.26GB
 System, DUP: total=8.00MB, used=40.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=8.00GB, used=2.57GB

 # df -h /opt
 Filesystem            Size  Used Avail Use% Mounted on
 /dev/sdb1             240G  171G   59G  75% /opt

 How come that there is difference detween btrfs-show and df .. 40GB Is the
 space really usead or can I claim it back? (there are no snapshots)

 # btrfs subvolume list /opt
 #


The btrfs-show command is being deprecated.  It's output can be easy
to misunderstand, but it probably won't be corrected since it's going
away at some point.

Basically, what this is telling you is that 229.25GB is committed
(213.23GB Data + 2 x8.00GB Metadata (because it's duplicated) + 2 x
8.00MB System.

However, all the committed space is not being used (which is clearer
in the 'btrfs filesystem df' command).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: real free space on btrfs volume (performance impact)

2012-01-10 Thread cwillu
On Tue, Jan 10, 2012 at 12:40 PM, Mitch Harder
mitch.har...@sabayonlinux.org wrote:
 2012/1/10 Michal Suba michal.s...@pantheon.sk:
 Hello

  we are currently investigating performance issue on system runing above
 btrs filesystem. Is it possible, that performance is impacted by lack of
 free space? Also, how to get info about real free space on btrfs volume?

 # btrfs-show /dev/sdb1
 Label: opt  uuid: 28a55827-e677-47a9-98d5-d31eb3d71436
    Total devices 1 FS bytes used 167.83GB
    devid    1 size 240.00GB used *229.25GB* path /dev/sdb1

 Btrfs Btrfs v0.19

 # btrfs filesystem df /opt
 Data: total=213.23GB, used=165.26GB
 System, DUP: total=8.00MB, used=40.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=8.00GB, used=2.57GB

 # df -h /opt
 Filesystem            Size  Used Avail Use% Mounted on
 /dev/sdb1             240G  171G   59G  75% /opt

 How come that there is difference detween btrfs-show and df .. 40GB Is the
 space really usead or can I claim it back? (there are no snapshots)

 # btrfs subvolume list /opt
 #


 The btrfs-show command is being deprecated.  It's output can be easy
 to misunderstand, but it probably won't be corrected since it's going
 away at some point.

The output of btrfs fi show /dev/whatever is identical, and isn't
going away afaik.  That said, it is easy to misinterpret, although
that's probably unavoidable while still actually presenting that
information.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: release space on error in page_mkwrite

2012-01-10 Thread Josef Bacik
If updating the inode gave us an ENOSPC we were just returning in page_mkwrite,
which is a problem since we make our reservation right before trying to update
the inode, so fix the out label so that we actually free our reservation.
Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/inode.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b0d..90a32f1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6509,8 +6509,8 @@ out_unlock:
if (!ret)
return VM_FAULT_LOCKED;
unlock_page(page);
-   btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE);
 out:
+   btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE);
return ret;
 }
 
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: revert to static snapshot on reboot

2012-01-10 Thread Kai Krakow
Hello!

bt...@spiritvideo.com bt...@spiritvideo.com schrieb:

 The plan that occurs to me is to make a snapshot of the system in the
 state that I want to always boot.  Then, I would rewrite the init
 script in the initrd to (a) delete any old tmp copy of the snapshot;
 (b) copy the static snapshot to a tmp copy; (c) mount the tmp copy.

I'd suggest to create a snapshot during initrd phase, then switch to that 
snapshot as the root. Before creating the new snapshot, first delete all old 
snapshots still there... Something like:

# sda1 = btrfs
mkdir -p /btrfs-prepare
mount /dev/sda1 /btrfs-prepare -o $REAL_ROOT_FLAGS,...
for snapshot in /btrfs-prepare/snapshots/*; do
  btrfs sub del $snapshot
done
snapshot=snapshots/root-$(date +%s)
# original-root has to be a subvolume
btrfs sub snap /btrfs-prepare/original-root /btrfs-prepare/$snapshot
REAL_ROOT=$snapshot
sync
umount /btrfs-prepare
# now let the rest of the initrd switch to the real root
# depending on your initrd system REAL_ROOT needs to be named
# differently: it should result in mount options like
# -o subvol=snapshots/root-123456789,...

This should be much faster than copying stuff around. I'm not sure how btrfs 
behaves when unmounting during the btrfs-cleaner deleting snapshots. It may 
become instable over time. I'm sure the btrfs gurus here can comment on 
this. I used a timestamp on the snapshot names so no naming conflicts occur 
during snapshot deletion and creation. I figured that if deleting and 
recreating the same snapshot name may confuse btrfs after unexpected reboots 
while the btrfs-cleaner was still running.

The above script expects your btrfs layout to be something like that:

$ ls -al /
./
original-root/ # system installation goes here (subvolume)
snapshots/ # normal empty directory
# nothing more

This way you can also use an alternate initrd which does no snapshotting to 
upgrade or reconfigure the system. Or you just chroot into the original root 
and update that.

HTH
Kai

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 1/3] Btrfs: add the Probabilistic Skiplist

2012-01-10 Thread David Sterba
On Tue, Jan 10, 2012 at 03:31:34PM +0800, Liu Bo wrote:
 +static inline int generate_node_level(u64 randseed)
 +{
 + int level = 0;
 +
 + while (randseed  !(randseed  3)) {
 + randseed = 2;
 + level++;
 + }
 +
 + return (level  MAXLEVEL ? MAXLEVEL : level);
 +}

This is counting number of trailing zeros * 2 (except when randseed ==
0), there's a gcc builtin for it __builtin_ctzll and you can turn it in
a loopless inlinable function:

static inline int generate_node_level(u64 randseed)
{
return randseed == 0 ? 0 : __builtin_ctzll(randseed)  1
}

the builtin should be safe on all arches without the need of libgcc
support, there seem to be handcoded asm statements for each arch.

microbenchmarkg of builtin vs while-counter showed 2.3x speedup:

builtin: 1.866529 ns/loop
while:   4.265664 ns/loop
(132 loops, on a generic intel x86_64 box)


and if MAXLEVEL is = 16, then you can generate just 4 random bytes and
compute the level in the same way without any loss.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 1/3] Btrfs: add the Probabilistic Skiplist

2012-01-10 Thread Liu Bo
On 01/11/2012 08:37 AM, David Sterba wrote:
 Hi, a few thoughts and comments below:
 
 On Tue, Jan 10, 2012 at 03:31:34PM +0800, Liu Bo wrote:
 c) The level limit may need to be adjusted.
I know it is a magic number, but now for simplicity we just keep it at 16,
and then each skiplist is able to contain (2^32-1)/3 nodes at most.
 
 (2^32-1)/3 = 1,431,655,765 that's a lot, I wonder what an average member
 count of a skiplist would be and whether eg. maxlevel = 12 is not enough
 (5,592,405 members).
 


hmm, sorry, I found I've made a mistake here,
let me correct it here (changelog will also be updated later):

As I set the probability to 1/4, the members linked on N+1 level list will be 
1/4 of
those linked on N level list.

And what's more, in skiplist a node can be linked on multi levels, eg. a node 
with N+1 level will
also be linked on N level list.

So before the node count reaches to 4^(maxlevel - 1), the skiplist can maintain 
O(lgn),
and after that, it will be no more O(lgn) although we can still insert nodes 
into the skiplist.

That's the difference.


 or you can set the maxlevel during skiplist creation, or predefine a
 small skiplist with compile-time-set level to whatever  16.

 this can be tuned later of course.
 

Yes, I do set the maxlevel to 16 at the creation of a skiplist.

Here 4^(16 - 1) is 2^30, I don't think this is enough for some severe workloads 
which build large
amount of fragments.

Maybe we should make the maxlevel self-update.


 --- /dev/null
 +++ b/fs/btrfs/skiplist.c
 @@ -0,0 +1,98 @@
 +inline int sl_fill_node(struct sl_node *node, int level, gfp_t mask)
 
 I suggest to pick the full prefix skiplist_ instead of just sl_,
 it'll be IMHO more readable and googlable. (Out of curiosity I grepped
 for the sl_ prefix and it's used by drivers/net/slip/slip.c).


I did hesitate for a while between skiplist_ and sl_...
and I just wanna make it be similar to rb_.

Anyway, I'm ok with skiplist_.

 +{
 +struct sl_node **p;
 +struct sl_node **q;
 +int num;
 +
 +BUG_ON(level  MAXLEVEL);
 +
 +num = level + 1;
 +p = kmalloc(sizeof(*p) * num, mask);
 +BUG_ON(!p);
 
 you can drop the BUG_ON
 
 +if (!p)
 ^^^
 
 +return -ENOMEM;
 +q = kmalloc(sizeof(*q) * num, mask);
 +BUG_ON(!q);
 ^^
 

ok, just in case.

 +if (!q) {
 +kfree(p);
 +return -ENOMEM;
 +}
 +
 +node-next = p;
 +node-prev = q;
 +node-level = level;
 +return 0;
 +}
 +
 diff --git a/fs/btrfs/skiplist.h b/fs/btrfs/skiplist.h
 new file mode 100644
 index 000..3e414b5
 --- /dev/null
 +++ b/fs/btrfs/skiplist.h
 +
 +#define MAXLEVEL 16
 +/* double p = 0.25; */
 +
 +struct sl_node {
 +struct sl_node **next;
 +struct sl_node **prev;
 +unsigned int level;
 +unsigned int head:1;
 
 the bitfield will use another sizeof(int) bytes, but the level is at
 most 16, you can reduce it's size eg to unsigned short.
 on the other hand, the structure has to start at address aligned to
 sizeof(void*) and the bytes after 'head' up to next sizeof(void*)
 boundary will be left unusable anyway. then, 'head' could be a full int
 or bool so the compiler is not restricted and forced to keep state of
 the single bit. if access to these items is exptected to be frequent,
 the diffenence could be mesurable.
 

I see.  Thanks a lot for your advice!

thanks,
liubo

 +};
 
 
 david
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: revert to static snapshot on reboot

2012-01-10 Thread Anand Jain



 Upcoming btrfs autosnap feature might help your problem-solution.
 But the main part in your case which is to replace the root
 with its snapshot is something beyond the scope of autosnap
 project.

 What is being developed is a set of btrfs-prog sub-command to
 create and manage snapshots with a rule-set. Code is under
 development, if you would like to test and provide feedback
 I can send you a copy this week.

 OR if you want to just know the new feature relevant to you
 its as below (not a complete features list though).

 - Create snapshot automatically based on
- AD-hoc (package-installation/boot ..etc)
  
cli eg:

# btrfs autosnap enable tag retain-policy subvol

and the cli that a init or package script should call is
# btrfs autosnap now -t tag /btrfs
which will create a snapshot and reviews its retention policy.

  retention policy can be based on count, based on FS % full, OR
  manually maintained snapshots.


 If you have any feedback pls let me know.

thanks, Anand

On Monday 09,January,2012 02:43 PM, bt...@spiritvideo.com wrote:

Hi all --

I just installed my first btrfs-based linux tonight, and I must say it
gives me a very warm feeling!  Congratulations on all your hard work
and your fine product.

I administer laptops for a small school, and we want to implement what
Deep Freeze (http://www.faronics.com/enterprise/deep-freeze) does for
Windows -- no matter what a student does after they log in, when they
reboot it is all forgotten and the computer has returned to a standard
state.

I would think this would be a FAQ, but I have searched the web and
mailing list for the past couple of hours.

Of course it's easy to mount a snapshot, but then if students make
changes the snapshot changes.

The plan that occurs to me is to make a snapshot of the system in the
state that I want to always boot.  Then, I would rewrite the init
script in the initrd to (a) delete any old tmp copy of the snapshot;
(b) copy the static snapshot to a tmp copy; (c) mount the tmp copy.

That's a little harder than I was hoping to work -- is there an easier
way to get this functionality?

I have a small ext4 boot partition containing grub, vmlinuz and
initramfs.  Everything else is in a big btrfs root partition.  I am
running Fedora 14, with Fedora-patched linux 2.6.35.  I could upgrade
if necessary.

Thanks,
Bob
--
I blog about my work at the school at SmallSchoolIT.wordpress.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/11] Btrfs: simplfy calculation of stripe length for discard operation

2012-01-10 Thread Li Zefan
For btrfs raid, while discarding a range of space, we'll need to know
the start offset and length to discard for each device, and it's done
in btrfs_map_block().

However the calculation is a bit complex for raid0 and raid10, so I
reimplement it based on a fact that:

dev1  dev2   dev3(raid0)
---
s0 s3 s6  s1 s4 s7   s2 s5

Each device has (total_stripes / nr_dev) stripes, or plus one.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/volumes.c |   95 +---
 1 files changed, 31 insertions(+), 64 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 540fdd2..563ef65 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3024,80 +3024,47 @@ static int __btrfs_map_block(struct btrfs_mapping_tree 
*map_tree, int rw,
atomic_set(bbio-error, 0);
 
if (rw  REQ_DISCARD) {
+   int factor = 0;
+   int sub_stripes = 0;
+   u64 stripes_per_dev = 0;
+   u32 remaining_stripes = 0;
+
+   if (map-type 
+   (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10)) {
+   if (map-type  BTRFS_BLOCK_GROUP_RAID0)
+   sub_stripes = 1;
+   else
+   sub_stripes = map-sub_stripes;
+
+   factor = map-num_stripes / sub_stripes;
+   stripes_per_dev = div_u64_rem(stripe_nr_end -
+ stripe_nr_orig,
+ factor,
+ remaining_stripes);
+   }
+
for (i = 0; i  num_stripes; i++) {
bbio-stripes[i].physical =
map-stripes[stripe_index].physical +
stripe_offset + stripe_nr * map-stripe_len;
bbio-stripes[i].dev = map-stripes[stripe_index].dev;
 
-   if (map-type  BTRFS_BLOCK_GROUP_RAID0) {
-   u64 stripes;
-   u32 last_stripe = 0;
-   int j;
-
-   div_u64_rem(stripe_nr_end - 1,
-   map-num_stripes,
-   last_stripe);
-
-   for (j = 0; j  map-num_stripes; j++) {
-   u32 test;
-
-   div_u64_rem(stripe_nr_end - 1 - j,
-   map-num_stripes, test);
-   if (test == stripe_index)
-   break;
-   }
-   stripes = stripe_nr_end - 1 - j;
-   do_div(stripes, map-num_stripes);
-   bbio-stripes[i].length = map-stripe_len *
-   (stripes - stripe_nr + 1);
-
-   if (i == 0) {
+   if (map-type  (BTRFS_BLOCK_GROUP_RAID0 |
+BTRFS_BLOCK_GROUP_RAID10)) {
+   bbio-stripes[i].length = stripes_per_dev *
+ map-stripe_len;
+   if (i / sub_stripes  remaining_stripes)
+   bbio-stripes[i].length +=
+   map-stripe_len;
+   if (i  sub_stripes)
bbio-stripes[i].length -=
stripe_offset;
-   stripe_offset = 0;
-   }
-   if (stripe_index == last_stripe)
-   bbio-stripes[i].length -=
-   stripe_end_offset;
-   } else if (map-type  BTRFS_BLOCK_GROUP_RAID10) {
-   u64 stripes;
-   int j;
-   int factor = map-num_stripes /
-map-sub_stripes;
-   u32 last_stripe = 0;
-
-   div_u64_rem(stripe_nr_end - 1,
-   factor, last_stripe);
-   last_stripe *= map-sub_stripes;
-
-   for (j = 0; j  factor; j++) {
-   u32 test;
-
-   div_u64_rem(stripe_nr_end - 1 - j,
-   factor, test);
-
- 

[PATCH 09/11][RESEND] Btrfs: rewrite btrfs_trim_block_group()

2012-01-10 Thread Li Zefan
There are various bugs in block group trimming:

- It may trim from offset smaller than user-specified offset.
- It may trim beyond user-specified range.
- It may leak free space for extents smaller than specified minlen.
- It may truncate the last trimmed extent thus leak free space.
- With mixed extents+bitmaps, some extents may not be trimmed.
- With mixed extents+bitmaps, some bitmaps may not be trimmed (even
none will be trimmed). Even for those trimmed, not all the free space
in the bitmaps will be trimmed.

I rewrite btrfs_trim_block_group() and break it into two functions.
One is to trim extents only, and the other is to trim bitmaps only.

Before patching:

# fstrim -v /mnt/
/mnt/: 1496465408 bytes were trimmed

After patching:

# fstrim -v /mnt/
/mnt/: 2193768448 bytes were trimmed

And this matches the total free space:

# btrfs fi df /mnt
Data: total=3.58GB, used=1.79GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=205.12MB, used=97.14MB
Metadata: total=8.00MB, used=0.00

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/free-space-cache.c |  235 ++-
 1 files changed, 164 insertions(+), 71 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index e4eb222..b3cbb89 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2594,17 +2594,57 @@ void btrfs_init_free_cluster(struct btrfs_free_cluster 
*cluster)
cluster-block_group = NULL;
 }
 
-int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group,
-  u64 *trimmed, u64 start, u64 end, u64 minlen)
+static int do_trimming(struct btrfs_block_group_cache *block_group,
+  u64 *total_trimmed, u64 start, u64 bytes,
+  u64 reserved_start, u64 reserved_bytes)
 {
-   struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
-   struct btrfs_free_space *entry = NULL;
+   struct btrfs_space_info *space_info = block_group-space_info;
struct btrfs_fs_info *fs_info = block_group-fs_info;
-   u64 bytes = 0;
-   u64 actually_trimmed;
-   int ret = 0;
+   int ret;
+   int update = 0;
+   u64 trimmed = 0;
 
-   *trimmed = 0;
+   spin_lock(space_info-lock);
+   spin_lock(block_group-lock);
+   if (!block_group-ro) {
+   block_group-reserved += reserved_bytes;
+   space_info-bytes_reserved += reserved_bytes;
+   update = 1;
+   }
+   spin_unlock(block_group-lock);
+   spin_unlock(space_info-lock);
+
+   ret = btrfs_error_discard_extent(fs_info-extent_root,
+start, bytes, trimmed);
+   if (!ret)
+   *total_trimmed += trimmed;
+
+   btrfs_add_free_space(block_group, reserved_start, reserved_bytes);
+
+   if (update) {
+   spin_lock(space_info-lock);
+   spin_lock(block_group-lock);
+   if (block_group-ro)
+   space_info-bytes_readonly += reserved_bytes;
+   block_group-reserved -= reserved_bytes;
+   space_info-bytes_reserved -= reserved_bytes;
+   spin_unlock(space_info-lock);
+   spin_unlock(block_group-lock);
+   }
+
+   return ret;
+}
+
+static int trim_no_bitmap(struct btrfs_block_group_cache *block_group,
+ u64 *total_trimmed, u64 start, u64 end, u64 minlen)
+{
+   struct btrfs_free_space_ctl *ctl = block_group-free_space_ctl;
+   struct btrfs_free_space *entry;
+   struct rb_node *node;
+   int ret = 0;
+   u64 extent_start;
+   u64 extent_bytes;
+   u64 bytes;
 
while (start  end) {
spin_lock(ctl-tree_lock);
@@ -2615,81 +2655,118 @@ int btrfs_trim_block_group(struct 
btrfs_block_group_cache *block_group,
}
 
entry = tree_search_offset(ctl, start, 0, 1);
-   if (!entry)
-   entry = tree_search_offset(ctl,
-  offset_to_bitmap(ctl, start),
-  1, 1);
-
-   if (!entry || entry-offset = end) {
+   if (!entry) {
spin_unlock(ctl-tree_lock);
break;
}
 
-   if (entry-bitmap) {
-   ret = search_bitmap(ctl, entry, start, bytes);
-   if (!ret) {
-   if (start = end) {
-   spin_unlock(ctl-tree_lock);
-   break;
-   }
-   bytes = min(bytes, end - start);
-   bitmap_clear_bits(ctl, entry, start, bytes);
-  

[PATCH 10/11] Btrfs: update global block_rsv when creating a new block group

2012-01-10 Thread Li Zefan
A bug was triggered while using seed device:

# mkfs.btrfs /dev/loop1
# btrfstune -S 1 /dev/loop1
# mount -o /dev/loop1 /mnt
# btrfs dev add /dev/loop2 /mnt

btrfs: block rsv returned -28
[ cut here ]
WARNING: at fs/btrfs/extent-tree.c:5969 btrfs_alloc_free_block+0x166/0x396 
[btrfs]()
...
Call Trace:
...
[f7b7c31c] btrfs_cow_block+0x101/0x147 [btrfs]
[f7b7eaa6] btrfs_search_slot+0x1b8/0x55f [btrfs]
[f7b7f844] btrfs_insert_empty_items+0x42/0x7f [btrfs]
[f7b7f8c1] btrfs_insert_item+0x40/0x7e [btrfs]
[f7b8ac02] btrfs_make_block_group+0x243/0x2aa [btrfs]
[f7bb3f53] __btrfs_alloc_chunk+0x672/0x70e [btrfs]
[f7bb41ff] init_first_rw_device+0x77/0x13c [btrfs]
[f7bb5a62] btrfs_init_new_device+0x664/0x9fd [btrfs]
[f7bbb65a] btrfs_ioctl+0x694/0xdbe [btrfs]
[c04f55f7] do_vfs_ioctl+0x496/0x4cc
[c04f5660] sys_ioctl+0x33/0x4f
[c07b9edf] sysenter_do_call+0x12/0x38
---[ end trace 906adac595facc7d ]---

Since seed device is readonly, there's no usable space in the filesystem.
Afterwards we add a sprout device to it, and the kernel creates a METADATA
block group and a SYSTEM block group where comes free space we can reserve,
but we still get revervation failure because the global block_rsv hasn't
been updated accordingly.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5b53479..bf30f67 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7446,6 +7446,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle 
*trans,
ret = update_space_info(root-fs_info, cache-flags, size, bytes_used,
cache-space_info);
BUG_ON(ret);
+   update_global_block_rsv(root-fs_info);
 
spin_lock(cache-space_info-lock);
cache-space_info-bytes_readonly += cache-bytes_super;
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/11] Btrfs: fix possible deadlock when opening a seed device

2012-01-10 Thread Li Zefan
The correct lock order is uuid_mutex - volume_mutex - chunk_mutex,
but when we mount a filesystem which has backing seed devices, we have
this lock chain:

open_ctree()
lock(chunk_mutex);
read_chunk_tree();
read_one_dev();
open_seed_devices();
lock(uuid_mutex);

and then we hit a lockdep splat.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c |2 --
 fs/btrfs/volumes.c |9 +++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3f9d555..858ab34 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2270,9 +2270,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
   (unsigned long)btrfs_header_chunk_tree_uuid(chunk_root-node),
   BTRFS_UUID_SIZE);
 
-   mutex_lock(fs_info-chunk_mutex);
ret = btrfs_read_chunk_tree(chunk_root);
-   mutex_unlock(fs_info-chunk_mutex);
if (ret) {
printk(KERN_WARNING btrfs: failed to read chunk tree on %s\n,
   sb-s_id);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 563ef65..fbb493b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3506,7 +3506,7 @@ static int open_seed_devices(struct btrfs_root *root, u8 
*fsid)
struct btrfs_fs_devices *fs_devices;
int ret;
 
-   mutex_lock(uuid_mutex);
+   BUG_ON(!mutex_is_locked(uuid_mutex));
 
fs_devices = root-fs_info-fs_devices-seed;
while (fs_devices) {
@@ -3544,7 +3544,6 @@ static int open_seed_devices(struct btrfs_root *root, u8 
*fsid)
fs_devices-seed = root-fs_info-fs_devices-seed;
root-fs_info-fs_devices-seed = fs_devices;
 out:
-   mutex_unlock(uuid_mutex);
return ret;
 }
 
@@ -3687,6 +3686,9 @@ int btrfs_read_chunk_tree(struct btrfs_root *root)
if (!path)
return -ENOMEM;
 
+   mutex_lock(uuid_mutex);
+   lock_chunks(root);
+
/* first we search for all of the device items, and then we
 * read in all of the chunk items.  This way we can create chunk
 * mappings that reference all of the devices that are afound
@@ -3737,6 +3739,9 @@ again:
}
ret = 0;
 error:
+   unlock_chunks(root);
+   mutex_unlock(uuid_mutex);
+
btrfs_free_path(path);
return ret;
 }
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/11] Btrfs: reserve metadata space in btrfs_ioctl_setflags()

2012-01-10 Thread Li Zefan
Check and reserve space for btrfs_update_inode().

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9619fb0..fe8a60c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -254,7 +254,7 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
ip-flags = ~(BTRFS_INODE_COMPRESS | BTRFS_INODE_NOCOMPRESS);
}
 
-   trans = btrfs_join_transaction(root);
+   trans = btrfs_start_transaction(root, 1);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
goto out_drop;
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/11] Btrfs: check the return value of io_ctl_init()

2012-01-10 Thread Li Zefan
It can return -ENOMEM.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/free-space-cache.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 4e55af3..e4eb222 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -637,7 +637,10 @@ int __load_free_space_cache(struct btrfs_root *root, 
struct inode *inode,
if (!num_entries)
return 0;
 
-   io_ctl_init(io_ctl, inode, root);
+   ret = io_ctl_init(io_ctl, inode, root);
+   if (ret)
+   return ret;
+
ret = readahead_cache(inode);
if (ret)
goto out;
@@ -851,7 +854,9 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct 
inode *inode,
if (!i_size_read(inode))
return -1;
 
-   io_ctl_init(io_ctl, inode, root);
+   ret = io_ctl_init(io_ctl, inode, root);
+   if (ret)
+   return -1;
 
/* Get the cluster for this block_group if it exists */
if (block_group  !list_empty(block_group-cluster_list))
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/11] Btrfs: some patches for 3.3

2012-01-10 Thread Li Zefan
The biggest one is a fix for fstrim, and there's a fix for on-disk
free space cache. Others are small fixes and cleanups.

The last three have been sent weeks ago.

The patchset is also available in this repo:

git://repo.or.cz/linux-btrfs-devel.git for-chris

Note there's a small confict with Al Viro's vfs changes.

Li Zefan (11):
  Btrfs: add pinned extents to on-disk free space cache correctly
  Btrfs: avoid possible NULL deref in io_ctl_drop_pages()
  Btrfs: check the return value of io_ctl_init()
  Btrfs: remove BUG_ON()s in btrfs_ioctl_setflags()
  Btrfs: reserve metadata space in btrfs_ioctl_setflags()
  Btrfs: don't pass a trans handle unnecessarily in volumes.c
  Btrfs: don't pre-allocate btrfs bio
  Btrfs: simplfy calculation of stripe length for discard operation
  Btrfs: rewrite btrfs_trim_block_group()
  Btrfs: update global block_rsv when creating a new block group
  Btrfs: fix possible deadlock when opening a seed device

 fs/btrfs/disk-io.c  |2 -
 fs/btrfs/extent-tree.c  |3 +-
 fs/btrfs/free-space-cache.c |  293 +--
 fs/btrfs/ioctl.c|   20 +++-
 fs/btrfs/volumes.c  |  189 ++--
 fs/btrfs/volumes.h  |3 +-
 6 files changed, 280 insertions(+), 230 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/11] Btrfs: avoid possible NULL deref in io_ctl_drop_pages()

2012-01-10 Thread Li Zefan
If we run into some failure path in io_ctl_prepare_pages(),
io_ctl-pages[] array may have some NULL pointers.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/free-space-cache.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 01840ef..4e55af3 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -319,9 +319,11 @@ static void io_ctl_drop_pages(struct io_ctl *io_ctl)
io_ctl_unmap_page(io_ctl);
 
for (i = 0; i  io_ctl-num_pages; i++) {
-   ClearPageChecked(io_ctl-pages[i]);
-   unlock_page(io_ctl-pages[i]);
-   page_cache_release(io_ctl-pages[i]);
+   if (io_ctl-pages[i]) {
+   ClearPageChecked(io_ctl-pages[i]);
+   unlock_page(io_ctl-pages[i]);
+   page_cache_release(io_ctl-pages[i]);
+   }
}
 }
 
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/11] Btrfs: don't pass a trans handle unnecessarily in volumes.c

2012-01-10 Thread Li Zefan
Some functions never use the transaction handle passed to them.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c |2 +-
 fs/btrfs/volumes.c |   18 +++---
 fs/btrfs/volumes.h |3 +--
 3 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8603ee4..5b53479 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7084,7 +7084,7 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
bytenr)
 * space to fit our block group in.
 */
if (device-total_bytes  device-bytes_used + min_free) {
-   ret = find_free_dev_extent(NULL, device, min_free,
+   ret = find_free_dev_extent(device, min_free,
   dev_offset, NULL);
if (!ret)
dev_nr++;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f4b839f..73f673c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -829,7 +829,6 @@ out:
 
 /*
  * find_free_dev_extent - find free space in the specified device
- * @trans: transaction handler
  * @device:the device which we search the free space in
  * @num_bytes: the size of the free space that we need
  * @start: store the start of the free space.
@@ -848,8 +847,7 @@ out:
  * But if we don't find suitable free space, it is used to store the size of
  * the max free space.
  */
-int find_free_dev_extent(struct btrfs_trans_handle *trans,
-struct btrfs_device *device, u64 num_bytes,
+int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
 u64 *start, u64 *len)
 {
struct btrfs_key key;
@@ -893,7 +891,7 @@ int find_free_dev_extent(struct btrfs_trans_handle *trans,
key.offset = search_start;
key.type = BTRFS_DEV_EXTENT_KEY;
 
-   ret = btrfs_search_slot(trans, root, key, path, 0, 0);
+   ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
if (ret  0)
goto out;
if (ret  0) {
@@ -1469,8 +1467,7 @@ error_undo:
 /*
  * does all the dirty work required for changing file system's UUID.
  */
-static int btrfs_prepare_sprout(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root)
+static int btrfs_prepare_sprout(struct btrfs_root *root)
 {
struct btrfs_fs_devices *fs_devices = root-fs_info-fs_devices;
struct btrfs_fs_devices *old_devices;
@@ -1695,7 +1692,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
 
if (seeding_dev) {
sb-s_flags = ~MS_RDONLY;
-   ret = btrfs_prepare_sprout(trans, root);
+   ret = btrfs_prepare_sprout(root);
BUG_ON(ret);
}
 
@@ -2323,8 +2320,7 @@ done:
return ret;
 }
 
-static int btrfs_add_system_chunk(struct btrfs_trans_handle *trans,
-  struct btrfs_root *root,
+static int btrfs_add_system_chunk(struct btrfs_root *root,
   struct btrfs_key *key,
   struct btrfs_chunk *chunk, int item_size)
 {
@@ -2496,7 +2492,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
if (total_avail == 0)
continue;
 
-   ret = find_free_dev_extent(trans, device,
+   ret = find_free_dev_extent(device,
   max_stripe_size * dev_stripes,
   dev_offset, max_avail);
if (ret  ret != -ENOSPC)
@@ -2687,7 +2683,7 @@ static int __finish_chunk_alloc(struct btrfs_trans_handle 
*trans,
BUG_ON(ret);
 
if (map-type  BTRFS_BLOCK_GROUP_SYSTEM) {
-   ret = btrfs_add_system_chunk(trans, chunk_root, key, chunk,
+   ret = btrfs_add_system_chunk(chunk_root, key, chunk,
 item_size);
BUG_ON(ret);
}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 78f2d4d..c1701ec 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -230,7 +230,6 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 
new_size);
 int btrfs_init_new_device(struct btrfs_root *root, char *path);
 int btrfs_balance(struct btrfs_root *dev_root);
 int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset);
-int find_free_dev_extent(struct btrfs_trans_handle *trans,
-struct btrfs_device *device, u64 num_bytes,
+int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
 u64 *start, u64 *max_avail);
 #endif
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/11] Btrfs: add pinned extents to on-disk free space cache correctly

2012-01-10 Thread Li Zefan
I got this while running xfstests:

[24256.836098] block group 317849600 has an wrong amount of free space
[24256.836100] btrfs: failed to load free space cache for block group 317849600

We should clamp the extent returned by find_first_extent_bit(),
so the start of the extent won't smaller than the start of the
block group.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/free-space-cache.c |   41 -
 1 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index ec23d43..01840ef 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -838,7 +838,7 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct 
inode *inode,
struct io_ctl io_ctl;
struct list_head bitmap_list;
struct btrfs_key key;
-   u64 start, end, len;
+   u64 start, extent_start, extent_end, len;
int entries = 0;
int bitmaps = 0;
int ret;
@@ -857,25 +857,12 @@ int __btrfs_write_out_cache(struct btrfs_root *root, 
struct inode *inode,
 struct btrfs_free_cluster,
 block_group_list);
 
-   /*
-* We shouldn't have switched the pinned extents yet so this is the
-* right one
-*/
-   unpin = root-fs_info-pinned_extents;
-
/* Lock all pages first so we can lock the extent safely. */
io_ctl_prepare_pages(io_ctl, inode, 0);
 
lock_extent_bits(BTRFS_I(inode)-io_tree, 0, i_size_read(inode) - 1,
 0, cached_state, GFP_NOFS);
 
-   /*
-* When searching for pinned extents, we need to start at our start
-* offset.
-*/
-   if (block_group)
-   start = block_group-key.objectid;
-
node = rb_first(ctl-free_space_offset);
if (!node  cluster) {
node = rb_first(cluster-root);
@@ -918,9 +905,20 @@ int __btrfs_write_out_cache(struct btrfs_root *root, 
struct inode *inode,
 * We want to add any pinned extents to our free space cache
 * so we don't leak the space
 */
+
+   /*
+* We shouldn't have switched the pinned extents yet so this is the
+* right one
+*/
+   unpin = root-fs_info-pinned_extents;
+
+   if (block_group)
+   start = block_group-key.objectid;
+
while (block_group  (start  block_group-key.objectid +
   block_group-key.offset)) {
-   ret = find_first_extent_bit(unpin, start, start, end,
+   ret = find_first_extent_bit(unpin, start,
+   extent_start, extent_end,
EXTENT_DIRTY);
if (ret) {
ret = 0;
@@ -928,20 +926,21 @@ int __btrfs_write_out_cache(struct btrfs_root *root, 
struct inode *inode,
}
 
/* This pinned extent is out of our range */
-   if (start = block_group-key.objectid +
+   if (extent_start = block_group-key.objectid +
block_group-key.offset)
break;
 
-   len = block_group-key.objectid +
-   block_group-key.offset - start;
-   len = min(len, end + 1 - start);
+   extent_start = max(extent_start, start);
+   extent_end = min(block_group-key.objectid +
+block_group-key.offset, extent_end + 1);
+   len = extent_end - extent_start;
 
entries++;
-   ret = io_ctl_add_entry(io_ctl, start, len, NULL);
+   ret = io_ctl_add_entry(io_ctl, extent_start, len, NULL);
if (ret)
goto out_nospc;
 
-   start = end + 1;
+   start = extent_end;
}
 
/* Write out the bitmaps */
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/11] Btrfs: don't pre-allocate btrfs bio

2012-01-10 Thread Li Zefan
We pre-allocate a btrfs bio with fixed size, and then may re-allocate
memory if we find stripes are bigger than the fixed size. But this
pre-allocation is not necessary.

Also we don't have to calcuate the stripe number twice.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/volumes.c |   67 ---
 1 files changed, 21 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 73f673c..540fdd2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2897,26 +2897,13 @@ static int __btrfs_map_block(struct btrfs_mapping_tree 
*map_tree, int rw,
u64 stripe_nr;
u64 stripe_nr_orig;
u64 stripe_nr_end;
-   int stripes_allocated = 8;
-   int stripes_required = 1;
int stripe_index;
int i;
+   int ret = 0;
int num_stripes;
int max_errors = 0;
struct btrfs_bio *bbio = NULL;
 
-   if (bbio_ret  !(rw  (REQ_WRITE | REQ_DISCARD)))
-   stripes_allocated = 1;
-again:
-   if (bbio_ret) {
-   bbio = kzalloc(btrfs_bio_size(stripes_allocated),
-   GFP_NOFS);
-   if (!bbio)
-   return -ENOMEM;
-
-   atomic_set(bbio-error, 0);
-   }
-
read_lock(em_tree-lock);
em = lookup_extent_mapping(em_tree, logical, *length);
read_unlock(em_tree-lock);
@@ -2935,32 +2922,6 @@ again:
if (mirror_num  map-num_stripes)
mirror_num = 0;
 
-   /* if our btrfs_bio struct is too small, back off and try again */
-   if (rw  REQ_WRITE) {
-   if (map-type  (BTRFS_BLOCK_GROUP_RAID1 |
-BTRFS_BLOCK_GROUP_DUP)) {
-   stripes_required = map-num_stripes;
-   max_errors = 1;
-   } else if (map-type  BTRFS_BLOCK_GROUP_RAID10) {
-   stripes_required = map-sub_stripes;
-   max_errors = 1;
-   }
-   }
-   if (rw  REQ_DISCARD) {
-   if (map-type  (BTRFS_BLOCK_GROUP_RAID0 |
-BTRFS_BLOCK_GROUP_RAID1 |
-BTRFS_BLOCK_GROUP_DUP |
-BTRFS_BLOCK_GROUP_RAID10)) {
-   stripes_required = map-num_stripes;
-   }
-   }
-   if (bbio_ret  (rw  (REQ_WRITE | REQ_DISCARD)) 
-   stripes_allocated  stripes_required) {
-   stripes_allocated = map-num_stripes;
-   free_extent_map(em);
-   kfree(bbio);
-   goto again;
-   }
stripe_nr = offset;
/*
 * stripe_nr counts the total number of stripes we have to stride
@@ -3055,6 +3016,13 @@ again:
}
BUG_ON(stripe_index = map-num_stripes);
 
+   bbio = kzalloc(btrfs_bio_size(num_stripes), GFP_NOFS);
+   if (!bbio) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   atomic_set(bbio-error, 0);
+
if (rw  REQ_DISCARD) {
for (i = 0; i  num_stripes; i++) {
bbio-stripes[i].physical =
@@ -3151,15 +3119,22 @@ again:
stripe_index++;
}
}
-   if (bbio_ret) {
-   *bbio_ret = bbio;
-   bbio-num_stripes = num_stripes;
-   bbio-max_errors = max_errors;
-   bbio-mirror_num = mirror_num;
+
+   if (rw  REQ_WRITE) {
+   if (map-type  (BTRFS_BLOCK_GROUP_RAID1 |
+BTRFS_BLOCK_GROUP_RAID10 |
+BTRFS_BLOCK_GROUP_DUP)) {
+   max_errors = 1;
+   }
}
+
+   *bbio_ret = bbio;
+   bbio-num_stripes = num_stripes;
+   bbio-max_errors = max_errors;
+   bbio-mirror_num = mirror_num;
 out:
free_extent_map(em);
-   return 0;
+   return ret;
 }
 
 int btrfs_map_block(struct btrfs_mapping_tree *map_tree, int rw,
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/11] Btrfs: remove BUG_ON()s in btrfs_ioctl_setflags()

2012-01-10 Thread Li Zefan
We can recover from errors and return -errno to user space.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c |   18 ++
 1 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index c04f02c..9619fb0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -176,6 +176,8 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
struct btrfs_trans_handle *trans;
unsigned int flags, oldflags;
int ret;
+   u64 ip_oldflags;
+   unsigned int i_oldflags;
 
if (btrfs_root_readonly(root))
return -EROFS;
@@ -192,6 +194,9 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
 
mutex_lock(inode-i_mutex);
 
+   ip_oldflags = ip-flags;
+   i_oldflags = inode-i_flags;
+
flags = btrfs_mask_flags(inode-i_mode, flags);
oldflags = btrfs_flags_to_ioctl(ip-flags);
if ((flags ^ oldflags)  (FS_APPEND_FL | FS_IMMUTABLE_FL)) {
@@ -250,18 +255,23 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
}
 
trans = btrfs_join_transaction(root);
-   BUG_ON(IS_ERR(trans));
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out_drop;
+   }
 
btrfs_update_iflags(inode);
inode-i_ctime = CURRENT_TIME;
ret = btrfs_update_inode(trans, root, inode);
-   BUG_ON(ret);
 
btrfs_end_transaction(trans, root);
+ out_drop:
+   if (ret) {
+   ip-flags = ip_oldflags;
+   inode-i_flags = i_oldflags;
+   }
 
mnt_drop_write(file-f_path.mnt);
-
-   ret = 0;
  out_unlock:
mutex_unlock(inode-i_mutex);
return ret;
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html