Re: Warning and BUG with btrfs and corrupted image

2009-01-21 Thread Phillip Lougher
On Tue, Jan 20, 2009 at 4:51 PM, Eric Sesterhenn  wrote:

> I already tested squashfs. One issue is basically a problem with
> the zlib-api for which i just posted a patch here
> http://marc.info/?t=12321280733&r=1&w=2
>

Thanks for testing Squashfs.  I've not ignored your emails, but I've
been busy job hunting, and so have not had time to look into this
until now.

I hardened Squashfs against fsfuzzer back in November 2006 (remember
the month of kernel bugs, or MOKB, which highlighted a number of
issues with Squashfs).  Your testing has thrown up a regression that I
inadvertently  put in last month!

> The other is an overwritten redzone (also reported in this thread
> http://marc.info/?l=linux-fsdevel&m=123212794425497&w=2)
> Looks like a length parameter is passed to squashfs_read_data
> which is bigger than ((msblk->block_size >> msblk->devblksize_log2) +
> 1), so the kcalloced buffer gets overwritten later.

As part of the mainlining effort I changed Squashfs to allocate
buffers in 4K page sizes rather than use vmalloced large buffers.   As
far as zlib goes, it means zlib_inflate now decompresses into a
sequence of 4K buffers rather than one large buffer.   What this means
is zlib_inflate is called repeatedly moving to the next 4K page
whenever zlib_inflate asks for another buffer (stream.avail_out == 0).

Your testing have thrown up the case where zlib_inflate is asking for
too many output buffers, i.e. it has returned with Z_OK,
stream.avail_in !=0 (more input data to be processed), and
stream.avail_out == 0 (I'd like another output buffer).  but it has
consumed all the output buffers.  This isn't checked (the code assumes
zlib will do the right thing on corrupt data and bomb out).  My guess
is either zlib_inflate is getting confused with corrupt data, or
fsfuzzer gets lucky sometimes and corrupts the filesystem to point to
another valid but larger compressed block (i.e. in your test
filesystems the 4K datablock is being corrupted to point to an 8K
metadata block).

This ultimately leads to an oops in zlib_inflate where it has been
passed a bogus or NULL steam.next_out pointer.

I'll create a patch and send it to you if you're happy to test it.

Phillip
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] ioctl to fetch csums of file extents

2009-01-21 Thread Yehuda Sadeh Weinraub
>
> So, I'd like to take this patch, but there is line wrapping and other
> whitespace problems.  Could you please send it with a mail client that
> doesn't mangle? ;)
>
> (Attaching it is fine too, at least on this list)


Attached both patches.

Thanks,
Yehuda
From d7768ad789685150eb5d6fc94f1133fa6089bbeb Mon Sep 17 00:00:00 2001
From: Yehuda Sadeh 
Date: Fri, 19 Dec 2008 12:56:56 -0800
Subject: [PATCH 1/1] btrfs fiemap support


Signed-off-by: Yehuda Sadeh 
---
 fs/btrfs/extent_io.c |   84 ++
 fs/btrfs/extent_io.h |2 +
 fs/btrfs/inode.c |7 
 3 files changed, 93 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 25ce2d1..a27bb39 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2900,6 +2900,89 @@ out:
 	return sector;
 }
 
+int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
+		__u64 start, __u64 len, get_extent_t *get_extent)
+{
+	int ret;
+	u64 off = start;
+	u64 max = start + len;
+	u32 flags = 0;
+	u64 disko = 0;
+	struct extent_map *em = NULL;
+	int end = 0;
+	u64 em_start = 0, em_len = 0;
+	ret = 0;
+
+	lock_extent(&BTRFS_I(inode)->io_tree, start, start + len,
+		GFP_NOFS);
+	em = get_extent(inode, NULL, 0, off, max - off, 0);
+	if (!em)
+		goto out;
+	if (IS_ERR(em)) {
+		ret = PTR_ERR(em);
+		goto out;
+	}
+	while (!end) {
+		off = em->start + em->len;
+
+		em_start = em->start;
+		em_len = em->len;
+
+		disko = 0;
+		flags = 0;
+
+		switch (em->block_start) {
+		case EXTENT_MAP_LAST_BYTE:
+			end = 1;
+			flags |= FIEMAP_EXTENT_LAST;
+			break;
+		case EXTENT_MAP_HOLE:
+			flags |= FIEMAP_EXTENT_UNWRITTEN;
+			break;
+		case EXTENT_MAP_INLINE:
+			flags |= (FIEMAP_EXTENT_DATA_INLINE |
+  FIEMAP_EXTENT_NOT_ALIGNED);
+			break;
+		case EXTENT_MAP_DELALLOC:
+			flags |= (FIEMAP_EXTENT_DELALLOC |
+  FIEMAP_EXTENT_UNKNOWN);
+			break;
+		default:
+			disko = em->block_start;
+			break;
+		}
+		if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags))
+			flags |= FIEMAP_EXTENT_ENCODED;
+
+		free_extent_map(em);
+
+		if (!end) {
+			em = get_extent(inode, NULL, 0, off, max - off, 0);
+			if (!em)
+goto out;
+			if (IS_ERR(em)) {
+ret = PTR_ERR(em);
+goto out;
+			}
+		}
+		if (test_bit(EXTENT_FLAG_VACANCY, &em->flags)) {
+			flags |= FIEMAP_EXTENT_LAST;
+			end = 1;
+		}
+
+		ret = fiemap_fill_next_extent(fieinfo, em_start, disko,
+	em_len, flags);
+		if (ret)
+			goto out_free;
+	}
+out_free:
+	free_extent_map(em);
+out:
+	unlock_extent(&BTRFS_I(inode)->io_tree, start, start + len,
+			GFP_NOFS);
+	return ret;
+}
+
 static inline struct page *extent_buffer_page(struct extent_buffer *eb,
 	  unsigned long i)
 {
@@ -3786,3 +3869,4 @@ out:
 	return ret;
 }
 EXPORT_SYMBOL(try_release_extent_buffer);
+
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index c5b483a..e80c6d9 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -193,6 +193,8 @@ int extent_commit_write(struct extent_io_tree *tree,
 			unsigned from, unsigned to);
 sector_t extent_bmap(struct address_space *mapping, sector_t iblock,
 		get_extent_t *get_extent);
+int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
+		__u64 start, __u64 len, get_extent_t *get_extent);
 int set_range_dirty(struct extent_io_tree *tree, u64 start, u64 end);
 int set_state_private(struct extent_io_tree *tree, u64 start, u64 private);
 int get_state_private(struct extent_io_tree *tree, u64 start, u64 *private);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 068bad4..ff182be 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4171,6 +4171,12 @@ static sector_t btrfs_bmap(struct address_space *mapping, sector_t iblock)
 	return extent_bmap(mapping, iblock, btrfs_get_extent);
 }
 
+static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
+		__u64 start, __u64 len)
+{
+	return extent_fiemap(inode, fieinfo, start, len, btrfs_get_extent);
+}
+
 int btrfs_readpage(struct file *file, struct page *page)
 {
 	struct extent_io_tree *tree;
@@ -5023,6 +5029,7 @@ static struct inode_operations btrfs_file_inode_operations = {
 	.removexattr	= btrfs_removexattr,
 	.permission	= btrfs_permission,
 	.fallocate	= btrfs_fallocate,
+	.fiemap		= btrfs_fiemap,
 };
 static struct inode_operations btrfs_special_inode_operations = {
 	.getattr	= btrfs_getattr,
-- 
1.5.6.5

From bca297ff9a136b740054d024a527d2a2d52ff945 Mon Sep 17 00:00:00 2001
From: Yehuda Sadeh 
Date: Fri, 19 Dec 2008 12:57:33 -0800
Subject: [PATCH 1/1] ioctl to fetch csums of file extents


Signed-off-by: Yehuda Sadeh 
---
 fs/btrfs/ioctl.c |  101 ++
 fs/btrfs/ioctl.h |9 +
 2 files changed, 110 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ab429fe..18a57c1 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -990,6 +990,105 @@ out_drop_write:
 	return ret;
 }
 
+static long btrfs

Re: [PATCH 1/1] ioctl to fetch csums of file extents

2009-01-21 Thread Chris Mason
On Fri, 2008-12-19 at 13:14 -0800, Yehuda Sadeh Weinraub wrote:
> This adds a new ioctl that requests for the csums of file's extent.
> The csums of contiguous extents can also be requested. The call will
> not return anything for extents that don't have csum information for
> their data, or that the csum list is not available.
> Presumably, the user calls fiemap first in order to get the list of
> extents. The max number of requested csum items is specified in the
> request and is updated according to the actual number of returned
> items.
> 

The swapfile discussions convinced me we're better off taking fiemap
support now and disabling bmap entirely, at least until we can fix the
swapfile code to work properly with btrfs.

So, I'd like to take this patch, but there is line wrapping and other
whitespace problems.  Could you please send it with a mail client that
doesn't mangle? ;)

(Attaching it is fine too, at least on this list)

Thanks!
-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch] Btrfs: use BTRFS_VOL_NAME_MAX for struct btrfs_ioctl_vol_args

2009-01-21 Thread Chris Mason
On Mon, 2009-01-19 at 21:12 +0800, Américo Wang wrote:
> On Mon, Jan 19, 2009 at 08:03:37AM -0500, Josef Bacik wrote:
> >On Mon, Jan 19, 2009 at 08:57:32PM +0800, Américo Wang wrote:
> >> 
> >> I found userspace tool, btrfsctl, uses BTRFS_VOL_NAME_MAX, and
> >> it also looks that this one is more proper.
> >> 
> >> Kill BTRFS_PATH_NAME_MAX since no one will use it.
> >> 
> >
> >Nope, BTRFS_PATH_NAME_MAX is specifically used for the ioctl stuff, makes the
> >arguments 4k aligned, this patch is incorrect.  Thanks,
> 
> Ok, then what is BTRFS_VOL_NAME_MAX? :)

Right now it is only used in the progs.  The disk format doesn't really
have a max there, it is just to keep names usable.  But, we should add a
check in the kernel ioctl side, are you interested in sending a patch
for it?

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Misleading error message in btrfsctl

2009-01-21 Thread Chris Mason
On Sun, 2009-01-18 at 09:12 -0500, Calvin Walton wrote:
> On Sun, 2009-01-18 at 14:40 +0100, Peter Klotz wrote:
> > Calling btrfsctl results in this message when module btrfs.ko is not loaded:
> > 
> > r...@asus:/root# btrfsctl -A /dev/sda
> > ioctl:: Bad file descriptor
> ..
> > The output with applied patch:
> > 
> > r...@asus:/root# btrfsctl -A /dev/sda
> > failed to open /dev/btrfs-control
> 
> It would probably be better to use perror() instead of fprintf, in order
> to give the actual error message. I often find it useful to know why the
> device failed to open - is it missing, permission issues? I've updated
> the patch:
> 

Thanks

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch] version.sh: clean up the code

2009-01-21 Thread Chris Mason
On Mon, 2009-01-19 at 21:51 +0800, Américo Wang wrote:
> - In bash, integer comparation should use '-eq', not '=='.
> - Remove code for Mercurial, since btrfs now uses git.
> 

version.sh should be removed from the mainline sources and only live in
the standalone tree.  So, I'll apply this one a bit later.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch] bit-radix.c: fix declarations

2009-01-21 Thread Chris Mason
On Mon, 2009-01-19 at 22:19 +0800, Américo Wang wrote:
> bit-radix.c should #include "bit-radix.h", and add a missing declaration
> for find_next_bit().
> 
> Signed-off-by: WANG Cong 
> 

I fixed this a little differently, the bit-radix code wasn't being used
anymore.  It's gone now ;)

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: check return value for kthread_run() correctly

2009-01-21 Thread Chris Mason
On Tue, 2009-01-20 at 00:28 +0800, Qinghuang Feng wrote:
> kthread_run() returns the kthread or ERR_PTR(-ENOMEM), not NULL.
> 

Thanks, got it.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs Conference call

2009-01-21 Thread Chris Mason
Hello everyone,

There will be a btrfs conference call today January 21.  Topics will
include mainline merging and testing results.

Time: 1:30pm US Eastern (10:30am Pacific)

* Dial-in Number(s):
* Toll Free: +1-888-967-2253
* Toll  +1-650-607-2253 
* Meeting id: 665734
* Passcode: 428737 (which hopefully spells 4Btrfs)


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Progs: update convert for uninitialized block groups

2009-01-21 Thread Yan Zheng
Hello,

There is a new feature 'uninitialized block groups' in ext4.
Block and inode bitmaps in uninitialized block groups are
uninitialized. This confuses the converter. The fix is call
ext2fs_new_inode for each block group at open time. It set
up uninitialized block and inode bitmaps appropriately.

This patch also contains some other minor fixes. Thank you,

Signed-off-by: Yan Zheng 

---
diff -urp btrfs-progs-unstable/convert.c btrfs-progs/convert.c
--- btrfs-progs-unstable/convert.c  2009-01-13 18:55:53.214449273 +0800
+++ btrfs-progs/convert.c   2009-01-21 16:54:37.0 +0800
@@ -54,6 +54,7 @@ static int open_ext2fs(const char *name,
 {
errcode_t ret;
ext2_filsys ext2_fs;
+   ext2_ino_t ino;
ret = ext2fs_open(name, 0, 0, 0, unix_io_manager, &ext2_fs);
if (ret) {
fprintf(stderr, "ext2fs_open: %s\n", error_message(ret));
@@ -71,6 +72,17 @@ static int open_ext2fs(const char *name,
error_message(ret));
goto fail;
}
+   /*
+* search each block group for a free inode. this set up
+* uninit block/inode bitmaps appropriately.
+*/
+   ino = 1;
+   while (ino <= ext2_fs->super->s_inodes_count) {
+   ext2_ino_t foo;
+   ext2fs_new_inode(ext2_fs, ino, 0, NULL, &foo);
+   ino += EXT2_INODES_PER_GROUP(ext2_fs->super);
+   }
+
*ret_fs = ext2_fs;
return 0;
 fail:
@@ -1929,6 +1941,7 @@ static int relocate_one_reference(struct
key.offset = 0;
}
btrfs_init_path(&path);
+recow:
ret = btrfs_search_slot(trans, root, &key, &path, -1, 1);
if (ret < 0)
goto fail;
@@ -1936,8 +1949,17 @@ static int relocate_one_reference(struct
leaf = path.nodes[0];
nritems = btrfs_header_nritems(leaf);
while (1) {
-   if (path.slots[0] >= nritems)
-   break;
+   if (path.slots[0] >= nritems) {
+   ret = btrfs_next_leaf(root, &path);
+   if (ret < 0)
+   goto fail;
+   if (ret > 0)
+   break;
+   btrfs_item_key_to_cpu(path.nodes[0], &key,
+ path.slots[0]);
+   btrfs_release_path(root, &path);
+   goto recow;
+   }
btrfs_item_key_to_cpu(leaf, &key, path.slots[0]);
if (key.objectid != objectid ||
key.type != BTRFS_EXTENT_DATA_KEY)
@@ -2048,7 +2070,7 @@ static int relocate_one_reference(struct
if (data.num_blocks > 0) {
ret = record_file_blocks(trans, root, objectid, &inode,
 data.first_block, data.disk_block,
-data.num_blocks, 0);
+data.num_blocks, datacsum);
if (ret)
goto fail;
}
@@ -2218,7 +2240,7 @@ next:
ret = btrfs_commit_transaction(trans, cur_root);
BUG_ON(ret);
 
-   if (num_refs > 0 && pass++ < 4)
+   if (num_refs > 0 && pass++ < 16)
goto again;
 
ret = (num_refs > 0) ? -1 : 0;
@@ -2537,6 +2559,7 @@ int do_rollback(const char *devname, int
 {
int fd;
int ret;
+   int i;
struct btrfs_root *root;
struct btrfs_root *ext2_root;
struct btrfs_root *chunk_root;
@@ -2654,6 +2677,10 @@ int do_rollback(const char *devname, int
struct btrfs_file_extent_item);
if (btrfs_file_extent_type(leaf, fi) != BTRFS_FILE_EXTENT_REG)
break;
+   if (btrfs_file_extent_compression(leaf, fi) ||
+   btrfs_file_extent_encryption(leaf, fi) ||
+   btrfs_file_extent_other_encoding(leaf, fi))
+   break;
 
bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
/* skip holes and direct mapped extents */
@@ -2750,6 +2777,15 @@ next_extent:
goto fail;
}
 
+   /* zero btrfs super block mirrors */
+   memset(buf, 0, sectorsize);
+   for (i = 1 ; i < BTRFS_SUPER_MIRROR_MAX; i++) {
+   bytenr = btrfs_sb_offset(i);
+   if (bytenr >= total_bytes)
+   break;
+   ret = pwrite(fd, buf, sectorsize, bytenr);
+   }
+
sb_bytenr = (u64)-1;
/* copy all relocated blocks back */
while(1) {

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix infinite loop in btrfs_extent_post_op

2009-01-21 Thread Yan Zheng
Hello,

btrfs_extent_post_op calls finish_current_insert and
del_pending_extents. They both may enter infinite loop.
finish_current_insert enters infinite loop if it only
finds some backrefs to update. The infinite loop in
del_pending_extents is due to a variable not properly
set. Thank you,

Signed-off-by: Yan Zheng 

---
diff -urp 1/fs/btrfs/extent-tree.c 2/fs/btrfs/extent-tree.c
--- 1/fs/btrfs/extent-tree.c2009-01-07 07:16:42.762364700 +0800
+++ 2/fs/btrfs/extent-tree.c2009-01-20 13:36:07.0 +0800
@@ -2159,7 +2159,8 @@ again:
ret = find_first_extent_bit(&info->extent_ins, search, &start,
&end, EXTENT_WRITEBACK);
if (ret) {
-   if (skipped && all && !num_inserts) {
+   if (skipped && all && !num_inserts &&
+   list_empty(&update_list)) {
skipped = 0;
search = 0;
continue;
@@ -2547,6 +2548,7 @@ again:
if (ret) {
if (all && skipped && !nr) {
search = 0;
+   skipped = 0;
continue;
}
mutex_unlock(&info->extent_ins_mutex);


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix locking issue in btrfs_remove_block_group

2009-01-21 Thread Yan Zheng
hello,

We should hold the block_group_cache_lock while modifying the
block groups red-black tree. Thank you,

Signed-off-by: Yan Zheng 

---
diff -urp 1/fs/btrfs/extent-tree.c 2/fs/btrfs/extent-tree.c
--- 1/fs/btrfs/extent-tree.c2009-01-07 07:16:42.762364700 +0800
+++ 2/fs/btrfs/extent-tree.c2009-01-21 10:41:38.0 +0800
@@ -5957,9 +5957,11 @@ int btrfs_remove_block_group(struct btrf
path = btrfs_alloc_path();
BUG_ON(!path);
 
-   btrfs_remove_free_space_cache(block_group);
+   spin_lock(&root->fs_info->block_group_cache_lock);
rb_erase(&block_group->cache_node,
 &root->fs_info->block_group_cache_tree);
+   spin_unlock(&root->fs_info->block_group_cache_lock);
+   btrfs_remove_free_space_cache(block_group);
down_write(&block_group->space_info->groups_sem);
list_del(&block_group->list);
up_write(&block_group->space_info->groups_sem);



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix stop searching test in replace_one_extent

2009-01-21 Thread Yan Zheng
Hello,

replace_one_extent search tree leaves for references to
a given extent. It stop searching if goes beyond the last
possible position. The last possible position is computed
by adding the starting offset of a found file extent to 
the full size of the extent. The code uses physical size
of the extent as the full size. This is incorrect since
btrfs supported compression. The fix is get the full size
from ram_bytes field of file extent item. Thank you,

Signed-off-by: Yan Zheng 

---
diff -urp 1/fs/btrfs/extent-tree.c 2/fs/btrfs/extent-tree.c
--- 1/fs/btrfs/extent-tree.c2009-01-07 07:16:42.762364700 +0800
+++ 2/fs/btrfs/extent-tree.c2009-01-08 15:14:47.0 +0800
@@ -,7 +4445,7 @@ static noinline int replace_one_extent(s
u64 lock_end = 0;
u64 num_bytes;
u64 ext_offset;
-   u64 first_pos;
+   u64 search_end = (u64)-1;
u32 nritems;
int nr_scaned = 0;
int extent_locked = 0;
@@ -4452,7 +4453,6 @@ static noinline int replace_one_extent(s
int ret;
 
memcpy(&key, leaf_key, sizeof(key));
-   first_pos = INT_LIMIT(loff_t) - extent_key->offset;
if (ref_path->owner_objectid != BTRFS_MULTIPLE_OBJECTIDS) {
if (key.objectid < ref_path->owner_objectid ||
(key.objectid == ref_path->owner_objectid &&
@@ -4501,7 +4501,7 @@ next:
if ((key.objectid > ref_path->owner_objectid) ||
(key.objectid == ref_path->owner_objectid &&
 key.type > BTRFS_EXTENT_DATA_KEY) ||
-   (key.offset >= first_pos + extent_key->offset))
+   key.offset >= search_end)
break;
}
 
@@ -4534,8 +4534,10 @@ next:
num_bytes = btrfs_file_extent_num_bytes(leaf, fi);
ext_offset = btrfs_file_extent_offset(leaf, fi);
 
-   if (first_pos > key.offset - ext_offset)
-   first_pos = key.offset - ext_offset;
+   if (search_end == (u64)-1) {
+   search_end = key.offset - ext_offset +
+   btrfs_file_extent_ram_bytes(leaf, fi);
+   }
 
if (!extent_locked) {
lock_start = key.offset;
@@ -4724,7 +4726,7 @@ next:
}
 skip:
if (ref_path->owner_objectid != BTRFS_MULTIPLE_OBJECTIDS &&
-   key.offset >= first_pos + extent_key->offset)
+   key.offset >= search_end)
break;
 
cond_resched();


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gcc inlining heuristics was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-21 Thread Nick Piggin
On Wed, Jan 21, 2009 at 10:54:18AM +0100, Andi Kleen wrote:
> > The point is that the compiler is then free to do it. If things
> > slow down after the compiler gets *more* information, then that
> > is a problem with the compiler heuristics rather than the
> > information we give it.
> 
> The point was the -Os disables it typically then.
> (not always, compiler heuristics are far from perfect)

That'd be just another gcc failing. If it can make the code
faster without a size increase, then it should (of course
if it has to start spilling registers etc. then that's a
different matter, but we're not talking about only 32-bit
x86 here).


> > > Then x86s tend to have very very fast L1 caches and
> > > if something is not in L1 on reads then the cost of fetching
> > > something for a read dwarfs the few cycles you can typically
> > > get out of this.
> > 
> > Well most architectures have L1 caches of several cycles. And
> > L2 miss typically means going to L2 which in some cases the
> > compiler is expected to attempt to cover as much as possible
> > (eg in-order architectures).
> 
> L2 cache is so much slower that scheduling a few instructions
> more doesn't help much.

I think on a lot of CPUs that is actually not the case. Including
on Nehalem and Montecito CPUs where it is what, like under 15
cycles?

Even in cases where you have a high latency LLC or go to memory,
you want to be able to start as many loads as possible before
stalling. Especially for in-order architectures, but even OOOE
can stall if it can't resolve store addresses early enough or
speculate them.

 
> > stall, so you still want to get loads out early if possible.
> > 
> > Even a lot of OOOE CPUs I think won't have the best alias
> > anaysis, so all else being equal, it wouldn't hurt them to
> > move loads earlier.
> 
> Hmm, but if the load is nearby it won't matter if a 
> store is in the middle, because the CPU will just execute
> over it.

If the address is not known or the store buffer fills up etc
then it may not be able to. It could even be hundreds of
instructions here too much for an OOOE processor window. We
have a lot of huge functions (although granted they'll often
contain barriers for other reasons like locks or function
calls anyway).
 

> The real big win is if you do some computation inbetween,
> but at least for typical list manipulation there isn't 
> really any.

Well, I have a feeling that the MLP side of it could be more
significant than ILP. But no data.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gcc inlining heuristics was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-21 Thread Andi Kleen
> The point is that the compiler is then free to do it. If things
> slow down after the compiler gets *more* information, then that
> is a problem with the compiler heuristics rather than the
> information we give it.

The point was the -Os disables it typically then.
(not always, compiler heuristics are far from perfect)

> 
>  
> > Then x86s tend to have very very fast L1 caches and
> > if something is not in L1 on reads then the cost of fetching
> > something for a read dwarfs the few cycles you can typically
> > get out of this.
> 
> Well most architectures have L1 caches of several cycles. And
> L2 miss typically means going to L2 which in some cases the
> compiler is expected to attempt to cover as much as possible
> (eg in-order architectures).

L2 cache is so much slower that scheduling a few instructions
more doesn't help much.

> stall, so you still want to get loads out early if possible.
> 
> Even a lot of OOOE CPUs I think won't have the best alias
> anaysis, so all else being equal, it wouldn't hurt them to
> move loads earlier.

Hmm, but if the load is nearby it won't matter if a 
store is in the middle, because the CPU will just execute
over it.

The real big win is if you do some computation inbetween,
but at least for typical list manipulation there isn't 
really any.

> > Also at least x86 gcc normally doesn't do scheduling 
> > beyond basic blocks, so any if () shuts it up.
> 
> I don't think any of this is a reason not to use restrict, though.
> But... there are so many places we could add it to the kernel, and
> probably so few where it makes much difference. Maybe it should be
> able to help some critical core code, though.

Frankly I think it would be another unlikely().

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Warning and BUG with btrfs and corrupted image

2009-01-21 Thread Eric Sesterhenn
* Pavel Machek (pa...@suse.cz) wrote:
> On Tue 2009-01-20 18:34:55, Eric Sesterhenn wrote:
> > * Dave Chinner (da...@fromorbit.com) wrote:
> > > On Tue, Jan 20, 2009 at 11:15:03AM +0100, Eric Sesterhenn wrote:
> > > > * Dave Chinner (da...@fromorbit.com) wrote:
> > > > > On Tue, Jan 20, 2009 at 07:31:50AM +0100, Eric Sesterhenn wrote:
> > > > > > * Pavel Machek (pa...@suse.cz) wrote:
> > > > > > > Does ext2/3 and vfat survive that kind of attacks? Those are 'in
> > > > > > > production' and should survive it...
> > > > > > 
> > > > > > I regularly (once or twice a week) test 100 corrupted images of 
> > > > > > vfat, udf, msdos, swap, iso9660, ext2, ext3, ext4, minix, bfs, befs,
> > > > > > hfs, hfs+, qnx4, affs and cramfs on each of my two test machines.
> > > > > 
> > > > > Any reason you are not testing XFS in that set?
> > > > 
> > > > So far the responses from xfs folks have been disappointing, if you are
> > > > interested in bugreports i can send you some.
> > > 
> > > Sure I am.  It would be good if you could start testing XFS along
> > > with all the other filesystems and report anything you find.
> > 
> > Ok, i wont report stuff with only xfs-internal backtraces from
> > xfs_error_report() or are they interesting to you?
> > 
> > This occurs during mount, box is dead afterwards
> > Image can be found here :
> > http://www.cccmz.de/~snakebyte/xfs.11.img.bz2
> > I see this every ~10 images, which makes further testing hard :)
> 
> BTW have you considered trying to run fsck.* on such images? I had lot
> of fun with e2fsck/e3fsck/fsck.vfat.

Not yet, but if one assumes the people writing the userspace code are
the same writing the kernel code, and probably make the same errors in
both versions this might be interesting.

For userspace a more advanced approach than random bit flipping is
possible with tools like bunny (http://code.google.com/p/bunny-the-fuzzer/)
Bunny instrumentates the code, and then tries to change the input file
based on the feedback it gets from the instrumentation. I fired up
an instance testing ext2, lets see if it finishes until evening :-)

Maybe someone is brave enough to test this approach with user mode
linux... :)

Greetings, Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gcc inlining heuristics was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-21 Thread Nick Piggin
On Wed, Jan 21, 2009 at 10:20:49AM +0100, Andi Kleen wrote:
> On Wed, Jan 21, 2009 at 09:52:08AM +0100, Nick Piggin wrote:
> > On Wed, Jan 21, 2009 at 09:54:02AM +0100, Andi Kleen wrote:
> > > > GCC 4.3.2. Maybe i missed something obvious?
> > > 
> > > The typical use case of restrict is to tell it that multiple given
> > > arrays are independent and then give the loop optimizer 
> > > more freedom to handle expressions in the loop that
> > > accesses these arrays.
> > > 
> > > Since there are no loops in the list functions nothing changed.
> > > 
> > > Ok presumably there are some other optimizations which 
> > > rely on that alias information too, but again the list_*
> > > stuff is probably too simple to trigger any of them.
> > 
> > Any function that does several interleaved loads and stores
> > through different pointers could have much more freedom to
> > move loads early and stores late. 
> 
> For once that would require more live registers. It's not
> a clear and obvious win. Especially not if you have
> only very little registers, like on 32bit x86.
> 
> Then it would typically increase code size.

The point is that the compiler is then free to do it. If things
slow down after the compiler gets *more* information, then that
is a problem with the compiler heuristics rather than the
information we give it.

 
> Then x86s tend to have very very fast L1 caches and
> if something is not in L1 on reads then the cost of fetching
> something for a read dwarfs the few cycles you can typically
> get out of this.

Well most architectures have L1 caches of several cycles. And
L2 miss typically means going to L2 which in some cases the
compiler is expected to attempt to cover as much as possible
(eg in-order architectures).

If the caches are missed completely, then especially with an
in-order architecture, you want to issue as many parallel loads
as possible during the stall. If the compiler can't resolve
aliases, then it simply won't be able to bring some of those
loads forward.


> And lastly even on a in order system stores can 
> be typically queued without stalling, so it doesn't
> hurt to do them early.

Store queues are, what? On the order of tens of entries for
big power hungry x86? I'd guess much smaller for low power
in-order x86 and ARM etc. These can definitely fill up and
stall, so you still want to get loads out early if possible.

Even a lot of OOOE CPUs I think won't have the best alias
anaysis, so all else being equal, it wouldn't hurt them to
move loads earlier.


> Also at least x86 gcc normally doesn't do scheduling 
> beyond basic blocks, so any if () shuts it up.

I don't think any of this is a reason not to use restrict, though.
But... there are so many places we could add it to the kernel, and
probably so few where it makes much difference. Maybe it should be
able to help some critical core code, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gcc inlining heuristics was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-21 Thread Andi Kleen
On Wed, Jan 21, 2009 at 09:52:08AM +0100, Nick Piggin wrote:
> On Wed, Jan 21, 2009 at 09:54:02AM +0100, Andi Kleen wrote:
> > > GCC 4.3.2. Maybe i missed something obvious?
> > 
> > The typical use case of restrict is to tell it that multiple given
> > arrays are independent and then give the loop optimizer 
> > more freedom to handle expressions in the loop that
> > accesses these arrays.
> > 
> > Since there are no loops in the list functions nothing changed.
> > 
> > Ok presumably there are some other optimizations which 
> > rely on that alias information too, but again the list_*
> > stuff is probably too simple to trigger any of them.
> 
> Any function that does several interleaved loads and stores
> through different pointers could have much more freedom to
> move loads early and stores late. 

For once that would require more live registers. It's not
a clear and obvious win. Especially not if you have
only very little registers, like on 32bit x86.

Then it would typically increase code size.

Then x86s tend to have very very fast L1 caches and
if something is not in L1 on reads then the cost of fetching
something for a read dwarfs the few cycles you can typically
get out of this.

And lastly even on a in order system stores can 
be typically queued without stalling, so it doesn't
hurt to do them early.

Also at least x86 gcc normally doesn't do scheduling 
beyond basic blocks, so any if () shuts it up.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gcc inlining heuristics was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-21 Thread Nick Piggin
On Wed, Jan 21, 2009 at 09:54:02AM +0100, Andi Kleen wrote:
> > GCC 4.3.2. Maybe i missed something obvious?
> 
> The typical use case of restrict is to tell it that multiple given
> arrays are independent and then give the loop optimizer 
> more freedom to handle expressions in the loop that
> accesses these arrays.
> 
> Since there are no loops in the list functions nothing changed.
> 
> Ok presumably there are some other optimizations which 
> rely on that alias information too, but again the list_*
> stuff is probably too simple to trigger any of them.

Any function that does several interleaved loads and stores
through different pointers could have much more freedom to
move loads early and stores late. Big OOOE CPUs won't care
so much, but embedded and things (including in-order x86)
are very important users of the kernel.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gcc inlining heuristics was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-21 Thread Andi Kleen
> GCC 4.3.2. Maybe i missed something obvious?

The typical use case of restrict is to tell it that multiple given
arrays are independent and then give the loop optimizer 
more freedom to handle expressions in the loop that
accesses these arrays.

Since there are no loops in the list functions nothing changed.

Ok presumably there are some other optimizations which 
rely on that alias information too, but again the list_*
stuff is probably too simple to trigger any of them.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html