Re: A rescue tool for btrfs

2010-12-20 Thread Bryan Østergaard
On 17/12/2010, at 00.18, Michael Niederle wrote:

 Hi!
 
 Last week I crashed a btrfs file system. I didn't lose a lot of data because I
 had current backups of most data and a full backup from a month ago.
 
 But I thought it would be a nice idea to have a rescue tool! Currently I have 
 a
 first release of this tool (surely buggy and runnning on little endian
 architectures only).
 
Thank you for writing this tool. It was able to save some 80% of all data from 
a 
very broken btrfs filesystem. I could mount the filesystem but it would hang 
after 
just a few accesses with thousands of parent transid verify failed messages and
btrfsck just exited immediately with some huge negative number as the only 
indication of what was wrong. Using the btrfsck -s $number option also didn't 
help but your tool seemed to do the job just fine.

I still have the broken filesystem as I'm interested to see what Chris Masons 
new 
btrfsck code can do with it so if anybody is interested in further debugging I 
can 
probably help with that.

Regards,
Bryan Østergaard

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] can not allocate space for caching data

2010-12-20 Thread Miao Xie
Hi, Chris

There is something wrong with this patch:

commit 83a50de97fe96aca82389e061862ed760ece2283
Author: Chris Mason chris.ma...@oracle.com
Date:   Mon Dec 13 15:06:46 2010 -0500

Btrfs: prevent RAID level downgrades when space is low

The extent allocator has code that allows us to fill
allocations from any available block group, even if it doesn't
match the raid level we've requested.

This was put in because adding a new drive to a filesystem
made with the default mkfs options actually upgrades the metadata from
single spindle dup to full RAID1.

But, the code also allows us to allocate from a raid0 chunk when we
really want a raid1 or raid10 chunk.  This can cause big trouble because
mkfs creates a small (4MB) raid0 chunk for data and metadata which then
goes unused for raid1/raid10 installs.

The allocator will happily wander in and allocate from that chunk when
things get tight, which is not correct.

The fix here is to make sure that we provide duplication when the
caller has asked for it.  It does all the dups to be any raid level,
which preserves the dup-raid1 upgrade abilities.

Signed-off-by: Chris Mason chris.ma...@oracle.com

Btrfs has added the space of single chunks and raid0 chunks into the space
information, so when we use btrfs_check_data_free_space() to check if there
is some space for storing file data, this function may return true. So we
write the data into the cache successfully. But, the extent allocator can
not allocate any space to store that cached data, and then the file system
panic.

I think we subtract that space from the space information, or split the space
information into two types, one is used to manage the chunks with duplication,
the other manages the other chunks.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] can not allocate space for caching data

2010-12-20 Thread Chris Mason
Excerpts from Miao Xie's message of 2010-12-20 07:25:10 -0500:
 Hi, Chris
 
 There is something wrong with this patch:
 
 commit 83a50de97fe96aca82389e061862ed760ece2283
 Author: Chris Mason chris.ma...@oracle.com
 Date:   Mon Dec 13 15:06:46 2010 -0500
 
 Btrfs: prevent RAID level downgrades when space is low
 
 The extent allocator has code that allows us to fill
 allocations from any available block group, even if it doesn't
 match the raid level we've requested.
 
 This was put in because adding a new drive to a filesystem
 made with the default mkfs options actually upgrades the metadata from
 single spindle dup to full RAID1.
 
 But, the code also allows us to allocate from a raid0 chunk when we
 really want a raid1 or raid10 chunk.  This can cause big trouble because
 mkfs creates a small (4MB) raid0 chunk for data and metadata which then
 goes unused for raid1/raid10 installs.
 
 The allocator will happily wander in and allocate from that chunk when
 things get tight, which is not correct.
 
 The fix here is to make sure that we provide duplication when the
 caller has asked for it.  It does all the dups to be any raid level,
 which preserves the dup-raid1 upgrade abilities.
 
 Signed-off-by: Chris Mason chris.ma...@oracle.com
 
 Btrfs has added the space of single chunks and raid0 chunks into the space
 information, so when we use btrfs_check_data_free_space() to check if there
 is some space for storing file data, this function may return true. So we
 write the data into the cache successfully. But, the extent allocator can
 not allocate any space to store that cached data, and then the file system
 panic.
 
 I think we subtract that space from the space information, or split the space
 information into two types, one is used to manage the chunks with duplication,
 the other manages the other chunks.

Ok, do you have a test case that triggers this?  I'll work out a patch.
Yan Zheng's original idea of 'the chunks should be readonly' should help
us deduct them from the total.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] can not allocate space for caching data

2010-12-20 Thread Miao Xie

On Mon, 20 Dec 2010 07:44:06 -0500, Chris Mason wrote:

Excerpts from Miao Xie's message of 2010-12-20 07:25:10 -0500:

Hi, Chris

There is something wrong with this patch:

commit 83a50de97fe96aca82389e061862ed760ece2283
Author: Chris Masonchris.ma...@oracle.com
Date:   Mon Dec 13 15:06:46 2010 -0500

 Btrfs: prevent RAID level downgrades when space is low

 The extent allocator has code that allows us to fill
 allocations from any available block group, even if it doesn't
 match the raid level we've requested.

 This was put in because adding a new drive to a filesystem
 made with the default mkfs options actually upgrades the metadata from
 single spindle dup to full RAID1.

 But, the code also allows us to allocate from a raid0 chunk when we
 really want a raid1 or raid10 chunk.  This can cause big trouble because
 mkfs creates a small (4MB) raid0 chunk for data and metadata which then
 goes unused for raid1/raid10 installs.

 The allocator will happily wander in and allocate from that chunk when
 things get tight, which is not correct.

 The fix here is to make sure that we provide duplication when the
 caller has asked for it.  It does all the dups to be any raid level,
 which preserves the dup-raid1 upgrade abilities.

 Signed-off-by: Chris Masonchris.ma...@oracle.com

Btrfs has added the space of single chunks and raid0 chunks into the space
information, so when we use btrfs_check_data_free_space() to check if there
is some space for storing file data, this function may return true. So we
write the data into the cache successfully. But, the extent allocator can
not allocate any space to store that cached data, and then the file system
panic.

I think we subtract that space from the space information, or split the space
information into two types, one is used to manage the chunks with duplication,
the other manages the other chunks.


Ok, do you have a test case that triggers this?  I'll work out a patch.
Yan Zheng's original idea of 'the chunks should be readonly' should help
us deduct them from the total.


# mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
# mount /dev/sda9 /mnt
# dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=99
  (fill the file system)
# umount /mnt
# mount /dev/sda9 /mnt
# dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=1000
# sync

Thanks
Miao

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] can not allocate space for caching data

2010-12-20 Thread Chris Mason
Excerpts from Miao Xie's message of 2010-12-20 08:13:14 -0500:
 On Mon, 20 Dec 2010 07:44:06 -0500, Chris Mason wrote:
  Excerpts from Miao Xie's message of 2010-12-20 07:25:10 -0500:
  Hi, Chris
 
  There is something wrong with this patch:
 
  commit 83a50de97fe96aca82389e061862ed760ece2283
  Author: Chris Masonchris.ma...@oracle.com
  Date:   Mon Dec 13 15:06:46 2010 -0500
 
   Btrfs: prevent RAID level downgrades when space is low
 
   The extent allocator has code that allows us to fill
   allocations from any available block group, even if it doesn't
   match the raid level we've requested.
 
  Btrfs has added the space of single chunks and raid0 chunks into the space
  information, so when we use btrfs_check_data_free_space() to check if there
  is some space for storing file data, this function may return true. So we
  write the data into the cache successfully. But, the extent allocator can
  not allocate any space to store that cached data, and then the file system
  panic.
 
  I think we subtract that space from the space information, or split the 
  space
  information into two types, one is used to manage the chunks with 
  duplication,
  the other manages the other chunks.
 
  Ok, do you have a test case that triggers this?  I'll work out a patch.
  Yan Zheng's original idea of 'the chunks should be readonly' should help
  us deduct them from the total.
 
 # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
 # mount /dev/sda9 /mnt
 # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=99
(fill the file system)
 # umount /mnt
 # mount /dev/sda9 /mnt
 # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=1000
 # sync

Looks like we've got an off by one bug in set_block_group_ro, which is
why our block group isn't getting set to ro.  With this patch, we're
properly setting the block group ro, and the enospc accounting is done
correctly.

It should also be able to replace my commit above.  Please take a look,
Zheng does this look correct to you?

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 227e581..6f7d758 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7970,13 +7970,14 @@ static int set_block_group_ro(struct 
btrfs_block_group_cache *cache)
 
if (sinfo-bytes_used + sinfo-bytes_reserved + sinfo-bytes_pinned +
sinfo-bytes_may_use + sinfo-bytes_readonly +
-   cache-reserved_pinned + num_bytes  sinfo-total_bytes) {
+   cache-reserved_pinned + num_bytes = sinfo-total_bytes) {
sinfo-bytes_readonly += num_bytes;
sinfo-bytes_reserved += cache-reserved_pinned;
cache-reserved_pinned = 0;
cache-ro = 1;
ret = 0;
}
+
spin_unlock(cache-lock);
spin_unlock(sinfo-lock);
return ret;
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mini bug report

2010-12-20 Thread Xavier Nicollet
Hi,

I use btrfs on my laptop, 2.6.37-rc2 from kernel.org, under dm-crypt, as
/. I use space_cache and compression (not forced).

Today, my computer froze. At reboot, the kernel could not mount.
The dmesg output, which I haven't saved was speaking of a null
dereference.

After that I rebooted on 2.6.34, which was not very happy: mount errors
(space cache ?).

Now I am on 2.6.37-rc2 again, which seems to work.
So I guess it might come from space_cache.

I would note on a paper if this comes back. However if anybody has any
clue on what specific part of the dmesg should be reported, it would be
very helpfull !

Cheers,

-- 
Xavier Nicollet
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mini bug report

2010-12-20 Thread Chris Mason
Excerpts from Xavier Nicollet's message of 2010-12-20 10:58:01 -0500:
 Hi,
 
 I use btrfs on my laptop, 2.6.37-rc2 from kernel.org, under dm-crypt, as
 /. I use space_cache and compression (not forced).
 
 Today, my computer froze. At reboot, the kernel could not mount.
 The dmesg output, which I haven't saved was speaking of a null
 dereference.
 
 After that I rebooted on 2.6.34, which was not very happy: mount errors
 (space cache ?).
 
 Now I am on 2.6.37-rc2 again, which seems to work.
 So I guess it might come from space_cache.
 
 I would note on a paper if this comes back. However if anybody has any
 clue on what specific part of the dmesg should be reported, it would be
 very helpfull !

These sound like the free space caching bugs that josef fixed.  If you
pull down the latest I think we've got it nailed.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[TRIVIAL][PATCH] Improve error handling in the btrfs command

2010-12-20 Thread Goffredo Baroncelli
Hi Chris,

below is enclosed a trivial patch, which has the aim to improve the error 
reporting of the btrfs command.

You can pull from

http://cassiopea.homelinux.net/git/btrfs-progs-unstable.git

branch

strerror

I changed every printf(some-error) to something like:

e = errno;
fprintf(stderr, ERROR:  - %s, strerror(e));

so:

1) all the error are reported to standard error
2) At the end of the message is printed the error as returned by the system.

The change is quite simple, I replaced every printf(some-error) to the line
above. I don't touched anything other.
I also integrated a missing printf on the basis of the Ben patch.

This patch leads the btrfs command to be more user friendly :-)

Regards
G.Baroncelli

 btrfs-list.c |   40 ++
 btrfs_cmds.c |   77 -
 utils.c  |6 
 3 files changed, 89 insertions(+), 34 deletions(-)


-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it
Key fingerprint = 4769 7E51 5293 D36C 814E  C054 BF04 F161 3DC5 0512
diff --git a/btrfs-list.c b/btrfs-list.c
index 93766a8..abcc2f4 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -265,7 +265,7 @@ static int resolve_root(struct root_lookup *rl, struct root_info *ri)
 static int lookup_ino_path(int fd, struct root_info *ri)
 {
 	struct btrfs_ioctl_ino_lookup_args args;
-	int ret;
+	int ret, e;
 
 	if (ri-path)
 		return 0;
@@ -275,9 +275,11 @@ static int lookup_ino_path(int fd, struct root_info *ri)
 	args.objectid = ri-dir_id;
 
 	ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP, args);
+	e = errno;
 	if (ret) {
-		fprintf(stderr, ERROR: Failed to lookup path for root %llu\n,
-			(unsigned long long)ri-ref_tree);
+		fprintf(stderr, ERROR: Failed to lookup path for root %llu - %s\n,
+			(unsigned long long)ri-ref_tree,
+			strerror(e));
 		return ret;
 	}
 
@@ -320,15 +322,18 @@ static u64 find_root_gen(int fd)
 	unsigned long off = 0;
 	u64 max_found = 0;
 	int i;
+	int e;
 
 	memset(ino_args, 0, sizeof(ino_args));
 	ino_args.objectid = BTRFS_FIRST_FREE_OBJECTID;
 
 	/* this ioctl fills in ino_args-treeid */
 	ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP, ino_args);
+	e = errno;
 	if (ret) {
-		fprintf(stderr, ERROR: Failed to lookup path for dirid %llu\n,
-			(unsigned long long)BTRFS_FIRST_FREE_OBJECTID);
+		fprintf(stderr, ERROR: Failed to lookup path for dirid %llu - %s\n,
+			(unsigned long long)BTRFS_FIRST_FREE_OBJECTID,
+			strerror(e));
 		return 0;
 	}
 
@@ -351,8 +356,10 @@ static u64 find_root_gen(int fd)
 
 	while (1) {
 		ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args);
+		e = errno;
 		if (ret  0) {
-			fprintf(stderr, ERROR: can't perform the search\n);
+			fprintf(stderr, ERROR: can't perform the search - %s\n,
+strerror(e));
 			return 0;
 		}
 		/* the ioctl returns the number of item it found in nr_items */
@@ -407,14 +414,16 @@ static char *__ino_resolve(int fd, u64 dirid)
 	struct btrfs_ioctl_ino_lookup_args args;
 	int ret;
 	char *full;
+	int e;
 
 	memset(args, 0, sizeof(args));
 	args.objectid = dirid;
 
 	ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP, args);
+	e = errno;
 	if (ret) {
-		fprintf(stderr, ERROR: Failed to lookup path for dirid %llu\n,
-			(unsigned long long)dirid);
+		fprintf(stderr, ERROR: Failed to lookup path for dirid %llu - %s\n,
+			(unsigned long long)dirid, strerror(e) );
 		return ERR_PTR(ret);
 	}
 
@@ -472,6 +481,7 @@ static char *ino_resolve(int fd, u64 ino, u64 *cache_dirid, char **cache_name)
 	struct btrfs_ioctl_search_header *sh;
 	unsigned long off = 0;
 	int namelen;
+	int e;
 
 	memset(args, 0, sizeof(args));
 
@@ -490,8 +500,10 @@ static char *ino_resolve(int fd, u64 ino, u64 *cache_dirid, char **cache_name)
 	sk-nr_items = 1;
 
 	ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args);
+	e = errno;
 	if (ret  0) {
-		fprintf(stderr, ERROR: can't perform the search\n);
+		fprintf(stderr, ERROR: can't perform the search - %s\n,
+			strerror(e));
 		return NULL;
 	}
 	/* the ioctl returns the number of item it found in nr_items */
@@ -550,6 +562,7 @@ int list_subvols(int fd)
 	char *name;
 	u64 dir_id;
 	int i;
+	int e;
 
 	root_lookup_init(root_lookup);
 
@@ -578,8 +591,10 @@ int list_subvols(int fd)
 
 	while(1) {
 		ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args);
+		e = errno;
 		if (ret  0) {
-			fprintf(stderr, ERROR: can't perform the search\n);
+			fprintf(stderr, ERROR: can't perform the search - %s\n,
+strerror(e));
 			return ret;
 		}
 		/* the ioctl returns the number of item it found in nr_items */
@@ -747,6 +762,7 @@ int find_updated_files(int fd, u64 root_id, u64 oldest_gen)
 	u64 found_gen;
 	u64 max_found = 0;
 	int i;
+	int e;
 	u64 cache_dirid = 0;
 	u64 cache_ino = 0;
 	char *cache_dir_name = NULL;
@@ -773,8 +789,10 @@ int find_updated_files(int fd, u64 root_id, u64 oldest_gen)
 	max_found = find_root_gen(fd);
 	while(1) {
 		ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args);
+		e = errno;
 		if (ret  0) {
-			fprintf(stderr, 

Re: [PATCH] Improve error handling in filesystem df

2010-12-20 Thread Goffredo Baroncelli
Hi Ben,

I integrated your patch on the my one (see my next email). However I changed 
the argument of the strerror function from the ioctl return code to
the errno variable.

Regards
G.Baroncelli

On Sunday, 19 December, 2010, Ben Gamari wrote:
 The return values of ioctl weren't being printed to stderr on failure,
 causing the command to silently fail, resulting in a very confused
 user.
 
 Signed-off-by: Ben Gamari bgamari.f...@gmail.com
 ---
  btrfs_cmds.c |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)
 
 diff --git a/btrfs_cmds.c b/btrfs_cmds.c
 index 8031c58..45da2bd 100644
 --- a/btrfs_cmds.c
 +++ b/btrfs_cmds.c
 @@ -857,6 +857,7 @@ int do_df_filesystem(int nargs, char **argv)
  
   ret = ioctl(fd, BTRFS_IOC_SPACE_INFO, sargs);
   if (ret) {
 + fprintf(stderr, ERROR: can't query '%s' for free space 
(%s)\n, path, strerror(-ret));
   free(sargs);
   return ret;
   }
 @@ -875,6 +876,7 @@ int do_df_filesystem(int nargs, char **argv)
  
   ret = ioctl(fd, BTRFS_IOC_SPACE_INFO, sargs);
   if (ret) {
 + fprintf(stderr, ERROR: can't query '%s' for free space 
(%s)\n, path, strerror(-ret));
   free(sargs);
   return ret;
   }
 -- 
 1.7.1
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it
Key fingerprint = 4769 7E51 5293 D36C 814E  C054 BF04 F161 3DC5 0512
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TRIVIAL][PATCH] Improve error handling in the btrfs command

2010-12-20 Thread Goffredo Baroncelli
On Monday, 20 December, 2010, you (Chris Samuel) wrote:
 On 21/12/10 07:06, Goffredo Baroncelli wrote:
 
  below is enclosed a trivial patch, which has the aim to
  improve the error reporting of the btrfs command.
 
 Any reason to not just use perror() ?

Some time I needed to add other info, so perror(3) may not be sufficient..
 

-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it
Key fingerprint = 4769 7E51 5293 D36C 814E  C054 BF04 F161 3DC5 0512
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TRIVIAL][PATCH] Improve error handling in the btrfs command

2010-12-20 Thread Chris Samuel
On 21/12/10 07:06, Goffredo Baroncelli wrote:

 below is enclosed a trivial patch, which has the aim to
 improve the error reporting of the btrfs command.

Any reason to not just use perror() ?

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs: 21 minutes to read 1.2M file directory

2010-12-20 Thread Andy Isaacson
Sigh, wrong btrfs address on the original.  Apologies for the
double-post.

On Mon, Dec 20, 2010 at 02:24:46PM -0800, Andy Isaacson wrote:
 I have a directory with 1.2M files in it, which makes readdir very slow
 on btrfs with cold caches (although it's reasonably fast with hot caches
 as in the first example below):
 
 % time find /btr/foo  /btr/foo.list
 find /btr/foo  /btr/foo.list  4.10s user 7.97s system 36% cpu 33.275 total
 % head /btr/foo.list
 /btr/foo
 /btr/foo/1281373625.777.fg.jpg
 /btr/foo/1281373625.777.bg.jpg
 /btr/foo/1281373625.948.fg.jpg
 /btr/foo/1281373625.948.bg.jpg
 /btr/foo/1281373626.096.fg.jpg
 /btr/foo/1281373626.096.bg.jpg
 /btr/foo/1281373626.218.fg.jpg
 /btr/foo/1281373626.218.bg.jpg
 /btr/foo/1281373626.350.fg.jpg
 % wc !$
 wc /btr/foo.list
  1216  1216 401499940 /btr/foo.list
 % wc -l /btr/foo.list
 1216 /btr/foo.list
 % sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0
 vm.drop_caches = 3
 vm.drop_caches = 0
 % time find /btr/foo  /btr/foo.list.2
 find /btr/foo  /btr/foo.list.2  5.62s user 24.54s system 2% cpu 21:40.90 
 total
 % uname -a
 Linux pyron 2.6.36-rc7-00149-g29979aa #71 SMP Wed Oct 13 09:42:57 PDT 2010 
 x86_64 GNU/Linux
 
 Interestingly, while readdir is busy I'm only seeing IO on sdb even
 though the btrfs is on 3 targets:
 
 Label: btr  uuid: 1271de53-b3d2-4d68-9d48-b19487e1c982
 Total devices 3 FS bytes used 555.13GB
 devid1 size 18.65GB used 18.64GB path /dev/sda2
 devid3 size 512.00GB used 44.13GB path /dev/sdc1
 devid2 size 512.00GB used 511.76GB path /dev/sdb1
 
 iostat -k 1 | grep sdb tells me:
 
 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
 
 sdb 173.00   692.00 0.00692  0
 sdb 185.00   740.00 0.00740  0
 sdb 198.00   792.00 0.00792  0
 sdb 177.00   712.00 0.00712  0
 
 I updated to a recent git and it's still slow (my test hasn't completed
 yet 19 minutes in):
 
 Linux pyron 2.6.37-rc6-11882-g55ec86f #72 SMP Mon Dec 20 13:34:38 PST 2010 
 x86_64 GNU/Linux
 
 The devices are:
 
 [1.834527] ata1.00: ATA-7: INTEL SSDSA2M040G2GC, 2CV102HD, max UDMA/133
 [1.834816] ata1.00: 78165360 sectors, multi 1: LBA48 NCQ (depth 31/32)
 [1.835369] ata1.00: configured for UDMA/133
 [1.835776] scsi 0:0:0:0: Direct-Access ATA  INTEL SSDSA2M040 2CV1 
 PQ: 0 ANSI: 5
 ...
 [2.904919] ata3.00: ATA-8: ST31500341AS, CC1H, max UDMA/133
 [2.905206] ata3.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
 [2.947393] ata3.00: configured for UDMA/133
 [2.947850] scsi 2:0:0:0: Direct-Access ATA  ST31500341AS CC1H 
 PQ: 0 ANSI: 5
 ...
 [3.989664] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 [4.018524] ata5.00: ATA-8: ST31500341AS, CC1H, max UDMA/133
 [4.018811] ata5.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
 [4.060838] ata5.00: configured for UDMA/133
 [4.061205] scsi 4:0:0:0: Direct-Access ATA  ST31500341AS CC1H 
 PQ: 0 ANSI: 5
 
 The host is a Intel(R) Core(TM) i7 CPU 930 @2.80GHz with 12GB RAM.
 
 Thanks,
 -andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TRIVIAL][PATCH] Improve error handling in the btrfs command

2010-12-20 Thread Chris Samuel
On 21/12/10 09:53, Goffredo Baroncelli wrote:

 Some time I needed to add other info, so perror(3) may not be sufficient..

Ah, of course, and you cannot rely on safely snprintf()'ing
something into the string would get passed to perror() because
that could easily change errno if something went wrong internally.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] can not allocate space for caching data

2010-12-20 Thread Yan, Zheng
On Mon, Dec 20, 2010 at 11:41 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Miao Xie's message of 2010-12-20 08:13:14 -0500:
 On Mon, 20 Dec 2010 07:44:06 -0500, Chris Mason wrote:
  Excerpts from Miao Xie's message of 2010-12-20 07:25:10 -0500:
  Hi, Chris
 
  There is something wrong with this patch:
 
  commit 83a50de97fe96aca82389e061862ed760ece2283
  Author: Chris Masonchris.ma...@oracle.com
  Date:   Mon Dec 13 15:06:46 2010 -0500
 
       Btrfs: prevent RAID level downgrades when space is low
 
       The extent allocator has code that allows us to fill
       allocations from any available block group, even if it doesn't
       match the raid level we've requested.
 
  Btrfs has added the space of single chunks and raid0 chunks into the space
  information, so when we use btrfs_check_data_free_space() to check if 
  there
  is some space for storing file data, this function may return true. So we
  write the data into the cache successfully. But, the extent allocator can
  not allocate any space to store that cached data, and then the file system
  panic.
 
  I think we subtract that space from the space information, or split the 
  space
  information into two types, one is used to manage the chunks with 
  duplication,
  the other manages the other chunks.
 
  Ok, do you have a test case that triggers this?  I'll work out a patch.
  Yan Zheng's original idea of 'the chunks should be readonly' should help
  us deduct them from the total.

 # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
 # mount /dev/sda9 /mnt
 # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=99
    (fill the file system)
 # umount /mnt
 # mount /dev/sda9 /mnt
 # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=1000
 # sync

 Looks like we've got an off by one bug in set_block_group_ro, which is
 why our block group isn't getting set to ro.  With this patch, we're
 properly setting the block group ro, and the enospc accounting is done
 correctly.

 It should also be able to replace my commit above.  Please take a look,
 Zheng does this look correct to you?

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 227e581..6f7d758 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -7970,13 +7970,14 @@ static int set_block_group_ro(struct 
 btrfs_block_group_cache *cache)

        if (sinfo-bytes_used + sinfo-bytes_reserved + sinfo-bytes_pinned +
            sinfo-bytes_may_use + sinfo-bytes_readonly +
 -           cache-reserved_pinned + num_bytes  sinfo-total_bytes) {
 +           cache-reserved_pinned + num_bytes = sinfo-total_bytes) {
                sinfo-bytes_readonly += num_bytes;
                sinfo-bytes_reserved += cache-reserved_pinned;
                cache-reserved_pinned = 0;
                cache-ro = 1;
                ret = 0;
        }
 +
        spin_unlock(cache-lock);
        spin_unlock(sinfo-lock);
        return ret;


Looks good for me,

Yan, Zheng
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs: 21 minutes to read 1.2M file directory

2010-12-20 Thread Felipe Contreras
On Tue, Dec 21, 2010 at 12:24 AM, Andy Isaacson a...@hexapodia.org wrote:
 I have a directory with 1.2M files in it, which makes readdir very slow
 on btrfs with cold caches (although it's reasonably fast with hot caches
 as in the first example below):

Sounds like:

Bug 21562 - btrfs is dead slow due to fragmentation
https://bugzilla.kernel.org/show_bug.cgi?id=21562

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Scary OOPS when playing with --bind, --move, and friends

2010-12-20 Thread C Anthony Risinger
hello,

i really need to stop recklessly doing this stuff to my laptop... i'm
finishing a new initramfs hook to support many features of btrfs; when
considering how i was going to mount the target subvol as / for the
booting system, i decided to play with --bind and --move.

in short, everything works fine until you --bind across a subvol via
the special folders created when one takes a snapshot, or --bind the
special folder itself.  the --bind succeeds, and everything initially
appears to work fine...

this is nearly the exact process i did; should reproduce :-(i'm scared
to do it again...):

-
# mkdir -p sand/root sand/bind
# cd sand
# mount -o subvolid=0 /dev/sda root
# mount --bind root/subvol of my current root/home/anthony bind
# touch bind/TEST

you can now see TEST at ~/TEST and bind/TEST

# vim bind/TEST
  did it work?
:wq

you can see the edited version ONLY in the one you edited... the
other is still 0 bytes

# vim ~/anthony/TEST
1 wtf, why not?
:wq

machine panics, X is instantly replaced by an oopsie screen; machine locked
-

i don't know why i decided to stupidly edit the bad version, even
though something was clearly wrong.  at any rate, this was about 15
minutes ago... the machine booted back up alright after a hard reboot,
hooray for that, but methinks there is probably some corruptions in
there now... meh.

i don't know what it means, but when the two versions desynced (it
could have been like this, but i didn't notice until after the
desync), `ls -l` reported a `0` right after the permissions:


-rw-r--r-- 0 anthony users 8 Dec 20 21:41 TEST


all other files report `1`.  since /dev and /proc etc. have different
numbers, this appears to have something to do with the mount or
device?

i panicked wen the kernel did, and i forgot to write down the message,
but the trace had `vfs_rename` and `tomoyo_???`... sorry for the bad
memory.  vim was attempting to move a temporary file over the top of
the misbehaving file, hence the rename.

i'm on 2.6.36.2

the `directory as a subvol` thing seems to be a little finicky :-) did
i do something incorrect?  should this kind of operation be supported?
 it seems to work fine so long as i stay on the same subvol.

thanks,

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scary OOPS when playing with --bind, --move, and friends

2010-12-20 Thread Fajar A. Nugraha
On Tue, Dec 21, 2010 at 10:51 AM, C Anthony Risinger anth...@extof.me wrote:
 in short, everything works fine until you --bind across a subvol via
 the special folders created when one takes a snapshot,

 # mount --bind root/subvol of my current root/home/anthony bind
 # touch bind/TEST

 you can now see TEST at ~/TEST and bind/TEST

bind/ is a mounted snapshot, right? if yes, then when you touch
bind/TEST, it should also appear in root/subvol of my current
root/home/anthony/TEST, and NOT in root/home/anthony/TEST or
/home/anthony/TEST

 i'm on 2.6.36.2

Try 2.6.35 or later. I tested something similar under ubuntu maverick
(2.6.35-24-generic) and it works just fine.

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scary OOPS when playing with --bind, --move, and friends

2010-12-20 Thread cwillu
On Mon, Dec 20, 2010 at 10:16 PM, Fajar A. Nugraha l...@fajar.net wrote:
 On Tue, Dec 21, 2010 at 10:51 AM, C Anthony Risinger anth...@extof.me wrote:
 in short, everything works fine until you --bind across a subvol via
 the special folders created when one takes a snapshot,

 # mount --bind root/subvol of my current root/home/anthony bind
 # touch bind/TEST

 you can now see TEST at ~/TEST and bind/TEST

 bind/ is a mounted snapshot, right? if yes, then when you touch
 bind/TEST, it should also appear in root/subvol of my current
 root/home/anthony/TEST, and NOT in root/home/anthony/TEST or
 /home/anthony/TEST

 i'm on 2.6.36.2

 Try 2.6.35 or later. I tested something similar under ubuntu maverick
 (2.6.35-24-generic) and it works just fine.

Last I checked, 2.6.36 came after 2.6.35.  :)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scary OOPS when playing with --bind, --move, and friends

2010-12-20 Thread Fajar A. Nugraha
On Tue, Dec 21, 2010 at 11:16 AM, Fajar A. Nugraha l...@fajar.net wrote:
 On Tue, Dec 21, 2010 at 10:51 AM, C Anthony Risinger anth...@extof.me wrote:
 i'm on 2.6.36.2

 Try 2.6.35 or later. I tested something similar under ubuntu maverick
 (2.6.35-24-generic) and it works just fine.

Sorry, hit send to soon. I though you wrote 2.6.32 :P

Still curious about your test scenario though. Can you double check
it? A write on the snapshot should not appear on the parent
filesystem.

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scary OOPS when playing with --bind, --move, and friends

2010-12-20 Thread C Anthony Risinger
On Mon, Dec 20, 2010 at 10:19 PM, Fajar A. Nugraha l...@fajar.net wrote:
 On Tue, Dec 21, 2010 at 11:16 AM, Fajar A. Nugraha l...@fajar.net wrote:
 On Tue, Dec 21, 2010 at 10:51 AM, C Anthony Risinger anth...@extof.me 
 wrote:
 i'm on 2.6.36.2

 Try 2.6.35 or later. I tested something similar under ubuntu maverick
 (2.6.35-24-generic) and it works just fine.

 Sorry, hit send to soon. I though you wrote 2.6.32 :P

 Still curious about your test scenario though. Can you double check
 it? A write on the snapshot should not appear on the parent
 filesystem.

sorry maybe i wasn't very clear; my current root is a subvol... the
directory i was --bind mounting corresponded to /home/anthony:

/

and

root/subvol of my current root

are the same; so it should show up in my /home/anthony directory.  if
mount the subvol by id, then --bind mount, it works as expected; only
when crossing the magic barrier doesn't things seem to freak out.

i actually reproduced it twice, but this time i didn't write to the files :-)

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scary OOPS when playing with --bind, --move, and friends

2010-12-20 Thread C Anthony Risinger
On Mon, Dec 20, 2010 at 10:25 PM, C Anthony Risinger anth...@extof.me wrote:
 On Mon, Dec 20, 2010 at 10:19 PM, Fajar A. Nugraha l...@fajar.net wrote:

 Still curious about your test scenario though. Can you double check
 it? A write on the snapshot should not appear on the parent
 filesystem.

 sorry maybe i wasn't very clear; my current root is a subvol... the
 directory i was --bind mounting corresponded to /home/anthony:

 /

 and

 root/subvol of my current root

 are the same; so it should show up in my /home/anthony directory.  if
 mount the subvol by id, then --bind mount, it works as expected; only
 when crossing the magic barrier doesn't things seem to freak out.

s/doesn't/do/g

to be exact, it looks like this:

-
(subvolid)
source
mount
[options]

(262)
/dev/sda
/

(__0)
/dev/sda
/home/anthony/sand/root
[subvolid=0]

(???)
/home/anthony/sand/root/vols/262/home/anthony
/home/anthony/sand/bind
[--bind]
-

all my subvolumes are kept in a vols directory in the btrfs root, so
my / and the --bind mount were suppose to be referencing the same
location.  additionally, TEST showed up in both locations... it was
the editing part that blew up.  NOTE however, that the subvol (id 262)
itself was _never_ actually mounted, it was accessed thru the btrfs
root mounted at `root`.  i think this is the crux of the problem;
--bind doesn't seem to know that the directory it was binding isn't
100% within the mount point it resides under.

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html