btrfs for files 10GB = random spontaneous CRC failure.

2013-01-14 Thread Tomasz Kusmierz

Hi,

Since I had some free time over Christmas, I decided to conduct few 
tests over btrFS to se how it will cope with real life storage for 
normal gray users and I've found that filesystem will always mess up 
your files that are larger than 10GB.


Long story:
I've used my set of data that I've got nicelly backed up on personal 
raid 5 to populate btrfs volumes: music, slr pics and video (an just a 
few document). Disks used in test are all green 2TB disks from WD.


1. First I started with creating btrfs (4k blocks) on one disk, filling 
it up and then adding second disk - convert to raid1 through balance - 
convert to raid10 trough balance. Unfortunately converting to raid1 
failed - because of CRC error in 49 files that vere bigger  10GB. At 
this point I was a bit spooked up that my controllers are failing or 
that drives got some bad sectors. Tested everything (took few days) and 
it turns out that there is no apparent issue with hardware (bad 
sectors or io down to disks).
2. At this point I thought cool this will be a perfect test case for 
scrub to show it's magical power!. Created raid1 over two volumes - 
try scrubbing - FAIL ... It turns out that magically I've got corrupted 
CRC in two exactly same logical locations on two different disks (~34 
files  10GB affected) hence scrub can't do anything with it. It only 
reports it as uncorrectable errors
3. Performed same test on raid10 setup (still 4k block). Same results 
(just different file count).


Ok, time to dig more into this because it starts get intriguing. I'm 
running ubuntu server 12.10 (64bit) with stock kernel, so my next step 
was to get 3.7.1 kernel + new btrfs tool straight from git repo.
Unfortunatelly 1  2  3 still provides same results, corrupt CRC only 
in files  10GB.
At this point I thought fine maybe when I'll expand allocation block - 
it will make less block needed for big file to fit in resulting in 
propperly storing those - time for 16K leafs :) (-n 16K -l 16K) 
sectors are still 4K for known reasons :P. Well, it does exactly the 
same thing - 1  2  3 same results, big files get automagically corrupt.



Something about test data:
music - not more than 200MB files (tipical mix of mp3  aac) 10 K files 
give or take.
pics - not more than 20MB (typical point  shot + dslr) 6K files give or 
take.
video1 - collection of little ones with size more than 300MB, less than 
1.5GB ~ 400 files

video2 - collection of 5GB - 18GB files ~400 files

I guess that stating that files 10GB are only affected is a long 
shot, but so far I've not seen file less than 10GB affected (I was not 
really thorough about checking size, but all files that size I've 
checked were more than 10GB)


ps. As a footnote I'll add that I've tried shuffling test 1, 2  3 
without video2 and it all work just fine.


If you've got any ideas for work around ( other than zfs :D ) I'm happy 
to try it out.


Tom.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs for files 10GB = random spontaneous CRC failure.

2013-01-14 Thread Tomasz Kusmierz

Hi,

Since I had some free time over Christmas, I decided to conduct few 
tests over btrFS to se how it will cope with real life storage for 
normal gray users and I've found that filesystem will always mess up 
your files that are larger than 10GB.


Long story:
I've used my set of data that I've got nicelly backed up on personal 
raid 5 to populate btrfs volumes: music, slr pics and video (an just a 
few document). Disks used in test are all green 2TB disks from WD.


1. First I started with creating btrfs (4k blocks) on one disk, filling 
it up and then adding second disk - convert to raid1 through balance - 
convert to raid10 trough balance. Unfortunately converting to raid1 
failed - because of CRC error in 49 files that vere bigger  10GB. At 
this point I was a bit spooked up that my controllers are failing or 
that drives got some bad sectors. Tested everything (took few days) and 
it turns out that there is no apparent issue with hardware (bad 
sectors or io down to disks).
2. At this point I thought cool this will be a perfect test case for 
scrub to show it's magical power!. Created raid1 over two volumes - 
try scrubbing - FAIL ... It turns out that magically I've got corrupted 
CRC in two exactly same logical locations (~34 files  10GB affected).
3. Performed same test on raid10 setup (still 4k block). Same results 
(just diffrent file count).


Ok, time to dig more into this because it starts get intriguing. I'm 
running ubuntu server 12.10 with stock kernel, so my next step was to 
get 3.7.1 kernel + new btrfs tool straight from git repo.
Unfortunatelly 1  2  3 still provides same results, corrupt CRC only 
in files  10GB.
At this point I thought fine maybe when I'll expand allocation block - 
it will make less block needed for big file to fit in resulting in 
propperly storing those - time for 16K leafs :) (-n 16K -l 16K) 
sectors are still 4K for known reasons :P. Well, it does exactly the 
same thing - 1  2  3 same results, big files get automagically corrupt.



Something about test data:
music - not more than 200MB files (tipical mix of mp3  aac) 10 K files 
give or take.
pics - not more than 20MB (typical point  shot + dslr) 6K files give or 
take.
video1 - collection of little ones with size more than 300MB, less than 
1.5GB ~ 400 files

video2 - collection of 5GB - 18GB files ~400 files

I guess that stating that files 10GB are only affected is a long 
shot, but so far I've not seen file less than 10GB affected (I was not 
really thorough about checking size, but all files that size I've 
checked were more than 10GB)


ps. As a footnote I'll add that I've tried shuffling test 1, 2  3 
without video2 and it all work just fine.


If you've got any ideas for work around ( other than zfs :D ) I'm happy 
to try it out.


Tom.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs for files 10GB = random spontaneous CRC failure.

2013-01-14 Thread Roman Mamedov
Hello,

On Mon, 14 Jan 2013 11:17:17 +
Tomasz Kusmierz tom.kusmi...@gmail.com wrote:

 this point I was a bit spooked up that my controllers are failing or 

Which controller manufacturer/model?

-- 
With respect,
Roman

~~~
Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free.


signature.asc
Description: PGP signature


Re: btrfs for files 10GB = random spontaneous CRC failure.

2013-01-14 Thread Tomasz Kusmierz

On 14/01/13 11:25, Roman Mamedov wrote:

Hello,

On Mon, 14 Jan 2013 11:17:17 +
Tomasz Kusmierz tom.kusmi...@gmail.com wrote:


this point I was a bit spooked up that my controllers are failing or

Which controller manufacturer/model?

Well, this is a home server (which I preffer to tinker on). Two 
controllers were used, mother board build in, and crappy Adaptec pcie one.


00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI 
SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]

02:00.0 RAID bus controller: Adaptec Serial ATA II RAID 1430SA (rev 02)


ps. MoBo is: ASUS M4A79T Deluxe
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs-progs: libify some parts of btrfs-progs

2013-01-14 Thread Arvin Schnell

Hi,

please find attached a patch to make the new libbtrfs usable from
C++ (at least for the parts snapper will likely need).

Regards,
  Arvin

-- 
Arvin Schnell, aschn...@suse.de
Senior Software Engineer, Research  Development
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
16746 (AG Nürnberg)
Maxfeldstraße 5
90409 Nürnberg
Germany
diff --git a/cmds-send.c b/cmds-send.c
index 9b47e70..c51310a 100644
--- a/cmds-send.c
+++ b/cmds-send.c
@@ -40,6 +40,10 @@
 #include send.h
 #include send-utils.h
 
+#ifdef __cplusplus
+extern C {
+#endif
+
 static int g_verbose = 0;
 
 struct btrfs_send {
@@ -654,3 +658,7 @@ int cmd_send(int argc, char **argv)
 {
 	return cmd_send_start(argc, argv);
 }
+
+#ifdef __cplusplus
+}
+#endif
diff --git a/extent_io.c b/extent_io.c
index ebb35b2..70ecc48 100644
--- a/extent_io.c
+++ b/extent_io.c
@@ -48,7 +48,7 @@ static struct extent_state *alloc_extent_state(void)
 		return NULL;
 	state-refs = 1;
 	state-state = 0;
-	state-private = 0;
+	state-xprivate = 0;
 	return state;
 }
 
@@ -509,7 +509,7 @@ int set_state_private(struct extent_io_tree *tree, u64 start, u64 private)
 		ret = -ENOENT;
 		goto out;
 	}
-	state-private = private;
+	state-xprivate = private;
 out:
 	return ret;
 }
@@ -530,7 +530,7 @@ int get_state_private(struct extent_io_tree *tree, u64 start, u64 *private)
 		ret = -ENOENT;
 		goto out;
 	}
-	*private = state-private;
+	*private = state-xprivate;
 out:
 	return ret;
 }
diff --git a/extent_io.h b/extent_io.h
index 4553859..6d8404d 100644
--- a/extent_io.h
+++ b/extent_io.h
@@ -54,7 +54,7 @@ struct extent_state {
 	u64 end;
 	int refs;
 	unsigned long state;
-	u64 private;
+	u64 xprivate;
 };
 
 struct extent_buffer {
@@ -93,8 +93,8 @@ int extent_buffer_uptodate(struct extent_buffer *eb);
 int set_extent_buffer_uptodate(struct extent_buffer *eb);
 int clear_extent_buffer_uptodate(struct extent_io_tree *tree,
 struct extent_buffer *eb);
-int set_state_private(struct extent_io_tree *tree, u64 start, u64 private);
-int get_state_private(struct extent_io_tree *tree, u64 start, u64 *private);
+int set_state_private(struct extent_io_tree *tree, u64 start, u64 xprivate);
+int get_state_private(struct extent_io_tree *tree, u64 start, u64 *xprivate);
 struct extent_buffer *find_extent_buffer(struct extent_io_tree *tree,
 	 u64 bytenr, u32 blocksize);
 struct extent_buffer *find_first_extent_buffer(struct extent_io_tree *tree,
diff --git a/ioctl.h b/ioctl.h
index b7f1ce3..56de39f 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -22,6 +22,10 @@
 #include linux/ioctl.h
 #include time.h
 
+#ifdef __cplusplus
+extern C {
+#endif
+
 #define BTRFS_IOCTL_MAGIC 0x94
 #define BTRFS_VOL_NAME_MAX 255
 
@@ -439,4 +443,9 @@ struct btrfs_ioctl_clone_range_args {
 	struct btrfs_ioctl_qgroup_create_args)
 #define BTRFS_IOC_QGROUP_LIMIT _IOR(BTRFS_IOCTL_MAGIC, 43, \
 	struct btrfs_ioctl_qgroup_limit_args)
+
+#ifdef __cplusplus
+}
+#endif
+
 #endif
diff --git a/list.h b/list.h
index d31090c..50f4619 100644
--- a/list.h
+++ b/list.h
@@ -19,8 +19,8 @@
 #ifndef _LINUX_LIST_H
 #define _LINUX_LIST_H
 
-#define LIST_POISON1  ((void *) 0x00100100)
-#define LIST_POISON2  ((void *) 0x00200200)
+#define LIST_POISON1  ((struct list_head *) 0x00100100)
+#define LIST_POISON2  ((struct list_head *) 0x00200200)
 
 /*
  * Simple doubly linked list implementation.
@@ -54,17 +54,17 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
  * the prev/next entries already!
  */
 #ifndef CONFIG_DEBUG_LIST
-static inline void __list_add(struct list_head *new,
+static inline void __list_add(struct list_head *xnew,
 			  struct list_head *prev,
 			  struct list_head *next)
 {
-	next-prev = new;
-	new-next = next;
-	new-prev = prev;
-	prev-next = new;
+	next-prev = xnew;
+	xnew-next = next;
+	xnew-prev = prev;
+	prev-next = xnew;
 }
 #else
-extern void __list_add(struct list_head *new,
+extern void __list_add(struct list_head *xnew,
 			  struct list_head *prev,
 			  struct list_head *next);
 #endif
@@ -78,12 +78,12 @@ extern void __list_add(struct list_head *new,
  * This is good for implementing stacks.
  */
 #ifndef CONFIG_DEBUG_LIST
-static inline void list_add(struct list_head *new, struct list_head *head)
+static inline void list_add(struct list_head *xnew, struct list_head *head)
 {
-	__list_add(new, head, head-next);
+	__list_add(xnew, head, head-next);
 }
 #else
-extern void list_add(struct list_head *new, struct list_head *head);
+extern void list_add(struct list_head *xnew, struct list_head *head);
 #endif
 
 
@@ -95,9 +95,9 @@ extern void list_add(struct list_head *new, struct list_head *head);
  * Insert a new entry before the specified head.
  * This is useful for implementing queues.
  */
-static inline void list_add_tail(struct list_head *new, struct list_head *head)
+static inline void list_add_tail(struct list_head *xnew, struct list_head *head)
 {
-	__list_add(new, head-prev, head);
+	__list_add(xnew, head-prev, head);
 }
 
 /*
@@ 

Re: btrfs for files 10GB = random spontaneous CRC failure.

2013-01-14 Thread Chris Mason
On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote:
 Hi,
 
 Since I had some free time over Christmas, I decided to conduct few 
 tests over btrFS to se how it will cope with real life storage for 
 normal gray users and I've found that filesystem will always mess up 
 your files that are larger than 10GB.

Hi Tom,

I'd like to nail down the test case a little better.

1) Create on one drive, fill with data
2) Add a second drive, convert to raid1
3) find corruptions?

What happens if you start with two drives in raid1?  In other words, I'm
trying to see if this is a problem with the conversion code.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs for files 10GB = random spontaneous CRC failure.

2013-01-14 Thread Tomasz Kusmierz

On 14/01/13 14:59, Chris Mason wrote:

On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote:

Hi,

Since I had some free time over Christmas, I decided to conduct few
tests over btrFS to se how it will cope with real life storage for
normal gray users and I've found that filesystem will always mess up
your files that are larger than 10GB.

Hi Tom,

I'd like to nail down the test case a little better.

1) Create on one drive, fill with data
2) Add a second drive, convert to raid1
3) find corruptions?

What happens if you start with two drives in raid1?  In other words, I'm
trying to see if this is a problem with the conversion code.

-chris
Ok, my description might be a bit enigmatic so to cut long story short 
tests are:
1) create a single drive default btrfs volume on single partition - 
fill with test data - scrub - admire errors.
2) create a raid1 (-d raid1 -m raid1) volume with two partitions on 
separate disk, each same size etc. - fill with test data - scrub - 
admire errors.
3) create a raid10 (-d raid10 -m raid1) volume with four partitions on 
separate disk, each same size etc. - fill with test data - scrub - 
admire errors.


all disks are same age + size + model ... two different batches to avoid 
same time failure.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel call trace when deleting a huge file (which used all the space) with btrfs

2013-01-14 Thread Chris Murphy

On Jan 14, 2013, at 12:11 AM, Reartes Guillermo rtgui...@gmail.com wrote:

 
 
 Recently i have found some issues with BTRFS, i am using F18.


For list reference, Fedora 18 ships with 
btrfs-progs-0.20.rc1.20121017git91d9eec-1.fc18, and 
kernel-3.6.10-4.fc18.x86_64, which is what your bug report shows.

Is this reproducible with either kernel-3.7.2-201.fc18 or 
kernel-3.8.0-0.rc3.git0.1.fc19? Both are in koji.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs for files 10GB = random spontaneous CRC failure.

2013-01-14 Thread Chris Mason
On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote:
 On 14/01/13 14:59, Chris Mason wrote:
  On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote:
  Hi,
 
  Since I had some free time over Christmas, I decided to conduct few
  tests over btrFS to se how it will cope with real life storage for
  normal gray users and I've found that filesystem will always mess up
  your files that are larger than 10GB.
  Hi Tom,
 
  I'd like to nail down the test case a little better.
 
  1) Create on one drive, fill with data
  2) Add a second drive, convert to raid1
  3) find corruptions?
 
  What happens if you start with two drives in raid1?  In other words, I'm
  trying to see if this is a problem with the conversion code.
 
  -chris
 Ok, my description might be a bit enigmatic so to cut long story short 
 tests are:
 1) create a single drive default btrfs volume on single partition - 
 fill with test data - scrub - admire errors.
 2) create a raid1 (-d raid1 -m raid1) volume with two partitions on 
 separate disk, each same size etc. - fill with test data - scrub - 
 admire errors.
 3) create a raid10 (-d raid10 -m raid1) volume with four partitions on 
 separate disk, each same size etc. - fill with test data - scrub - 
 admire errors.
 
 all disks are same age + size + model ... two different batches to avoid 
 same time failure.

Ok, so we have two possible causes.  #1 btrfs is writing garbage to your
disks.  #2 something in your kernel is corrupting your data.

Since you're able to see this 100% of the time, lets assume that if #2
were true, we'd be able to trigger it on other filesystems.

So, I've attached an old friend, stress.sh.  Use it like this:

stress.sh -n 5 -c your source directory -s your btrfs mount point

It will run in a loop with 5 parallel processes and make 5 copies of
your data set into the destination.  It will run forever until there are
errors.  You can use a higher process count (-n) to force more
concurrency and use more ram.  It may help to pin down all but 2 or 3 GB
of your memory.

What I'd like you to do is find a data set and command line that make
the script find errors on btrfs.  Then, try the same thing on xfs or
ext4 and let it run at least twice as long.  Then report back ;)

-chris

#!/bin/bash -
# -*- Shell-script -*-
#
# Copyright (C) 1999 Bibliotech Ltd., 631-633 Fulham Rd., London SW6 5UQ.
#
# $Id: stress.sh,v 1.2 1999/02/10 10:58:04 rich Exp $
#
# Change log:
#
# $Log: stress.sh,v $
# Revision 1.2  1999/02/10 10:58:04  rich
# Use cp instead of tar to copy.
#
# Revision 1.1  1999/02/09 15:13:38  rich
# Added first version of stress test program.
#

# Stress-test a file system by doing multiple
# parallel disk operations. This does everything
# in MOUNTPOINT/stress.

nconcurrent=50
content=/usr/doc
stagger=yes

while getopts c:n:s c; do
case $c in
c)
content=$OPTARG
;;
n)
nconcurrent=$OPTARG
;;
s)
stagger=no
;;
*)
echo 'Usage: stress.sh [-options] MOUNTPOINT'
echo 'Options: -c Content directory'
echo ' -n Number of concurrent accesses (default: 4)'
echo ' -s Avoid staggerring start times'
exit 1
;;
esac
done

shift $(($OPTIND-1))
if [ $# -ne 1 ]; then
echo 'For usage: stress.sh -?'
exit 1
fi

mountpoint=$1

echo 'Number of concurrent processes:' $nconcurrent
echo 'Content directory:' $content '(size:' `du -s $content | awk '{print $1}'` 
'KB)'

# Check the mount point is really a mount point.

#if [ `df | awk '{print $6}' | grep ^$mountpoint\$ | wc -l` -lt 1 ]; then
#echo $mountpoint: This doesn\'t seem to be a mountpoint. Try not
#echo to use a trailing / character.
#exit 1
#fi

# Create the directory, if it doesn't exist.

if [ ! -d $mountpoint/stress ]; then
rm -rf $mountpoint/stress
if ! mkdir $mountpoint/stress; then
echo Problem creating $mountpoint/stress directory. Do you have 
sufficient
echo access permissions\?
exit 1
fi
fi

echo Created $mountpoint/stress directory.

# Construct MD5 sums over the content directory.

echo -n Computing MD5 sums over content directory: 
( cd $content  find . -type f -print0 | xargs -0 md5sum | sort -o 
$mountpoint/stress/content.sums )
echo done.

# Start the stressing processes.

echo -n Starting stress test processes: 

pids=

p=1
while [ $p -le $nconcurrent ]; do
echo -n $p 

(

# Wait for all processes to start up.
if [ $stagger = yes ]; then
sleep $((10*$p))
else
sleep 10
fi

while true; do

# Remove old directories.
echo -n D$p 
rm -rf $mountpoint/stress/$p

# Copy content - partition.
echo -n W$p 
mkdir $mountpoint/stress/$p
base=`basename $content`

#( cd $content  tar cf - . ) | ( cd $mountpoint/stress/$p  tar 
xf - )
cp -dRx $content 

Re: btrfs for files 10GB = random spontaneous CRC failure.

2013-01-14 Thread Roman Mamedov
On Mon, 14 Jan 2013 15:22:36 +
Tomasz Kusmierz tom.kusmi...@gmail.com wrote:

 1) create a single drive default btrfs volume on single partition - 
 fill with test data - scrub - admire errors.

Did you try ruling out btrfs as the cause of the problem? Maybe something else
in your system is corrupting data, and btrfs just lets you know about that.

I.e. on the same drive, create an Ext4 filesystem, copy some data to it which
has known checksums (use md5sum or cfv to generate them in advance for data
that is on another drive and is waiting to be copied); copy to that drive,
flush caches, verify checksums of files at the destination.

-- 
With respect,
Roman

~~~
Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free.


signature.asc
Description: PGP signature


Re: btrfs for files 10GB = random spontaneous CRC failure.

2013-01-14 Thread Tomasz Kusmierz

On 14/01/13 15:57, Chris Mason wrote:

On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote:

On 14/01/13 14:59, Chris Mason wrote:

On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote:

Hi,

Since I had some free time over Christmas, I decided to conduct few
tests over btrFS to se how it will cope with real life storage for
normal gray users and I've found that filesystem will always mess up
your files that are larger than 10GB.

Hi Tom,

I'd like to nail down the test case a little better.

1) Create on one drive, fill with data
2) Add a second drive, convert to raid1
3) find corruptions?

What happens if you start with two drives in raid1?  In other words, I'm
trying to see if this is a problem with the conversion code.

-chris

Ok, my description might be a bit enigmatic so to cut long story short
tests are:
1) create a single drive default btrfs volume on single partition -
fill with test data - scrub - admire errors.
2) create a raid1 (-d raid1 -m raid1) volume with two partitions on
separate disk, each same size etc. - fill with test data - scrub -
admire errors.
3) create a raid10 (-d raid10 -m raid1) volume with four partitions on
separate disk, each same size etc. - fill with test data - scrub -
admire errors.

all disks are same age + size + model ... two different batches to avoid
same time failure.

Ok, so we have two possible causes.  #1 btrfs is writing garbage to your
disks.  #2 something in your kernel is corrupting your data.

Since you're able to see this 100% of the time, lets assume that if #2
were true, we'd be able to trigger it on other filesystems.

So, I've attached an old friend, stress.sh.  Use it like this:

stress.sh -n 5 -c your source directory -s your btrfs mount point

It will run in a loop with 5 parallel processes and make 5 copies of
your data set into the destination.  It will run forever until there are
errors.  You can use a higher process count (-n) to force more
concurrency and use more ram.  It may help to pin down all but 2 or 3 GB
of your memory.

What I'd like you to do is find a data set and command line that make
the script find errors on btrfs.  Then, try the same thing on xfs or
ext4 and let it run at least twice as long.  Then report back ;)

-chris


Chris,

Will do, just please be remember that 2TB of test data on customer 
grade sata drives will take a while to test :)




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs for files 10GB = random spontaneous CRC failure.

2013-01-14 Thread Tomasz Kusmierz

On 14/01/13 16:20, Roman Mamedov wrote:

On Mon, 14 Jan 2013 15:22:36 +
Tomasz Kusmierz tom.kusmi...@gmail.com wrote:


1) create a single drive default btrfs volume on single partition -
fill with test data - scrub - admire errors.

Did you try ruling out btrfs as the cause of the problem? Maybe something else
in your system is corrupting data, and btrfs just lets you know about that.

I.e. on the same drive, create an Ext4 filesystem, copy some data to it which
has known checksums (use md5sum or cfv to generate them in advance for data
that is on another drive and is waiting to be copied); copy to that drive,
flush caches, verify checksums of files at the destination.


Hi Roman,

Chris just provided his good old friend stress.sh that should do that. 
So I'll dive into more testing :)


Tom.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs for files 10GB = random spontaneous CRC failure.

2013-01-14 Thread Chris Mason
On Mon, Jan 14, 2013 at 09:32:25AM -0700, Tomasz Kusmierz wrote:
 On 14/01/13 15:57, Chris Mason wrote:
  On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote:
  On 14/01/13 14:59, Chris Mason wrote:
  On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote:
  Hi,
 
  Since I had some free time over Christmas, I decided to conduct few
  tests over btrFS to se how it will cope with real life storage for
  normal gray users and I've found that filesystem will always mess up
  your files that are larger than 10GB.
  Hi Tom,
 
  I'd like to nail down the test case a little better.
 
  1) Create on one drive, fill with data
  2) Add a second drive, convert to raid1
  3) find corruptions?
 
  What happens if you start with two drives in raid1?  In other words, I'm
  trying to see if this is a problem with the conversion code.
 
  -chris
  Ok, my description might be a bit enigmatic so to cut long story short
  tests are:
  1) create a single drive default btrfs volume on single partition -
  fill with test data - scrub - admire errors.
  2) create a raid1 (-d raid1 -m raid1) volume with two partitions on
  separate disk, each same size etc. - fill with test data - scrub -
  admire errors.
  3) create a raid10 (-d raid10 -m raid1) volume with four partitions on
  separate disk, each same size etc. - fill with test data - scrub -
  admire errors.
 
  all disks are same age + size + model ... two different batches to avoid
  same time failure.
  Ok, so we have two possible causes.  #1 btrfs is writing garbage to your
  disks.  #2 something in your kernel is corrupting your data.
 
  Since you're able to see this 100% of the time, lets assume that if #2
  were true, we'd be able to trigger it on other filesystems.
 
  So, I've attached an old friend, stress.sh.  Use it like this:
 
  stress.sh -n 5 -c your source directory -s your btrfs mount point
 
  It will run in a loop with 5 parallel processes and make 5 copies of
  your data set into the destination.  It will run forever until there are
  errors.  You can use a higher process count (-n) to force more
  concurrency and use more ram.  It may help to pin down all but 2 or 3 GB
  of your memory.
 
  What I'd like you to do is find a data set and command line that make
  the script find errors on btrfs.  Then, try the same thing on xfs or
  ext4 and let it run at least twice as long.  Then report back ;)
 
  -chris
 
 Chris,
 
 Will do, just please be remember that 2TB of test data on customer 
 grade sata drives will take a while to test :)

Many thanks.  You might want to start with a smaller data set, 20GB or
so total.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs-progs: libify some parts of btrfs-progs

2013-01-14 Thread Mark Fasheh
On Mon, Jan 14, 2013 at 11:43:44AM +0800, Anand Jain wrote:

 Mark,

  Good to create man libbtrfs ?

Are you asking if you should do this? If so yeah for sure, I won't complain
about sharing the work!

If you're asking whether I should, I'm not sure. I suppose it's probably a
good idea :) I'll have to look at other libfs manpages first to get an idea
of what goes in there.

At any rate, thanks :)
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can moving data to a subvolume not take as long as a fully copy?

2013-01-14 Thread Hugo Mills
On Mon, Jan 14, 2013 at 10:00:41AM -0800, Marc MERLIN wrote:
 On Mon, Jan 14, 2013 at 05:41:55PM +, Hugo Mills wrote:
  On Mon, Jan 14, 2013 at 09:32:50AM -0800, Marc MERLIN wrote:
   I made a mistake and copied data in the root of a new btrfs filesystem.
   I created a subvolume, and used mv to put everything in there.
   Something like:
   cd /mnt
   btrfs subvolume create dir
   mv * dir
   
   Except it's been running for over a day now (ok, it's 5TB of data)
   
   Looks like mv is really copying all the data as if it were an entirely
   different filesystem.
   
   Is there not a way to short circuit this and only update the metadata?
  
 I guess the best way of doing this in this case is to teach mv to
  do cp --reflink=always then unlink the origin.
  
 Clearly that won't work over mount boundaries (where a copy of the
  data is the best you're going to get), but that's not what you've got
  here.
 
 Mmmh, this made me think:
 It seems that I could have done cp --reflink without duplicating the data
 and running out of space.
 Then, I could have deleted the originals?
 
 Is that correct?

   Yup, exactly what I just said above. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Your problem is that you've got too much taste to be ---   
a web developer. 


signature.asc
Description: Digital signature


Re: [PATCH 2/3] btrfs-progs: libify some parts of btrfs-progs

2013-01-14 Thread Mark Fasheh
Hi Arvin!

On Mon, Jan 14, 2013 at 03:18:14PM +0100, Arvin Schnell wrote:
 please find attached a patch to make the new libbtrfs usable from
 C++ (at least for the parts snapper will likely need).

Thanks, that looks great. I'll integrate it into my stack of patches and
send it out with the next set.

Can I put:

Signed-off-by: Arvin Schnell aschn...@suse.de

at the bottom of the patch to indicate that started with you?

Thanks,
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs-progs: libify some parts of btrfs-progs

2013-01-14 Thread Mark Fasheh
On Mon, Jan 14, 2013 at 11:42:05AM +0800, Anand Jain wrote:


 Mark,

  Its bit strange, the steps given before can reproduce
  the problem still on my older workspace. However when
  I try with the fresh clone, it can reproduce the issue
  (4 times, consistently) only with the following (new)
  steps..

Ok.


 # git clone 
 git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

 # cd btrfs-progs/
 # make version; make install

Ahh, this *does* reproduce for me. I think we just need to add libbtrfs as a
target somewhere. I'll fix it up. Thanks so much for drilling down to a test
case!
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11 V3] add show command to the subvol sub command

2013-01-14 Thread Goffredo Baroncelli
On 01/14/2013 05:04 AM, Anand Jain wrote:
 
 
 Any comments on this new sub-command, please. ?
 
 Thanks, Anand
 
I am trying to use this new command. Very nice. However I tried to use
it against the root of filesystem, without success:

The root of filesystem is under /var/btrfs; I used a subvolume as root:

$ cat /proc/self/mountinfo  | grep sdc3
19 1 0:15 /__active / rw,noatime,nodiratime - btrfs /dev/sdc3 rw,space_cache
25 19 0:15 / /var/btrfs rw,noatime,nodiratime - btrfs /dev/sdc3
rw,space_cache


If I do:

$ #test 1
$ sudo ./btrfs su show /

I got nothing

If I do

$ #test 2
$ sudo ./btrfs su show /var/btrfs/

still, I got nothing

$ #test 3
$  sudo ./btrfs su show /var/btrfs/__active

I, finally, got:

/var/btrfs/__active
uuid:   835c96b8-c066-554b-9230-1c531e831ff6
Parent uuid:-
Creation time:  -
Object ID:  256
Generation (Gen):   75774
Gen at creation:0
Parent: 5
Top Level:  5
Snapshot(s):


I expected also from test1 and test2 something.

BR
G.Baroncelli





-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] On a diff-send, avoid sending PREALLOC extent

2013-01-14 Thread Josef Bacik
On Wed, Jan 09, 2013 at 10:41:10AM -0700, Alex Lyakas wrote:
 Subject: [PATCH 2/2] On a diff-send, avoid sending PREALLOC extents, if the
  parent root has only PREALLOC extents on an appropriate
  file range.
 
 This does not fully avoids sending PREALLOC extents, because on full-send or
 new inode we need a new send command to do that. But this patch improves
 the situation by handling diff-sends.
 
 Signed-off-by: Alex Lyakas alex.bt...@zadarastorage.com
 ---

Malformed patch and it confused patchwork, please resend.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs-progs: libify some parts of btrfs-progs

2013-01-14 Thread Arvin Schnell
On Mon, Jan 14, 2013 at 10:14:03AM -0800, Mark Fasheh wrote:
 Hi Arvin!
 
 On Mon, Jan 14, 2013 at 03:18:14PM +0100, Arvin Schnell wrote:
  please find attached a patch to make the new libbtrfs usable from
  C++ (at least for the parts snapper will likely need).
 
 Thanks, that looks great. I'll integrate it into my stack of patches and
 send it out with the next set.
 
 Can I put:
 
 Signed-off-by: Arvin Schnell aschn...@suse.de
 
 at the bottom of the patch to indicate that started with you?

Sure.

Regards,
  Arvin

-- 
Arvin Schnell, aschn...@suse.de
Senior Software Engineer, Research  Development
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
16746 (AG Nürnberg)
Maxfeldstraße 5
90409 Nürnberg
Germany
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs-progs: libify some parts of btrfs-progs

2013-01-14 Thread Mark Fasheh
On Mon, Jan 14, 2013 at 11:42:05AM +0800, Anand Jain wrote:
  Its bit strange, the steps given before can reproduce
  the problem still on my older workspace. However when
  I try with the fresh clone, it can reproduce the issue
  (4 times, consistently) only with the following (new)
  steps..

 -
 # git clone 
 git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

 # cd btrfs-progs/
 # make version; make install

Ok, I was able to fix this by adding the libs target to the various other
targets.

A series of patches with all the fixes (build and whitespace) and Arvin's
patch can be found at:

https://github.com/markfasheh/btrfs-progs-patches/tree/no-data-and-libify

If nothing else comes up, I will send them out tommorrow. Thanks again to
you all for looking at this.
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: add UUID switches to mkfs and convert

2013-01-14 Thread David Sterba
On Sun, Jan 06, 2013 at 04:32:11PM +0100, Florian Albrechtskirchinger wrote:
 Add the following switches to mkfs.btrfs and update man page:
 * -U UUID, --uuid UUID
 
 Add the following switches to btrfs-convert:
 * -U UUID
 * -U new
   Generates a random UUID (default behavior).
 * -U copy
   Copies the UUID from the ext2fs.

Sounds useful, thanks. There are minor comments to the
documentation/help strings.

More than one filesystem with the same UUID brings trouble when mounting
by UUID, a check that would at least warn that there are more such
filesystems would be good. This should catch easy  silly copy-paste
errors.

 --- a/convert.c
 +++ b/convert.c
 @@ -2752,23 +2759,28 @@ fail:
  
  static void print_usage(void)
  {
 - printf(usage: btrfs-convert [-d] [-i] [-n] [-r] device\n);
 + printf(usage: btrfs-convert [-d] [-i] [-n] [-r]
 + [-U UUID | new | copy] device\n);
   printf(\t-d disable data checksum\n);
   printf(\t-i ignore xattrs and ACLs\n);
   printf(\t-n disable packing of small files\n);
   printf(\t-r roll back to ext2fs\n);
 + printf(\t-U UUID specify FS UUID\n);
 + printf(\t-U new generate random FS UUID (default)\n);
 + printf(\t-U copy copy FS UUID from ext2fs\n);

From this is not clear that 'new' and 'copy' should be typed verbatim
(ie.  that it's not part of the help text), the expected format of the
UUID like specify FS UUID in canonical format'.

  }
  
 --- a/mkfs.c
 +++ b/mkfs.c
 @@ -347,6 +347,7 @@ static void print_usage(void)
   fprintf(stderr, \t -M --mixed mix metadata and data together\n);
   fprintf(stderr, \t -n --nodesize size of btree nodes\n);
   fprintf(stderr, \t -s --sectorsize min block allocation\n);
 + fprintf(stderr, \t -U --uuid FS UUID\n);

maybe a little bit enhanced with the 'in canonical format'.

   fprintf(stderr, \t -r --rootdir the source directory\n);
   fprintf(stderr, \t -K --nodiscard do not perform whole device TRIM\n);
   fprintf(stderr, %s\n, BTRFS_BUILD_VERSION);
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel call trace when deleting a huge file (which used all the space) with btrfs

2013-01-14 Thread Reartes Guillermo
I am afraid it is reproducible with kernel 3.7.2-201.fc18.x86_64, with
the same procedure.

Cheers.

On Mon, Jan 14, 2013 at 12:25 PM, Chris Murphy li...@colorremedies.com wrote:

 On Jan 14, 2013, at 12:11 AM, Reartes Guillermo rtgui...@gmail.com
 wrote:



 Recently i have found some issues with BTRFS, i am using F18.


 For list reference, Fedora 18 ships with
 btrfs-progs-0.20.rc1.20121017git91d9eec-1.fc18, and
 kernel-3.6.10-4.fc18.x86_64, which is what your bug report shows.

 Is this reproducible with either kernel-3.7.2-201.fc18 or
 kernel-3.8.0-0.rc3.git0.1.fc19? Both are in koji.


 Chris Murphy--
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: partition question

2013-01-14 Thread dima

I don't know this area of the code at all well, but as I understand
it, there's been some work in the kernel (swap over NFS) which lays
down some of the underlying infrastructure we'd need to support
swapfiles on btrfs, but we don't have anything beyond that. I don't
know of anyone working on it, either.

Hugo.



You can use swapfile on btrfs with mounting it via a loop device. It 
won't be incredibly fast, but it will work.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/11] Btrfs: use bit operation for -fs_state

2013-01-14 Thread Liu Bo
On Mon, Jan 14, 2013 at 03:50:31PM +0800, Miao Xie wrote:
 Onthu, 10 Jan 2013 18:57:35 +0100, David Sterba wrote:
  On Thu, Jan 10, 2013 at 08:51:59PM +0800, Miao Xie wrote:
  There is no lock to protect fs_info-fs_state, it will introduce some 
  problems,
  such as the value may be covered by the other task when several tasks 
  modify
  it. Now we use bit operation for it to fix the above problem.
  
  Can you please describe in more detail how does that happen and to what
  problems it leads?
 
 For example:
   Task0 - CPU0Task1 - CPU1
   mov %fs_state rax
   or $0x1 rax
   mov %fs_state rax
   or $0x2 rax
   mov rax %fs_state
   mov rax %fs_state
 
 The expected value is 3, but in fact, it is 2

The code shows that fs_state is only set by open_ctree() and
save_error_info(), how could the above race can happen?

Although I'm ok with this as a harmless cleanup patch, I'm afraid the commit log
is not persuadable anyway.

thanks,
liubo

 
 Thanks
 Miao
 
  
  thanks,
  david
  
  Signed-off-by: Miao Xie mi...@cn.fujitsu.com
  ---
   fs/btrfs/ctree.h   | 4 +++-
   fs/btrfs/disk-io.c | 5 +++--
   fs/btrfs/file.c| 2 +-
   fs/btrfs/scrub.c   | 2 +-
   fs/btrfs/super.c   | 4 ++--
   fs/btrfs/transaction.c | 9 -
   6 files changed, 14 insertions(+), 12 deletions(-)
 
  diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
  index c95b539..c34e36e 100644
  --- a/fs/btrfs/ctree.h
  +++ b/fs/btrfs/ctree.h
  @@ -338,7 +338,9 @@ static inline unsigned long btrfs_chunk_item_size(int 
  num_stripes)
   /*
* File system states
*/
  +#define BTRFS_FS_STATE_ERROR  0
   
  +/* Super block flags */
   /* Errors detected */
   #define BTRFS_SUPER_FLAG_ERROR(1ULL  2)
   
  @@ -1540,7 +1542,7 @@ struct btrfs_fs_info {
 u64 qgroup_seq;
   
 /* filesystem state */
  -  u64 fs_state;
  +  unsigned long fs_state;
   
 struct btrfs_delayed_root *delayed_root;
   
  diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
  index cf03a45..d06e50c 100644
  --- a/fs/btrfs/disk-io.c
  +++ b/fs/btrfs/disk-io.c
  @@ -2196,7 +2196,8 @@ int open_ctree(struct super_block *sb,
 goto fail_alloc;
   
 /* check FS state, whether FS is broken. */
  -  fs_info-fs_state |= btrfs_super_flags(disk_super);
  +  if (btrfs_super_flags(disk_super)  BTRFS_SUPER_FLAG_ERROR)
  +  set_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state);
   
 ret = btrfs_check_super_valid(fs_info, sb-s_flags  MS_RDONLY);
 if (ret) {
  @@ -3354,7 +3355,7 @@ int close_ctree(struct btrfs_root *root)
 printk(KERN_ERR btrfs: commit super ret %d\n, ret);
 }
   
  -  if (fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR)
  +  if (test_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state))
 btrfs_error_commit_super(root);
   
 btrfs_put_block_group_cache(fs_info);
  diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
  index 3e9fa0e..ec87b69 100644
  --- a/fs/btrfs/file.c
  +++ b/fs/btrfs/file.c
  @@ -1531,7 +1531,7 @@ static ssize_t btrfs_file_aio_write(struct kiocb 
  *iocb,
  * although we have opened a file as writable, we have
  * to stop this write operation to ensure FS consistency.
  */
  -  if (root-fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR) {
  +  if (test_bit(BTRFS_FS_STATE_ERROR, root-fs_info-fs_state)) {
 mutex_unlock(inode-i_mutex);
 err = -EROFS;
 goto out;
  diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
  index af0b566..2e91b56 100644
  --- a/fs/btrfs/scrub.c
  +++ b/fs/btrfs/scrub.c
  @@ -2700,7 +2700,7 @@ static noinline_for_stack int scrub_supers(struct 
  scrub_ctx *sctx,
 int ret;
 struct btrfs_root *root = sctx-dev_root;
   
  -  if (root-fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR)
  +  if (test_bit(BTRFS_FS_STATE_ERROR, root-fs_info-fs_state))
 return -EIO;
   
 gen = atomic64_read(root-fs_info-last_trans_committed);
  diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
  index 6f0524d..f714379 100644
  --- a/fs/btrfs/super.c
  +++ b/fs/btrfs/super.c
  @@ -98,7 +98,7 @@ static void __save_error_info(struct btrfs_fs_info 
  *fs_info)
  * today we only save the error info into ram.  Long term we'll
  * also send it down to the disk
  */
  -  fs_info-fs_state = BTRFS_SUPER_FLAG_ERROR;
  +  set_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state);
   }
   
   static void save_error_info(struct btrfs_fs_info *fs_info)
  @@ -114,7 +114,7 @@ static void btrfs_handle_error(struct btrfs_fs_info 
  *fs_info)
 if (sb-s_flags  MS_RDONLY)
 return;
   
  -  if (fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR) {
  +  if (test_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state)) {
 sb-s_flags |= MS_RDONLY;
 printk(KERN_INFO btrfs is forced readonly\n);
 /*
  diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
  index 7999bf8..a950d48 

Re: [PATCH 10/11] Btrfs: use bit operation for -fs_state

2013-01-14 Thread Miao Xie
On tue, 15 Jan 2013 12:03:03 +0800, Liu Bo wrote:
 On Mon, Jan 14, 2013 at 03:50:31PM +0800, Miao Xie wrote:
 On   thu, 10 Jan 2013 18:57:35 +0100, David Sterba wrote:
 On Thu, Jan 10, 2013 at 08:51:59PM +0800, Miao Xie wrote:
 There is no lock to protect fs_info-fs_state, it will introduce some 
 problems,
 such as the value may be covered by the other task when several tasks 
 modify
 it. Now we use bit operation for it to fix the above problem.

 Can you please describe in more detail how does that happen and to what
 problems it leads?

 For example:
  Task0 - CPU0Task1 - CPU1
  mov %fs_state rax
  or $0x1 rax
  mov %fs_state rax
  or $0x2 rax
  mov rax %fs_state
  mov rax %fs_state

 The expected value is 3, but in fact, it is 2
 
 The code shows that fs_state is only set by open_ctree() and
 save_error_info(), how could the above race can happen?

The reason that the above race can not happen is because there is only one flag 
currently.
But as we know, -fs_state can be accessed and updated by multi-task, so the 
current code
is error prone, if we add other flags, the above problem will happen to a 
certainty. (Adding
new flags is very likely) So why not write right and robust code at the 
beginning?

 Although I'm ok with this as a harmless cleanup patch, I'm afraid the commit 
 log
 is not persuadable anyway.

I think the changelog is right since -fs_state can be accessed and updated by 
multi-task
actually.

Thanks
Miao

 
 thanks,
 liubo
 

 Thanks
 Miao


 thanks,
 david

 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/ctree.h   | 4 +++-
  fs/btrfs/disk-io.c | 5 +++--
  fs/btrfs/file.c| 2 +-
  fs/btrfs/scrub.c   | 2 +-
  fs/btrfs/super.c   | 4 ++--
  fs/btrfs/transaction.c | 9 -
  6 files changed, 14 insertions(+), 12 deletions(-)

 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index c95b539..c34e36e 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -338,7 +338,9 @@ static inline unsigned long btrfs_chunk_item_size(int 
 num_stripes)
  /*
   * File system states
   */
 +#define BTRFS_FS_STATE_ERROR  0
  
 +/* Super block flags */
  /* Errors detected */
  #define BTRFS_SUPER_FLAG_ERROR(1ULL  2)
  
 @@ -1540,7 +1542,7 @@ struct btrfs_fs_info {
u64 qgroup_seq;
  
/* filesystem state */
 -  u64 fs_state;
 +  unsigned long fs_state;
  
struct btrfs_delayed_root *delayed_root;
  
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index cf03a45..d06e50c 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -2196,7 +2196,8 @@ int open_ctree(struct super_block *sb,
goto fail_alloc;
  
/* check FS state, whether FS is broken. */
 -  fs_info-fs_state |= btrfs_super_flags(disk_super);
 +  if (btrfs_super_flags(disk_super)  BTRFS_SUPER_FLAG_ERROR)
 +  set_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state);
  
ret = btrfs_check_super_valid(fs_info, sb-s_flags  MS_RDONLY);
if (ret) {
 @@ -3354,7 +3355,7 @@ int close_ctree(struct btrfs_root *root)
printk(KERN_ERR btrfs: commit super ret %d\n, ret);
}
  
 -  if (fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR)
 +  if (test_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state))
btrfs_error_commit_super(root);
  
btrfs_put_block_group_cache(fs_info);
 diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
 index 3e9fa0e..ec87b69 100644
 --- a/fs/btrfs/file.c
 +++ b/fs/btrfs/file.c
 @@ -1531,7 +1531,7 @@ static ssize_t btrfs_file_aio_write(struct kiocb 
 *iocb,
 * although we have opened a file as writable, we have
 * to stop this write operation to ensure FS consistency.
 */
 -  if (root-fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR) {
 +  if (test_bit(BTRFS_FS_STATE_ERROR, root-fs_info-fs_state)) {
mutex_unlock(inode-i_mutex);
err = -EROFS;
goto out;
 diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
 index af0b566..2e91b56 100644
 --- a/fs/btrfs/scrub.c
 +++ b/fs/btrfs/scrub.c
 @@ -2700,7 +2700,7 @@ static noinline_for_stack int scrub_supers(struct 
 scrub_ctx *sctx,
int ret;
struct btrfs_root *root = sctx-dev_root;
  
 -  if (root-fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR)
 +  if (test_bit(BTRFS_FS_STATE_ERROR, root-fs_info-fs_state))
return -EIO;
  
gen = atomic64_read(root-fs_info-last_trans_committed);
 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 6f0524d..f714379 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -98,7 +98,7 @@ static void __save_error_info(struct btrfs_fs_info 
 *fs_info)
 * today we only save the error info into ram.  Long term we'll
 * also send it down to the disk
 */
 -  fs_info-fs_state = BTRFS_SUPER_FLAG_ERROR;
 +  set_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state);
  }
  
  static void save_error_info(struct btrfs_fs_info *fs_info)
 @@ -114,7 +114,7 @@ static void 

[PATCH V2 11/11] Btrfs: Add ACCESS_ONCE() to transaction-abort accesses

2013-01-14 Thread Miao Xie
We may access and update transaction-aborted on the different CPUs without
lock, so we need ACCESS_ONCE() wrapper to prevent the compiler from creating
unsolicited accesses and make sure we can get the right value.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1 - v2:
- modify the changelog
- split the old patch into two patches, this is the first one - just add
  ACCESS_ONCE() wrapper.
---
 fs/btrfs/super.c   | 2 +-
 fs/btrfs/transaction.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f714379..0d88513 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -267,7 +267,7 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle 
*trans,
 function, line, errstr);
return;
}
-   trans-transaction-aborted = errno;
+   ACCESS_ONCE(trans-transaction-aborted) = errno;
__btrfs_std_error(root-fs_info, function, line, errno, NULL);
 }
 /*
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index a950d48..7dbfe2c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1468,7 +1468,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
goto cleanup_transaction;
}
 
-   if (cur_trans-aborted) {
+   /* Stop the commit early if -aborted is set */
+   if (unlikely(ACCESS_ONCE(cur_trans-aborted))) {
ret = cur_trans-aborted;
goto cleanup_transaction;
}
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix missed transaction-aborted check

2013-01-14 Thread Miao Xie
First, though the current transaction-aborted check can stop the commit early
and avoid unnecessary operations, it is too early, and some transaction handles
don't end, those handles may set transaction-aborted after the check.

Second, when we commit the transaction, we will wake up some worker threads to
flush the space cache and inode cache. Those threads also allocate some 
transaction
handles and may set transaction-aborted if some serious error happens.

So we need more check for -aborted when committing the transaction. Fix it.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
This patch is against the following patch:
[PATCH V2 11/11] Btrfs: Add ACCESS_ONCE() to transaction-abort accesses
---
 fs/btrfs/transaction.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 7dbfe2c..50437b4 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1575,6 +1575,11 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
wait_event(cur_trans-writer_wait,
   atomic_read(cur_trans-num_writers) == 1);
 
+   /* -aborted might be set after the previous check, so check it */
+   if (unlikely(ACCESS_ONCE(cur_trans-aborted))) {
+   ret = cur_trans-aborted;
+   goto cleanup_transaction;
+   }
/*
 * the reloc mutex makes sure that we stop
 * the balancing code from coming in and moving
@@ -1658,6 +1663,17 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
goto cleanup_transaction;
}
 
+   /*
+* The tasks which save the space cache and inode cache may also
+* update -aborted, check it.
+*/
+   if (unlikely(ACCESS_ONCE(cur_trans-aborted))) {
+   ret = cur_trans-aborted;
+   mutex_unlock(root-fs_info-tree_log_mutex);
+   mutex_unlock(root-fs_info-reloc_mutex);
+   goto cleanup_transaction;
+   }
+
btrfs_prepare_extent_commit(trans, root);
 
cur_trans = root-fs_info-running_transaction;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can moving data to a subvolume not take as long as a fully copy?

2013-01-14 Thread David Brown
Marc MERLIN m...@merlins.org writes:

 I made a mistake and copied data in the root of a new btrfs filesystem.
 I created a subvolume, and used mv to put everything in there.
 Something like:
 cd /mnt
 btrfs subvolume create dir
 mv * dir

 Except it's been running for over a day now (ok, it's 5TB of data)

 Looks like mv is really copying all the data as if it were an entirely
 different filesystem.

 Is there not a way to short circuit this and only update the metadata?

Why not make a snapshot of the root volume, and then delete the files
you want to move from the original root, and delete the rest of root
from the snapshot?

David
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11 V3] add show command to the subvol sub command

2013-01-14 Thread Anand Jain


Goffredo,

 Thanks for the review.

 I expected also from test1 and test2 something.

 actually it is working as intended. which is
 to show more info of any item under btrfs su list output,
 and root itself won't be in the btrfs su list (unless
 -a option is used), otherwise any suggestion what
 is good to have for btrfs su show / ?

Anand

On 01/15/2013 02:25 AM, Goffredo Baroncelli wrote:

On 01/14/2013 05:04 AM, Anand Jain wrote:



Any comments on this new sub-command, please. ?

Thanks, Anand


I am trying to use this new command. Very nice. However I tried to use
it against the root of filesystem, without success:

The root of filesystem is under /var/btrfs; I used a subvolume as root:

$ cat /proc/self/mountinfo  | grep sdc3
19 1 0:15 /__active / rw,noatime,nodiratime - btrfs /dev/sdc3 rw,space_cache
25 19 0:15 / /var/btrfs rw,noatime,nodiratime - btrfs /dev/sdc3
rw,space_cache


If I do:

$ #test 1
$ sudo ./btrfs su show /

I got nothing

If I do

$ #test 2
$ sudo ./btrfs su show /var/btrfs/

still, I got nothing

$ #test 3
$  sudo ./btrfs su show /var/btrfs/__active

I, finally, got:

/var/btrfs/__active
uuid:   835c96b8-c066-554b-9230-1c531e831ff6
Parent uuid:-
Creation time:  -
Object ID:  256
Generation (Gen):   75774
Gen at creation:0
Parent: 5
Top Level:  5
Snapshot(s):


I expected also from test1 and test2 something.

BR
G.Baroncelli






--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html