getdents - ext4 vs btrfs performance

2012-02-29 Thread Jacek Luczak
Hi All,

Long story short: We've found that operations on a directory structure
holding many dirs takes ages on ext4.

The Question: Why there's that huge difference in ext4 and btrfs? See
below test results for real values.

Background: I had to backup a Jenkins directory holding workspace for
few projects which were co from svn (implies lot of extra .svn dirs).
The copy takes lot of time (at least more than I've expected) and
process was mostly in D (disk sleep). I've dig more and done some
extra test to see if this is not a regression on block/fs site. To
isolate the issue I've also performed same tests on btrfs.

Test environment configuration:
1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
2) Kernels: All tests were done on following kernels:
 - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
config changes mostly. In -3 we've introduced ,,fix readahead pipeline
break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
 - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
release recently).
3) A subject of tests, directory holding:
 - 54GB of data (measured on ext4)
 - 1978149 files
 - 844008 directories
4) Mount options:
 - ext4 -- errors=remount-ro,noatime,data=writeback
 - btrfs -- noatime,nodatacow and for later investigation on
copression effect: noatime,nodatacow,compress=lzo

In all tests I've been measuring time of execution. Following tests
were performed:
- find . -type d
- find . -type f
- cp -a
- rm -rf

Ext4 results:
| Type | 2.6.39.4-3   | 3.2.7
| Dir cnt  | 17m 40sec  | 11m 20sec
| File cnt |  17m 36sec | 11m 22sec
| Copy| 1h 28m| 1h 27m
| Remove| 3m 43sec
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


chris.ma...@oracle.com, Al Viro v...@zeniv.linux.org.uk, Ted Ts'o ty...@mit.edu

2012-02-29 Thread Jacek Luczak
Hi All,

/*Sorry for sending incomplete email, hit wrong button :) */

Long story short: We've found that operations on a directory structure
holding many dirs takes ages on ext4.

The Question: Why there's that huge difference in ext4 and btrfs? See
below test results for real values.

Background: I had to backup a Jenkins directory holding workspace for
few projects which were co from svn (implies lot of extra .svn dirs).
The copy takes lot of time (at least more than I've expected) and
process was mostly in D (disk sleep). I've dig more and done some
extra test to see if this is not a regression on block/fs site. To
isolate the issue I've also performed same tests on btrfs.

Test environment configuration:
1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
2) Kernels: All tests were done on following kernels:
 - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
config changes mostly. In -3 we've introduced ,,fix readahead pipeline
break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
 - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
release recently).
3) A subject of tests, directory holding:
 - 54GB of data (measured on ext4)
 - 1978149 files
 - 844008 directories
4) Mount options:
 - ext4 -- errors=remount-ro,noatime,data=writeback
 - btrfs -- noatime,nodatacow and for later investigation on
copression effect: noatime,nodatacow,compress=lzo

In all tests I've been measuring time of execution. Following tests
were performed:
- find . -type d
- find . -type f
- cp -a
- rm -rf

Ext4 results:
| Type | 2.6.39.4-3   | 3.2.7
| Dir cnt  | 17m 40sec  | 11m 20sec
| File cnt |  17m 36sec | 11m 22sec
| Copy| 1h 28m| 1h 27m
| Remove| 3m 43sec| 3m 38sec

Btrfs results (without lzo comression):
| Type | 2.6.39.4-3   | 3.2.7
| Dir cnt  | 2m 22sec  | 2m 21sec
| File cnt |  2m 26sec | 2m 23sec
| Copy| 36m 22sec | 39m 35sec
| Remove| 7m 51sec   | 10m 43sec

From above one can see that copy takes close to 1h less on btrfs. I've
done strace counting times of calls, results are as follows (from
3.2.7):
1) Ext4 (only to elements):
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 57.01   13.257850   1  15082163   read
 23.405.440353   3   1687702   getdents
  6.151.430559   0   3672418   lstat
  3.800.883767   0  13106961   write
  2.320.539959   0   4794099   open
  1.690.393589   0843695   mkdir
  1.280.296700   0   5637802   setxattr
  0.800.186539   0   7325195   stat

2) Btrfs:
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
53.389.486210   1  15179751   read
11.382.021662   1   1688328   getdents
 10.641.890234   0   4800317   open
  6.831.213723   0  13201590   write
  4.850.862731   0   5644314   setxattr
  3.500.621194   1844008   mkdir
  2.750.489059   0   3675992 1 lstat
  1.710.303544   0   5644314   llistxattr
  1.500.265943   0   1978149   utimes
  1.020.180585   0   5644314844008 getxattr

On btrfs getdents takes much less time which prove the bottleneck in
copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time
for getdents:
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 50.77   10.978816   1  15033132   read
 14.463.125996   1   4733589   open
  7.151.546311   0   5566988   setxattr
  5.891.273845   0   3626505   lstat
  5.811.255858   1   1667050   getdents
  5.661.224403   0  13083022   write
  3.400.735114   1833371   mkdir
  1.960.424881   0   5566988   llistxattr


Why so huge difference in the getdents timings?

-Jacek
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getdents - ext4 vs btrfs performance

2012-02-29 Thread Chris Mason
On Wed, Feb 29, 2012 at 02:31:03PM +0100, Jacek Luczak wrote:
 Hi All,
 
 Long story short: We've found that operations on a directory structure
 holding many dirs takes ages on ext4.
 
 The Question: Why there's that huge difference in ext4 and btrfs? See
 below test results for real values.
 
 Background: I had to backup a Jenkins directory holding workspace for
 few projects which were co from svn (implies lot of extra .svn dirs).
 The copy takes lot of time (at least more than I've expected) and
 process was mostly in D (disk sleep). I've dig more and done some
 extra test to see if this is not a regression on block/fs site. To
 isolate the issue I've also performed same tests on btrfs.
 
 Test environment configuration:
 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
 enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
 2) Kernels: All tests were done on following kernels:
  - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
 config changes mostly. In -3 we've introduced ,,fix readahead pipeline
 break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
  - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
 release recently).
 3) A subject of tests, directory holding:
  - 54GB of data (measured on ext4)
  - 1978149 files
  - 844008 directories
 4) Mount options:
  - ext4 -- errors=remount-ro,noatime,data=writeback
  - btrfs -- noatime,nodatacow and for later investigation on
 copression effect: noatime,nodatacow,compress=lzo

For btrfs, nodatacow and compression don't really mix.  The compression
will just override it. (Just FYI, not really related to these results).

 
 In all tests I've been measuring time of execution. Following tests
 were performed:
 - find . -type d
 - find . -type f
 - cp -a
 - rm -rf
 
 Ext4 results:
 | Type | 2.6.39.4-3   | 3.2.7
 | Dir cnt  | 17m 40sec  | 11m 20sec
 | File cnt |  17m 36sec | 11m 22sec
 | Copy| 1h 28m| 1h 27m
 | Remove| 3m 43sec

Are the btrfs numbers missing? ;)

In order for btrfs to be faster for cp -a, the files probably didn't
change much since creation.  Btrfs maintains extra directory indexes
that help in sequential backup scans, but this usually means slower
delete performance.

But, how exactly did you benchmark it?  If you compare a fresh
mkfs.btrfs where you just copied all the data over with an ext4 FS that
has been on the disk for a long time, it isn't quite fair to ext4.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


getdents - ext4 vs btrfs performance

2012-02-29 Thread Jacek Luczak
Hi All,

/*Sorry for sending incomplete email, hit wrong button :) I guess I
can't use Gmail */

Long story short: We've found that operations on a directory structure
holding many dirs takes ages on ext4.

The Question: Why there's that huge difference in ext4 and btrfs? See
below test results for real values.

Background: I had to backup a Jenkins directory holding workspace for
few projects which were co from svn (implies lot of extra .svn dirs).
The copy takes lot of time (at least more than I've expected) and
process was mostly in D (disk sleep). I've dig more and done some
extra test to see if this is not a regression on block/fs site. To
isolate the issue I've also performed same tests on btrfs.

Test environment configuration:
1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
2) Kernels: All tests were done on following kernels:
 - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
config changes mostly. In -3 we've introduced ,,fix readahead pipeline
break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
 - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
release recently).
3) A subject of tests, directory holding:
 - 54GB of data (measured on ext4)
 - 1978149 files
 - 844008 directories
4) Mount options:
 - ext4 -- errors=remount-ro,noatime,
data=writeback
 - btrfs -- noatime,nodatacow and for later investigation on
copression effect: noatime,nodatacow,compress=lzo

In all tests I've been measuring time of execution. Following tests
were performed:
- find . -type d
- find . -type f
- cp -a
- rm -rf

Ext4 results:
| Type | 2.6.39.4-3   | 3.2.7
| Dir cnt  | 17m 40sec  | 11m 20sec
| File cnt |  17m 36sec | 11m 22sec
| Copy| 1h 28m| 1h 27m
| Remove| 3m 43sec| 3m 38sec

Btrfs results (without lzo comression):
| Type | 2.6.39.4-3   | 3.2.7
| Dir cnt  | 2m 22sec  | 2m 21sec
| File cnt |  2m 26sec | 2m 23sec
| Copy| 36m 22sec | 39m 35sec
| Remove| 7m 51sec   | 10m 43sec

From above one can see that copy takes close to 1h less on btrfs. I've
done strace counting times of calls, results are as follows (from
3.2.7):
1) Ext4 (only to elements):
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 57.01   13.257850   1  15082163   read
 23.405.440353   3   1687702   getdents
 6.151.430559   0   3672418   lstat
 3.800.883767   0  13106961   write
 2.320.539959   0   4794099   open
 1.690.393589   0843695   mkdir
 1.280.296700   0   5637802   setxattr
 0.800.186539   0   7325195   stat

2) Btrfs:
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
53.389.486210   1  15179751   read
11.382.021662   1   1688328   getdents
 10.641.890234   0   4800317   open
 6.831.213723   0  13201590   write
 4.850.862731   0   5644314   setxattr
 3.500.621194   1844008   mkdir
 2.750.489059   0   3675992 1 lstat
 1.710.303544   0   5644314   llistxattr
 1.500.265943   0   1978149   utimes
 1.020.180585   0   5644314844008 getxattr

On btrfs getdents takes much less time which prove the bottleneck in
copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time
for getdents:
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 50.77   10.978816   1  15033132   read
 14.463.125996   1   4733589   open
 7.151.546311   0   5566988   setxattr
 5.891.273845   0   3626505   lstat
 5.811.255858   1   1667050   getdents
 5.661.224403   0  13083022   write
 3.400.735114   1833371   mkdir
 1.960.424881   0   5566988   llistxattr


Why so huge difference in the getdents timings?

-Jacek
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getdents - ext4 vs btrfs performance

2012-02-29 Thread Jacek Luczak
Hi Chris,

the last one was borked :) Please check this one.

-jacek

2012/2/29 Jacek Luczak difrost.ker...@gmail.com:
 Hi All,

 /*Sorry for sending incomplete email, hit wrong button :) I guess I
 can't use Gmail */

 Long story short: We've found that operations on a directory structure
 holding many dirs takes ages on ext4.

 The Question: Why there's that huge difference in ext4 and btrfs? See
 below test results for real values.

 Background: I had to backup a Jenkins directory holding workspace for
 few projects which were co from svn (implies lot of extra .svn dirs).
 The copy takes lot of time (at least more than I've expected) and
 process was mostly in D (disk sleep). I've dig more and done some
 extra test to see if this is not a regression on block/fs site. To
 isolate the issue I've also performed same tests on btrfs.

 Test environment configuration:
 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
 enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
 2) Kernels: All tests were done on following kernels:
  - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
 config changes mostly. In -3 we've introduced ,,fix readahead pipeline
 break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
  - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
 release recently).
 3) A subject of tests, directory holding:
  - 54GB of data (measured on ext4)
  - 1978149 files
  - 844008 directories
 4) Mount options:
  - ext4 -- errors=remount-ro,noatime,
 data=writeback
  - btrfs -- noatime,nodatacow and for later investigation on
 copression effect: noatime,nodatacow,compress=lzo

 In all tests I've been measuring time of execution. Following tests
 were performed:
 - find . -type d
 - find . -type f
 - cp -a
 - rm -rf

 Ext4 results:
 | Type     | 2.6.39.4-3   | 3.2.7
 | Dir cnt  | 17m 40sec  | 11m 20sec
 | File cnt |  17m 36sec | 11m 22sec
 | Copy    | 1h 28m        | 1h 27m
 | Remove| 3m 43sec    | 3m 38sec

 Btrfs results (without lzo comression):
 | Type     | 2.6.39.4-3   | 3.2.7
 | Dir cnt  | 2m 22sec  | 2m 21sec
 | File cnt |  2m 26sec | 2m 23sec
 | Copy    | 36m 22sec | 39m 35sec
 | Remove| 7m 51sec   | 10m 43sec

 From above one can see that copy takes close to 1h less on btrfs. I've
 done strace counting times of calls, results are as follows (from
 3.2.7):
 1) Ext4 (only to elements):
 % time     seconds  usecs/call     calls    errors syscall
 -- --- --- - - 
  57.01   13.257850           1  15082163           read
  23.40    5.440353           3   1687702           getdents
  6.15    1.430559           0   3672418           lstat
  3.80    0.883767           0  13106961           write
  2.32    0.539959           0   4794099           open
  1.69    0.393589           0    843695           mkdir
  1.28    0.296700           0   5637802           setxattr
  0.80    0.186539           0   7325195           stat

 2) Btrfs:
 % time     seconds  usecs/call     calls    errors syscall
 -- --- --- - - 
 53.38    9.486210           1  15179751           read
 11.38    2.021662           1   1688328           getdents
  10.64    1.890234           0   4800317           open
  6.83    1.213723           0  13201590           write
  4.85    0.862731           0   5644314           setxattr
  3.50    0.621194           1    844008           mkdir
  2.75    0.489059           0   3675992         1 lstat
  1.71    0.303544           0   5644314           llistxattr
  1.50    0.265943           0   1978149           utimes
  1.02    0.180585           0   5644314    844008 getxattr

 On btrfs getdents takes much less time which prove the bottleneck in
 copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time
 for getdents:
 % time     seconds  usecs/call     calls    errors syscall
 -- --- --- - - 
  50.77   10.978816           1  15033132           read
  14.46    3.125996           1   4733589           open
  7.15    1.546311           0   5566988           setxattr
  5.89    1.273845           0   3626505           lstat
  5.81    1.255858           1   1667050           getdents
  5.66    1.224403           0  13083022           write
  3.40    0.735114           1    833371           mkdir
  1.96    0.424881           0   5566988           llistxattr


 Why so huge difference in the getdents timings?

 -Jacek
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Storage Array Corrupted

2012-02-29 Thread Chris Mason
On Tue, Feb 28, 2012 at 09:36:35PM -0600, Travis Shivers wrote:
 I upgraded my kernel so my version is now:
 Linux server 3.3.0-030300rc5-generic #201202251535 SMP Sat Feb 25
 20:36:29 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 
 The problem has not been solved and I still get the previous errors.

Ok,

Step one is to grab the development version of btrfs-progs, which
currently sits in the dangerdonteveruse branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git 
dangerdonteveruse

Run btrfs-debug-tree -R /dev/sdh

and then run btrfs-debug-tree -b 5568194695168 /dev/sdh

and then run btrfsck /dev/sdh

Send the results of all three here, it should tell us which tree that
block belongs to, and from there we'll figure out the best way to fix
it.

-chris

 
 # mount /dev/sdh /mnt/main
 mount: wrong fs type, bad option, bad superblock on /dev/sdh,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so
 
 # dmesg
 [  232.985248] device fsid 2c11a326-5630-484e-9f1d-9dab777a1028 devid
 4 transid 43477 /dev/sdi
 [  232.985434] device fsid 2c11a326-5630-484e-9f1d-9dab777a1028 devid
 3 transid 43477 /dev/sdh
 [  233.027881] device fsid 2c11a326-5630-484e-9f1d-9dab777a1028 devid
 2 transid 43477 /dev/sdg
 [  233.065675] device fsid 2c11a326-5630-484e-9f1d-9dab777a1028 devid
 1 transid 43476 /dev/sdf
 [  284.384320] device fsid 2c11a326-5630-484e-9f1d-9dab777a1028 devid
 3 transid 43477 /dev/sdh
 [  284.427076] btrfs: disk space caching is enabled
 [  284.442565] verify_parent_transid: 2 callbacks suppressed
 [  284.442572] parent transid verify failed on 5568194695168 wanted
 43477 found 43151
 [  284.442834] parent transid verify failed on 5568194695168 wanted
 43477 found 43151
 [  284.443151] parent transid verify failed on 5568194695168 wanted
 43477 found 43151
 [  284.443159] parent transid verify failed on 5568194695168 wanted
 43477 found 43151
 [  284.445740] btrfs: open_ctree failed
 
 
 On Tue, Feb 28, 2012 at 9:16 PM, cwillu cwi...@cwillu.com wrote:
  On Tue, Feb 28, 2012 at 9:00 PM, Travis Shivers ttshiv...@gmail.com wrote:
  Where should I grab the source from? The main repo that you have
  listed on your main wiki page
  (https://btrfs.wiki.kernel.org/articles/b/t/r/Btrfs_source_repositories.html)
  is down: 
  git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
 
  The btrfs wiki is at http://btrfs.ipv5.de .  The kernel.org one is a
  static snapshot of the contents made nearly a year ago, prior to the
  kernel.org break-in, and should be ignored.
 
  git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git is
  the development tree, although the above patch is in mainline as of
  3.3rc5, which probably makes that the easiest way to try it.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getdents - ext4 vs btrfs performance

2012-02-29 Thread Lukas Czerner
On Wed, 29 Feb 2012, Chris Mason wrote:

 On Wed, Feb 29, 2012 at 02:31:03PM +0100, Jacek Luczak wrote:
  Hi All,
  
  Long story short: We've found that operations on a directory structure
  holding many dirs takes ages on ext4.
  
  The Question: Why there's that huge difference in ext4 and btrfs? See
  below test results for real values.
  
  Background: I had to backup a Jenkins directory holding workspace for
  few projects which were co from svn (implies lot of extra .svn dirs).
  The copy takes lot of time (at least more than I've expected) and
  process was mostly in D (disk sleep). I've dig more and done some
  extra test to see if this is not a regression on block/fs site. To
  isolate the issue I've also performed same tests on btrfs.
  
  Test environment configuration:
  1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
  enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
  2) Kernels: All tests were done on following kernels:
   - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
  config changes mostly. In -3 we've introduced ,,fix readahead pipeline
  break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
   - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
  release recently).
  3) A subject of tests, directory holding:
   - 54GB of data (measured on ext4)
   - 1978149 files
   - 844008 directories
  4) Mount options:
   - ext4 -- errors=remount-ro,noatime,data=writeback
   - btrfs -- noatime,nodatacow and for later investigation on
  copression effect: noatime,nodatacow,compress=lzo
 
 For btrfs, nodatacow and compression don't really mix.  The compression
 will just override it. (Just FYI, not really related to these results).
 
  
  In all tests I've been measuring time of execution. Following tests
  were performed:
  - find . -type d
  - find . -type f
  - cp -a
  - rm -rf
  
  Ext4 results:
  | Type | 2.6.39.4-3   | 3.2.7
  | Dir cnt  | 17m 40sec  | 11m 20sec
  | File cnt |  17m 36sec | 11m 22sec
  | Copy| 1h 28m| 1h 27m
  | Remove| 3m 43sec
 
 Are the btrfs numbers missing? ;)
 
 In order for btrfs to be faster for cp -a, the files probably didn't
 change much since creation.  Btrfs maintains extra directory indexes
 that help in sequential backup scans, but this usually means slower
 delete performance.

Exactly and IIRC ext4 have directory entries stored in hash order which
does not really help the sequential access.

 
 But, how exactly did you benchmark it?  If you compare a fresh
 mkfs.btrfs where you just copied all the data over with an ext4 FS that
 has been on the disk for a long time, it isn't quite fair to ext4.

I have the same question, note that if the files on ext4 has been worked
with it may very well be that directory hash trees are not in very good
shape. You can attempt to optimize that by e2fsck (just run fsck.ext4 -f
device) but that may take quite some time and memory, but it is worth
trying.

Thanks!
-Lukas

 
 -chris
 --
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-- 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getdents - ext4 vs btrfs performance

2012-02-29 Thread Jacek Luczak
2012/2/29 Jacek Luczak difrost.ker...@gmail.com:
 Hi Chris,

 the last one was borked :) Please check this one.

 -jacek

 2012/2/29 Jacek Luczak difrost.ker...@gmail.com:
 Hi All,

 /*Sorry for sending incomplete email, hit wrong button :) I guess I
 can't use Gmail */

 Long story short: We've found that operations on a directory structure
 holding many dirs takes ages on ext4.

 The Question: Why there's that huge difference in ext4 and btrfs? See
 below test results for real values.

 Background: I had to backup a Jenkins directory holding workspace for
 few projects which were co from svn (implies lot of extra .svn dirs).
 The copy takes lot of time (at least more than I've expected) and
 process was mostly in D (disk sleep). I've dig more and done some
 extra test to see if this is not a regression on block/fs site. To
 isolate the issue I've also performed same tests on btrfs.

 Test environment configuration:
 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
 enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
 2) Kernels: All tests were done on following kernels:
  - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
 config changes mostly. In -3 we've introduced ,,fix readahead pipeline
 break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
  - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
 release recently).
 3) A subject of tests, directory holding:
  - 54GB of data (measured on ext4)
  - 1978149 files
  - 844008 directories
 4) Mount options:
  - ext4 -- errors=remount-ro,noatime,
 data=writeback
  - btrfs -- noatime,nodatacow and for later investigation on
 copression effect: noatime,nodatacow,compress=lzo

 In all tests I've been measuring time of execution. Following tests
 were performed:
 - find . -type d
 - find . -type f
 - cp -a
 - rm -rf

 Ext4 results:
 | Type     | 2.6.39.4-3   | 3.2.7
 | Dir cnt  | 17m 40sec  | 11m 20sec
 | File cnt |  17m 36sec | 11m 22sec
 | Copy    | 1h 28m        | 1h 27m
 | Remove| 3m 43sec    | 3m 38sec

 Btrfs results (without lzo comression):
 | Type     | 2.6.39.4-3   | 3.2.7
 | Dir cnt  | 2m 22sec  | 2m 21sec
 | File cnt |  2m 26sec | 2m 23sec
 | Copy    | 36m 22sec | 39m 35sec
 | Remove| 7m 51sec   | 10m 43sec

 From above one can see that copy takes close to 1h less on btrfs. I've
 done strace counting times of calls, results are as follows (from
 3.2.7):
 1) Ext4 (only to elements):
 % time     seconds  usecs/call     calls    errors syscall
 -- --- --- - - 
  57.01   13.257850           1  15082163           read
  23.40    5.440353           3   1687702           getdents
  6.15    1.430559           0   3672418           lstat
  3.80    0.883767           0  13106961           write
  2.32    0.539959           0   4794099           open
  1.69    0.393589           0    843695           mkdir
  1.28    0.296700           0   5637802           setxattr
  0.80    0.186539           0   7325195           stat

 2) Btrfs:
 % time     seconds  usecs/call     calls    errors syscall
 -- --- --- - - 
 53.38    9.486210           1  15179751           read
 11.38    2.021662           1   1688328           getdents
  10.64    1.890234           0   4800317           open
  6.83    1.213723           0  13201590           write
  4.85    0.862731           0   5644314           setxattr
  3.50    0.621194           1    844008           mkdir
  2.75    0.489059           0   3675992         1 lstat
  1.71    0.303544           0   5644314           llistxattr
  1.50    0.265943           0   1978149           utimes
  1.02    0.180585           0   5644314    844008 getxattr

 On btrfs getdents takes much less time which prove the bottleneck in
 copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time
 for getdents:
 % time     seconds  usecs/call     calls    errors syscall
 -- --- --- - - 
  50.77   10.978816           1  15033132           read
  14.46    3.125996           1   4733589           open
  7.15    1.546311           0   5566988           setxattr
  5.89    1.273845           0   3626505           lstat
  5.81    1.255858           1   1667050           getdents
  5.66    1.224403           0  13083022           write
  3.40    0.735114           1    833371           mkdir
  1.96    0.424881           0   5566988           llistxattr


 Why so huge difference in the getdents timings?

 -Jacek

I will try to answer the question from the broken email I've sent.

@Lukas, it was always a fresh FS on top of LVM logical volume. I've
been cleaning cache/remounting to sync all data before (re)doing
tests.

-Jacek

BTW: Sorry for the email mixture. I just can't get this gmail thing to
work (why forcing top posting:/). Please use this thread.
--
To unsubscribe from this list: send the line unsubscribe 

Re: getdents - ext4 vs btrfs performance

2012-02-29 Thread Jacek Luczak
2012/2/29 Jacek Luczak difrost.ker...@gmail.com:
 2012/2/29 Jacek Luczak difrost.ker...@gmail.com:
 Hi Chris,

 the last one was borked :) Please check this one.

 -jacek

 2012/2/29 Jacek Luczak difrost.ker...@gmail.com:
 Hi All,

 /*Sorry for sending incomplete email, hit wrong button :) I guess I
 can't use Gmail */

 Long story short: We've found that operations on a directory structure
 holding many dirs takes ages on ext4.

 The Question: Why there's that huge difference in ext4 and btrfs? See
 below test results for real values.

 Background: I had to backup a Jenkins directory holding workspace for
 few projects which were co from svn (implies lot of extra .svn dirs).
 The copy takes lot of time (at least more than I've expected) and
 process was mostly in D (disk sleep). I've dig more and done some
 extra test to see if this is not a regression on block/fs site. To
 isolate the issue I've also performed same tests on btrfs.

 Test environment configuration:
 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
 enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
 2) Kernels: All tests were done on following kernels:
  - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
 config changes mostly. In -3 we've introduced ,,fix readahead pipeline
 break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
  - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
 release recently).
 3) A subject of tests, directory holding:
  - 54GB of data (measured on ext4)
  - 1978149 files
  - 844008 directories
 4) Mount options:
  - ext4 -- errors=remount-ro,noatime,
 data=writeback
  - btrfs -- noatime,nodatacow and for later investigation on
 copression effect: noatime,nodatacow,compress=lzo

 In all tests I've been measuring time of execution. Following tests
 were performed:
 - find . -type d
 - find . -type f
 - cp -a
 - rm -rf

 Ext4 results:
 | Type     | 2.6.39.4-3   | 3.2.7
 | Dir cnt  | 17m 40sec  | 11m 20sec
 | File cnt |  17m 36sec | 11m 22sec
 | Copy    | 1h 28m        | 1h 27m
 | Remove| 3m 43sec    | 3m 38sec

 Btrfs results (without lzo comression):
 | Type     | 2.6.39.4-3   | 3.2.7
 | Dir cnt  | 2m 22sec  | 2m 21sec
 | File cnt |  2m 26sec | 2m 23sec
 | Copy    | 36m 22sec | 39m 35sec
 | Remove| 7m 51sec   | 10m 43sec

 From above one can see that copy takes close to 1h less on btrfs. I've
 done strace counting times of calls, results are as follows (from
 3.2.7):
 1) Ext4 (only to elements):
 % time     seconds  usecs/call     calls    errors syscall
 -- --- --- - - 
  57.01   13.257850           1  15082163           read
  23.40    5.440353           3   1687702           getdents
  6.15    1.430559           0   3672418           lstat
  3.80    0.883767           0  13106961           write
  2.32    0.539959           0   4794099           open
  1.69    0.393589           0    843695           mkdir
  1.28    0.296700           0   5637802           setxattr
  0.80    0.186539           0   7325195           stat

 2) Btrfs:
 % time     seconds  usecs/call     calls    errors syscall
 -- --- --- - - 
 53.38    9.486210           1  15179751           read
 11.38    2.021662           1   1688328           getdents
  10.64    1.890234           0   4800317           open
  6.83    1.213723           0  13201590           write
  4.85    0.862731           0   5644314           setxattr
  3.50    0.621194           1    844008           mkdir
  2.75    0.489059           0   3675992         1 lstat
  1.71    0.303544           0   5644314           llistxattr
  1.50    0.265943           0   1978149           utimes
  1.02    0.180585           0   5644314    844008 getxattr

 On btrfs getdents takes much less time which prove the bottleneck in
 copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time
 for getdents:
 % time     seconds  usecs/call     calls    errors syscall
 -- --- --- - - 
  50.77   10.978816           1  15033132           read
  14.46    3.125996           1   4733589           open
  7.15    1.546311           0   5566988           setxattr
  5.89    1.273845           0   3626505           lstat
  5.81    1.255858           1   1667050           getdents
  5.66    1.224403           0  13083022           write
  3.40    0.735114           1    833371           mkdir
  1.96    0.424881           0   5566988           llistxattr


 Why so huge difference in the getdents timings?

 -Jacek

 I will try to answer the question from the broken email I've sent.

 @Lukas, it was always a fresh FS on top of LVM logical volume. I've
 been cleaning cache/remounting to sync all data before (re)doing
 tests.

 -Jacek

 BTW: Sorry for the email mixture. I just can't get this gmail thing to
 work (why forcing top posting:/). Please use this thread.

More 

Re: getdents - ext4 vs btrfs performance

2012-02-29 Thread Chris Mason
On Wed, Feb 29, 2012 at 08:51:58AM -0500, Chris Mason wrote:
 On Wed, Feb 29, 2012 at 02:31:03PM +0100, Jacek Luczak wrote:
  Ext4 results:
  | Type | 2.6.39.4-3   | 3.2.7
  | Dir cnt  | 17m 40sec  | 11m 20sec
  | File cnt |  17m 36sec | 11m 22sec
  | Copy| 1h 28m| 1h 27m
  | Remove| 3m 43sec
 
 Are the btrfs numbers missing? ;)

[ answered in a different reply, btrfs is faster in everything except
delete ]

The btrfs readdir uses an index that is much more likely to be
sequential on disk than ext.  This makes the readdir more sequential and
it makes the actual file IO more sequential because we're reading things
in the order they were created instead of (random) htree index order.

 
 In order for btrfs to be faster for cp -a, the files probably didn't
 change much since creation.  Btrfs maintains extra directory indexes
 that help in sequential backup scans, but this usually means slower
 delete performance.
 
 But, how exactly did you benchmark it?  If you compare a fresh
 mkfs.btrfs where you just copied all the data over with an ext4 FS that
 has been on the disk for a long time, it isn't quite fair to ext4.
 

But, the consistent benchmarking part is really important.  We shouldn't
put an aged ext4 up against a fresh mkfs.btrfs.  Did you do ext4
comparisons on a fresh copy?

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getdents - ext4 vs btrfs performance

2012-02-29 Thread Chris Mason
On Wed, Feb 29, 2012 at 03:07:45PM +0100, Jacek Luczak wrote:

[ btrfs faster than ext for find and cp -a ]

 2012/2/29 Jacek Luczak difrost.ker...@gmail.com:
 
 I will try to answer the question from the broken email I've sent.
 
 @Lukas, it was always a fresh FS on top of LVM logical volume. I've
 been cleaning cache/remounting to sync all data before (re)doing
 tests.

The next step is to get cp -a out of the picture, in this case you're
benchmarking both the read speed and the write speed (what are you
copying to btw?).

Using tar cf /dev/zero some_dir is one way to get a consistent picture
of the read speed.

You can confirm the theory that it is directory order causing problems
by using acp to read the data.

http://oss.oracle.com/~mason/acp/acp-0.6.tar.bz2

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getdents - ext4 vs btrfs performance

2012-02-29 Thread Jacek Luczak
2012/2/29 Chris Mason chris.ma...@oracle.com:
 On Wed, Feb 29, 2012 at 03:07:45PM +0100, Jacek Luczak wrote:

 [ btrfs faster than ext for find and cp -a ]

 2012/2/29 Jacek Luczak difrost.ker...@gmail.com:

 I will try to answer the question from the broken email I've sent.

 @Lukas, it was always a fresh FS on top of LVM logical volume. I've
 been cleaning cache/remounting to sync all data before (re)doing
 tests.

 The next step is to get cp -a out of the picture, in this case you're
 benchmarking both the read speed and the write speed (what are you
 copying to btw?).

It's simple cp -a Jenkins{,.bak} so dir to dir copy on same volume.

 Using tar cf /dev/zero some_dir is one way to get a consistent picture
 of the read speed.

IMO the problem is not - only - in read speed. The directory order hit
here. There's a difference in the sequential tests that place btrfs as
the winner but still this should not have that huge influence on
getdents. I know a bit on the difference between ext4 and btrfs
directory handling and I would not expect that huge difference. On the
production system where the issue has been observed doing some real
work in the background copy takes up to 4h.

For me btrfs looks perfect here, what could be worth checking is the
change of timing in syscall between 39.4 and 3.2.7. Before getdents
was not that high on the list while now it jumps to second position
but without huge impact on the timings.

 You can confirm the theory that it is directory order causing problems
 by using acp to read the data.

 http://oss.oracle.com/~mason/acp/acp-0.6.tar.bz2

Will check this still today and report back.

-jacek
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Set nodatacow per file?

2012-02-29 Thread Kyle Gates


  Actually it is possible. Check out David's response to my question from
  some time ago:
  http://permalink.gmane.org/gmane.comp.file-systems.btrfs/14227

 this was a quick aid, please see attached file for an updated tool to set
 the file flags, now added 'z' for NOCOMPRESS flag, and supports chattr
 syntax plus all of the standard file flags.

 Setting and unsetting nocow is done like 'fileflags +C file' or -C for
 unseting. Without any + or - options it prints current state.


I get the following errors when running fileflags on large (2GB) database 
files:

open(): No such file or directory

open(): Value too large for defined data type


  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Set nodatacow per file?

2012-02-29 Thread cwillu
 I get the following errors when running fileflags on large (2GB) database 
 files:

 open(): No such file or directory

 open(): Value too large for defined data type

http://www.gnu.org/software/coreutils/faq/#Value-too-large-for-defined-data-type

The message Value too large for defined data type is a system
error message reported when an operation on a large file is attempted
using a non-large file data type. Large files are defined as anything
larger than a signed 32-bit integer, or stated differently, larger
than 2GB.

Many system calls that deal with files return values in a long int
data type. On 32-bit hardware a long int is 32-bits and therefore this
imposes a 2GB limit on the size of files. When this was invented that
was HUGE and it was hard to conceive of needing anything that large.
Time has passed and files can be much larger today. On native 64-bit
systems the file size limit is usually 2GB * 2GB. Which we will again
think is huge.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: stop silently switching single chunks to raid0 on balance

2012-02-29 Thread Ilya Dryomov
This has been causing a lot of confusion for quite a while now and a lot
of users were surprised by this (some of them were even stuck in a
ENOSPC situation which they couldn't easily get out of).  The addition
of restriper gives users a clear choice between raid0 and drive concat
setup so there's absolutely no excuse for us to keep doing this.

Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 fs/btrfs/extent-tree.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 37e0a80..e0969eb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7029,7 +7029,6 @@ static u64 update_block_group_flags(struct btrfs_root 
*root, u64 flags)
if (flags  (BTRFS_BLOCK_GROUP_RAID1 |
 BTRFS_BLOCK_GROUP_RAID10))
return stripped | BTRFS_BLOCK_GROUP_DUP;
-   return flags;
} else {
/* they already had raid on here, just return */
if (flags  stripped)
@@ -7042,9 +7041,9 @@ static u64 update_block_group_flags(struct btrfs_root 
*root, u64 flags)
if (flags  BTRFS_BLOCK_GROUP_DUP)
return stripped | BTRFS_BLOCK_GROUP_RAID1;
 
-   /* turn single device chunks into raid0 */
-   return stripped | BTRFS_BLOCK_GROUP_RAID0;
+   /* this is drive concat, leave it alone */
}
+
return flags;
 }
 
-- 
1.7.6.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Storage Array Corrupted

2012-02-29 Thread Travis Shivers
Here is the output from the commands:

# ./btrfs-debug-tree -R /dev/sdh
failed to read /dev/sr0: No medium found
failed to read /dev/sde: No medium found
failed to read /dev/sdd: No medium found
failed to read /dev/sdc: No medium found
failed to read /dev/sdb: No medium found
failed to read /dev/sda: No medium found
parent transid verify failed on 5568194695168 wanted 43477 found 43151
parent transid verify failed on 5568194695168 wanted 43477 found 43151
parent transid verify failed on 5568194695168 wanted 43477 found 43151
parent transid verify failed on 5568194695168 wanted 43477 found 43151
Ignoring transid failure
parent transid verify failed on 5568194748416 wanted 43477 found 43151
parent transid verify failed on 5568194748416 wanted 43477 found 43151
parent transid verify failed on 5568194748416 wanted 43477 found 43151
parent transid verify failed on 5568194748416 wanted 43477 found 43151
Ignoring transid failure
root tree: 5568194412544 level 1
chunk tree: 20979712 level 1
extent tree key (EXTENT_TREE ROOT_ITEM 0) 5568194416640 level 3
device tree key (DEV_TREE ROOT_ITEM 0) 4895076519936 level 1
fs tree key (FS_TREE ROOT_ITEM 0) 4895092506624 level 2
checksum tree key (CSUM_TREE ROOT_ITEM 0) 5568194695168 level 0
parent transid verify failed on 5568194801664 wanted 43477 found 43151
parent transid verify failed on 5568194801664 wanted 43477 found 43151
parent transid verify failed on 5568194801664 wanted 43477 found 43151
parent transid verify failed on 5568194801664 wanted 43477 found 43151
Ignoring transid failure
parent transid verify failed on 5568194674688 wanted 43477 found 43151
parent transid verify failed on 5568194674688 wanted 43477 found 43151
parent transid verify failed on 5568194674688 wanted 43477 found 43151
parent transid verify failed on 5568194674688 wanted 43477 found 43151
Ignoring transid failure
parent transid verify failed on 5568194678784 wanted 43477 found 43151
parent transid verify failed on 5568194678784 wanted 43477 found 43151
parent transid verify failed on 5568194678784 wanted 43477 found 43151
parent transid verify failed on 5568194678784 wanted 43477 found 43151
Ignoring transid failure
parent transid verify failed on 5568194809856 wanted 43477 found 43151
parent transid verify failed on 5568194809856 wanted 43477 found 43151
parent transid verify failed on 5568194809856 wanted 43477 found 43151
parent transid verify failed on 5568194809856 wanted 43477 found 43151
Ignoring transid failure
parent transid verify failed on 5568194875392 wanted 43477 found 42983
parent transid verify failed on 5568194875392 wanted 43477 found 42983
parent transid verify failed on 5568194875392 wanted 43477 found 42983
parent transid verify failed on 5568194875392 wanted 43477 found 42983
Ignoring transid failure
parent transid verify failed on 5568195104768 wanted 43477 found 43151
parent transid verify failed on 5568195104768 wanted 43477 found 43151
parent transid verify failed on 5568195104768 wanted 43477 found 43151
parent transid verify failed on 5568195104768 wanted 43477 found 43151
Ignoring transid failure
parent transid verify failed on 5568195043328 wanted 43477 found 43151
parent transid verify failed on 5568195162112 wanted 43477 found 43175
parent transid verify failed on 5568195162112 wanted 43477 found 43175
parent transid verify failed on 5568195162112 wanted 43477 found 43175
parent transid verify failed on 5568195162112 wanted 43477 found 43175
Ignoring transid failure
parent transid verify failed on 5568195166208 wanted 43477 found 43175
parent transid verify failed on 5568195166208 wanted 43477 found 43175
parent transid verify failed on 5568195166208 wanted 43477 found 43175
parent transid verify failed on 5568195166208 wanted 43477 found 43175
Ignoring transid failure
btrfs root backup slot 0
tree root gen 9799893461141291008 block 0
extent root gen 67174399 block 976369115086847
chunk root gen 18446605274118684671 block 9799972705260863487
device root gen 977658994114559 block 18446638534628474880
csum root gen 94490787839 block 18446638559949619199
fs root gen 262144 block 1048576
974850661629952 used 0 total 977659432419327 devices
btrfs root backup slot 1
tree root gen 16777216 block 38655295488
extent root gen 1179648 block 6989415099341275135
chunk root gen 18446605285113004031 block 977659432353792
device root gen 9223372036861329408 block 0
csum root gen 65535 block 977659424489472
fs root gen 4295032832 block 25769803776
282399669551104 used 282400664715264 total
9799892621752008704 devices
btrfs root backup slot 2
tree root gen 65535 block 18446744073709551615
extent root gen 977659447099391 block 977659447033856
chunk root gen 0 block 0
device root gen 

Re: Btrfs Storage Array Corrupted

2012-02-29 Thread Chris Mason
On Wed, Feb 29, 2012 at 03:57:19PM -0600, Travis Shivers wrote:
 Here is the output from the commands:
 
 # ./btrfs-debug-tree -R /dev/sdh
 failed to read /dev/sr0: No medium found
 failed to read /dev/sde: No medium found
 failed to read /dev/sdd: No medium found
 failed to read /dev/sdc: No medium found
 failed to read /dev/sdb: No medium found
 failed to read /dev/sda: No medium found
 parent transid verify failed on 5568194695168 wanted 43477 found 43151

So far all the blocks that have come up look like they are in the extent
allocation tree.  This helps because it is the easiest to recover.

I can also make a patch for you against 3.3-rc that skips reading it
entirely, which should make it possible to copy things off.

But before I do that, could you describe the raid array?  Was it
mirrored or raid10?  What exactly happened when it stopped working?

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Storage Array Corrupted

2012-02-29 Thread Travis Shivers
Thank you all for helping. My btrfs array consists of 4 disks: 2 (2
TB) disks and 2(500 GB) disks. Since I have disks of different sizes,
I have the array being mirrored so that there are two copies of a file
on two separate disks. The data and metadata are mirrored.

I originally made the array by using this command:

# mkfs.btrfs -m raid1 -d raid1 /dev/sd[abcd]
(The drives were originally those letters)


All of the disks sit in an external 4 bay ESATA enclosure going into a
PCI-E RAID card set up as JBOD, so I can use btrfs' software
mirroring. This is the enclosure that I have:
http://www.newegg.com/Product/Product.aspx?Item=N82E16816132029

The corruption was unexpected. I am not entirely sure what caused it,
but a few days before the corruption, there were several power
outages. I do not think that the problem is with the actual hard drive
hardware since they are fairly new (6 months old) and they pass all
SMART tests. After a reboot, the btrfs array refused to mount and
started giving off errors. I do weekly scrubs, balances, and
defragmentation.

Here is what btrfs filesystem show says:

# btrfs filesystem show
Label: none  uuid: 2c11a326-5630-484e-9f1d-9dab777a1028
Total devices 4 FS bytes used 1.08TB
devid1 size 1.82TB used 1.08TB path /dev/sdf
devid2 size 1.82TB used 1.08TB path /dev/sdg
devid3 size 465.76GB used 8.00MB path /dev/sdh
devid4 size 465.76GB used 8.00MB path /dev/sdi

Btrfs Btrfs v0.19

These are my normal mount line for the array in /etc/fstab

UUID=2c11a326-5630-484e-9f1d-9dab777a1028 /mnt/main btrfs
noatime,nodiratime,compress=lzo,space_cache,inode_cache 0 1


On Wed, Feb 29, 2012 at 4:14 PM, Chris Mason chris.ma...@oracle.com wrote:
 On Wed, Feb 29, 2012 at 03:57:19PM -0600, Travis Shivers wrote:
 Here is the output from the commands:

 # ./btrfs-debug-tree -R /dev/sdh
 failed to read /dev/sr0: No medium found
 failed to read /dev/sde: No medium found
 failed to read /dev/sdd: No medium found
 failed to read /dev/sdc: No medium found
 failed to read /dev/sdb: No medium found
 failed to read /dev/sda: No medium found
 parent transid verify failed on 5568194695168 wanted 43477 found 43151

 So far all the blocks that have come up look like they are in the extent
 allocation tree.  This helps because it is the easiest to recover.

 I can also make a patch for you against 3.3-rc that skips reading it
 entirely, which should make it possible to copy things off.

 But before I do that, could you describe the raid array?  Was it
 mirrored or raid10?  What exactly happened when it stopped working?

 -chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Storage Array Corrupted

2012-02-29 Thread Chris Mason
On Wed, Feb 29, 2012 at 05:11:24PM -0600, Travis Shivers wrote:
 Thank you all for helping. My btrfs array consists of 4 disks: 2 (2
 TB) disks and 2(500 GB) disks. Since I have disks of different sizes,
 I have the array being mirrored so that there are two copies of a file
 on two separate disks. The data and metadata are mirrored.
 
 I originally made the array by using this command:
 
 # mkfs.btrfs -m raid1 -d raid1 /dev/sd[abcd]
 (The drives were originally those letters)
 
 
 All of the disks sit in an external 4 bay ESATA enclosure going into a
 PCI-E RAID card set up as JBOD, so I can use btrfs' software
 mirroring. This is the enclosure that I have:
 http://www.newegg.com/Product/Product.aspx?Item=N82E16816132029
 
 The corruption was unexpected. I am not entirely sure what caused it,
 but a few days before the corruption, there were several power
 outages. I do not think that the problem is with the actual hard drive
 hardware since they are fairly new (6 months old) and they pass all
 SMART tests. After a reboot, the btrfs array refused to mount and
 started giving off errors. I do weekly scrubs, balances, and
 defragmentation.

Ok, all of this should have worked.  Which kernel were you running when
you had the power outages?

I'm testing out the patch to skip the extent allocation tree at mount.
That will be the easiest way to get to the data (readonly, but it'll
work).

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Storage Array Corrupted

2012-02-29 Thread Travis Shivers
I was running a fairly old version of the kernel:
Linux server 3.0.0-16-generic #28-Ubuntu SMP Fri Jan 27 17:44:39 UTC
2012 x86_64 x86_64 x86_64 GNU/Linux

On Wed, Feb 29, 2012 at 5:44 PM, Chris Mason chris.ma...@oracle.com wrote:
 On Wed, Feb 29, 2012 at 05:11:24PM -0600, Travis Shivers wrote:
 Thank you all for helping. My btrfs array consists of 4 disks: 2 (2
 TB) disks and 2(500 GB) disks. Since I have disks of different sizes,
 I have the array being mirrored so that there are two copies of a file
 on two separate disks. The data and metadata are mirrored.

 I originally made the array by using this command:

 # mkfs.btrfs -m raid1 -d raid1 /dev/sd[abcd]
 (The drives were originally those letters)


 All of the disks sit in an external 4 bay ESATA enclosure going into a
 PCI-E RAID card set up as JBOD, so I can use btrfs' software
 mirroring. This is the enclosure that I have:
 http://www.newegg.com/Product/Product.aspx?Item=N82E16816132029

 The corruption was unexpected. I am not entirely sure what caused it,
 but a few days before the corruption, there were several power
 outages. I do not think that the problem is with the actual hard drive
 hardware since they are fairly new (6 months old) and they pass all
 SMART tests. After a reboot, the btrfs array refused to mount and
 started giving off errors. I do weekly scrubs, balances, and
 defragmentation.

 Ok, all of this should have worked.  Which kernel were you running when
 you had the power outages?

 I'm testing out the patch to skip the extent allocation tree at mount.
 That will be the easiest way to get to the data (readonly, but it'll
 work).

 -chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Kernel Bug at fs/btrfs/volumes.c:3638

2012-02-29 Thread David Sterba
I just noticed that there's a bugreport from opensuse user tripping over
the same BUG() during log replay (and his problem was solved by
btrfs-zero-log), probably after some crash. The kernel version was 3.1
ie. without the corruption fixes, so while it happened during normal use
(and not via a crafted fs image), I'm not sure if this is still the case
with recent kernels.

Turning the BUG in __btrfs_map_block to return needs checking the value
in not-so-few callers and from various callpaths, it's not
straightforward to do eg. a quick return during mount, as in your case.

Good that Jeff Mahoney's error handling series reduce the number of
callers to update.


david

[ cut here ]
WARNING: at 
/home/abuild/rpmbuild/BUILD/kernel-desktop-3.1.0/linux-3.1/fs/btrfs/tree-log.c:1729
 walk_down_log_tree+0x
15a/0x3e0 [btrfs]()
Pid: 8978, comm: mount Not tainted 3.1.0-1.2-desktop #1
Call Trace:
 [810043fa] dump_trace+0xaa/0x2b0
 [81582a4a] dump_stack+0x69/0x6f
 [8105386b] warn_slowpath_common+0x7b/0xc0
 [a0573cba] walk_down_log_tree+0x15a/0x3e0 [btrfs]
 [a0574267] walk_log_tree+0xc7/0x1f0 [btrfs]
 [a057803c] btrfs_recover_log_trees+0x1ec/0x2d0 [btrfs]
 [a0544303] open_ctree+0x13c3/0x1740 [btrfs]
 [a0522733] btrfs_fill_super.isra.36+0x73/0x150 [btrfs]
 [a0523b29] btrfs_mount+0x359/0x3e0 [btrfs]
 [81156465] mount_fs+0x45/0x1d0
 [8116fdb6] vfs_kern_mount+0x66/0xd0
 [81171383] do_kern_mount+0x53/0x120
 [81172e35] do_mount+0x1a5/0x260
 [811732da] sys_mount+0x9a/0xf0
 [815a3292] system_call_fastpath+0x16/0x1b
 [7fc524137daa] 0x7fc524137da9
---[ end trace 2bf4520d35da960f ]---
unable to find logical 5493736079360 len 4096
[ cut here ]

1728 if (btrfs_header_level(cur) != *level)
1729 WARN_ON(1);


kernel BUG at 
/home/abuild/rpmbuild/BUILD/kernel-desktop-3.1.0/linux-3.1/fs/btrfs/volumes.c:2891!
invalid opcode:  [#1] PREEMPT SMP
CPU 1

Pid: 8978, comm: mount Tainted: GW   3.1.0-1.2-desktop #1
RIP: 0010:[a0568e28]  [a0568e28] 
__btrfs_map_block+0x7c8/0x890 [btrfs]
RSP: 0018:8801b7507798  EFLAGS: 00010296
RAX: 0043 RBX: 04ff1c30 RCX: 2a82
RDX: 723a RSI: 0046 RDI: 0202
RBP: 8801b7507860 R08: 000a R09: 
R10:  R11: 0001 R12: 8801dcd10cc0
R13: 0001 R14:  R15: 0001
FS:  7fc524c587e0() GS:88021fd0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7faea5cb8000 CR3: 0001b74f4000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process mount (pid: 8978, threadinfo 8801b7506000, task 8801b0d9c740)
Call Trace:
 [a056baa7] btrfs_map_bio+0x57/0x210 [btrfs]
 [a05600d4] submit_one_bio+0x64/0xa0 [btrfs]
 [a05653c7] read_extent_buffer_pages+0x367/0x4a0 [btrfs]
 [a053fd10] btree_read_extent_buffer_pages.isra.63+0x80/0xc0 [btrfs]
 [a0542b3a] btrfs_read_buffer+0x2a/0x40 [btrfs]
 [a0576d56] replay_one_buffer+0x46/0x360 [btrfs]
 [a0573d6d] walk_down_log_tree+0x20d/0x3e0 [btrfs]
 [a0574267] walk_log_tree+0xc7/0x1f0 [btrfs]
 [a057803c] btrfs_recover_log_trees+0x1ec/0x2d0 [btrfs]
 [a0544303] open_ctree+0x13c3/0x1740 [btrfs]
 [a0522733] btrfs_fill_super.isra.36+0x73/0x150 [btrfs]
 [a0523b29] btrfs_mount+0x359/0x3e0 [btrfs]
 [81156465] mount_fs+0x45/0x1d0
 [8116fdb6] vfs_kern_mount+0x66/0xd0
 [81171383] do_kern_mount+0x53/0x120
 [81172e35] do_mount+0x1a5/0x260
 [811732da] sys_mount+0x9a/0xf0
 [815a3292] system_call_fastpath+0x16/0x1b
 [7fc524137daa] 0x7fc524137da9

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LABEL only 1 device

2012-02-29 Thread Duncan
Karel Zak posted on Tue, 28 Feb 2012 23:35:57 +0100 as excerpted:

 On Sun, Feb 26, 2012 at 06:07:31PM +, Duncan wrote:
 Unfortunately, since gpt is reasonably new in terms of filesystem and
 partitioning tools, there isn't really anything (mount, etc) that makes
 /use/ of that label yet,
 
 udev exports GPT labels and uuids by symlinks, see
 
   ls /dev/disk/by-partlabel/
   ls /dev/disk/by-partuuid/

So it does. =:^)  I knew about the /dev/disk/by-*/ dirs in general and 
had no doubt browsed past them before without actually noting the 
significance, but hadn't actually noticed the by-part* until you pointed 
it out specifically.  Either that or exporting these is relatively new to 
udev, tho it's probably been there and I simply didn't see it.

Either way, thanks! =:^)

 you can use these links in your fstab.

Yes.  Now that I know they are there, using them in fstab makes sense, 
since I remember seeing the note in the mount manpage that it uses the 
udev symlinks internally already, so whatever udev does in this regard 
should just work with mount, and thus in fstab.

Useful indeed! It seems modern Linux (or more properly, a modern udev and 
mount, along with the kernel of course) has rather more use for partition-
labels than I was aware and thus than I was giving it credit for! =:^)

Thanks!

 And if I good remember kernel
 supports PARTUUID for root= command line option.

That wouldn't surprise me at all.

That leaves grub2 (and other bootloaders).  I already know grub2 prefers 
UUIDs to /dev/* device names.  But I don't know if it handles labels, 
either the gpt-partlabel or the fs-label version.  I'll have to try that 
too.  Fortunately for me my device ordering is quite stable (and I hand-
edit grub.cfg, no mkgrub-config here), so that just works.  But UUIDs 
are designed for computer use, not human use, while labels work well for 
both, so if grub2 handles labels and I can use either fs or partition/
device labels there too, I'll be a happy camper indeed! =:^)


But just knowing mount/fstab supports partlabels is going to be a boon 
for me!  My current setup (pending multi-way raid1 mirroring, and perhaps 
a bit more stability, in btrfs) has multiple partitions and partitioned 
md/raids, with working and backup copies of nearly all of them.  When I 
update the backup, I often mkfs and start with a clean filesystem, then 
copy all the data over from the working copy.  The mkfs step of course 
changes filesystem UUID and my labeling scheme includes the date the 
filesystem and backup image was made, so it changes too.  So while I've 
been using (filesystem) labels in fstab for some time, I've had to update 
them when I update my backups.

Now I should be able to use the partlabels in fstab instead, and those 
only change if I repartition, a much less frequent occurrence, meaning I 
can update my backups without having to update the fstab for mounting 
them, at the same time. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getdents - ext4 vs btrfs performance

2012-02-29 Thread Theodore Tso
You might try sorting the entries returned by readdir by inode number before 
you stat them.This is a long-standing weakness in ext3/ext4, and it has to 
do with how we added hashed tree indexes to directories in (a) a backwards 
compatible way, that (b) was POSIX compliant with respect to adding and 
removing directory entries concurrently with reading all of the directory 
entries using readdir.

You might try compiling spd_readdir from the e2fsprogs source tree (in the 
contrib directory):

http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=blob;f=contrib/spd_readdir.c;h=f89832cd7146a6f5313162255f057c5a754a4b84;hb=d9a5d37535794842358e1cfe4faa4a89804ed209

… and then using that as a LD_PRELOAD, and see how that changes things.

The short version is that we can't easily do this in the kernel since it's a 
problem that primarily shows up with very big directories, and using 
non-swappable kernel memory to store all of the directory entries and then sort 
them so they can be returned in inode number just isn't practical.   It is 
something which can be easily done in userspace, though, and a number of 
programs (including mutt for its Maildir support) does do, and it helps greatly 
for workloads where you are calling readdir() followed by something that needs 
to access the inode (i.e., stat, unlink, etc.)

-- Ted


On Feb 29, 2012, at 8:52 AM, Jacek Luczak wrote:

 Hi All,
 
 /*Sorry for sending incomplete email, hit wrong button :) I guess I
 can't use Gmail */
 
 Long story short: We've found that operations on a directory structure
 holding many dirs takes ages on ext4.
 
 The Question: Why there's that huge difference in ext4 and btrfs? See
 below test results for real values.
 
 Background: I had to backup a Jenkins directory holding workspace for
 few projects which were co from svn (implies lot of extra .svn dirs).
 The copy takes lot of time (at least more than I've expected) and
 process was mostly in D (disk sleep). I've dig more and done some
 extra test to see if this is not a regression on block/fs site. To
 isolate the issue I've also performed same tests on btrfs.
 
 Test environment configuration:
 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
 enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
 2) Kernels: All tests were done on following kernels:
 - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
 config changes mostly. In -3 we've introduced ,,fix readahead pipeline
 break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
 - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
 release recently).
 3) A subject of tests, directory holding:
 - 54GB of data (measured on ext4)
 - 1978149 files
 - 844008 directories
 4) Mount options:
 - ext4 -- errors=remount-ro,noatime,
 data=writeback
 - btrfs -- noatime,nodatacow and for later investigation on
 copression effect: noatime,nodatacow,compress=lzo
 
 In all tests I've been measuring time of execution. Following tests
 were performed:
 - find . -type d
 - find . -type f
 - cp -a
 - rm -rf
 
 Ext4 results:
 | Type | 2.6.39.4-3   | 3.2.7
 | Dir cnt  | 17m 40sec  | 11m 20sec
 | File cnt |  17m 36sec | 11m 22sec
 | Copy| 1h 28m| 1h 27m
 | Remove| 3m 43sec| 3m 38sec
 
 Btrfs results (without lzo comression):
 | Type | 2.6.39.4-3   | 3.2.7
 | Dir cnt  | 2m 22sec  | 2m 21sec
 | File cnt |  2m 26sec | 2m 23sec
 | Copy| 36m 22sec | 39m 35sec
 | Remove| 7m 51sec   | 10m 43sec
 
 From above one can see that copy takes close to 1h less on btrfs. I've
 done strace counting times of calls, results are as follows (from
 3.2.7):
 1) Ext4 (only to elements):
 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
 57.01   13.257850   1  15082163   read
 23.405.440353   3   1687702   getdents
 6.151.430559   0   3672418   lstat
 3.800.883767   0  13106961   write
 2.320.539959   0   4794099   open
 1.690.393589   0843695   mkdir
 1.280.296700   0   5637802   setxattr
 0.800.186539   0   7325195   stat
 
 2) Btrfs:
 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
 53.389.486210   1  15179751   read
 11.382.021662   1   1688328   getdents
 10.641.890234   0   4800317   open
 6.831.213723   0  13201590   write
 4.850.862731   0   5644314   setxattr
 3.500.621194   1844008   mkdir
 2.750.489059   0   3675992 1 lstat
 1.710.303544   0   5644314   llistxattr
 1.500.265943   0   1978149   utimes
 1.020.180585   0   5644314844008 getxattr
 
 On btrfs 

[PATCH] btrfs: fix locking issues in find_parent_nodes()

2012-02-29 Thread Li Zefan
- We might unlock head-mutex while it was not locked
- We might leave the function without unlocking delayed_refs-lock

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/backref.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 98f6bf10..0436c12 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -583,7 +583,7 @@ static int find_parent_nodes(struct btrfs_trans_handle 
*trans,
struct btrfs_path *path;
struct btrfs_key info_key = { 0 };
struct btrfs_delayed_ref_root *delayed_refs = NULL;
-   struct btrfs_delayed_ref_head *head = NULL;
+   struct btrfs_delayed_ref_head *head;
int info_level = 0;
int ret;
struct list_head prefs_delayed;
@@ -607,6 +607,8 @@ static int find_parent_nodes(struct btrfs_trans_handle 
*trans,
 * at a specified point in time
 */
 again:
+   head = NULL;
+
ret = btrfs_search_slot(trans, fs_info-extent_root, key, path, 0, 0);
if (ret  0)
goto out;
@@ -635,8 +637,10 @@ again:
goto again;
}
ret = __add_delayed_refs(head, seq, info_key, prefs_delayed);
-   if (ret)
+   if (ret) {
+   spin_unlock(delayed_refs-lock);
goto out;
+   }
}
spin_unlock(delayed_refs-lock);
 
-- 
1.7.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html