Raid0 with btrfs

2010-08-05 Thread Leonidas Spyropoulos
Hi all,

I want to make a btrfs raid0 on 2 partitions of my pc.
Until now I am using the mdadm tools to make a software raid of the 2
partitions /dev/sde2, /dev/sdd2
and then mkfs.etx4 the newly created /dev/md0 device.
From performance point of view is it better to keep the configuration of mdadm
and just format the /dev/md0 device as btrfs OR
delete the raid device and format the 2 partitions /dev/sde2 /dev/sdd2
as a btrfs with 2 devices?
mkfs.btrfs /dev/sde2 /dev/sdd2

On a sidenote:
If I decide to go for raid5 which is not supported currently from mkfs
I have to use the mdadm tool anyway, right?

-- 
Caution: breathing may be hazardous to your health.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid0 with btrfs

2010-08-05 Thread Hubert Kario
On Thursday 05 August 2010 16:15:22 Leonidas Spyropoulos wrote:
 Hi all,
 
 I want to make a btrfs raid0 on 2 partitions of my pc.
 Until now I am using the mdadm tools to make a software raid of the 2
 partitions /dev/sde2, /dev/sdd2
 and then mkfs.etx4 the newly created /dev/md0 device.
 From performance point of view is it better to keep the configuration of
 mdadm and just format the /dev/md0 device as btrfs OR
 delete the raid device and format the 2 partitions /dev/sde2 /dev/sdd2
 as a btrfs with 2 devices?
 mkfs.btrfs /dev/sde2 /dev/sdd2

Btrfs already supports metadata mirroring when the data is striped. What this 
means, is while the performance should be more-or-less identical to MD RAID0 
(if it isn't it's a bug), your data is a bit more secure as the metadata 
describing it resides on both drives. Later on it will be possible to selct 
which directories/files should have what level of redundancy. This will allow 
to have ~/work RAID1-ed and ~/videos RAID0-ed while keeping both directories 
on the same partition and filesystem.

 On a sidenote:
 If I decide to go for raid5 which is not supported currently from mkfs
 I have to use the mdadm tool anyway, right?

yes, RAID5 code is not in trunk yet.

-- 
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawerów 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wanted X found X-1 but you got X-2

2010-08-05 Thread adi
 Hi Miao,

ok, i'll try:

mkfs.ext3 /dev/sdaN
»used it as root-file-system quite a while«
tune2fs -O extents,uninit_bg,dir_index /dev/sdaN  (converted to ext4)
»used it as root file system quite another while«
btrfs-convert /dev/sdaN (converted to btrfs)
»used it as root filesystem with -compress quite another while«
installed tuxonice-patches on kernel
restart - still works
suspend-to-disk
and from here on the bug appeared

hope, this helps,
Ciao, Adi


On 03.08.2010 15:07, Miao Xie wrote:
 Hi, Adi

 I don't know the detailed reproduce steps.Could you tell us the steps
 just  like this?
 # mkfs.btrfs /dev/sdaN
 # mount /dev/sdaN /mnt
 # dd ..
 # (suspend-to-disk)
 ...

 It is very useful for our analysis.

 Thanks
 Miao

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Poor read performance on high-end server

2010-08-05 Thread Mathieu Chouquet-Stringer
Hello,

freek.dijks...@sara.nl (Freek Dijkstra) writes:
 [...]

 Here are the exact settings:
 ~# mkfs.btrfs -d raid0 /dev/sdd /dev/sde /dev/sdf /dev/sdg \
  /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm \
  /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds
 nodesize 4096 leafsize 4096 sectorsize 4096 size 2.33TB
 Btrfs Btrfs v0.19

Don't you need to stripe metadata too (with -m raid0)?  Or you may
be limited by your metadata drive?

-- 
Mathieu Chouquet-Stringer   mchou...@free.fr
The sun itself sees not till heaven clears.
 -- William Shakespeare --
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Poor read performance on high-end server

2010-08-05 Thread Freek Dijkstra
Chris, Daniel and Mathieu,

Thanks for your constructive feedback!

 On Thu, Aug 05, 2010 at 04:05:33PM +0200, Freek Dijkstra wrote:
  ZFS BtrFS
 1 SSD  256 MiByte/s 256 MiByte/s
 2 SSDs 505 MiByte/s 504 MiByte/s
 3 SSDs 736 MiByte/s 756 MiByte/s
 4 SSDs 952 MiByte/s 916 MiByte/s
 5 SSDs1226 MiByte/s 986 MiByte/s
 6 SSDs1450 MiByte/s 978 MiByte/s
 8 SSDs1653 MiByte/s 932 MiByte/s
 16 SSDs   2750 MiByte/s 919 MiByte/s

[...]
 The above results were for Ubuntu 10.04.1 server, with BtrFS v0.19,
 
 Which kernels are those?

For BtrFS: Linux 2.6.32-21-server #32-Ubuntu SMP x86_64 GNU/Linux
For ZFS: FreeBSD 8.1-RELEASE (GENERIC)

(Note that we can currently not upgrade easily due to binary drivers for
the SAS+SATA controllers :(. I'd be happy to push the vendor though, if
you think it makes a difference.)


Daniel J Blueman wrote:

 Perhaps create a new filesystem and mount with 'nodatasum'

I get an improvement: 919 MiByte/s just became 1580 MiByte/s. Not as
fast as it can, but most certainly an improvement.

 existing extents which were previously created will be checked, so
 need to start fresh.

Indeed, also the other way around. I created two test files, while
mounted with and without the -o nodatasum option:
write w/o nodatasum; read w/o nodatasum:  919 ± 43 MiByte/s
write w/o nodatasum; read w/  nodatasum:  922 ± 72 MiByte/s
write w/  nodatasum; read w/o nodatasum: 1082 ± 46 MiByte/s
write w/  nodatasum; read w/  nodatasum: 1586 ± 126 MiByte/s

So even if I remount the disk in the normal way, and read a file created
without checksums, I still get a small improvement :)

(PS: the above tests were repeated 4 times, the last even 8 times. As
you can see from the standard deviation, the results are not always very
accurate. The cause is unknown; CPU load is low.)


Chris Mason wrote:

 Basically we have two different things to tune.  First the block layer
 and then btrfs.


 And then we need to setup a fio job file that hammers on all the ssds at
 once.  I'd have it use adio/dio and talk directly to the drives.
 
 [global]
 size=32g
 direct=1
 iodepth=8
 bs=20m
 rw=read
 
 [f1]
 filename=/dev/sdd
 [f2]
 filename=/dev/sde
 [f3]
 filename=/dev/sdf
[...]
 [f16]
 filename=/dev/sds

Thanks. First one disk:

 f1: (groupid=0, jobs=1): err= 0: pid=6273
   read : io=32780MB, bw=260964KB/s, iops=12, runt=128626msec
 clat (usec): min=74940, max=80721, avg=78449.61, stdev=923.24
 bw (KB/s) : min=240469, max=269981, per=100.10%, avg=261214.77, 
 stdev=2765.91
   cpu  : usr=0.01%, sys=2.69%, ctx=1747, majf=0, minf=5153
   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0%
  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 =64=0.0%
  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 =64=0.0%
  issued r/w: total=1639/0, short=0/0
 
  lat (msec): 100=100.00%
 
 Run status group 0 (all jobs):
READ: io=32780MB, aggrb=260963KB/s, minb=267226KB/s, maxb=267226KB/s, 
 mint=128626msec, maxt=128626msec
 
 Disk stats (read/write):
   sdd: ios=261901/0, merge=0/0, ticks=10135270/0, in_queue=10136460, 
 util=99.30%

So 255 MiByte/s.
Out of curiousity, what is the distinction between the reported figures
of 260964 kiB/s, 261214.77 kiB/s, 267226 kiB/s and 260963 kiB/s?


Now 16 disks (abbreviated):

 ~/fio# ./fio ssd.fio
 Starting 16 processes
 f1: (groupid=0, jobs=1): err= 0: pid=4756
   read : io=32780MB, bw=212987KB/s, iops=10, runt=157600msec
 clat (msec): min=75, max=138, avg=96.15, stdev= 4.47
  lat (msec): min=75, max=138, avg=96.15, stdev= 4.47
 bw (KB/s) : min=153121, max=268968, per=6.31%, avg=213181.15, 
 stdev=9052.26
   cpu  : usr=0.00%, sys=1.71%, ctx=2737, majf=0, minf=5153
   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0%
  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 =64=0.0%
  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 =64=0.0%
  issued r/w: total=1639/0, short=0/0
 
  lat (msec): 100=97.99%, 250=2.01%

[..similar for f2 to f16..]

 f1:  read : io=32780MB, bw=212987KB/s, iops=10, runt=157600msec
 bw (KB/s) : min=153121, max=268968, per=6.31%, avg=213181.15, 
 stdev=9052.26
 f2:  read : io=32780MB, bw=213873KB/s, iops=10, runt=156947msec
 bw (KB/s) : min=151143, max=251508, per=6.33%, avg=213987.34, 
 stdev=8958.86
 f3:  read : io=32780MB, bw=214613KB/s, iops=10, runt=156406msec
 bw (KB/s) : min=149216, max=219037, per=6.35%, avg=214779.89, 
 stdev=9332.99
 f4:  read : io=32780MB, bw=214388KB/s, iops=10, runt=156570msec
 bw (KB/s) : min=148675, max=226298, per=6.35%, avg=214576.51, 
 stdev=8985.03
 f5:  read : io=32780MB, bw=213848KB/s, iops=10, runt=156965msec
 bw (KB/s) : min=144479, max=241414, per=6.33%, avg=213935.81, 
 stdev=10023.68
 f6:  read : io=32780MB, bw=213514KB/s, iops=10,