Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-16 Thread Marc MERLIN
To close this thread.

I gave up on the drive after it showed abysmal benchmarking results in
windows too.

I owed everyone an update, which I just finished typing:
http://marc.merlins.org/perso/linux/post_2012-08-15_The-tale-of-SSDs_-Crucial-C300-early-Death_-Samsung-830-extreme-random-IO-slowness_-and-settling-with-OCZ-Vertex-4.html

I'm not sure how I could have gotten 2 bad drives from Samsung in 2
different shipments, so I'm afraid the entire line may be bad. At least, it
was for me after extensive benchmarking, and even using their own windows
benchmarking tool.

In the end, I got a OCZ Vertex 4 and it's superfast as per the benchmarks I
posted in the link above.

What is interesting is that ext4 is somewhat faster than btrfs in the tests
I posted above, but not in ways that should be truly worrisome.
I'm still happy to have COW and snapshots, even if that costs me
performance.

Sorry for spamming the list with what apparently just ended up a crappy
drive (well, 2 since I got two just to make sure) from a vendor who
shouldn't have been selling crappy SSDs.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-03 Thread Martin Steigerwald
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN:
   I'll try plugging this SSD in a totally different PC and see what
   happens. This may say if it's an AHCI/intel sata driver problem.
 
  
 
  Seems we will continue until someone starts to complain here. Maybe
  another list will be more approbiate? But then this thread has it all
  in one ;). Adding a CC with some introductory note might be
  approbiate. Its your problem, so you decide ;). I´d suggest the fio
  mailing list, there are other performance people how may want to
  chime in.
 
  
 Actually you know the lists and people involved more than me. I'd be
 happy if you added a Cc to another list you think is best, and we can
 move there.

I have no more ideas for the moment.

I suggest you to try the fio mailing list at fio lalala vger.kernel.org.

I currently have no ideas for other lists. I bet there is some block layer 
/ libata / scsi list that might match. Just browse the lists from 
kernel.org and pick the one that sounds most suitable for me¹. Probably 
fsdevel but since the thing does not seem to be filesystem related beside 
some alignment effect (stripe in Ext4). Hmmm, probably linux-scsi, but 
look there first, whether block layer and libata related things are 
discussed there.

I would start a new thread. Tell the problem, tell in summary what you 
have tried and the outcome of that was and ask for advice. Add a link to 
the original thread here. (It might be a good idea to have one more post 
here with a link to the new thread so that other curious people can follow 
there – in case there are any. Otherwise I would end it on this thread. 
The thread view already looks ridiculous in KMail here ;-)

And put me on Cc ;). I´d really like to know whats going on here. As 
written I have no more ideas right now. It seems to be getting to low 
level for me.

[1] http://vger.kernel.org/vger-lists.html

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-02 Thread Martin Steigerwald
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN:
 On Wed, Aug 01, 2012 at 11:57:39PM +0200, Martin Steigerwald wrote:
  Its getting quite strange.
  
 I would agree :)
 
 Before I paste a bunch of thing, I wanted to thank you for not giving up on 
 me 
 and offering your time to help me figure this out :)

You are welcome.

Well I am holding Linux performance analysis  tuning trainings and I am
really interested into issues like this ;)

I will take care of myself and I take my time to respond or even do not
respond at all anymore if I run out of ideas ;).

  I lost track of whether you did that already or not, but if you didn´t
  please post some
  
  vmstat 1
  iostat -xd 1
  on the device while it is being slow.
  
 Sure thing, here's the 24 second du -s run:
 procs ---memory-- ---swap-- -io -system-- cpu
  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
  2  1  0 2747264 44 348388002850  242  184 19  6 74  1
  1  0  0 2744128 44 35170000   144 0 2758 32115 30  5 61  
 4
  2  1  0 2743100 44 35199200   792 0 2616 30613 28  4 50 
 18
  1  1  0 2741592 44 35266800   776 0 2574 31551 29  4 45 
 21
  1  1  0 2740720 44 35343200   692 0 2734 32891 30  4 45 
 22
  1  1  0 2740104 44 35428400   460 0 2639 31585 30  4 45 
 21
  3  1  0 2738520 44 35469200   544   264 2834 30302 32  5 42 
 21
  1  1  0 2736936 44 35547600  1064  2012 2867 31172 28  4 45 
 23

A bit more wait I/O with not even 10% of the throughput as compared to
the Intel SSD 320 figures. Seems that Intel SSD is running circles around
your Samsung SSD while not – as expected for that use case – being fully
utilized.

 Linux 3.5.0-amd64-preempt-noide-20120410 (gandalfthegreat)08/01/2012  
 _x86_64_(4 CPU)
 rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz   await 
 r_await w_await  svctm  %util
   2.18 0.686.45   17.7778.12   153.3919.12 0.187.51   
  8.527.15   4.46  10.81
   0.00 0.00  118.000.00   540.00 0.00 9.15 1.189.93   
  9.930.00   4.98  58.80
   0.00 0.00  217.000.00   868.00 0.00 8.00 1.908.77   
  8.770.00   4.44  96.40
   0.00 0.00  192.000.00   768.00 0.00 8.00 1.638.44   
  8.440.00   5.10  98.00
   0.00 0.00  119.000.00   476.00 0.00 8.00 1.069.01   
  9.010.00   8.20  97.60
   0.00 0.00  125.000.00   500.00 0.00 8.00 1.088.67   
  8.670.00   7.55  94.40
   0.00 0.00  165.000.00   660.00 0.00 8.00 1.509.12   
  9.120.00   5.87  96.80
   0.00 0.00  195.00   13.00   780.00   272.0010.12 1.688.10   
  7.94   10.46   4.65  96.80
   0.00 0.00  173.000.00   692.00 0.00 8.00 1.729.87   
  9.870.00   5.71  98.80
   0.00 0.00  171.000.00   684.00 0.00 8.00 1.629.33   
  9.330.00   5.75  98.40
   0.00 0.00  161.000.00   644.00 0.00 8.00 1.529.57   
  9.570.00   6.14  98.80
   0.00 0.00  136.000.00   544.00 0.00 8.00 1.269.29   
  9.290.00   7.24  98.40
   0.00 0.00  199.000.00   796.00 0.00 8.00 1.949.73   
  9.730.00   4.94  98.40
   0.00 0.00  201.000.00   804.00 0.00 8.00 1.708.54   
  8.540.00   4.80  96.40
   0.00 0.00  272.00   15.00  1088.00   272.00 9.48 2.358.21   
  8.463.73   3.39  97.20
[…]
  I am interested in wait I/O and latencies and disk utilization.
  
 Cool tool, I didn't know about iostat.
 My r_await numbers don't look good obviously and yet %util is pretty much
 100% the entire time.
 
 Does that show that it's indeed the device that is unable to deliver the 
 requests any quicker, despite
 being an ssd, or are you reading this differently?

That, or…

  Also I am interested in
  merkaba:~ hdparm -I /dev/sda | grep -i queue
  Queue depth: 32
 *Native Command Queueing (NCQ)
  output for your SSD.
 
 gandalfthegreat:/var/local# hdparm -I /dev/sda | grep -i queue   
   Queue depth: 32
  *Native Command Queueing (NCQ)
 gandalfthegreat:/var/local# 
 
 I've the the fio tests in:
 /dev/mapper/cryptroot /var btrfs 
 rw,noatime,compress=lzo,nossd,discard,space_cache 0 0

… you are still using dm_crypt?

Please test without dm_crypt. My figures are from within LVM, but no
dm_crypt. Its good to have a comparable base for the measurements.

 (discard is there, so fstrim shouldn't be needed)

I can´t imagine why it should matter, but maybe its worth having some
tests without „discard“. 

  I also suggest to use fio with with the ssd-test example on the SSD. I
  have some comparison data available for my setup. Heck it should be
  publicly available 

Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-02 Thread Martin Steigerwald
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN:
 So, doctor, is it bad? :)
 
 randomwrite: (g=0): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, 
 iodepth=64
 sequentialwrite: (g=1): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, 
 iodepth=64
 randomread: (g=2): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
 sequentialread: (g=3): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
 2.0.8
 Starting 4 processes
 randomwrite: Laying out IO file(s) (1 file(s) / 2048MB)
 Jobs: 1 (f=1): [___R] [100.0% done] [558.8M/0K /s] [63.8K/0  iops] [eta 
 00m:00s]  
 randomwrite: (groupid=0, jobs=1): err= 0: pid=7193
   write: io=102048KB, bw=1700.8KB/s, iops=189 , runt= 60003msec
 slat (usec): min=21 , max=219834 , avg=5250.91, stdev=5936.55
 clat (usec): min=25 , max=738932 , avg=329339.45, stdev=106004.63
  lat (msec): min=4 , max=751 , avg=334.59, stdev=107.57
 clat percentiles (msec):

Heck, I didn´t look at the IOPS figure!

189 IOPS for a SATA-600 SSD. Thats pathetic.

So again, please test this without dm_crypt. I can´t believe that this
is the maximum the hardware is able to achieve.

A really fast 15000 rpm SAS harddisk might top that.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-02 Thread Marc MERLIN
On Thu, Aug 02, 2012 at 01:18:07PM +0200, Martin Steigerwald wrote:
  I've the the fio tests in:
  /dev/mapper/cryptroot /var btrfs 
  rw,noatime,compress=lzo,nossd,discard,space_cache 0 0
 
 … you are still using dm_crypt?
 
That was my biggest partition and so far I've found no performance impact
on file access between unencrypted and dm_crypt.
I just took out my swap partition and made a smaller btrfs there:
/dev/sda3 /mnt/mnt3 btrfs rw,noatime,ssd,space_cache 0 0

I mounted without discard.

  lat (usec) : 50=0.01%
  lat (msec) : 10=0.02%, 20=0.02%, 50=0.05%, 100=0.14%, 250=12.89%
  lat (msec) : 500=72.44%, 750=14.43%
 
 Gosh, look at these latencies!
 
 72,44% of all requests above 500 (in words: five hundred) milliseconds!
 And 14,43% above 750 msecs. The percentage of requests served at 100 msecs
 or less is was below one percent! Hey, is this an SSD or what?
 
Yeah, that's kind of what I've been complaining about since the beginning :)
Once I'm reading sequentially, it goes fast, but random access/latency is
indeed abysmal.

 Still even with iodepth 64 totally different picture. And look at the IOPS
 and throughput.
 
Yep. I know mine are bad :(

 For reference, this refers to
 
 [global]
 ioengine=libaio
 direct=1
 iodepth=64

Since it's slightly different than the first job file you gave me, I re-ran
with this one this time.

gandalfthegreat:~# /sbin/mkfs.btrfs -L test /dev/sda2
gandalfthegreat:~# mount -o noatime /dev/sda2 /mnt/mnt2
gandalfthegreat:~# grep sda2 /proc/mounts
/dev/sda2 /mnt/mnt2 btrfs rw,noatime,ssd,space_cache 0 0

here's the btrfs test (ext4 is lower down):
gandalfthegreat:/mnt/mnt2# fio ~/fio.job2
zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
zufälligschreiben: (g=2): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, 
iodepth=64
sequentiellschreiben: (g=3): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, 
iodepth=64
2.0.8
Starting 4 processes
zufälliglesen: Laying out IO file(s) (1 file(s) / 2048MB)
sequentielllesen: Laying out IO file(s) (1 file(s) / 2048MB)
zufälligschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
sequentiellschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
Jobs: 1 (f=1): [___W] [59.5% done] [0K/1800K /s] [0 /193  iops] [eta 02m:10s]   
  
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=30318
  read : io=73682KB, bw=1227.1KB/s, iops=137 , runt= 60004msec
slat (usec): min=3 , max=37432 , avg=7252.52, stdev=5717.70
clat (usec): min=13 , max=981927 , avg=454046.13, stdev=110527.92
 lat (msec): min=5 , max=999 , avg=461.30, stdev=112.00
clat percentiles (msec):
 |  1.00th=[  145],  5.00th=[  269], 10.00th=[  371], 20.00th=[  408],
 | 30.00th=[  424], 40.00th=[  437], 50.00th=[  449], 60.00th=[  457],
 | 70.00th=[  474], 80.00th=[  490], 90.00th=[  570], 95.00th=[  644],
 | 99.00th=[  865], 99.50th=[  922], 99.90th=[  963], 99.95th=[  979],
 | 99.99th=[  979]
bw (KB/s)  : min=8, max= 2807, per=100.00%, avg=1227.75, stdev=317.57
lat (usec) : 20=0.01%
lat (msec) : 10=0.01%, 20=0.02%, 50=0.04%, 100=0.46%, 250=3.82%
lat (msec) : 500=79.48%, 750=13.57%, 1000=2.58%
  cpu  : usr=0.12%, sys=1.13%, ctx=12186, majf=0, minf=276
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, =64=99.2%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, =64=0.0%
 issued: total=r=8262/w=0/d=0, short=r=0/w=0/d=0
sequentielllesen: (groupid=1, jobs=1): err= 0: pid=30340
  read : io=2048.0MB, bw=211257KB/s, iops=23473 , runt=  9927msec
slat (usec): min=1 , max=56321 , avg=20.51, stdev=424.44
clat (usec): min=0 , max=57987 , avg=2695.98, stdev=6624.00
 lat (usec): min=1 , max=58015 , avg=2716.75, stdev=6642.09
clat percentiles (usec):
 |  1.00th=[1],  5.00th=[   10], 10.00th=[   30], 20.00th=[  100],
 | 30.00th=[  217], 40.00th=[  362], 50.00th=[  494], 60.00th=[  636],
 | 70.00th=[  892], 80.00th=[ 1656], 90.00th=[ 7392], 95.00th=[21632],
 | 99.00th=[29056], 99.50th=[29568], 99.90th=[43776], 99.95th=[46848],
 | 99.99th=[57600]
bw (KB/s)  : min=166675, max=260984, per=99.83%, avg=210892.26, 
stdev=22433.65
lat (usec) : 2=2.16%, 4=0.43%, 10=2.33%, 20=2.80%, 50=5.72%
lat (usec) : 100=6.47%, 250=12.35%, 500=18.29%, 750=15.52%, 1000=5.44%
lat (msec) : 2=13.04%, 4=3.59%, 10=3.83%, 20=1.88%, 50=6.11%
lat (msec) : 100=0.04%
  cpu  : usr=4.51%, sys=35.70%, ctx=11480, majf=0, minf=278
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, =64=100.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, =64=0.0%
 issued: total=r=233025/w=0/d=0, short=r=0/w=0/d=0
zufälligschreiben: (groupid=2, jobs=1): err= 0: 

Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-02 Thread Martin Steigerwald
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN:
 On Thu, Aug 02, 2012 at 01:18:07PM +0200, Martin Steigerwald wrote:
   I've the the fio tests in:
   /dev/mapper/cryptroot /var btrfs 
   rw,noatime,compress=lzo,nossd,discard,space_cache 0 0
  
  … you are still using dm_crypt?
[…]
 I just took out my swap partition and made a smaller btrfs there:
 /dev/sda3 /mnt/mnt3 btrfs rw,noatime,ssd,space_cache 0 0
 
 I mounted without discard.
[…]
  For reference, this refers to
  
  [global]
  ioengine=libaio
  direct=1
  iodepth=64
 
 Since it's slightly different than the first job file you gave me, I re-ran
 with this one this time.
 
 gandalfthegreat:~# /sbin/mkfs.btrfs -L test /dev/sda2
 gandalfthegreat:~# mount -o noatime /dev/sda2 /mnt/mnt2
 gandalfthegreat:~# grep sda2 /proc/mounts
 /dev/sda2 /mnt/mnt2 btrfs rw,noatime,ssd,space_cache 0 0
 
 here's the btrfs test (ext4 is lower down):
 gandalfthegreat:/mnt/mnt2# fio ~/fio.job2
 zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, 
 iodepth=64
 sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, 
 iodepth=64
 zufälligschreiben: (g=2): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, 
 iodepth=64
 sequentiellschreiben: (g=3): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, 
 iodepth=64
 2.0.8

Still abysmal except for sequential reads.

 Starting 4 processes
 zufälliglesen: Laying out IO file(s) (1 file(s) / 2048MB)
 sequentielllesen: Laying out IO file(s) (1 file(s) / 2048MB)
 zufälligschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
 sequentiellschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
 Jobs: 1 (f=1): [___W] [59.5% done] [0K/1800K /s] [0 /193  iops] [eta 02m:10s] 
 
 zufälliglesen: (groupid=0, jobs=1): err= 0: pid=30318
   read : io=73682KB, bw=1227.1KB/s, iops=137 , runt= 60004msec
[…]
 lat (usec) : 20=0.01%
 lat (msec) : 10=0.01%, 20=0.02%, 50=0.04%, 100=0.46%, 250=3.82%
 lat (msec) : 500=79.48%, 750=13.57%, 1000=2.58%

1 second latency? Oh well…


 Run status group 3 (all jobs):
   WRITE: io=84902KB, aggrb=1414KB/s, minb=1414KB/s, maxb=1414KB/s, 
 mint=60005msec, maxt=60005msec
 gandalfthegreat:/mnt/mnt2#
 (no lines beyond this, fio 2.0.8)
[…]
  Can you also post the last lines:
  
  Disk stats (read/write):
dm-2: ios=616191/613142, merge=0/0, ticks=1300820/2565384, 
  in_queue=3867448, util=98.81%, aggrios=504829/504643, 
  aggrmerge=111362/111451, aggrticks=1058320/2164664, aggrin_queue=3223048, 
  aggrutil=98.78%
  sda: ios=504829/504643, merge=111362/111451, ticks=1058320/2164664, 
  in_queue=3223048, util=98.78%
  martin@merkaba:~/Artikel/LinuxNewMedia/fio/Recherche/Messungen/merkaba
  
 I didn't get these lines.

Hmmm, back then I guess this was fio 1.5.9 or so.

 I tried deadline and noop, and indeed I'm not see much of a difference for my 
 basic tests.
 Fro now I have deadline.
  
  So my recommendation of now:
  
  Remove as much factors as possible and in order to compare results with
  what I posted try with plain logical volume with Ext4.
 
 gandalfthegreat:~# mkfs.ext4 -O extent -b 4096 -E stride=128,stripe-width=128 
 /dev/sda2
 /dev/sda2 /mnt/mnt2 ext4 rw,noatime,stripe=128,data=ordered 0 0
 
 gandalfthegreat:/mnt/mnt2# fio ~/fio.job2
 zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, 
 iodepth=64
 sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, 
 iodepth=64
 zufälligschreiben: (g=2): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, 
 iodepth=64
 sequentiellschreiben: (g=3): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, 
 iodepth=64
 2.0.8
 Starting 4 processes
 zufälliglesen: Laying out IO file(s) (1 file(s) / 2048MB)
 sequentielllesen: Laying out IO file(s) (1 file(s) / 2048MB)
 zufälligschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
 sequentiellschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
 Jobs: 1 (f=1): [___W] [63.8% done] [0K/2526K /s] [0 /280  iops] [eta 01m:21s] 
  
 zufälliglesen: (groupid=0, jobs=1): err= 0: pid=30077
   read : io=2048.0MB, bw=276232KB/s, iops=50472 , runt=  7592msec
 slat (usec): min=2 , max=2276 , avg= 6.87, stdev=12.01
 clat (usec): min=249 , max=52128 , avg=1258.87, stdev=1714.63
  lat (usec): min=260 , max=52134 , avg=1266.00, stdev=1715.36
 clat percentiles (usec):
  |  1.00th=[  450],  5.00th=[  548], 10.00th=[  620], 20.00th=[  724],
  | 30.00th=[  820], 40.00th=[  908], 50.00th=[ 1004], 60.00th=[ 1096],
  | 70.00th=[ 1208], 80.00th=[ 1368], 90.00th=[ 1640], 95.00th=[ 2040],
  | 99.00th=[ 8256], 99.50th=[14912], 99.90th=[21120], 99.95th=[23168],
  | 99.99th=[33024]
 bw (KB/s)  : min=76463, max=385328, per=100.00%, avg=277313.20, 
 stdev=94661.29
 lat (usec) : 250=0.01%, 500=2.46%, 750=19.82%, 1000=27.70%
 lat (msec) : 2=44.79%, 4=3.55%, 10=0.72%, 20=0.78%, 50=0.17%
 lat (msec) : 100=0.01%
   cpu  : usr=11.91%, sys=51.64%, ctx=91337, majf=0, minf=277
   IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, 

Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-02 Thread Marc MERLIN
On Thu, Aug 02, 2012 at 10:20:07PM +0200, Martin Steigerwald wrote:
 Hey, whats this? With Ext4 you have really good random read performance
 now! Way better than the Intel SSD 320 and…

Yep, my du -sh tests do show that ext4 is 2x faster than btrfs.
Obviously it's sending IO in a way that either the IO subsystem, linux
driver, or drive prefer.

 The performance of your Samsung SSD seems to be quite erratic. It seems
 that the device is capable of being fast, but only sometimes shows this
 capability.
 
That's indeed exactly what I'm seeing in real life :)

   Have the IOPS run on the device it self. That will remove any filesystem
   layer. But only the read only tests, to make sure I suggest to use fio
   with the --readonly option as safety guard. Unless you have a spare SSD
   that you can afford to use for write testing which will likely destroy
   every filesystem on it. Or let it run on just one logical volume.
   
  Can you send me a recommended job config you'd like me to run if the runs
  above haven't already answered your questions?
 
 [global]
(...)

I used this and just changed filename to /dev/sda. Since I'm reading
from the beginning of the drive, reads have to be aligned.

 I won´t expect much of a difference, but then the random read performance
 is quite different between Ext4 and BTRFS on this disk. That would make
 it interesting to test without any filesystem in between and over the
 whole device.

Here is the output:
gandalfthegreat:~# fio --readonly ./fio.job3
zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1
sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1
2.0.8
Starting 2 processes
Jobs: 1 (f=1): [_R] [66.9% done] [966K/0K /s] [108 /0  iops] [eta 01m:00s] 
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=2172
  read : io=59036KB, bw=983.93KB/s, iops=108 , runt= 60002msec
slat (usec): min=5 , max=158 , avg=27.62, stdev=10.64
clat (usec): min=45 , max=27348 , avg=9150.78, stdev=4452.66
 lat (usec): min=53 , max=27370 , avg=9179.05, stdev=4454.88
clat percentiles (usec):
 |  1.00th=[  126],  5.00th=[  235], 10.00th=[ 5216], 20.00th=[ 5920],
 | 30.00th=[ 5920], 40.00th=[ 5984], 50.00th=[ 7712], 60.00th=[12480],
 | 70.00th=[12608], 80.00th=[12736], 90.00th=[12864], 95.00th=[16768],
 | 99.00th=[18560], 99.50th=[18816], 99.90th=[20352], 99.95th=[22656],
 | 99.99th=[27264]
bw (KB/s)  : min=  423, max= 5776, per=100.00%, avg=986.48, stdev=480.68
lat (usec) : 50=0.11%, 100=0.64%, 250=4.47%, 500=1.65%, 750=0.02%
lat (usec) : 1000=0.02%
lat (msec) : 2=0.06%, 4=0.03%, 10=43.31%, 20=49.51%, 50=0.18%
  cpu  : usr=0.17%, sys=0.45%, ctx=6534, majf=0, minf=26
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0%
 issued: total=r=6532/w=0/d=0, short=r=0/w=0/d=0
sequentielllesen: (groupid=1, jobs=1): err= 0: pid=2199
  read : io=54658KB, bw=932798 B/s, iops=101 , runt= 60002msec
slat (usec): min=5 , max=140 , avg=28.63, stdev= 9.91
clat (usec): min=39 , max=34210 , avg=9799.18, stdev=4471.32
 lat (usec): min=45 , max=34228 , avg=9828.50, stdev=4472.06
clat percentiles (usec):
 |  1.00th=[   61],  5.00th=[ 5088], 10.00th=[ 5856], 20.00th=[ 5920],
 | 30.00th=[ 5984], 40.00th=[ 6048], 50.00th=[11840], 60.00th=[12608],
 | 70.00th=[12608], 80.00th=[12736], 90.00th=[16512], 95.00th=[17536],
 | 99.00th=[18816], 99.50th=[19584], 99.90th=[24960], 99.95th=[29568],
 | 99.99th=[34048]
bw (KB/s)  : min=  405, max= 2680, per=100.00%, avg=912.92, stdev=261.62
lat (usec) : 50=0.41%, 100=1.77%, 250=1.20%, 500=0.23%, 750=0.02%
lat (usec) : 1000=0.03%
lat (msec) : 2=0.02%, 10=43.06%, 20=52.91%, 50=0.36%
  cpu  : usr=0.15%, sys=0.45%, ctx=6103, majf=0, minf=28
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0%
 issued: total=r=6101/w=0/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=59036KB, aggrb=983KB/s, minb=983KB/s, maxb=983KB/s, mint=60002msec, 
maxt=60002msec

Run status group 1 (all jobs):
   READ: io=54658KB, aggrb=910KB/s, minb=910KB/s, maxb=910KB/s, mint=60002msec, 
maxt=60002msec

Disk stats (read/write):
  sda: ios=12660/2072, merge=5/34, ticks=119452/22496, in_queue=141936, 
util=99.30%

 … or get yourself another SSD. Its your decision.
 
 I admire your endurance. ;)

Since I've gotten 2 SSDs to make sure I didn't get one bad one, and that the
company is greating great reviews for them, I'm now pretty sure that
it's either a problem with a linux driver, which is interesting for us
all to debug :)
If I go buy another brand, the next guy 

Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-02 Thread Martin Steigerwald
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN:
 On Thu, Aug 02, 2012 at 10:20:07PM +0200, Martin Steigerwald wrote:
  Hey, whats this? With Ext4 you have really good random read performance
  now! Way better than the Intel SSD 320 and…
 
 Yep, my du -sh tests do show that ext4 is 2x faster than btrfs.
 Obviously it's sending IO in a way that either the IO subsystem, linux
 driver, or drive prefer.

But only on reads.

Have the IOPS run on the device it self. That will remove any filesystem
layer. But only the read only tests, to make sure I suggest to use fio
with the --readonly option as safety guard. Unless you have a spare SSD
that you can afford to use for write testing which will likely destroy
every filesystem on it. Or let it run on just one logical volume.

   Can you send me a recommended job config you'd like me to run if the runs
   above haven't already answered your questions?
  
  [global]
 (...)
 
 I used this and just changed filename to /dev/sda. Since I'm reading
 from the beginning of the drive, reads have to be aligned.
 
  I won´t expect much of a difference, but then the random read performance
  is quite different between Ext4 and BTRFS on this disk. That would make
  it interesting to test without any filesystem in between and over the
  whole device.
 
 Here is the output:
 gandalfthegreat:~# fio --readonly ./fio.job3
 zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, 
 iodepth=1
 sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1
 2.0.8
 Starting 2 processes
 Jobs: 1 (f=1): [_R] [66.9% done] [966K/0K /s] [108 /0  iops] [eta 01m:00s] 
 zufälliglesen: (groupid=0, jobs=1): err= 0: pid=2172
   read : io=59036KB, bw=983.93KB/s, iops=108 , runt= 60002msec

WTF?

Hey, did you adapt the size= keyword? It seems fio 2.0.8 can do completely
without it.

Also I noticed that I had iodepth 1 in there to circumvent any in drive
cache / optimization.

 slat (usec): min=5 , max=158 , avg=27.62, stdev=10.64
 clat (usec): min=45 , max=27348 , avg=9150.78, stdev=4452.66
  lat (usec): min=53 , max=27370 , avg=9179.05, stdev=4454.88
 clat percentiles (usec):
  |  1.00th=[  126],  5.00th=[  235], 10.00th=[ 5216], 20.00th=[ 5920],
  | 30.00th=[ 5920], 40.00th=[ 5984], 50.00th=[ 7712], 60.00th=[12480],
  | 70.00th=[12608], 80.00th=[12736], 90.00th=[12864], 95.00th=[16768],
  | 99.00th=[18560], 99.50th=[18816], 99.90th=[20352], 99.95th=[22656],
  | 99.99th=[27264]
 bw (KB/s)  : min=  423, max= 5776, per=100.00%, avg=986.48, stdev=480.68
 lat (usec) : 50=0.11%, 100=0.64%, 250=4.47%, 500=1.65%, 750=0.02%
 lat (usec) : 1000=0.02%
 lat (msec) : 2=0.06%, 4=0.03%, 10=43.31%, 20=49.51%, 50=0.18%
   cpu  : usr=0.17%, sys=0.45%, ctx=6534, majf=0, minf=26
   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0%
  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 =64=0.0%
  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 =64=0.0%
  issued: total=r=6532/w=0/d=0, short=r=0/w=0/d=0

Latency is still way to high even with iodepth 1, 10 milliseconds for
43% of requests. And the throughput and IOPS is still abysmal even
for iodepth 1 (see below for Intel SSD 320 values).

Okay, one further idea: Remove the bsrange to just test with 4k blocks.

Additionally test these are aligned with

   blockalign=int[,int], ba=int[,int]
  At what boundary to align random IO offsets. Defaults to
  the  same  as  'blocksize'  the minimum blocksize given.
  Minimum alignment is typically 512b for using direct IO,
  though  it  usually  depends on the hardware block size.
  This option is mutually exclusive with  using  a  random
  map for files, so it will turn off that option.

I would first test with 4k blocks as is. And then do something like:

blocksize=4k
blockalign=4k

And then raise

blockalign to some values that may matter like 8k, 128k, 512k, 1m or so.

But thats just guess work. I do not even exactly now if it works this
way in fio.

There is something pretty wierd going on. But I am not sure what it is.
Maybe an alignment issue, since Ext4 with stripe alignment was able to
do so much faster on reads.

 sequentielllesen: (groupid=1, jobs=1): err= 0: pid=2199
   read : io=54658KB, bw=932798 B/s, iops=101 , runt= 60002msec

Ey, whats this?

 slat (usec): min=5 , max=140 , avg=28.63, stdev= 9.91
 clat (usec): min=39 , max=34210 , avg=9799.18, stdev=4471.32
  lat (usec): min=45 , max=34228 , avg=9828.50, stdev=4472.06
 clat percentiles (usec):
  |  1.00th=[   61],  5.00th=[ 5088], 10.00th=[ 5856], 20.00th=[ 5920],
  | 30.00th=[ 5984], 40.00th=[ 6048], 50.00th=[11840], 60.00th=[12608],
  | 70.00th=[12608], 80.00th=[12736], 90.00th=[16512], 95.00th=[17536],
  | 99.00th=[18816], 99.50th=[19584], 99.90th=[24960], 

Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-01 Thread Marc MERLIN
On Fri, Jul 27, 2012 at 11:42:39AM -0700, Marc MERLIN wrote:
  https://oss.oracle.com/~mason/latencytop.patch
 
 Thanks for the patch, and yes I can confirm I'm definitely not pegged on CPU 
 (not even close and I get the same problem with unencrypted filesystem, 
 actually
 du -sh is exactly the same speed on encrypted and unecrypted).
 
 Here's the result I think you were looking for. I'm not good at reading this,
 but hopefully it tells you something useful :)
 
 The full run is here if that helps:
 http://marc.merlins.org/tmp/latencytop.txt
 
I did some other tests since last week since my laptop is hard to use
considering how slow the SSD is.

(TL;DR: ntfs on linux via fuse is 33% faster than ext4, which is 2x faster
than btrfs, but 3x slower than the same filesystem on spinning disk :( )


Ok, just to help with debuggging this,
1)  I put my samsung 830 SSD into another thinkpad and it wasn't faster or
slower.

2) Then I put a crucial 256 C300 SSD (the replacement for the one I had that
just died and killed all my data), and du took 0.3 seconds on both my old
and new thinkpads.
The old thinkpad is running ubuntu 32bit the new one debian testing 64bit
both with kernel 3.4.4.

So, clearly, there is something wrong with the samsung 830 SSD with linux
but I have no clue what :(
In raw speed (dd) the samsung is faster than the crucial (350MB/s vs
500MB/s).
It it were a random crappy SSD from a random vendor, I'd blame the SSD, but
I have a hard time believing that samsung is selling SSDs that are slower
than hard drives at random IO and 'seeks'.

3) I just got a 2nd ssd from samsung (same kind), just to make sure the one
I had wasn't bad. It's brand new, and I formatted it carefully on 512
boundaries:
/dev/sda12048  502271  250112   83  Linux
/dev/sda2  50227252930559262141447  HPFS/NTFS/exFAT
/dev/sda3529305607390207910485760   82  Linux swap / Solaris
/dev/sda473902080  1000215215   463156568   83  Linux

I also upgraded to 3.5.0 in the meantime but unfortunately the results are
similar.

First: btrfs is the slowest:
gandalfthegreat:/mnt/ssd/var/local# time du -sh src/
514Msrc/
real0m25.741s
gandalfthegreat:/mnt/ssd/var/local# grep /mnt/ssd/var /proc/mounts 
/dev/mapper/ssd /mnt/ssd/var btrfs 
rw,noatime,compress=lzo,ssd,discard,space_cache 0 0


Second: ext4 is 2x faster than btrfs with mkfs.ext4 -O extent -b 4096 /dev/sda3
gandalfthegreat:/mnt/mnt3# reset_cache
gandalfthegreat:/mnt/mnt3# time du -sh src/
519Msrc/
real0m12.459s
gandalfthegreat:~# grep mnt3 /proc/mounts
/dev/sda3 /mnt/mnt3 ext4 rw,noatime,discard,data=ordered 0 0

Third, A freshly made ntfs filesystem through fuse is actually FASTER!
gandalfthegreat:/mnt/mnt2# reset_cache 
gandalfthegreat:/mnt/mnt2# time du -sh src/
506Msrc/
real0m8.928s
gandalfthegreat:/mnt/mnt2# grep mnt2 /proc/mounts
/dev/sda2 /mnt/mnt2 fuseblk 
rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096 0 0

How can ntfs via fuse be the fastest and btrfs so slow?
Of course, all 3 are slower than the same filesystem on spinning too, but
I'm wondering if there is a scheduling issue that is somehow causing the
extreme slowness I'm seeing.

Did the latencytop trace I got help in any way?

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-01 Thread Fajar A. Nugraha
On Wed, Aug 1, 2012 at 1:01 PM, Marc MERLIN m...@merlins.org wrote:

 So, clearly, there is something wrong with the samsung 830 SSD with linux


 It it were a random crappy SSD from a random vendor, I'd blame the SSD, but
 I have a hard time believing that samsung is selling SSDs that are slower
 than hard drives at random IO and 'seeks'.

You'd be surprised on how badly some vendors can screw up :)


 First: btrfs is the slowest:

 gandalfthegreat:/mnt/ssd/var/local# grep /mnt/ssd/var /proc/mounts
 /dev/mapper/ssd /mnt/ssd/var btrfs 
 rw,noatime,compress=lzo,ssd,discard,space_cache 0 0

Just checking, did you explicitly activate discard? Cause on my
setup (with corsair SSD) it made things MUCH slower. Also, try adding
noatime (just in case the slow down was because du cause many
access time updates)

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-01 Thread Marc MERLIN
On Wed, Aug 01, 2012 at 01:08:46PM +0700, Fajar A. Nugraha wrote:
  It it were a random crappy SSD from a random vendor, I'd blame the SSD, but
  I have a hard time believing that samsung is selling SSDs that are slower
  than hard drives at random IO and 'seeks'.
 
 You'd be surprised on how badly some vendors can screw up :)
 
At some point, it may come down to that indeed :-/
I'm still hopefully that Samsung didn't, but we'll see.
 
  First: btrfs is the slowest:
 
  gandalfthegreat:/mnt/ssd/var/local# grep /mnt/ssd/var /proc/mounts
  /dev/mapper/ssd /mnt/ssd/var btrfs 
  rw,noatime,compress=lzo,ssd,discard,space_cache 0 0
 
 Just checking, did you explicitly activate discard? Cause on my

Yes. Note that it should a noop when all you're doing is stating inodes and
not writing (I'm using noatime).

 setup (with corsair SSD) it made things MUCH slower. Also, try adding
 noatime (just in case the slow down was because du cause many
 access time updates)

I have noatime in there already :)

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-01 Thread Chris Samuel
On 01/08/12 16:01, Marc MERLIN wrote:

 Third, A freshly made ntfs filesystem through fuse is actually FASTER!

Could it be that Samsungs FTL has optimisations in it for NTFS ?

A horrible thought, but not impossible..

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-01 Thread Marc MERLIN
On Wed, Aug 01, 2012 at 04:36:22PM +1000, Chris Samuel wrote:
 On 01/08/12 16:01, Marc MERLIN wrote:
 
  Third, A freshly made ntfs filesystem through fuse is actually FASTER!
 
 Could it be that Samsungs FTL has optimisations in it for NTFS ?
 
 A horrible thought, but not impossible..

Not impossible, but it's can be the main reason since it clocks still 2x
slower with ntfs than a spinning disk with encrypted btrfs.
Since SSDs should seek 10-100x faster than spinning disks, that can't be
the only reason.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-08-01 Thread Martin Steigerwald
Hi Marc,

Am Mittwoch, 1. August 2012 schrieb Marc MERLIN:
 On Wed, Aug 01, 2012 at 01:08:46PM +0700, Fajar A. Nugraha wrote:
   It it were a random crappy SSD from a random vendor, I'd blame the
   SSD, but I have a hard time believing that samsung is selling SSDs
   that are slower than hard drives at random IO and 'seeks'.
  
  You'd be surprised on how badly some vendors can screw up :)
 
 At some point, it may come down to that indeed :-/
 I'm still hopefully that Samsung didn't, but we'll see.

Its getting quite strange.

I lost track of whether you did that already or not, but if you didn´t
please post some

vmstat 1

iostat -xd 1

on the device while it is being slow.

I am interested in wait I/O and latencies and disk utilization.


Comparison data of Intel SSD 320 in ThinkPad T520 during

merkaba:~ echo 3  /proc/sys/drop_caches ; du -sch /usr

on BTRFS with Kernel 3.5:


martin@merkaba:~ vmstat 1
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 2  1  21556 4442668   2056 50235200   19485  247  120 11  2 87  0
 1  2  21556 440   2448 51488400 11684   328 4975 24585  5 16 65 14
 1  0  21556 4389880   2448 52806000 13400 0 4574 23452  2 16 68 14
 3  1  21556 4370068   2448 54505200 18132 0 5499 27220  1 18 64 16
 1  0  21556 4350228   2448 8000 10856 0 4122 25339  3 16 67 14
 1  1  21556 4315604   2448 56975600 12648 0 4647 31153  5 14 66 15
 0  1  21556 4295652   2456 58148000 1154856 4093 24618  2 13 69 16
 0  1  21556 4286720   2456 59158000 10824 0 3750 21445  1 12 71 16
 0  1  21556 4266308   2456 60362000 12932 0 4841 26447  4 12 68 17
 1  0  21556 4248228   2456 61380800 10264 4 3703 22108  1 13 71 15
 5  1  21556 4231976   2456 62435600 10540 0 3581 20436  1 10 72 17
 0  1  21556 4197168   2456 63910800 12952 0 4738 28223  4 15 66 15
 4  1  21556 4178456   2456 65055200 11656 0 4234 23480  2 14 68 16
 0  1  21556 4163616   2456 66299200 13652 0 4619 26580  1 16 70 13
 4  1  21556 4138288   2456 67569600 13352 0 4422 22254  1 16 70 13
 1  0  21556 4113204   2456 68906000 13232 0 4312 21936  1 15 70 14
 0  1  21556 4085532   2456 70416000 14972 0 4820 24238  1 16 69 14
 2  0  21556 4055740   2456 71964400 15736 0 5099 25513  3 17 66 14
 0  1  21556 4028612   2456 73438000 14504 0 4795 25052  3 15 68 14
 2  0  21556 3999108   2456 74904000 1465616 4672 21878  1 17 69 13
 1  1  21556 3972732   2456 76210800 12972 0 4717 22411  1 17 70 13
 5  0  21556 3949684   2584 77348400 1152852 4837 24107  3 15 67 15
 1  0  21556 3912504   2584 78742000 12156 0 4883 25201  4 15 67 14


martin@merkaba:~ iostat -xd 1 /dev/sda
Linux 3.5.0-tp520 (merkaba) 01.08.2012  _x86_64_(4 CPU)

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda   1,29 1,44   11,58   12,78   684,74   299,7580,81 
0,249,860,95   17,93   0,29   0,71

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda   0,00 0,00 2808,000,00 11232,00 0,00 8,00 
0,570,210,210,00   0,19  54,50

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda   0,00 0,00 2967,000,00 11868,00 0,00 8,00 
0,630,210,210,00   0,21  60,90

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda   0,0011,00 2992,004,00 11968,0056,00 8,03 
0,640,220,220,25   0,21  62,00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda   0,00 0,00 2680,000,00 10720,00 0,00 8,00 
0,700,260,260,00   0,25  66,70

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda   0,00 0,00 3153,000,00 12612,00 0,00 8,00 
0,720,230,230,00   0,22  69,30

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda   0,00 0,00 2769,000,00 11076,00 0,00 8,00 
0,630,230,230,00   0,21  58,00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda   0,00 0,00 2523,001,00 10092,00 4,00 8,00 
0,740,290,290,00   0,28  

Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-07-27 Thread Chris Mason
On Mon, Jul 23, 2012 at 12:42:03AM -0600, Marc MERLIN wrote:
 
 22 seconds for 15K files on an SSD is super slow and being 5 times
 slower than a spinning disk with the same data.
 What's going on?

Hi Marc,

The easiest way to figure out is with latencytop.  I'd either run the
latencytop gui or use the latencytop -c patch which sends a text dump to
the console.

This is assuming that you're not pegged at 100% CPU...

https://oss.oracle.com/~mason/latencytop.patch

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-07-27 Thread Marc MERLIN
On Fri, Jul 27, 2012 at 07:08:35AM -0400, Chris Mason wrote:
 On Mon, Jul 23, 2012 at 12:42:03AM -0600, Marc MERLIN wrote:
  
  22 seconds for 15K files on an SSD is super slow and being 5 times
  slower than a spinning disk with the same data.
  What's going on?
 
 Hi Marc,
 
 The easiest way to figure out is with latencytop.  I'd either run the
 latencytop gui or use the latencytop -c patch which sends a text dump to
 the console.
 
 This is assuming that you're not pegged at 100% CPU...
 
 https://oss.oracle.com/~mason/latencytop.patch

Thanks for the patch, and yes I can confirm I'm definitely not pegged on CPU 
(not even close and I get the same problem with unencrypted filesystem, actually
du -sh is exactly the same speed on encrypted and unecrypted).

Here's the result I think you were looking for. I'm not good at reading this,
but hopefully it tells you something useful :)

The full run is here if that helps:
http://marc.merlins.org/tmp/latencytop.txt

Process du (6748) Total: 4280.5 msec
Reading directory content15.2 msec 11.0 %
sleep_on_page wait_on_page_bit read_extent_buffer_pages 
btree_read_extent_buffer_pages.constprop.110 read_tree_block 
read_block_for_search.isra.32 btrfs_next_leaf 
btrfs_real_readdir vfs_readdir sys_getdents 
system_call_fastpath 
[sleep_on_page]  13.5 msec 88.2 %
sleep_on_page wait_on_page_bit read_extent_buffer_pages 
btree_read_extent_buffer_pages.constprop.110 read_tree_block 
read_block_for_search.isra.32 btrfs_search_slot 
btrfs_lookup_csum __btrfs_lookup_bio_sums 
btrfs_lookup_bio_sums btrfs_submit_compressed_read 
btrfs_submit_bio_hook 
Page fault   12.9 msec  0.6 %
sleep_on_page_killable wait_on_page_bit_killable 
__lock_page_or_retry filemap_fault __do_fault handle_pte_fault 
handle_mm_fault do_page_fault page_fault 
Executing a program   7.1 msec  0.2 %
sleep_on_page_killable __lock_page_killable 
generic_file_aio_read do_sync_read vfs_read kernel_read 
prepare_binprm do_execve_common.isra.27 do_execve sys_execve 
stub_execve 

Process du (6748) Total: 9517.4 msec
[sleep_on_page]  23.0 msec 82.8 %
sleep_on_page wait_on_page_bit read_extent_buffer_pages 
btree_read_extent_buffer_pages.constprop.110 read_tree_block 
read_block_for_search.isra.32 btrfs_search_slot 
btrfs_lookup_inode btrfs_iget btrfs_lookup_dentry btrfs_lookup 
__lookup_hash 
Reading directory content13.2 msec 17.2 %
sleep_on_page wait_on_page_bit read_extent_buffer_pages 
btree_read_extent_buffer_pages.constprop.110 read_tree_block 
read_block_for_search.isra.32 btrfs_search_slot 
btrfs_real_readdir vfs_readdir sys_getdents 
system_call_fastpath 

Process du (6748) Total: 9524.0 msec
[sleep_on_page]  17.1 msec 88.5 %
sleep_on_page wait_on_page_bit read_extent_buffer_pages 
btree_read_extent_buffer_pages.constprop.110 read_tree_block 
read_block_for_search.isra.32 btrfs_search_slot 
btrfs_lookup_inode btrfs_iget btrfs_lookup_dentry btrfs_lookup 
__lookup_hash 
Reading directory content16.0 msec 11.5 %
sleep_on_page wait_on_page_bit read_extent_buffer_pages 
btree_read_extent_buffer_pages.constprop.110 read_tree_block 
read_block_for_search.isra.32 btrfs_search_slot 
btrfs_real_readdir vfs_readdir sys_getdents 
system_call_fastpath 

-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-07-26 Thread Marc MERLIN
On Tue, Jul 24, 2012 at 09:56:26AM +0200, Martin Steigerwald wrote:
 find is fast, du is much slower:
 
 merkaba:~ echo 3  /proc/sys/vm/drop_caches ; time ( find /usr | wc -l )
 404166
 ( find /usr | wc -l; )  0,03s user 0,07s system 1% cpu 9,212 total
 merkaba:~ echo 3  /proc/sys/vm/drop_caches ; time ( du -sh /usr )  
 11G /usr
 ( du -sh /usr; )  1,00s user 19,07s system 41% cpu 48,886 total
 
You're right.

gandalfthegreat [mc]# time du -sh src
514Msrc
real0m25.159s
gandalfthegreat [mc]# reset_cache 
gandalfthegreat [mc]# time bash -c find src | wc -l
15261
real0m7.614s

But find is still slower the du on my spinning disk for the same tree.
 
 Anyway thats still much faster than your measurements.
 
Right. Your numbers look reasonable for an SSD.

Since then, I did some more tests and I'm also getting slower than normal
speeds with ext4, an indications that it's a problem with the block layer.
I'm working with some folks try to pin down the core problem, but it looks
like it's not an easy one.

Thanks for your data.
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-07-24 Thread Marc MERLIN
On Sun, Jul 22, 2012 at 11:42:03PM -0700, Marc MERLIN wrote:
 I just realized that the older thread got a bit confusing, so I'll keep
 problems separate and make things simpler :)
 
Since yesterday, I tried other kernels, including noprempt, volprempt and
preempt for 3.4.4.
I also tried a default 3.2.0 kernel from debian (all amd64), but that did
not help. I'm still seeing close to 25 seconds to scan 15K files.

How can it possibly be so slow?
More importantly how I can provide useful debug information.

- I don't think it's a problem with the kernel since I tried 4 kernels,
  including a default debian one.

- Alignement seem ok, I made sure cylinders was divisible by 512:
/dev/sda2  5022725293055926214144   83  Linux

- I tried another brand new btrfs, and thing are even slower now.
gandalfthegreat:/mnt/mnt2# mount -o ssd,discard,noatime /dev/sda2 /mnt/mnt2
gandalfthegreat:/mnt/mnt2# reset_cache 
gandalfthegreat:/mnt/mnt2# time du -sh src/
514Msrc/
real0m29.584s
gandalfthegreat:/mnt/mnt2# find src/| wc -l
15261

This is bad enough that there ought to be a way to debug this, right?

Can you suggest something?

Thanks,
Marc

 On an _unencrypted_ partition on the SSD, running du -sh on a directory
 with 15K files, takes 23 seconds on unencrypted SSD and 4 secs on
 encrypted spinning drive, both with a similar btrfs filesystem, and 
 the same kernel (3.4.4).
 
 Unencrypted btrfs on SSD:
 gandalfthegreat:~# mount -o compress=lzo,discard,nossd,space_cache,noatime 
 /dev/sda2 /mnt/mnt2
 gandalfthegreat:/mnt/mnt2# echo 3  /proc/sys/vm/drop_caches; time du -sh src
 514M  src
 real  0m22.667s
 
 Encrypted btrfs on spinning drive of the same src directory:
 gandalfthegreat:/var/local# echo 3  /proc/sys/vm/drop_caches; time du -sh src
 514M  src
 real  0m3.881s
 
 I've run this many times and get the same numbers.
 I've tried deadline and noop on /dev/sda (the SSD) and du is just as slow.  
 
 I also tried with:
 - space_cache and nospace_cache
 - ssd and nossd
 - noatime didn't seem to help even though I was hopeful on this one.
 
 In all cases, I get:
 gandalfthegreat:/mnt/mnt2# echo 3  /proc/sys/vm/drop_caches; time du -sh src
 514M  src
 real  0m22.537s
 
 
 I'm having the same slow speed on 2 btrfs filesystems on the same SSD.
 One is encrypted, the other one isnt:
 Label: 'btrfs_pool1'  uuid: d570c40a-4a0b-4d03-b1c9-cff319fc224d
   Total devices 1 FS bytes used 144.74GB
   devid1 size 441.70GB used 195.04GB path /dev/dm-0
 
 Label: 'boot'  uuid: 84199644-3542-430a-8f18-a5aa58959662
   Total devices 1 FS bytes used 2.33GB
   devid1 size 25.00GB used 5.04GB path /dev/sda2
 
 If instead of stating a bunch of files, I try reading a big file, I do get 
 speeds
 that are quite fast (253MB/s and 423MB/s).
 
 22 seconds for 15K files on an SSD is super slow and being 5 times
 slower than a spinning disk with the same data.
 What's going on?
 
 Thanks,
 Marc

-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can btrfs take 23sec to stat 23K files from an SSD?

2012-07-24 Thread Martin Steigerwald
Am Montag, 23. Juli 2012 schrieb Marc MERLIN:
 I just realized that the older thread got a bit confusing, so I'll keep
 problems separate and make things simpler :)
 
 On an _unencrypted_ partition on the SSD, running du -sh on a directory
 with 15K files, takes 23 seconds on unencrypted SSD and 4 secs on
 encrypted spinning drive, both with a similar btrfs filesystem, and
 the same kernel (3.4.4).
 
 Unencrypted btrfs on SSD:
 gandalfthegreat:~# mount -o
 compress=lzo,discard,nossd,space_cache,noatime /dev/sda2 /mnt/mnt2
 gandalfthegreat:/mnt/mnt2# echo 3  /proc/sys/vm/drop_caches; time du
 -sh src 514M  src
 real  0m22.667s
 
 Encrypted btrfs on spinning drive of the same src directory:
 gandalfthegreat:/var/local# echo 3  /proc/sys/vm/drop_caches; time du
 -sh src 514M  src
 real  0m3.881s

find is fast, du is much slower:

merkaba:~ echo 3  /proc/sys/vm/drop_caches ; time ( find /usr | wc -l )
404166
( find /usr | wc -l; )  0,03s user 0,07s system 1% cpu 9,212 total
merkaba:~ echo 3  /proc/sys/vm/drop_caches ; time ( du -sh /usr )  
11G /usr
( du -sh /usr; )  1,00s user 19,07s system 41% cpu 48,886 total


Now I try to find something with less files.

merkaba:~ find /usr/share/doc | wc -l   
50715
merkaba:~ echo 3  /proc/sys/vm/drop_caches ; time ( find /usr/share/doc 
| wc -l )
50715
( find /usr/share/doc | wc -l; )  0,00s user 0,02s system 1% cpu 1,398 
total
merkaba:~ echo 3  /proc/sys/vm/drop_caches ; time ( du -sh 
/usr/share/doc )  
606M/usr/share/doc
( du -sh /usr/share/doc; )  0,20s user 3,63s system 35% cpu 10,691 total

merkaba:~ echo 3  /proc/sys/vm/drop_caches ; time du -sh /usr/share/doc   
606M/usr/share/doc
du -sh /usr/share/doc  0,19s user 3,54s system 35% cpu 10,386 total


Anyway thats still much faster than your measurements.



merkaba:~ df -hT /usr
DateisystemTyp   Größe Benutzt Verf. Verw% Eingehängt auf
/dev/dm-0  btrfs   19G 11G  5,6G   67% /
merkaba:~ btrfs fi sh  
 
failed to read /dev/sr0
Label: 'debian'  uuid: […]
Total devices 1 FS bytes used 10.25GB
devid1 size 18.62GB used 18.62GB path /dev/dm-0

Btrfs Btrfs v0.19
merkaba:~ btrfs fi df /   
Data: total=15.10GB, used=9.59GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.75GB, used=670.43MB
Metadata: total=8.00MB, used=0.00


merkaba:~ grep btrfs /proc/mounts
/dev/dm-0 / btrfs rw,noatime,compress=lzo,ssd,space_cache,inode_cache 0 0


Somewhat aged BTRFS filesystem on ThinkPad T520, Intel SSD 320, kernel 
3.5.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How can btrfs take 23sec to stat 23K files from an SSD?

2012-07-23 Thread Marc MERLIN
I just realized that the older thread got a bit confusing, so I'll keep
problems separate and make things simpler :)

On an _unencrypted_ partition on the SSD, running du -sh on a directory
with 15K files, takes 23 seconds on unencrypted SSD and 4 secs on
encrypted spinning drive, both with a similar btrfs filesystem, and 
the same kernel (3.4.4).

Unencrypted btrfs on SSD:
gandalfthegreat:~# mount -o compress=lzo,discard,nossd,space_cache,noatime 
/dev/sda2 /mnt/mnt2
gandalfthegreat:/mnt/mnt2# echo 3  /proc/sys/vm/drop_caches; time du -sh src
514Msrc
real0m22.667s

Encrypted btrfs on spinning drive of the same src directory:
gandalfthegreat:/var/local# echo 3  /proc/sys/vm/drop_caches; time du -sh src
514Msrc
real0m3.881s

I've run this many times and get the same numbers.
I've tried deadline and noop on /dev/sda (the SSD) and du is just as slow.  

I also tried with:
- space_cache and nospace_cache
- ssd and nossd
- noatime didn't seem to help even though I was hopeful on this one.

In all cases, I get:
gandalfthegreat:/mnt/mnt2# echo 3  /proc/sys/vm/drop_caches; time du -sh src
514Msrc
real0m22.537s


I'm having the same slow speed on 2 btrfs filesystems on the same SSD.
One is encrypted, the other one isnt:
Label: 'btrfs_pool1'  uuid: d570c40a-4a0b-4d03-b1c9-cff319fc224d
Total devices 1 FS bytes used 144.74GB
devid1 size 441.70GB used 195.04GB path /dev/dm-0

Label: 'boot'  uuid: 84199644-3542-430a-8f18-a5aa58959662
Total devices 1 FS bytes used 2.33GB
devid1 size 25.00GB used 5.04GB path /dev/sda2

If instead of stating a bunch of files, I try reading a big file, I do get 
speeds
that are quite fast (253MB/s and 423MB/s).

22 seconds for 15K files on an SSD is super slow and being 5 times
slower than a spinning disk with the same data.
What's going on?

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html