Re: How can btrfs take 23sec to stat 23K files from an SSD?
To close this thread. I gave up on the drive after it showed abysmal benchmarking results in windows too. I owed everyone an update, which I just finished typing: http://marc.merlins.org/perso/linux/post_2012-08-15_The-tale-of-SSDs_-Crucial-C300-early-Death_-Samsung-830-extreme-random-IO-slowness_-and-settling-with-OCZ-Vertex-4.html I'm not sure how I could have gotten 2 bad drives from Samsung in 2 different shipments, so I'm afraid the entire line may be bad. At least, it was for me after extensive benchmarking, and even using their own windows benchmarking tool. In the end, I got a OCZ Vertex 4 and it's superfast as per the benchmarks I posted in the link above. What is interesting is that ext4 is somewhat faster than btrfs in the tests I posted above, but not in ways that should be truly worrisome. I'm still happy to have COW and snapshots, even if that costs me performance. Sorry for spamming the list with what apparently just ended up a crappy drive (well, 2 since I got two just to make sure) from a vendor who shouldn't have been selling crappy SSDs. Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How can btrfs take 23sec to stat 23K files from an SSD?
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN: I'll try plugging this SSD in a totally different PC and see what happens. This may say if it's an AHCI/intel sata driver problem. Seems we will continue until someone starts to complain here. Maybe another list will be more approbiate? But then this thread has it all in one ;). Adding a CC with some introductory note might be approbiate. Its your problem, so you decide ;). I´d suggest the fio mailing list, there are other performance people how may want to chime in. Actually you know the lists and people involved more than me. I'd be happy if you added a Cc to another list you think is best, and we can move there. I have no more ideas for the moment. I suggest you to try the fio mailing list at fio lalala vger.kernel.org. I currently have no ideas for other lists. I bet there is some block layer / libata / scsi list that might match. Just browse the lists from kernel.org and pick the one that sounds most suitable for me¹. Probably fsdevel but since the thing does not seem to be filesystem related beside some alignment effect (stripe in Ext4). Hmmm, probably linux-scsi, but look there first, whether block layer and libata related things are discussed there. I would start a new thread. Tell the problem, tell in summary what you have tried and the outcome of that was and ask for advice. Add a link to the original thread here. (It might be a good idea to have one more post here with a link to the new thread so that other curious people can follow there – in case there are any. Otherwise I would end it on this thread. The thread view already looks ridiculous in KMail here ;-) And put me on Cc ;). I´d really like to know whats going on here. As written I have no more ideas right now. It seems to be getting to low level for me. [1] http://vger.kernel.org/vger-lists.html Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How can btrfs take 23sec to stat 23K files from an SSD?
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN: On Wed, Aug 01, 2012 at 11:57:39PM +0200, Martin Steigerwald wrote: Its getting quite strange. I would agree :) Before I paste a bunch of thing, I wanted to thank you for not giving up on me and offering your time to help me figure this out :) You are welcome. Well I am holding Linux performance analysis tuning trainings and I am really interested into issues like this ;) I will take care of myself and I take my time to respond or even do not respond at all anymore if I run out of ideas ;). I lost track of whether you did that already or not, but if you didn´t please post some vmstat 1 iostat -xd 1 on the device while it is being slow. Sure thing, here's the 24 second du -s run: procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 2 1 0 2747264 44 348388002850 242 184 19 6 74 1 1 0 0 2744128 44 35170000 144 0 2758 32115 30 5 61 4 2 1 0 2743100 44 35199200 792 0 2616 30613 28 4 50 18 1 1 0 2741592 44 35266800 776 0 2574 31551 29 4 45 21 1 1 0 2740720 44 35343200 692 0 2734 32891 30 4 45 22 1 1 0 2740104 44 35428400 460 0 2639 31585 30 4 45 21 3 1 0 2738520 44 35469200 544 264 2834 30302 32 5 42 21 1 1 0 2736936 44 35547600 1064 2012 2867 31172 28 4 45 23 A bit more wait I/O with not even 10% of the throughput as compared to the Intel SSD 320 figures. Seems that Intel SSD is running circles around your Samsung SSD while not – as expected for that use case – being fully utilized. Linux 3.5.0-amd64-preempt-noide-20120410 (gandalfthegreat)08/01/2012 _x86_64_(4 CPU) rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util 2.18 0.686.45 17.7778.12 153.3919.12 0.187.51 8.527.15 4.46 10.81 0.00 0.00 118.000.00 540.00 0.00 9.15 1.189.93 9.930.00 4.98 58.80 0.00 0.00 217.000.00 868.00 0.00 8.00 1.908.77 8.770.00 4.44 96.40 0.00 0.00 192.000.00 768.00 0.00 8.00 1.638.44 8.440.00 5.10 98.00 0.00 0.00 119.000.00 476.00 0.00 8.00 1.069.01 9.010.00 8.20 97.60 0.00 0.00 125.000.00 500.00 0.00 8.00 1.088.67 8.670.00 7.55 94.40 0.00 0.00 165.000.00 660.00 0.00 8.00 1.509.12 9.120.00 5.87 96.80 0.00 0.00 195.00 13.00 780.00 272.0010.12 1.688.10 7.94 10.46 4.65 96.80 0.00 0.00 173.000.00 692.00 0.00 8.00 1.729.87 9.870.00 5.71 98.80 0.00 0.00 171.000.00 684.00 0.00 8.00 1.629.33 9.330.00 5.75 98.40 0.00 0.00 161.000.00 644.00 0.00 8.00 1.529.57 9.570.00 6.14 98.80 0.00 0.00 136.000.00 544.00 0.00 8.00 1.269.29 9.290.00 7.24 98.40 0.00 0.00 199.000.00 796.00 0.00 8.00 1.949.73 9.730.00 4.94 98.40 0.00 0.00 201.000.00 804.00 0.00 8.00 1.708.54 8.540.00 4.80 96.40 0.00 0.00 272.00 15.00 1088.00 272.00 9.48 2.358.21 8.463.73 3.39 97.20 […] I am interested in wait I/O and latencies and disk utilization. Cool tool, I didn't know about iostat. My r_await numbers don't look good obviously and yet %util is pretty much 100% the entire time. Does that show that it's indeed the device that is unable to deliver the requests any quicker, despite being an ssd, or are you reading this differently? That, or… Also I am interested in merkaba:~ hdparm -I /dev/sda | grep -i queue Queue depth: 32 *Native Command Queueing (NCQ) output for your SSD. gandalfthegreat:/var/local# hdparm -I /dev/sda | grep -i queue Queue depth: 32 *Native Command Queueing (NCQ) gandalfthegreat:/var/local# I've the the fio tests in: /dev/mapper/cryptroot /var btrfs rw,noatime,compress=lzo,nossd,discard,space_cache 0 0 … you are still using dm_crypt? Please test without dm_crypt. My figures are from within LVM, but no dm_crypt. Its good to have a comparable base for the measurements. (discard is there, so fstrim shouldn't be needed) I can´t imagine why it should matter, but maybe its worth having some tests without „discard“. I also suggest to use fio with with the ssd-test example on the SSD. I have some comparison data available for my setup. Heck it should be publicly available
Re: How can btrfs take 23sec to stat 23K files from an SSD?
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN: So, doctor, is it bad? :) randomwrite: (g=0): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 sequentialwrite: (g=1): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 randomread: (g=2): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 sequentialread: (g=3): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 2.0.8 Starting 4 processes randomwrite: Laying out IO file(s) (1 file(s) / 2048MB) Jobs: 1 (f=1): [___R] [100.0% done] [558.8M/0K /s] [63.8K/0 iops] [eta 00m:00s] randomwrite: (groupid=0, jobs=1): err= 0: pid=7193 write: io=102048KB, bw=1700.8KB/s, iops=189 , runt= 60003msec slat (usec): min=21 , max=219834 , avg=5250.91, stdev=5936.55 clat (usec): min=25 , max=738932 , avg=329339.45, stdev=106004.63 lat (msec): min=4 , max=751 , avg=334.59, stdev=107.57 clat percentiles (msec): Heck, I didn´t look at the IOPS figure! 189 IOPS for a SATA-600 SSD. Thats pathetic. So again, please test this without dm_crypt. I can´t believe that this is the maximum the hardware is able to achieve. A really fast 15000 rpm SAS harddisk might top that. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How can btrfs take 23sec to stat 23K files from an SSD?
On Thu, Aug 02, 2012 at 01:18:07PM +0200, Martin Steigerwald wrote: I've the the fio tests in: /dev/mapper/cryptroot /var btrfs rw,noatime,compress=lzo,nossd,discard,space_cache 0 0 … you are still using dm_crypt? That was my biggest partition and so far I've found no performance impact on file access between unencrypted and dm_crypt. I just took out my swap partition and made a smaller btrfs there: /dev/sda3 /mnt/mnt3 btrfs rw,noatime,ssd,space_cache 0 0 I mounted without discard. lat (usec) : 50=0.01% lat (msec) : 10=0.02%, 20=0.02%, 50=0.05%, 100=0.14%, 250=12.89% lat (msec) : 500=72.44%, 750=14.43% Gosh, look at these latencies! 72,44% of all requests above 500 (in words: five hundred) milliseconds! And 14,43% above 750 msecs. The percentage of requests served at 100 msecs or less is was below one percent! Hey, is this an SSD or what? Yeah, that's kind of what I've been complaining about since the beginning :) Once I'm reading sequentially, it goes fast, but random access/latency is indeed abysmal. Still even with iodepth 64 totally different picture. And look at the IOPS and throughput. Yep. I know mine are bad :( For reference, this refers to [global] ioengine=libaio direct=1 iodepth=64 Since it's slightly different than the first job file you gave me, I re-ran with this one this time. gandalfthegreat:~# /sbin/mkfs.btrfs -L test /dev/sda2 gandalfthegreat:~# mount -o noatime /dev/sda2 /mnt/mnt2 gandalfthegreat:~# grep sda2 /proc/mounts /dev/sda2 /mnt/mnt2 btrfs rw,noatime,ssd,space_cache 0 0 here's the btrfs test (ext4 is lower down): gandalfthegreat:/mnt/mnt2# fio ~/fio.job2 zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 zufälligschreiben: (g=2): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 sequentiellschreiben: (g=3): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 2.0.8 Starting 4 processes zufälliglesen: Laying out IO file(s) (1 file(s) / 2048MB) sequentielllesen: Laying out IO file(s) (1 file(s) / 2048MB) zufälligschreiben: Laying out IO file(s) (1 file(s) / 2048MB) sequentiellschreiben: Laying out IO file(s) (1 file(s) / 2048MB) Jobs: 1 (f=1): [___W] [59.5% done] [0K/1800K /s] [0 /193 iops] [eta 02m:10s] zufälliglesen: (groupid=0, jobs=1): err= 0: pid=30318 read : io=73682KB, bw=1227.1KB/s, iops=137 , runt= 60004msec slat (usec): min=3 , max=37432 , avg=7252.52, stdev=5717.70 clat (usec): min=13 , max=981927 , avg=454046.13, stdev=110527.92 lat (msec): min=5 , max=999 , avg=461.30, stdev=112.00 clat percentiles (msec): | 1.00th=[ 145], 5.00th=[ 269], 10.00th=[ 371], 20.00th=[ 408], | 30.00th=[ 424], 40.00th=[ 437], 50.00th=[ 449], 60.00th=[ 457], | 70.00th=[ 474], 80.00th=[ 490], 90.00th=[ 570], 95.00th=[ 644], | 99.00th=[ 865], 99.50th=[ 922], 99.90th=[ 963], 99.95th=[ 979], | 99.99th=[ 979] bw (KB/s) : min=8, max= 2807, per=100.00%, avg=1227.75, stdev=317.57 lat (usec) : 20=0.01% lat (msec) : 10=0.01%, 20=0.02%, 50=0.04%, 100=0.46%, 250=3.82% lat (msec) : 500=79.48%, 750=13.57%, 1000=2.58% cpu : usr=0.12%, sys=1.13%, ctx=12186, majf=0, minf=276 IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, =64=99.2% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, =64=0.0% issued: total=r=8262/w=0/d=0, short=r=0/w=0/d=0 sequentielllesen: (groupid=1, jobs=1): err= 0: pid=30340 read : io=2048.0MB, bw=211257KB/s, iops=23473 , runt= 9927msec slat (usec): min=1 , max=56321 , avg=20.51, stdev=424.44 clat (usec): min=0 , max=57987 , avg=2695.98, stdev=6624.00 lat (usec): min=1 , max=58015 , avg=2716.75, stdev=6642.09 clat percentiles (usec): | 1.00th=[1], 5.00th=[ 10], 10.00th=[ 30], 20.00th=[ 100], | 30.00th=[ 217], 40.00th=[ 362], 50.00th=[ 494], 60.00th=[ 636], | 70.00th=[ 892], 80.00th=[ 1656], 90.00th=[ 7392], 95.00th=[21632], | 99.00th=[29056], 99.50th=[29568], 99.90th=[43776], 99.95th=[46848], | 99.99th=[57600] bw (KB/s) : min=166675, max=260984, per=99.83%, avg=210892.26, stdev=22433.65 lat (usec) : 2=2.16%, 4=0.43%, 10=2.33%, 20=2.80%, 50=5.72% lat (usec) : 100=6.47%, 250=12.35%, 500=18.29%, 750=15.52%, 1000=5.44% lat (msec) : 2=13.04%, 4=3.59%, 10=3.83%, 20=1.88%, 50=6.11% lat (msec) : 100=0.04% cpu : usr=4.51%, sys=35.70%, ctx=11480, majf=0, minf=278 IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, =64=100.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, =64=0.0% issued: total=r=233025/w=0/d=0, short=r=0/w=0/d=0 zufälligschreiben: (groupid=2, jobs=1): err= 0:
Re: How can btrfs take 23sec to stat 23K files from an SSD?
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN: On Thu, Aug 02, 2012 at 01:18:07PM +0200, Martin Steigerwald wrote: I've the the fio tests in: /dev/mapper/cryptroot /var btrfs rw,noatime,compress=lzo,nossd,discard,space_cache 0 0 … you are still using dm_crypt? […] I just took out my swap partition and made a smaller btrfs there: /dev/sda3 /mnt/mnt3 btrfs rw,noatime,ssd,space_cache 0 0 I mounted without discard. […] For reference, this refers to [global] ioengine=libaio direct=1 iodepth=64 Since it's slightly different than the first job file you gave me, I re-ran with this one this time. gandalfthegreat:~# /sbin/mkfs.btrfs -L test /dev/sda2 gandalfthegreat:~# mount -o noatime /dev/sda2 /mnt/mnt2 gandalfthegreat:~# grep sda2 /proc/mounts /dev/sda2 /mnt/mnt2 btrfs rw,noatime,ssd,space_cache 0 0 here's the btrfs test (ext4 is lower down): gandalfthegreat:/mnt/mnt2# fio ~/fio.job2 zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 zufälligschreiben: (g=2): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 sequentiellschreiben: (g=3): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 2.0.8 Still abysmal except for sequential reads. Starting 4 processes zufälliglesen: Laying out IO file(s) (1 file(s) / 2048MB) sequentielllesen: Laying out IO file(s) (1 file(s) / 2048MB) zufälligschreiben: Laying out IO file(s) (1 file(s) / 2048MB) sequentiellschreiben: Laying out IO file(s) (1 file(s) / 2048MB) Jobs: 1 (f=1): [___W] [59.5% done] [0K/1800K /s] [0 /193 iops] [eta 02m:10s] zufälliglesen: (groupid=0, jobs=1): err= 0: pid=30318 read : io=73682KB, bw=1227.1KB/s, iops=137 , runt= 60004msec […] lat (usec) : 20=0.01% lat (msec) : 10=0.01%, 20=0.02%, 50=0.04%, 100=0.46%, 250=3.82% lat (msec) : 500=79.48%, 750=13.57%, 1000=2.58% 1 second latency? Oh well… Run status group 3 (all jobs): WRITE: io=84902KB, aggrb=1414KB/s, minb=1414KB/s, maxb=1414KB/s, mint=60005msec, maxt=60005msec gandalfthegreat:/mnt/mnt2# (no lines beyond this, fio 2.0.8) […] Can you also post the last lines: Disk stats (read/write): dm-2: ios=616191/613142, merge=0/0, ticks=1300820/2565384, in_queue=3867448, util=98.81%, aggrios=504829/504643, aggrmerge=111362/111451, aggrticks=1058320/2164664, aggrin_queue=3223048, aggrutil=98.78% sda: ios=504829/504643, merge=111362/111451, ticks=1058320/2164664, in_queue=3223048, util=98.78% martin@merkaba:~/Artikel/LinuxNewMedia/fio/Recherche/Messungen/merkaba I didn't get these lines. Hmmm, back then I guess this was fio 1.5.9 or so. I tried deadline and noop, and indeed I'm not see much of a difference for my basic tests. Fro now I have deadline. So my recommendation of now: Remove as much factors as possible and in order to compare results with what I posted try with plain logical volume with Ext4. gandalfthegreat:~# mkfs.ext4 -O extent -b 4096 -E stride=128,stripe-width=128 /dev/sda2 /dev/sda2 /mnt/mnt2 ext4 rw,noatime,stripe=128,data=ordered 0 0 gandalfthegreat:/mnt/mnt2# fio ~/fio.job2 zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 zufälligschreiben: (g=2): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 sequentiellschreiben: (g=3): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64 2.0.8 Starting 4 processes zufälliglesen: Laying out IO file(s) (1 file(s) / 2048MB) sequentielllesen: Laying out IO file(s) (1 file(s) / 2048MB) zufälligschreiben: Laying out IO file(s) (1 file(s) / 2048MB) sequentiellschreiben: Laying out IO file(s) (1 file(s) / 2048MB) Jobs: 1 (f=1): [___W] [63.8% done] [0K/2526K /s] [0 /280 iops] [eta 01m:21s] zufälliglesen: (groupid=0, jobs=1): err= 0: pid=30077 read : io=2048.0MB, bw=276232KB/s, iops=50472 , runt= 7592msec slat (usec): min=2 , max=2276 , avg= 6.87, stdev=12.01 clat (usec): min=249 , max=52128 , avg=1258.87, stdev=1714.63 lat (usec): min=260 , max=52134 , avg=1266.00, stdev=1715.36 clat percentiles (usec): | 1.00th=[ 450], 5.00th=[ 548], 10.00th=[ 620], 20.00th=[ 724], | 30.00th=[ 820], 40.00th=[ 908], 50.00th=[ 1004], 60.00th=[ 1096], | 70.00th=[ 1208], 80.00th=[ 1368], 90.00th=[ 1640], 95.00th=[ 2040], | 99.00th=[ 8256], 99.50th=[14912], 99.90th=[21120], 99.95th=[23168], | 99.99th=[33024] bw (KB/s) : min=76463, max=385328, per=100.00%, avg=277313.20, stdev=94661.29 lat (usec) : 250=0.01%, 500=2.46%, 750=19.82%, 1000=27.70% lat (msec) : 2=44.79%, 4=3.55%, 10=0.72%, 20=0.78%, 50=0.17% lat (msec) : 100=0.01% cpu : usr=11.91%, sys=51.64%, ctx=91337, majf=0, minf=277 IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
Re: How can btrfs take 23sec to stat 23K files from an SSD?
On Thu, Aug 02, 2012 at 10:20:07PM +0200, Martin Steigerwald wrote: Hey, whats this? With Ext4 you have really good random read performance now! Way better than the Intel SSD 320 and… Yep, my du -sh tests do show that ext4 is 2x faster than btrfs. Obviously it's sending IO in a way that either the IO subsystem, linux driver, or drive prefer. The performance of your Samsung SSD seems to be quite erratic. It seems that the device is capable of being fast, but only sometimes shows this capability. That's indeed exactly what I'm seeing in real life :) Have the IOPS run on the device it self. That will remove any filesystem layer. But only the read only tests, to make sure I suggest to use fio with the --readonly option as safety guard. Unless you have a spare SSD that you can afford to use for write testing which will likely destroy every filesystem on it. Or let it run on just one logical volume. Can you send me a recommended job config you'd like me to run if the runs above haven't already answered your questions? [global] (...) I used this and just changed filename to /dev/sda. Since I'm reading from the beginning of the drive, reads have to be aligned. I won´t expect much of a difference, but then the random read performance is quite different between Ext4 and BTRFS on this disk. That would make it interesting to test without any filesystem in between and over the whole device. Here is the output: gandalfthegreat:~# fio --readonly ./fio.job3 zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1 sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1 2.0.8 Starting 2 processes Jobs: 1 (f=1): [_R] [66.9% done] [966K/0K /s] [108 /0 iops] [eta 01m:00s] zufälliglesen: (groupid=0, jobs=1): err= 0: pid=2172 read : io=59036KB, bw=983.93KB/s, iops=108 , runt= 60002msec slat (usec): min=5 , max=158 , avg=27.62, stdev=10.64 clat (usec): min=45 , max=27348 , avg=9150.78, stdev=4452.66 lat (usec): min=53 , max=27370 , avg=9179.05, stdev=4454.88 clat percentiles (usec): | 1.00th=[ 126], 5.00th=[ 235], 10.00th=[ 5216], 20.00th=[ 5920], | 30.00th=[ 5920], 40.00th=[ 5984], 50.00th=[ 7712], 60.00th=[12480], | 70.00th=[12608], 80.00th=[12736], 90.00th=[12864], 95.00th=[16768], | 99.00th=[18560], 99.50th=[18816], 99.90th=[20352], 99.95th=[22656], | 99.99th=[27264] bw (KB/s) : min= 423, max= 5776, per=100.00%, avg=986.48, stdev=480.68 lat (usec) : 50=0.11%, 100=0.64%, 250=4.47%, 500=1.65%, 750=0.02% lat (usec) : 1000=0.02% lat (msec) : 2=0.06%, 4=0.03%, 10=43.31%, 20=49.51%, 50=0.18% cpu : usr=0.17%, sys=0.45%, ctx=6534, majf=0, minf=26 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued: total=r=6532/w=0/d=0, short=r=0/w=0/d=0 sequentielllesen: (groupid=1, jobs=1): err= 0: pid=2199 read : io=54658KB, bw=932798 B/s, iops=101 , runt= 60002msec slat (usec): min=5 , max=140 , avg=28.63, stdev= 9.91 clat (usec): min=39 , max=34210 , avg=9799.18, stdev=4471.32 lat (usec): min=45 , max=34228 , avg=9828.50, stdev=4472.06 clat percentiles (usec): | 1.00th=[ 61], 5.00th=[ 5088], 10.00th=[ 5856], 20.00th=[ 5920], | 30.00th=[ 5984], 40.00th=[ 6048], 50.00th=[11840], 60.00th=[12608], | 70.00th=[12608], 80.00th=[12736], 90.00th=[16512], 95.00th=[17536], | 99.00th=[18816], 99.50th=[19584], 99.90th=[24960], 99.95th=[29568], | 99.99th=[34048] bw (KB/s) : min= 405, max= 2680, per=100.00%, avg=912.92, stdev=261.62 lat (usec) : 50=0.41%, 100=1.77%, 250=1.20%, 500=0.23%, 750=0.02% lat (usec) : 1000=0.03% lat (msec) : 2=0.02%, 10=43.06%, 20=52.91%, 50=0.36% cpu : usr=0.15%, sys=0.45%, ctx=6103, majf=0, minf=28 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued: total=r=6101/w=0/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): READ: io=59036KB, aggrb=983KB/s, minb=983KB/s, maxb=983KB/s, mint=60002msec, maxt=60002msec Run status group 1 (all jobs): READ: io=54658KB, aggrb=910KB/s, minb=910KB/s, maxb=910KB/s, mint=60002msec, maxt=60002msec Disk stats (read/write): sda: ios=12660/2072, merge=5/34, ticks=119452/22496, in_queue=141936, util=99.30% … or get yourself another SSD. Its your decision. I admire your endurance. ;) Since I've gotten 2 SSDs to make sure I didn't get one bad one, and that the company is greating great reviews for them, I'm now pretty sure that it's either a problem with a linux driver, which is interesting for us all to debug :) If I go buy another brand, the next guy
Re: How can btrfs take 23sec to stat 23K files from an SSD?
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN: On Thu, Aug 02, 2012 at 10:20:07PM +0200, Martin Steigerwald wrote: Hey, whats this? With Ext4 you have really good random read performance now! Way better than the Intel SSD 320 and… Yep, my du -sh tests do show that ext4 is 2x faster than btrfs. Obviously it's sending IO in a way that either the IO subsystem, linux driver, or drive prefer. But only on reads. Have the IOPS run on the device it self. That will remove any filesystem layer. But only the read only tests, to make sure I suggest to use fio with the --readonly option as safety guard. Unless you have a spare SSD that you can afford to use for write testing which will likely destroy every filesystem on it. Or let it run on just one logical volume. Can you send me a recommended job config you'd like me to run if the runs above haven't already answered your questions? [global] (...) I used this and just changed filename to /dev/sda. Since I'm reading from the beginning of the drive, reads have to be aligned. I won´t expect much of a difference, but then the random read performance is quite different between Ext4 and BTRFS on this disk. That would make it interesting to test without any filesystem in between and over the whole device. Here is the output: gandalfthegreat:~# fio --readonly ./fio.job3 zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1 sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1 2.0.8 Starting 2 processes Jobs: 1 (f=1): [_R] [66.9% done] [966K/0K /s] [108 /0 iops] [eta 01m:00s] zufälliglesen: (groupid=0, jobs=1): err= 0: pid=2172 read : io=59036KB, bw=983.93KB/s, iops=108 , runt= 60002msec WTF? Hey, did you adapt the size= keyword? It seems fio 2.0.8 can do completely without it. Also I noticed that I had iodepth 1 in there to circumvent any in drive cache / optimization. slat (usec): min=5 , max=158 , avg=27.62, stdev=10.64 clat (usec): min=45 , max=27348 , avg=9150.78, stdev=4452.66 lat (usec): min=53 , max=27370 , avg=9179.05, stdev=4454.88 clat percentiles (usec): | 1.00th=[ 126], 5.00th=[ 235], 10.00th=[ 5216], 20.00th=[ 5920], | 30.00th=[ 5920], 40.00th=[ 5984], 50.00th=[ 7712], 60.00th=[12480], | 70.00th=[12608], 80.00th=[12736], 90.00th=[12864], 95.00th=[16768], | 99.00th=[18560], 99.50th=[18816], 99.90th=[20352], 99.95th=[22656], | 99.99th=[27264] bw (KB/s) : min= 423, max= 5776, per=100.00%, avg=986.48, stdev=480.68 lat (usec) : 50=0.11%, 100=0.64%, 250=4.47%, 500=1.65%, 750=0.02% lat (usec) : 1000=0.02% lat (msec) : 2=0.06%, 4=0.03%, 10=43.31%, 20=49.51%, 50=0.18% cpu : usr=0.17%, sys=0.45%, ctx=6534, majf=0, minf=26 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued: total=r=6532/w=0/d=0, short=r=0/w=0/d=0 Latency is still way to high even with iodepth 1, 10 milliseconds for 43% of requests. And the throughput and IOPS is still abysmal even for iodepth 1 (see below for Intel SSD 320 values). Okay, one further idea: Remove the bsrange to just test with 4k blocks. Additionally test these are aligned with blockalign=int[,int], ba=int[,int] At what boundary to align random IO offsets. Defaults to the same as 'blocksize' the minimum blocksize given. Minimum alignment is typically 512b for using direct IO, though it usually depends on the hardware block size. This option is mutually exclusive with using a random map for files, so it will turn off that option. I would first test with 4k blocks as is. And then do something like: blocksize=4k blockalign=4k And then raise blockalign to some values that may matter like 8k, 128k, 512k, 1m or so. But thats just guess work. I do not even exactly now if it works this way in fio. There is something pretty wierd going on. But I am not sure what it is. Maybe an alignment issue, since Ext4 with stripe alignment was able to do so much faster on reads. sequentielllesen: (groupid=1, jobs=1): err= 0: pid=2199 read : io=54658KB, bw=932798 B/s, iops=101 , runt= 60002msec Ey, whats this? slat (usec): min=5 , max=140 , avg=28.63, stdev= 9.91 clat (usec): min=39 , max=34210 , avg=9799.18, stdev=4471.32 lat (usec): min=45 , max=34228 , avg=9828.50, stdev=4472.06 clat percentiles (usec): | 1.00th=[ 61], 5.00th=[ 5088], 10.00th=[ 5856], 20.00th=[ 5920], | 30.00th=[ 5984], 40.00th=[ 6048], 50.00th=[11840], 60.00th=[12608], | 70.00th=[12608], 80.00th=[12736], 90.00th=[16512], 95.00th=[17536], | 99.00th=[18816], 99.50th=[19584], 99.90th=[24960],
Re: How can btrfs take 23sec to stat 23K files from an SSD?
On Fri, Jul 27, 2012 at 11:42:39AM -0700, Marc MERLIN wrote: https://oss.oracle.com/~mason/latencytop.patch Thanks for the patch, and yes I can confirm I'm definitely not pegged on CPU (not even close and I get the same problem with unencrypted filesystem, actually du -sh is exactly the same speed on encrypted and unecrypted). Here's the result I think you were looking for. I'm not good at reading this, but hopefully it tells you something useful :) The full run is here if that helps: http://marc.merlins.org/tmp/latencytop.txt I did some other tests since last week since my laptop is hard to use considering how slow the SSD is. (TL;DR: ntfs on linux via fuse is 33% faster than ext4, which is 2x faster than btrfs, but 3x slower than the same filesystem on spinning disk :( ) Ok, just to help with debuggging this, 1) I put my samsung 830 SSD into another thinkpad and it wasn't faster or slower. 2) Then I put a crucial 256 C300 SSD (the replacement for the one I had that just died and killed all my data), and du took 0.3 seconds on both my old and new thinkpads. The old thinkpad is running ubuntu 32bit the new one debian testing 64bit both with kernel 3.4.4. So, clearly, there is something wrong with the samsung 830 SSD with linux but I have no clue what :( In raw speed (dd) the samsung is faster than the crucial (350MB/s vs 500MB/s). It it were a random crappy SSD from a random vendor, I'd blame the SSD, but I have a hard time believing that samsung is selling SSDs that are slower than hard drives at random IO and 'seeks'. 3) I just got a 2nd ssd from samsung (same kind), just to make sure the one I had wasn't bad. It's brand new, and I formatted it carefully on 512 boundaries: /dev/sda12048 502271 250112 83 Linux /dev/sda2 50227252930559262141447 HPFS/NTFS/exFAT /dev/sda3529305607390207910485760 82 Linux swap / Solaris /dev/sda473902080 1000215215 463156568 83 Linux I also upgraded to 3.5.0 in the meantime but unfortunately the results are similar. First: btrfs is the slowest: gandalfthegreat:/mnt/ssd/var/local# time du -sh src/ 514Msrc/ real0m25.741s gandalfthegreat:/mnt/ssd/var/local# grep /mnt/ssd/var /proc/mounts /dev/mapper/ssd /mnt/ssd/var btrfs rw,noatime,compress=lzo,ssd,discard,space_cache 0 0 Second: ext4 is 2x faster than btrfs with mkfs.ext4 -O extent -b 4096 /dev/sda3 gandalfthegreat:/mnt/mnt3# reset_cache gandalfthegreat:/mnt/mnt3# time du -sh src/ 519Msrc/ real0m12.459s gandalfthegreat:~# grep mnt3 /proc/mounts /dev/sda3 /mnt/mnt3 ext4 rw,noatime,discard,data=ordered 0 0 Third, A freshly made ntfs filesystem through fuse is actually FASTER! gandalfthegreat:/mnt/mnt2# reset_cache gandalfthegreat:/mnt/mnt2# time du -sh src/ 506Msrc/ real0m8.928s gandalfthegreat:/mnt/mnt2# grep mnt2 /proc/mounts /dev/sda2 /mnt/mnt2 fuseblk rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096 0 0 How can ntfs via fuse be the fastest and btrfs so slow? Of course, all 3 are slower than the same filesystem on spinning too, but I'm wondering if there is a scheduling issue that is somehow causing the extreme slowness I'm seeing. Did the latencytop trace I got help in any way? Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How can btrfs take 23sec to stat 23K files from an SSD?
On Wed, Aug 1, 2012 at 1:01 PM, Marc MERLIN m...@merlins.org wrote: So, clearly, there is something wrong with the samsung 830 SSD with linux It it were a random crappy SSD from a random vendor, I'd blame the SSD, but I have a hard time believing that samsung is selling SSDs that are slower than hard drives at random IO and 'seeks'. You'd be surprised on how badly some vendors can screw up :) First: btrfs is the slowest: gandalfthegreat:/mnt/ssd/var/local# grep /mnt/ssd/var /proc/mounts /dev/mapper/ssd /mnt/ssd/var btrfs rw,noatime,compress=lzo,ssd,discard,space_cache 0 0 Just checking, did you explicitly activate discard? Cause on my setup (with corsair SSD) it made things MUCH slower. Also, try adding noatime (just in case the slow down was because du cause many access time updates) -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How can btrfs take 23sec to stat 23K files from an SSD?
On Wed, Aug 01, 2012 at 01:08:46PM +0700, Fajar A. Nugraha wrote: It it were a random crappy SSD from a random vendor, I'd blame the SSD, but I have a hard time believing that samsung is selling SSDs that are slower than hard drives at random IO and 'seeks'. You'd be surprised on how badly some vendors can screw up :) At some point, it may come down to that indeed :-/ I'm still hopefully that Samsung didn't, but we'll see. First: btrfs is the slowest: gandalfthegreat:/mnt/ssd/var/local# grep /mnt/ssd/var /proc/mounts /dev/mapper/ssd /mnt/ssd/var btrfs rw,noatime,compress=lzo,ssd,discard,space_cache 0 0 Just checking, did you explicitly activate discard? Cause on my Yes. Note that it should a noop when all you're doing is stating inodes and not writing (I'm using noatime). setup (with corsair SSD) it made things MUCH slower. Also, try adding noatime (just in case the slow down was because du cause many access time updates) I have noatime in there already :) Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How can btrfs take 23sec to stat 23K files from an SSD?
On 01/08/12 16:01, Marc MERLIN wrote: Third, A freshly made ntfs filesystem through fuse is actually FASTER! Could it be that Samsungs FTL has optimisations in it for NTFS ? A horrible thought, but not impossible.. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How can btrfs take 23sec to stat 23K files from an SSD?
On Wed, Aug 01, 2012 at 04:36:22PM +1000, Chris Samuel wrote: On 01/08/12 16:01, Marc MERLIN wrote: Third, A freshly made ntfs filesystem through fuse is actually FASTER! Could it be that Samsungs FTL has optimisations in it for NTFS ? A horrible thought, but not impossible.. Not impossible, but it's can be the main reason since it clocks still 2x slower with ntfs than a spinning disk with encrypted btrfs. Since SSDs should seek 10-100x faster than spinning disks, that can't be the only reason. Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How can btrfs take 23sec to stat 23K files from an SSD?
Hi Marc, Am Mittwoch, 1. August 2012 schrieb Marc MERLIN: On Wed, Aug 01, 2012 at 01:08:46PM +0700, Fajar A. Nugraha wrote: It it were a random crappy SSD from a random vendor, I'd blame the SSD, but I have a hard time believing that samsung is selling SSDs that are slower than hard drives at random IO and 'seeks'. You'd be surprised on how badly some vendors can screw up :) At some point, it may come down to that indeed :-/ I'm still hopefully that Samsung didn't, but we'll see. Its getting quite strange. I lost track of whether you did that already or not, but if you didn´t please post some vmstat 1 iostat -xd 1 on the device while it is being slow. I am interested in wait I/O and latencies and disk utilization. Comparison data of Intel SSD 320 in ThinkPad T520 during merkaba:~ echo 3 /proc/sys/drop_caches ; du -sch /usr on BTRFS with Kernel 3.5: martin@merkaba:~ vmstat 1 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 2 1 21556 4442668 2056 50235200 19485 247 120 11 2 87 0 1 2 21556 440 2448 51488400 11684 328 4975 24585 5 16 65 14 1 0 21556 4389880 2448 52806000 13400 0 4574 23452 2 16 68 14 3 1 21556 4370068 2448 54505200 18132 0 5499 27220 1 18 64 16 1 0 21556 4350228 2448 8000 10856 0 4122 25339 3 16 67 14 1 1 21556 4315604 2448 56975600 12648 0 4647 31153 5 14 66 15 0 1 21556 4295652 2456 58148000 1154856 4093 24618 2 13 69 16 0 1 21556 4286720 2456 59158000 10824 0 3750 21445 1 12 71 16 0 1 21556 4266308 2456 60362000 12932 0 4841 26447 4 12 68 17 1 0 21556 4248228 2456 61380800 10264 4 3703 22108 1 13 71 15 5 1 21556 4231976 2456 62435600 10540 0 3581 20436 1 10 72 17 0 1 21556 4197168 2456 63910800 12952 0 4738 28223 4 15 66 15 4 1 21556 4178456 2456 65055200 11656 0 4234 23480 2 14 68 16 0 1 21556 4163616 2456 66299200 13652 0 4619 26580 1 16 70 13 4 1 21556 4138288 2456 67569600 13352 0 4422 22254 1 16 70 13 1 0 21556 4113204 2456 68906000 13232 0 4312 21936 1 15 70 14 0 1 21556 4085532 2456 70416000 14972 0 4820 24238 1 16 69 14 2 0 21556 4055740 2456 71964400 15736 0 5099 25513 3 17 66 14 0 1 21556 4028612 2456 73438000 14504 0 4795 25052 3 15 68 14 2 0 21556 3999108 2456 74904000 1465616 4672 21878 1 17 69 13 1 1 21556 3972732 2456 76210800 12972 0 4717 22411 1 17 70 13 5 0 21556 3949684 2584 77348400 1152852 4837 24107 3 15 67 15 1 0 21556 3912504 2584 78742000 12156 0 4883 25201 4 15 67 14 martin@merkaba:~ iostat -xd 1 /dev/sda Linux 3.5.0-tp520 (merkaba) 01.08.2012 _x86_64_(4 CPU) Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 1,29 1,44 11,58 12,78 684,74 299,7580,81 0,249,860,95 17,93 0,29 0,71 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0,00 0,00 2808,000,00 11232,00 0,00 8,00 0,570,210,210,00 0,19 54,50 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0,00 0,00 2967,000,00 11868,00 0,00 8,00 0,630,210,210,00 0,21 60,90 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0,0011,00 2992,004,00 11968,0056,00 8,03 0,640,220,220,25 0,21 62,00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0,00 0,00 2680,000,00 10720,00 0,00 8,00 0,700,260,260,00 0,25 66,70 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0,00 0,00 3153,000,00 12612,00 0,00 8,00 0,720,230,230,00 0,22 69,30 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0,00 0,00 2769,000,00 11076,00 0,00 8,00 0,630,230,230,00 0,21 58,00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0,00 0,00 2523,001,00 10092,00 4,00 8,00 0,740,290,290,00 0,28
Re: How can btrfs take 23sec to stat 23K files from an SSD?
On Mon, Jul 23, 2012 at 12:42:03AM -0600, Marc MERLIN wrote: 22 seconds for 15K files on an SSD is super slow and being 5 times slower than a spinning disk with the same data. What's going on? Hi Marc, The easiest way to figure out is with latencytop. I'd either run the latencytop gui or use the latencytop -c patch which sends a text dump to the console. This is assuming that you're not pegged at 100% CPU... https://oss.oracle.com/~mason/latencytop.patch -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How can btrfs take 23sec to stat 23K files from an SSD?
On Fri, Jul 27, 2012 at 07:08:35AM -0400, Chris Mason wrote: On Mon, Jul 23, 2012 at 12:42:03AM -0600, Marc MERLIN wrote: 22 seconds for 15K files on an SSD is super slow and being 5 times slower than a spinning disk with the same data. What's going on? Hi Marc, The easiest way to figure out is with latencytop. I'd either run the latencytop gui or use the latencytop -c patch which sends a text dump to the console. This is assuming that you're not pegged at 100% CPU... https://oss.oracle.com/~mason/latencytop.patch Thanks for the patch, and yes I can confirm I'm definitely not pegged on CPU (not even close and I get the same problem with unencrypted filesystem, actually du -sh is exactly the same speed on encrypted and unecrypted). Here's the result I think you were looking for. I'm not good at reading this, but hopefully it tells you something useful :) The full run is here if that helps: http://marc.merlins.org/tmp/latencytop.txt Process du (6748) Total: 4280.5 msec Reading directory content15.2 msec 11.0 % sleep_on_page wait_on_page_bit read_extent_buffer_pages btree_read_extent_buffer_pages.constprop.110 read_tree_block read_block_for_search.isra.32 btrfs_next_leaf btrfs_real_readdir vfs_readdir sys_getdents system_call_fastpath [sleep_on_page] 13.5 msec 88.2 % sleep_on_page wait_on_page_bit read_extent_buffer_pages btree_read_extent_buffer_pages.constprop.110 read_tree_block read_block_for_search.isra.32 btrfs_search_slot btrfs_lookup_csum __btrfs_lookup_bio_sums btrfs_lookup_bio_sums btrfs_submit_compressed_read btrfs_submit_bio_hook Page fault 12.9 msec 0.6 % sleep_on_page_killable wait_on_page_bit_killable __lock_page_or_retry filemap_fault __do_fault handle_pte_fault handle_mm_fault do_page_fault page_fault Executing a program 7.1 msec 0.2 % sleep_on_page_killable __lock_page_killable generic_file_aio_read do_sync_read vfs_read kernel_read prepare_binprm do_execve_common.isra.27 do_execve sys_execve stub_execve Process du (6748) Total: 9517.4 msec [sleep_on_page] 23.0 msec 82.8 % sleep_on_page wait_on_page_bit read_extent_buffer_pages btree_read_extent_buffer_pages.constprop.110 read_tree_block read_block_for_search.isra.32 btrfs_search_slot btrfs_lookup_inode btrfs_iget btrfs_lookup_dentry btrfs_lookup __lookup_hash Reading directory content13.2 msec 17.2 % sleep_on_page wait_on_page_bit read_extent_buffer_pages btree_read_extent_buffer_pages.constprop.110 read_tree_block read_block_for_search.isra.32 btrfs_search_slot btrfs_real_readdir vfs_readdir sys_getdents system_call_fastpath Process du (6748) Total: 9524.0 msec [sleep_on_page] 17.1 msec 88.5 % sleep_on_page wait_on_page_bit read_extent_buffer_pages btree_read_extent_buffer_pages.constprop.110 read_tree_block read_block_for_search.isra.32 btrfs_search_slot btrfs_lookup_inode btrfs_iget btrfs_lookup_dentry btrfs_lookup __lookup_hash Reading directory content16.0 msec 11.5 % sleep_on_page wait_on_page_bit read_extent_buffer_pages btree_read_extent_buffer_pages.constprop.110 read_tree_block read_block_for_search.isra.32 btrfs_search_slot btrfs_real_readdir vfs_readdir sys_getdents system_call_fastpath -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How can btrfs take 23sec to stat 23K files from an SSD?
On Tue, Jul 24, 2012 at 09:56:26AM +0200, Martin Steigerwald wrote: find is fast, du is much slower: merkaba:~ echo 3 /proc/sys/vm/drop_caches ; time ( find /usr | wc -l ) 404166 ( find /usr | wc -l; ) 0,03s user 0,07s system 1% cpu 9,212 total merkaba:~ echo 3 /proc/sys/vm/drop_caches ; time ( du -sh /usr ) 11G /usr ( du -sh /usr; ) 1,00s user 19,07s system 41% cpu 48,886 total You're right. gandalfthegreat [mc]# time du -sh src 514Msrc real0m25.159s gandalfthegreat [mc]# reset_cache gandalfthegreat [mc]# time bash -c find src | wc -l 15261 real0m7.614s But find is still slower the du on my spinning disk for the same tree. Anyway thats still much faster than your measurements. Right. Your numbers look reasonable for an SSD. Since then, I did some more tests and I'm also getting slower than normal speeds with ext4, an indications that it's a problem with the block layer. I'm working with some folks try to pin down the core problem, but it looks like it's not an easy one. Thanks for your data. Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How can btrfs take 23sec to stat 23K files from an SSD?
On Sun, Jul 22, 2012 at 11:42:03PM -0700, Marc MERLIN wrote: I just realized that the older thread got a bit confusing, so I'll keep problems separate and make things simpler :) Since yesterday, I tried other kernels, including noprempt, volprempt and preempt for 3.4.4. I also tried a default 3.2.0 kernel from debian (all amd64), but that did not help. I'm still seeing close to 25 seconds to scan 15K files. How can it possibly be so slow? More importantly how I can provide useful debug information. - I don't think it's a problem with the kernel since I tried 4 kernels, including a default debian one. - Alignement seem ok, I made sure cylinders was divisible by 512: /dev/sda2 5022725293055926214144 83 Linux - I tried another brand new btrfs, and thing are even slower now. gandalfthegreat:/mnt/mnt2# mount -o ssd,discard,noatime /dev/sda2 /mnt/mnt2 gandalfthegreat:/mnt/mnt2# reset_cache gandalfthegreat:/mnt/mnt2# time du -sh src/ 514Msrc/ real0m29.584s gandalfthegreat:/mnt/mnt2# find src/| wc -l 15261 This is bad enough that there ought to be a way to debug this, right? Can you suggest something? Thanks, Marc On an _unencrypted_ partition on the SSD, running du -sh on a directory with 15K files, takes 23 seconds on unencrypted SSD and 4 secs on encrypted spinning drive, both with a similar btrfs filesystem, and the same kernel (3.4.4). Unencrypted btrfs on SSD: gandalfthegreat:~# mount -o compress=lzo,discard,nossd,space_cache,noatime /dev/sda2 /mnt/mnt2 gandalfthegreat:/mnt/mnt2# echo 3 /proc/sys/vm/drop_caches; time du -sh src 514M src real 0m22.667s Encrypted btrfs on spinning drive of the same src directory: gandalfthegreat:/var/local# echo 3 /proc/sys/vm/drop_caches; time du -sh src 514M src real 0m3.881s I've run this many times and get the same numbers. I've tried deadline and noop on /dev/sda (the SSD) and du is just as slow. I also tried with: - space_cache and nospace_cache - ssd and nossd - noatime didn't seem to help even though I was hopeful on this one. In all cases, I get: gandalfthegreat:/mnt/mnt2# echo 3 /proc/sys/vm/drop_caches; time du -sh src 514M src real 0m22.537s I'm having the same slow speed on 2 btrfs filesystems on the same SSD. One is encrypted, the other one isnt: Label: 'btrfs_pool1' uuid: d570c40a-4a0b-4d03-b1c9-cff319fc224d Total devices 1 FS bytes used 144.74GB devid1 size 441.70GB used 195.04GB path /dev/dm-0 Label: 'boot' uuid: 84199644-3542-430a-8f18-a5aa58959662 Total devices 1 FS bytes used 2.33GB devid1 size 25.00GB used 5.04GB path /dev/sda2 If instead of stating a bunch of files, I try reading a big file, I do get speeds that are quite fast (253MB/s and 423MB/s). 22 seconds for 15K files on an SSD is super slow and being 5 times slower than a spinning disk with the same data. What's going on? Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How can btrfs take 23sec to stat 23K files from an SSD?
Am Montag, 23. Juli 2012 schrieb Marc MERLIN: I just realized that the older thread got a bit confusing, so I'll keep problems separate and make things simpler :) On an _unencrypted_ partition on the SSD, running du -sh on a directory with 15K files, takes 23 seconds on unencrypted SSD and 4 secs on encrypted spinning drive, both with a similar btrfs filesystem, and the same kernel (3.4.4). Unencrypted btrfs on SSD: gandalfthegreat:~# mount -o compress=lzo,discard,nossd,space_cache,noatime /dev/sda2 /mnt/mnt2 gandalfthegreat:/mnt/mnt2# echo 3 /proc/sys/vm/drop_caches; time du -sh src 514M src real 0m22.667s Encrypted btrfs on spinning drive of the same src directory: gandalfthegreat:/var/local# echo 3 /proc/sys/vm/drop_caches; time du -sh src 514M src real 0m3.881s find is fast, du is much slower: merkaba:~ echo 3 /proc/sys/vm/drop_caches ; time ( find /usr | wc -l ) 404166 ( find /usr | wc -l; ) 0,03s user 0,07s system 1% cpu 9,212 total merkaba:~ echo 3 /proc/sys/vm/drop_caches ; time ( du -sh /usr ) 11G /usr ( du -sh /usr; ) 1,00s user 19,07s system 41% cpu 48,886 total Now I try to find something with less files. merkaba:~ find /usr/share/doc | wc -l 50715 merkaba:~ echo 3 /proc/sys/vm/drop_caches ; time ( find /usr/share/doc | wc -l ) 50715 ( find /usr/share/doc | wc -l; ) 0,00s user 0,02s system 1% cpu 1,398 total merkaba:~ echo 3 /proc/sys/vm/drop_caches ; time ( du -sh /usr/share/doc ) 606M/usr/share/doc ( du -sh /usr/share/doc; ) 0,20s user 3,63s system 35% cpu 10,691 total merkaba:~ echo 3 /proc/sys/vm/drop_caches ; time du -sh /usr/share/doc 606M/usr/share/doc du -sh /usr/share/doc 0,19s user 3,54s system 35% cpu 10,386 total Anyway thats still much faster than your measurements. merkaba:~ df -hT /usr DateisystemTyp Größe Benutzt Verf. Verw% Eingehängt auf /dev/dm-0 btrfs 19G 11G 5,6G 67% / merkaba:~ btrfs fi sh failed to read /dev/sr0 Label: 'debian' uuid: […] Total devices 1 FS bytes used 10.25GB devid1 size 18.62GB used 18.62GB path /dev/dm-0 Btrfs Btrfs v0.19 merkaba:~ btrfs fi df / Data: total=15.10GB, used=9.59GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.75GB, used=670.43MB Metadata: total=8.00MB, used=0.00 merkaba:~ grep btrfs /proc/mounts /dev/dm-0 / btrfs rw,noatime,compress=lzo,ssd,space_cache,inode_cache 0 0 Somewhat aged BTRFS filesystem on ThinkPad T520, Intel SSD 320, kernel 3.5. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How can btrfs take 23sec to stat 23K files from an SSD?
I just realized that the older thread got a bit confusing, so I'll keep problems separate and make things simpler :) On an _unencrypted_ partition on the SSD, running du -sh on a directory with 15K files, takes 23 seconds on unencrypted SSD and 4 secs on encrypted spinning drive, both with a similar btrfs filesystem, and the same kernel (3.4.4). Unencrypted btrfs on SSD: gandalfthegreat:~# mount -o compress=lzo,discard,nossd,space_cache,noatime /dev/sda2 /mnt/mnt2 gandalfthegreat:/mnt/mnt2# echo 3 /proc/sys/vm/drop_caches; time du -sh src 514Msrc real0m22.667s Encrypted btrfs on spinning drive of the same src directory: gandalfthegreat:/var/local# echo 3 /proc/sys/vm/drop_caches; time du -sh src 514Msrc real0m3.881s I've run this many times and get the same numbers. I've tried deadline and noop on /dev/sda (the SSD) and du is just as slow. I also tried with: - space_cache and nospace_cache - ssd and nossd - noatime didn't seem to help even though I was hopeful on this one. In all cases, I get: gandalfthegreat:/mnt/mnt2# echo 3 /proc/sys/vm/drop_caches; time du -sh src 514Msrc real0m22.537s I'm having the same slow speed on 2 btrfs filesystems on the same SSD. One is encrypted, the other one isnt: Label: 'btrfs_pool1' uuid: d570c40a-4a0b-4d03-b1c9-cff319fc224d Total devices 1 FS bytes used 144.74GB devid1 size 441.70GB used 195.04GB path /dev/dm-0 Label: 'boot' uuid: 84199644-3542-430a-8f18-a5aa58959662 Total devices 1 FS bytes used 2.33GB devid1 size 25.00GB used 5.04GB path /dev/sda2 If instead of stating a bunch of files, I try reading a big file, I do get speeds that are quite fast (253MB/s and 423MB/s). 22 seconds for 15K files on an SSD is super slow and being 5 times slower than a spinning disk with the same data. What's going on? Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html