Re: stupid UFS behaviour on random writes
Am 18.01.2013 00:01, schrieb Rick Macklem: Wojciech Puchar wrote: create 10GB file (on 2GB RAM machine, with some swap used to make sure little cache would be available for filesystem. dd if=/dev/zero of=file bs=1m count=10k block size is 32KB, fragment size 4k now test random read access to it (10 threads) randomio test 10 0 0 4096 normal result on such not so fast disk in my laptop. 118.5 | 118.5 5.8 82.3 383.2 85.6 | 0.0 inf nan 0.0 nan 138.4 | 138.4 3.9 72.2 499.7 76.1 | 0.0 inf nan 0.0 nan 142.9 | 142.9 5.4 69.9 297.7 60.9 | 0.0 inf nan 0.0 nan 133.9 | 133.9 4.3 74.1 480.1 75.1 | 0.0 inf nan 0.0 nan 138.4 | 138.4 5.1 72.1 380.0 71.3 | 0.0 inf nan 0.0 nan 145.9 | 145.9 4.7 68.8 419.3 69.6 | 0.0 inf nan 0.0 nan systat shows 4kB I/O size. all is fine. BUT random 4kB writes randomio test 10 1 0 4096 total | read: latency (ms) | write: latency (ms) iops | iops min avg max sdev | iops min avg max sdev +---+-- 38.5 | 0.0 inf nan 0.0 nan | 38.5 9.0 166.5 1156.8 261.5 44.0 | 0.0 inf nan 0.0 nan | 44.0 0.1 251.2 2616.7 492.7 44.0 | 0.0 inf nan 0.0 nan | 44.0 7.6 178.3 1895.4 330.0 45.0 | 0.0 inf nan 0.0 nan | 45.0 0.0 239.8 3457.4 522.3 45.5 | 0.0 inf nan 0.0 nan | 45.5 0.1 249.8 5126.7 621.0 results are horrific. systat shows 32kB I/O, gstat shows half are reads half are writes. Why UFS need to read full block, change one 4kB part and then write back, instead of just writing 4kB part? Because that's the way the buffer cache works. It writes an entire buffer cache block (unless at the end of file), so it must read the rest of the block into the buffer, so it doesn't write garbage (the rest of the block) out. Without having looked at the code or testing: I assume using O_DIRECT when opening the file should help for that particular test (on kernels compiled with options DIRECTIO). I'd argue that using an I/O size smaller than the file system block size is simply sub-optimal and that most apps. don't do random I/O of blocks. OR If you had an app. that does random I/O of 4K blocks (at 4K byte offsets), then using a 4K/1K file system would be better. A 4k/1k file system has higher overhead (more indirect blocks) and is clearly sub-obtimal for most general uses, today. NFS is the exception, in that it keeps track of a dirty byte range within a buffer cache block and writes that byte range. (NFS writes are byte granular, unlike a disk.) I should be easy to add support for a fragment mask to the buffer cache, which allows to identify valid fragments. Such a mask should be set to 0xff for all current uses of the buffer cache (meaning the full block is valid), but a special case could then be added for writes of exactly one or multiple fragments, where only the corresponding valid flag bits were set. In addition, a possible later read from disk must obviously skip fragments for which the valid mask bits are already set. This bit mask could then be used to update the affected fragments only, without a read-modify-write of the containing block. But I doubt that such a change would improve performance in the general case, just in random update scenarios (which might still be relevant, in case of a DBMS knowing the fragment size and using it for DB files). Regards, STefan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: stupid UFS behaviour on random writes
But I doubt that such a change would improve performance in the you doubt but i am sure it would improve it a lot. Just imagine multiple VM images on filesystem, running windoze with 4kB cluster size, each writing something. no matter what is written from within VM it ends up as read followed by write, unless blocks are already in buffer cache. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
[GIANT-LOCKED] even without D_NEEDGIANT
Hello list, At $DAILY_JOB I got involved with an ASI board that didn't have any kind of FreeBSD support, so I ended up writing a driver for it. If you try to ignore the blatant style(9) violations (of which there are many, hopefully on the way to be cleaned up) it seems to work fine. However, I noticed that when loading the driver I always get a message about the giant lock being used, even if D_NEEDGIANT is not specified anywhere. The actual output when loading is this (FreeBSD 9-STABLE i386): dektec0: DekTec DTA-145 mem 0xfeaff800-0xfeaf irq 16 at device 13.0 on pci0 dektec0: [GIANT-LOCKED] dektec0: [ITHREAD] dektec0: board model 145, firmware version 2 (tx: 0, rx: 2), tx fifo 16384 MB Source code here: https://github.com/olgeni/freebsd-dektec/blob/master/dektec.c Can anybody offer a clue about what could be triggering the GIANT requirement? Could I be doing something that has this, and possibly other, unintended side effects? -- jimmy ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [GIANT-LOCKED] even without D_NEEDGIANT
on 18/01/2013 13:39 Jimmy Olgeni said the following: Hello list, At $DAILY_JOB I got involved with an ASI board that didn't have any kind of FreeBSD support, so I ended up writing a driver for it. If you try to ignore the blatant style(9) violations (of which there are many, hopefully on the way to be cleaned up) it seems to work fine. However, I noticed that when loading the driver I always get a message about the giant lock being used, even if D_NEEDGIANT is not specified anywhere. The actual output when loading is this (FreeBSD 9-STABLE i386): dektec0: DekTec DTA-145 mem 0xfeaff800-0xfeaf irq 16 at device 13.0 on pci0 dektec0: [GIANT-LOCKED] dektec0: [ITHREAD] dektec0: board model 145, firmware version 2 (tx: 0, rx: 2), tx fifo 16384 MB Source code here: https://github.com/olgeni/freebsd-dektec/blob/master/dektec.c Can anybody offer a clue about what could be triggering the GIANT requirement? Could I be doing something that has this, and possibly other, unintended side effects? See INTR_MPSAFE in bus_setup_intr(9). -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [GIANT-LOCKED] even without D_NEEDGIANT
On Fri, 18 Jan 2013, Andriy Gapon wrote: See INTR_MPSAFE in bus_setup_intr(9). Thanks! It went away. Back to testing... -- jimmy ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
On Thu, 17 Jan 2013 16:12:17 -0600, Karim Fodil-Lemelin fodillemlinka...@gmail.com wrote: SAS controllers may connect to SATA devices, either directly connected using native SATA protocol or through SAS expanders using SATA Tunneled Protocol (STP). The systems is currently put in place using SATA instead of SAS although its using the same interface and backplane connectors and the drives (SATA) show as da0 in BSD _but_ with the SATA drive we get *much* better performances. I am thinking that something fancy in that SAS drive is not being handled correctly by the FreeBSD driver. I am planning to revisit the SAS drive issue at a later point (sometimes next week). Your SATA drives are connected directly not with an interposer such as the LSISS9252, correct? If so, this might be the cause of your problems. Mixing SAS and SATA drives is known to cause serious performance issues for almost every JBOD/controller/expander/what-have-you. Change your configuration so there is only one protocol being spoken on the bus (SAS) by putting your SATA drives behind interposers which translate SAS to SATA just before the disk. This will solve many problems. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
kmem_map auto-sizing and size dependencies
The autotuning work is reaching into many places of the kernel and while trying to tie up all lose ends I've got stuck in the kmem_map and how it works or what its limitations are. During startup the VM is initialized and an initial kernel virtual memory map is setup in kmem_init() covering the entire KVM address range. Only the kernel itself is actually allocated within that map. A bit later on a number of other submaps are allocated (clean_map, buffer_map, pager_map, exec_map). Also in kmeminit() (in kern_malloc.c, different from kmem_init) the kmem_map is allocated. The (inital?) size of the kmem_map is determined by some voodoo magic, a sprinkle of nmbclusters * PAGE_SIZE incrementor and lots of tunables. However it seems to work out to an effective kmem_map_size of about 58MB on my 16GB AMD64 dev machine: vm.kvm_size: 549755809792 vm.kvm_free: 530233421824 vm.kmem_size: 16,594,300,928 vm.kmem_size_min: 0 vm.kmem_size_max: 329,853,485,875 vm.kmem_size_scale: 1 vm.kmem_map_size: 59,518,976 vm.kmem_map_free: 16,534,777,856 The kmem_map serves kernel malloc (via UMA), contigmalloc and everthing else that uses UMA for memory allocation. Mbuf memory too is managed by UMA which obtains the backing kernel memory from the kmem_map. The limits of the various mbuf memory types have been considerably raised recently and may make use of 50-75% of all physically present memory, or available KVM space, whichever is smaller. Now my questions/comments are: Does the kmem_map automatically extend itself if more memory is requested? Should it be set to a larger initial value based on min(physical,KVM) space available? The use of nmbclusters for the initial kmem_map size calculation isn't appropriate anymore due to it being set up later and nmbclusters isn't the only mbuf relevant mbuf type. We make significant use of page sized mbuf clusters too. The naming and output of the various vm.kmem_* and vm.kvm_* sysctls is confusing and not easy to reconcile. Either we need some more detailing more aspects or less. Plus perhaps sysctl subtrees to better describe the hierarchy of the maps. Why are separate kmem submaps being used? Is it to limit memory usage of certain subsystems? Are those limits actually enforced? -- Andre ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: stupid UFS behaviour on random writes
Stefan Esser wrote: Am 18.01.2013 00:01, schrieb Rick Macklem: Wojciech Puchar wrote: create 10GB file (on 2GB RAM machine, with some swap used to make sure little cache would be available for filesystem. dd if=/dev/zero of=file bs=1m count=10k block size is 32KB, fragment size 4k now test random read access to it (10 threads) randomio test 10 0 0 4096 normal result on such not so fast disk in my laptop. 118.5 | 118.5 5.8 82.3 383.2 85.6 | 0.0 inf nan 0.0 nan 138.4 | 138.4 3.9 72.2 499.7 76.1 | 0.0 inf nan 0.0 nan 142.9 | 142.9 5.4 69.9 297.7 60.9 | 0.0 inf nan 0.0 nan 133.9 | 133.9 4.3 74.1 480.1 75.1 | 0.0 inf nan 0.0 nan 138.4 | 138.4 5.1 72.1 380.0 71.3 | 0.0 inf nan 0.0 nan 145.9 | 145.9 4.7 68.8 419.3 69.6 | 0.0 inf nan 0.0 nan systat shows 4kB I/O size. all is fine. BUT random 4kB writes randomio test 10 1 0 4096 total | read: latency (ms) | write: latency (ms) iops | iops min avg max sdev | iops min avg max sdev +---+-- 38.5 | 0.0 inf nan 0.0 nan | 38.5 9.0 166.5 1156.8 261.5 44.0 | 0.0 inf nan 0.0 nan | 44.0 0.1 251.2 2616.7 492.7 44.0 | 0.0 inf nan 0.0 nan | 44.0 7.6 178.3 1895.4 330.0 45.0 | 0.0 inf nan 0.0 nan | 45.0 0.0 239.8 3457.4 522.3 45.5 | 0.0 inf nan 0.0 nan | 45.5 0.1 249.8 5126.7 621.0 results are horrific. systat shows 32kB I/O, gstat shows half are reads half are writes. Why UFS need to read full block, change one 4kB part and then write back, instead of just writing 4kB part? Because that's the way the buffer cache works. It writes an entire buffer cache block (unless at the end of file), so it must read the rest of the block into the buffer, so it doesn't write garbage (the rest of the block) out. Without having looked at the code or testing: I assume using O_DIRECT when opening the file should help for that particular test (on kernels compiled with options DIRECTIO). I'd argue that using an I/O size smaller than the file system block size is simply sub-optimal and that most apps. don't do random I/O of blocks. OR If you had an app. that does random I/O of 4K blocks (at 4K byte offsets), then using a 4K/1K file system would be better. A 4k/1k file system has higher overhead (more indirect blocks) and is clearly sub-obtimal for most general uses, today. Yes, but if the sysadmin knows that most of the I/O is random 4K blocks, that's his specific case, not a general use. Sorry, I didn't mean to imply that a 4K file system was a good choice, in general. NFS is the exception, in that it keeps track of a dirty byte range within a buffer cache block and writes that byte range. (NFS writes are byte granular, unlike a disk.) I should be easy to add support for a fragment mask to the buffer cache, which allows to identify valid fragments. Such a mask should be set to 0xff for all current uses of the buffer cache (meaning the full block is valid), but a special case could then be added for writes of exactly one or multiple fragments, where only the corresponding valid flag bits were set. In addition, a possible later read from disk must obviously skip fragments for which the valid mask bits are already set. This bit mask could then be used to update the affected fragments only, without a read-modify-write of the containing block. But I doubt that such a change would improve performance in the general case, just in random update scenarios (which might still be relevant, in case of a DBMS knowing the fragment size and using it for DB files). Regards, STefan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org Yes. And for some I/O patterns the fragment change would degrade performance. You mentioned that a later read might have to skip fragments with the valid bit. I think this would translate to doing multiple reads for the other fragments, in practice. Also, when an app. goes to write a partial fragment, that fragment would have to be read in and this could result in several reads of fragments instead of one read for the entire block. It's the old OS doesn't have a crystal ball that predicts future I/O activity. Btw, although I did a dirty byte range for NFS for the buffer cache ages (late 1980s) ago, it is also a performance hit for certain cases. The linker/loaders love to write random sized chucks to files. For the NFS code, if the new write isn't contiguous with the old one, a synchronous write of the old dirty byte range is forced to the server. I have a patch that replaces the single byte range with a list in order to avoid this synchronous write, but it has not made it into head. (I hope to do so someday, after more testing and when I figure out all the implications of
Re: kmem_map auto-sizing and size dependencies
I'll follow up with detailed answers to your questions over the weekend. For now, I will, however, point out that you've misinterpreted the tunables. In fact, they say that your kmem map can hold up to 16GB and the current used space is about 58MB. Like other things, the kmem map is auto-sized based on the available physical memory and capped so as not to consume too much of the overall kernel address space. Regards, Alan On Fri, Jan 18, 2013 at 9:29 AM, Andre Oppermann an...@freebsd.org wrote: The autotuning work is reaching into many places of the kernel and while trying to tie up all lose ends I've got stuck in the kmem_map and how it works or what its limitations are. During startup the VM is initialized and an initial kernel virtual memory map is setup in kmem_init() covering the entire KVM address range. Only the kernel itself is actually allocated within that map. A bit later on a number of other submaps are allocated (clean_map, buffer_map, pager_map, exec_map). Also in kmeminit() (in kern_malloc.c, different from kmem_init) the kmem_map is allocated. The (inital?) size of the kmem_map is determined by some voodoo magic, a sprinkle of nmbclusters * PAGE_SIZE incrementor and lots of tunables. However it seems to work out to an effective kmem_map_size of about 58MB on my 16GB AMD64 dev machine: vm.kvm_size: 549755809792 vm.kvm_free: 530233421824 vm.kmem_size: 16,594,300,928 vm.kmem_size_min: 0 vm.kmem_size_max: 329,853,485,875 vm.kmem_size_scale: 1 vm.kmem_map_size: 59,518,976 vm.kmem_map_free: 16,534,777,856 The kmem_map serves kernel malloc (via UMA), contigmalloc and everthing else that uses UMA for memory allocation. Mbuf memory too is managed by UMA which obtains the backing kernel memory from the kmem_map. The limits of the various mbuf memory types have been considerably raised recently and may make use of 50-75% of all physically present memory, or available KVM space, whichever is smaller. Now my questions/comments are: Does the kmem_map automatically extend itself if more memory is requested? Should it be set to a larger initial value based on min(physical,KVM) space available? The use of nmbclusters for the initial kmem_map size calculation isn't appropriate anymore due to it being set up later and nmbclusters isn't the only mbuf relevant mbuf type. We make significant use of page sized mbuf clusters too. The naming and output of the various vm.kmem_* and vm.kvm_* sysctls is confusing and not easy to reconcile. Either we need some more detailing more aspects or less. Plus perhaps sysctl subtrees to better describe the hierarchy of the maps. Why are separate kmem submaps being used? Is it to limit memory usage of certain subsystems? Are those limits actually enforced? -- Andre __**_ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/**mailman/listinfo/freebsd-**currenthttp://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscribe@** freebsd.org freebsd-current-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: kmem_map auto-sizing and size dependencies
On Fri, Jan 18, 2013 at 7:29 AM, Andre Oppermann an...@freebsd.org wrote: The (inital?) size of the kmem_map is determined by some voodoo magic, a sprinkle of nmbclusters * PAGE_SIZE incrementor and lots of tunables. However it seems to work out to an effective kmem_map_size of about 58MB on my 16GB AMD64 dev machine: vm.kvm_size: 549755809792 vm.kvm_free: 530233421824 vm.kmem_size: 16,594,300,928 vm.kmem_size_min: 0 vm.kmem_size_max: 329,853,485,875 vm.kmem_size_scale: 1 vm.kmem_map_size: 59,518,976 vm.kmem_map_free: 16,534,777,856 The kmem_map serves kernel malloc (via UMA), contigmalloc and everthing else that uses UMA for memory allocation. Mbuf memory too is managed by UMA which obtains the backing kernel memory from the kmem_map. The limits of the various mbuf memory types have been considerably raised recently and may make use of 50-75% of all physically present memory, or available KVM space, whichever is smaller. Now my questions/comments are: Does the kmem_map automatically extend itself if more memory is requested? Not that I recall. Should it be set to a larger initial value based on min(physical,KVM) space available? It needs to be smaller than the physical space, because the only limit on the kernel's use of (pinned) memory is the size of the map. So if it is too large there is nothing to stop the kernel from consuming all available memory. The lowmem handler is called when running out of virtual space only (i.e. a failure to allocate a range in the map). The naming and output of the various vm.kmem_* and vm.kvm_* sysctls is confusing and not easy to reconcile. Either we need some more detailing more aspects or less. Plus perhaps sysctl subtrees to better describe the hierarchy of the maps. Why are separate kmem submaps being used? Is it to limit memory usage of certain subsystems? Are those limits actually enforced? I mostly know about memguard, since I added memguard_fudge(). IIRC some of the submaps are used. The memguard_map specifically is used to know whether an allocation is guarded or not, so at free(9) it can be handled as normal malloc() or as memguard. Cheers, matthew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fixing grep -D skip
On Thursday, January 17, 2013 9:33:53 pm David Xu wrote: I am trying to fix a bug in GNU grep, the bug is if you want to skip FIFO file, it will not work, for example: grep -D skip aaa . it will be stucked on a FIFO file. Here is the patch: http://people.freebsd.org/~davidxu/patch/grep.c.diff2 Is it fine to be committed ? I think the first part definitely looks fine. My guess is the non-blocking change is als probably fine, but that should be run by the bsdgrep person at least. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
Try adding the following to /boot/loader.conf and reboot: hw.mpt.enable_sata_wc=1 The default value, -1, instructs the driver to leave the STA drives at their configuration default. Often times this means that the MPT BIOS will turn off the write cache on every system boot sequence. IT DOES THIS FOR A GOOD REASON! An enabled write cache is counter to data reliability. Yes, it helps make benchmarks look really good, and it's acceptable if your data can be safely thrown away (for example, you're just caching from a slower source, and the cache can be rebuilt if it gets corrupted). And yes, Linux has many tricks to make this benchmark look really good. The tricks range from buffering the raw device to having 'dd' recognize the requested task and short-circuit the process of going to /dev/null or pulling from /dev/zero. I can't tell you how bogus these tests are and how completely irrelevant they are in predicting actual workload performance. But, I'm not going to stop anyone from trying, so give the above tunable a try and let me know how it works. Btw, I'm not subscribed to the hackers mailing list, so please redistribute this email as needed. Scott From: Dieter BSD dieter...@gmail.com To: freebsd-hackers@freebsd.org Cc: mja...@freebsd.org; gi...@freebsd.org; sco...@freebsd.org Sent: Thursday, January 17, 2013 9:03 PM Subject: Re: IBM blade server abysmal disk write performances I am thinking that something fancy in that SAS drive is not being handled correctly by the FreeBSD driver. I think so too, and I think the something fancy is tagged command queuing. The driver prints da0: Command Queueing enabled and yet your SAS drive is only getting 1 write per rev, and queuing should get you more than that. Your SATA drive is getting the expected performance, which means that NCQ must be working. Please let me know if there is anything you would like me to run on the BSD 9.1 system to help diagnose this issue? Looking at the mpt driver, a verbose boot may give more info. Looks like you can set a debug device hint, but I don't see any documentation on what to set it to. I think it is time to ask the driver wizards why TCQ isn't working, so I'm cc-ing the authors listed on the mpt man page. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
The default value, -1, instructs the driver to leave the STA drives at their configuration default. Often times this means that the MPT BIOS will turn off the write cache on every system boot sequence. IT DOES THIS FOR A GOOD REASON! An enabled write cache is counter to data reliability. Yes, it helps make benchmarks look really good, and it's acceptable if your data can be safely thrown away (for example, you're just caching from a slower source, and the cache can be rebuilt if it gets corrupted). And yes, Linux has many tricks to make this benchmark look really good. The tricks range from buffering the raw device to having 'dd' recognize the requested task and short-circuit the process of going to /dev/null or pulling from /dev/zero. I can't tell you how bogus these tests are and how completely irrelevant they are in predicting actual workload performance. But, I'm not going to stop anyone from trying, so give the above tunable a try and let me know how it works. If computer have UPS then write caching is fine. even if FreeBSD crash, disk would write data___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
- Original Message - From: Wojciech Puchar woj...@wojtek.tensor.gdynia.pl To: Scott Long scott4l...@yahoo.com Cc: Dieter BSD dieter...@gmail.com; freebsd-hackers@freebsd.org freebsd-hackers@freebsd.org; gi...@freebsd.org gi...@freebsd.org; sco...@freebsd.org sco...@freebsd.org; mja...@freebsd.org mja...@freebsd.org Sent: Friday, January 18, 2013 11:10 AM Subject: Re: IBM blade server abysmal disk write performances The default value, -1, instructs the driver to leave the STA drives at their configuration default. Often times this means that the MPT BIOS will turn off the write cache on every system boot sequence. IT DOES THIS FOR A GOOD REASON! An enabled write cache is counter to data reliability. Yes, it helps make benchmarks look really good, and it's acceptable if your data can be safely thrown away (for example, you're just caching from a slower source, and the cache can be rebuilt if it gets corrupted). And yes, Linux has many tricks to make this benchmark look really good. The tricks range from buffering the raw device to having 'dd' recognize the requested task and short-circuit the process of going to /dev/null or pulling from /dev/zero. I can't tell you how bogus these tests are and how completely irrelevant they are in predicting actual workload performance. But, I'm not going to stop anyone from trying, so give the above tunable a try and let me know how it works. If computer have UPS then write caching is fine. even if FreeBSD crash, disk would write data I suspect that I'm encountering situations right now at netflix where this advice is not true. I have drives that are seeing intermittent errors, then being forced into reset after a timeout, and then coming back up with filesystem problems. It's only a suspicion at this point, not a confirmed case. Scott ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
disk would write data I suspect that I'm encountering situations right now at netflix where this advice is not true. I have drives that are seeing intermittent errors, then being forced into reset after a timeout, and then coming back up with filesystem problems. It's only a suspicion at this point, not a confirmed case. true. I just assumed that anywhere it matters one would use gmirror. As for myself - i always prefer to put different manufacturers drives for gmirror or at least - not manufactured at similar time. 2 fails at the same moment is rather unlikely. Of course - everything is possible so i do proper backups to remote sites. Remote means another city.___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
Wojciech writes: If computer have UPS then write caching is fine. even if FreeBSD crash, disk would write data That is incorrect. A UPS reduces the risk, but does not eliminate it. It is impossible to completely eliminate the risk of having the write cache on. If you care about your data you must turn the disk's write cache off. If you are using the drive in an application where the data does not matter, or can easily be regenerated (e.g. disk duplication, if it fails, just start over), then turning the write cache on for that one drive can be ok. There is a patch that allows turning the write cache on and off on a per drive basis. The patch is for ata(4), but should be possible with other drivers. camcontrol(8) may work for SCSI and SAS drives. I have yet to see a USB-to-*ATA bridge that allows turning the write cache off, so USB disks are useless for most applications. But for most applications, you must have the write cache off, and you need queuing (e.g. TCQ or NCQ) for performance. If you have queuing, there is no need to turn the write cache on. It is inexcusable that FreeBSD defaults to leaving the write cache on for SATA PATA drives. At least the admin can easily fix this by adding hw.ata.wc=0 to /boot/loader.conf. The bigger problem is that FreeBSD does not support queuing on all controllers that support it. Not something that admins can fix, and inexcusable for an OS that claims to care about performance. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
On Fri, 2013-01-18 at 20:37 +0100, Wojciech Puchar wrote: disk would write data I suspect that I'm encountering situations right now at netflix where this advice is not true. I have drives that are seeing intermittent errors, then being forced into reset after a timeout, and then coming back up with filesystem problems. It's only a suspicion at this point, not a confirmed case. true. I just assumed that anywhere it matters one would use gmirror. As for myself - i always prefer to put different manufacturers drives for gmirror or at least - not manufactured at similar time. That is good advice. I bought six 1TB drives at the same time a few years ago and received drives with consequtive serial numbers. They were all part of the same array, and they all failed (click of death) within a six hour timespan of each other. Luckily I noticed the clicking right away and was able to get all the data copied to another array within a few hours, before they all died. -- Ian 2 fails at the same moment is rather unlikely. Of course - everything is possible so i do proper backups to remote sites. Remote means another city. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
That is incorrect. A UPS reduces the risk, but does not eliminate it. nothing eliminate all risks. But for most applications, you must have the write cache off, and you need queuing (e.g. TCQ or NCQ) for performance. If you have queuing, there is no need to turn the write cache on. did you tested the above claim? i have SATA drives everywhere, all in ahci mode, all with NCQ active. It is inexcusable that FreeBSD defaults to leaving the write cache on for SATA PATA drives. At least the admin can easily fix this by adding hw.ata.wc=0 to /boot/loader.conf. The bigger problem is that FreeBSD does not support queuing on all controllers that support it. i must be happy as i never had a case of not seeing adaX: Command Queueing enabled on my machines. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
On Jan 18, 2013, at 1:12 PM, Dieter BSD dieter...@gmail.com wrote: It is inexcusable that FreeBSD defaults to leaving the write cache on for SATA PATA drives. This was completely driven by the need to satisfy idiotic benchmarkers, tech writers, and system administrators. It was a huge deal for FreeBSD 4.4, IIRC. It had been silently enabled it, we turned it off, released 4.4, and then got murdered in the press for being slow. If I had my way, the WC would be off, everyone would be using SAS, and anyone who enabled SATA WC or complained about I/O slowness would be forced into Siberian salt mines for the remainder of their lives. At least the admin can easily fix this by adding hw.ata.wc=0 to /boot/loader.conf. The bigger problem is that FreeBSD does not support queuing on all controllers that support it. Not something that admins can fix, and inexcusable for an OS that claims to care about performance. You keep saying this, but I'm unclear on what you mean. Can you explain? Scott ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
and anyone who enabled SATA WC or complained about I/O slowness would be forced into Siberian salt mines for the remainder of their lives. so reserve a place for me there. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
On Fri, 2013-01-18 at 22:18 +0100, Wojciech Puchar wrote: and anyone who enabled SATA WC or complained about I/O slowness would be forced into Siberian salt mines for the remainder of their lives. so reserve a place for me there. Yeah, me too. I prefer to go for all-out performance with separate risk mitigation strategies. I wouldn't set up a client datacenter that way, but it's wholly appropriate for what I do with this machine. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
On 2013-Jan-18 12:12:11 -0800, Dieter BSD dieter...@gmail.com wrote: adding hw.ata.wc=0 to /boot/loader.conf. The bigger problem is that FreeBSD does not support queuing on all controllers that support it. Not something that admins can fix, and inexcusable for an OS that claims to care about performance. Apart from continuous whinging and whining on mailing lists, what have you done to add support for queuing? -- Peter Jeremy pgpPelv8iAQPo.pgp Description: PGP signature
Re: IBM blade server abysmal disk write performances
On 18/01/2013 10:16 AM, Mark Felder wrote: On Thu, 17 Jan 2013 16:12:17 -0600, Karim Fodil-Lemelin fodillemlinka...@gmail.com wrote: SAS controllers may connect to SATA devices, either directly connected using native SATA protocol or through SAS expanders using SATA Tunneled Protocol (STP). The systems is currently put in place using SATA instead of SAS although its using the same interface and backplane connectors and the drives (SATA) show as da0 in BSD _but_ with the SATA drive we get *much* better performances. I am thinking that something fancy in that SAS drive is not being handled correctly by the FreeBSD driver. I am planning to revisit the SAS drive issue at a later point (sometimes next week). Your SATA drives are connected directly not with an interposer such as the LSISS9252, correct? If so, this might be the cause of your problems. Mixing SAS and SATA drives is known to cause serious performance issues for almost every JBOD/controller/expander/what-have-you. Change your configuration so there is only one protocol being spoken on the bus (SAS) by putting your SATA drives behind interposers which translate SAS to SATA just before the disk. This will solve many problems. Not sure what you mean by this but isn't the mpt detecting an interposer in this line: mpt0: LSILogic SAS/SATA Adapter port 0x1000-0x10ff mem 0x9991-0x99913fff,0x9990-0x9990 irq 28 at device 0.0 on pci11 mpt0: MPI Version=1.5.20.0 mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 ) mpt0: 0 Active Volumes (2 Max) mpt0: 0 Hidden Drive Members (14 Max) Also please not SATA speed in that same hardware setup works just fine. In any case I will have a look. Thanks, Karim. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
This is all turning into a bikeshed discussion. As far as I can tell, the basic original question was why a *SAS* (not a SATA) drive was not performing as well as expected based upon experiences with Linux. I still don't know whether reads or writes were being used for dd. This morning, I ran a fio test with a single threaded read component and a multithreaded write component to see if there were differences. All I had connected to my MPT system were ATA drives (Seagate 500GBs) and I'm remote now and won't be back until Sunday to put one of my 'good' SAS drives (140 GB Seagates, i.e., real SAS 15K RPM drives, not fat SATA bs drives). The numbers were pretty much the same for both FreeBSD and Linux. In fact, FreeBSD was slightly faster. I won't report the exact numbers right now, but only mention this as a piece of information that at least in my case the differences between the OS platform involved is negligible. This would, at least in my case, rule out issues based upon different platform access methods and different drivers. All of this other discussion, about WCE and what not is nice, but for all intents and purposes it serves could be moved to *-advocacy. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
mpt0: LSILogic SAS/SATA Adapter port 0x1000-0x10ff mem 0x9991-0x99913fff,0x9990-0x9990 irq 28 at device 0.0 on pci11 mpt0: MPI Version=1.5.20.0 mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 ) mpt0: 0 Active Volumes (2 Max) mpt0: 0 Hidden Drive Members (14 Max) Ah. Historically IBM systems (the 335, for one) have been very slow with the Integrated Raid software, at least on FreeBSD. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fixing grep -D skip
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 01/18/13 08:39, John Baldwin wrote: I (disclaimer: not bsdgrep person) have just tested that bsdgrep handle this case just fine. The non-blocking part is required to make the code function, otherwise the system will block on open() if fifo don't have another opener. I'd say Yes for this p Cheers, - -- Xin LI delp...@delphij.nethttps://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -BEGIN PGP SIGNATURE- iQEcBAEBCgAGBQJQ+eW5AAoJEG80Jeu8UPuzB3IH/RmhJionoEWRtczBy2ccA8sl XG1OIvSR60vWNFAGooOF2I66J8xF0/+f/4xDwN3C56kIweN3XgvxSmOCCM3aUab3 eaAdOIoWAkNb3r4iMxFCJNo6YKuiLTiw8vEdcqjXsqrHzzAMtk81jqSpw0ZkVJM2 upPWF9EItlyKDSOfLCVZiL9qxUxppV+xTpVpMd1F/ud7cQMBaAiU2/pyOgcZDLet GVp4dninbxn3+YN7DU/yvjBnhWWVCrHfbOl5C6zNgrfzfLDyxrP+G67DHCFF9VnU 1l31FOXdd6ThChxfiH3F6QZ7KL0ncDd1pH+qvaoQo7KZBq6jEoiwplaq6qKP4xk= =zQj9 -END PGP SIGNATURE- ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fixing grep -D skip
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 01/18/13 08:39, John Baldwin wrote: On Thursday, January 17, 2013 9:33:53 pm David Xu wrote: I am trying to fix a bug in GNU grep, the bug is if you want to skip FIFO file, it will not work, for example: grep -D skip aaa . it will be stucked on a FIFO file. Here is the patch: http://people.freebsd.org/~davidxu/patch/grep.c.diff2 Is it fine to be committed ? I think the first part definitely looks fine. My guess is the non-blocking change is als probably fine, but that should be run by the bsdgrep person at least. I (disclaimer: not bsdgrep person) have just tested that bsdgrep handle this case just fine. The non-blocking part is required to make the code function, otherwise the system will block on open() if fifo don't have another opener. I'd say Yes for this p Cheers, - -- Xin LI delp...@delphij.nethttps://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die a -BEGIN PGP SIGNATURE- iQEcBAEBCgAGBQJQ+eW5AAoJEG80Jeu8UPuzaPkH/RPnvBg5pDPxmbSXWF3T22s3 XTPNfDns416g6dqig+E+YOhamu+Pz8xFC6JCu3DzPbNcb+OGRh14LBFeZQ6xn648 yxn1j0Y2ZsmjoppMAWg+wuwLtOYX0pK69zZzOxQMepBeA/rkA26hJA/3j6VTPu/X hLFP+bRy+wt8Ni39PuSrBywuPmwg82de+Fuf8WVVVwXgXHnK+yc/Pb1JWgiU6kzz r1tyCAh2rXcM4mg++LUoeYZZhrLuxWKKPrXkzGSbz7NSPXJccwf5rx/ZPB2EysVv Z/CA6wS2jqsOUbyelM01jtvrY6Q6llLIIEc3aGPcjYZbqy/B0VLwyGnR+rElKBo= =M7oI -END PGP SIGNATURE- ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
Scott writes: If I had my way, the WC would be off, everyone would be using SAS, and anyone who enabled SATA WC or complained about I/O slowness would be forced into Siberian salt mines for the remainder of their lives. Actually, If you are running SAS, having SATA WC on or off wouldn't matter, it would be SCSI's WC you'd care about. :-) The bigger problem is that FreeBSD does not support queuing on all controllers that support it. Not something that admins can fix, and inexcusable for an OS that claims to care about performance. You keep saying this, but I'm unclear on what you mean. Can you explain? For most applications you need the write cache to be off. Having the write cache off is fine as long as you have queuing. But with the write cache off, if you don't have queuing, performance sucks. Like getting only 6% of the performance you should be getting. Some of the early SATA controllers didn't have NCQ. Knowing that queuing was very important, I made sure to choose a mainboard with NCQ, giving up other useful features to get it. But FreeBSD does not support NCQ on the nforce4-ultra's SATA controllers. Even the sad joke of an OS Linux has had NCQ on nforce4 since Oct 2006. But Linux is such crap it is unusable. Linux is slowly improving, but I don't expect to live long enough to see it become usable. Seriously. I've tried it several times but I have completely given up on it. Anyway, even after all these years the supposedly performance oriented FreeBSD still does not support NCQ on nforce4, which isn't some obscure chip. they sold a lot them. I've added 3 additional SATA controllers on expansion cards, and FreeBSD supports NCQ on them, so the slow controllers limited by PCIe-x1 have much better write performance than the much faster controllers in the chipset with all the bandwidth they need. I can't add more controllers, there aren't any free slots. The nforce will remain in service for years, aside from the monetary cost, silicon has a huge amount of environmental cost: embedded energy, water, pollution, etc. And there are a lot of them. Wojciech writes: That is incorrect. A UPS reduces the risk, but does not eliminate it. nothing eliminate all risks. Turning the write cache off eliminates the risk of having the write cache on. Yes you can still lose data for other reasons. Backups are still a good idea. But for most applications, you must have the write cache off, and you need queuing (e.g. TCQ or NCQ) for performance. If you have queuing, there is no need to turn the write cache on. did you tested the above claim? i have SATA drives everywhere, all in ahci mode, all with NCQ active. Yes, turn the write cache off and NCQ will give you the performance. As long as you have queuing you can have the best of both worlds. Which is why Karim's problem is so odd. Driver says there is queuing, but performance (1 write per rev) looks exactly like there is no queuing. Maybe there is something else that causes only 1 write per rev but I don't know what that might be. Peter writes: Apart from continuous whinging and whining on mailing lists, what have you done to add support for queuing? Submitted PR, it was closed without being fixed. Looked at code, but Greek to me, even though I have successfully modified a BSD based device driver in the past giving major performance improvement. If I were a C-level exec of a Fortune 500 company I'd just hire some device driver wizard. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
On 18/01/2013 5:42 PM, Matthew Jacob wrote: This is all turning into a bikeshed discussion. As far as I can tell, the basic original question was why a *SAS* (not a SATA) drive was not performing as well as expected based upon experiences with Linux. I still don't know whether reads or writes were being used for dd. This morning, I ran a fio test with a single threaded read component and a multithreaded write component to see if there were differences. All I had connected to my MPT system were ATA drives (Seagate 500GBs) and I'm remote now and won't be back until Sunday to put one of my 'good' SAS drives (140 GB Seagates, i.e., real SAS 15K RPM drives, not fat SATA bs drives). The numbers were pretty much the same for both FreeBSD and Linux. In fact, FreeBSD was slightly faster. I won't report the exact numbers right now, but only mention this as a piece of information that at least in my case the differences between the OS platform involved is negligible. This would, at least in my case, rule out issues based upon different platform access methods and different drivers. All of this other discussion, about WCE and what not is nice, but for all intents and purposes it serves could be moved to *-advocacy. Thanks for the clarifications! I did mention at some point those were write speeds and reads were just fine and those were either writes to the filesystem or direct access (only on SAS again). Here is what I am planning to do next week when I get the chance: 0) I plan on focusing on the SAS driver tests _only_ since SATA is working as expected so nothing to report there. 1) Look carefully at how the drives are physically connected. Although it feels like if the SATA works fine the SAS should also but I'll check anyway. 2) Boot verbose with boot -v and send the dmesg output. mpt driver might give us a clue. 3) Run gstat -abc in a loop for the test duration. Although I would think ctlstat(8) might be more interesting here so I'll run it too for good measure :). Please note that in all tests write caching was enabled as I think this is the default with FBSD 9.1 GENERIC but I'll confirm this with camcontrol(8). I've also seen quite a lot of 'quirks' for tagged command queuing in the source code (/sys/cam/scsi/scps_xtp.c) but a particular one got my attention (thanks to whomever writes good comments in source code :) : /* * Slow when tagged queueing is enabled. Write performance * steadily drops off with more and more concurrent * transactions. Best sequential write performance with * tagged queueing turned off and write caching turned on. * * PR: kern/10398 * Submitted by: Hideaki Okada hok...@isl.melco.co.jp * Drive: DCAS-34330 w/ S65A firmware. * * The drive with the problem had the S65A firmware * revision, and has also been reported (by Stephen J. * Roznowski s...@home.net) for a drive with the S61A * firmware revision. * * Although no one has reported problems with the 2 gig * version of the DCAS drive, the assumption is that it * has the same problems as the 4 gig version. Therefore * this quirk entries disables tagged queueing for all * DCAS drives. */ { T_DIRECT, SIP_MEDIA_FIXED, IBM, DCAS*, * }, /*quirks*/0, /*mintags*/0, /*maxtags*/0 So I looked at the kern/10398 pr and got some feeling of 'deja vu' although the original problem was on FreeBSD 3.1 so its most likely not that but I though I would mention it. The issue described is awfully familiar. Basically the SAS drive (scsi back then) is slow on writes but fast on reads with dd. Could be a coincidence or a ghost from the past who knows... Cheers, Karim. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
Matthew writes: There is also no information in the original email as to which direction the I/O was being sent. In one of the followups, Karim reported: # dd if=/dev/zero of=foo count=10 bs=1024000 10+0 records in 10+0 records out 1024 bytes transferred in 19.615134 secs (522046 bytes/sec) 522 KB/s is pathetic. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Getting the current thread ID without a syscall?
On 1/15/13 4:03 PM, Trent Nelson wrote: On Tue, Jan 15, 2013 at 02:33:41PM -0800, Ian Lepore wrote: On Tue, 2013-01-15 at 14:29 -0800, Alfred Perlstein wrote: On 1/15/13 1:43 PM, Konstantin Belousov wrote: On Tue, Jan 15, 2013 at 04:35:14PM -0500, Trent Nelson wrote: Luckily it's for an open source project (Python), so recompilation isn't a big deal. (I also check the intrinsic result versus the syscall result during startup to verify the same ID is returned, falling back to the syscall by default.) For you, may be. For your users, it definitely will be a problem. And worse, the problem will be blamed on the operating system and not to the broken application. Anything we can do to avoid this would be best. The reason is that we are still dealing with an optimization that perl did, it reached inside of the opaque struct FILE to do nasty things. Now it is very difficult for us to fix struct FILE. We are still paying for this years later. Any way we can make this a supported interface? -Alfred Re-reading the original question, I've got to ask why pthread_self() isn't the right answer? The requirement wasn't I need to know what the OS calls me it was I need a unique ID per thread within a process. The identity check is performed hundreds of times per second. The overhead of (Py_MainThreadId == __readgsdword(0x48) ? A() : B()) is negligible -- I can't say the same for a system/function call. (I'm experimenting with an idea I had to parallelize Python such that it can exploit all cores without impeding the performance of normal single-threaded execution (like previous-GIL-removal attempts and STM). It's very promising so far -- presuming we can get the current thread ID in a couple of instructions. If not, single-threaded performance suffers too much.) TLS? Trent. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: IBM blade server abysmal disk write performances
On 18 January 2013 19:11, Dieter BSD dieter...@gmail.com wrote: Matthew writes: There is also no information in the original email as to which direction the I/O was being sent. In one of the followups, Karim reported: # dd if=/dev/zero of=foo count=10 bs=1024000 10+0 records in 10+0 records out 1024 bytes transferred in 19.615134 secs (522046 bytes/sec) 522 KB/s is pathetic. When this is running, use gstat and see exactly how many IOPS/sec there are and the average io size is. Yes, 522kbytes/sec is really pathetic, but there's a lot of potential reasons for that. adrian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org