Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
Ivan Voras wrote: Ivan Voras wrote: Another data point - the OS in the VM in question hanged today sometime after 5 AM in the following way: * console nonresponsive (also to ctrl-alt-del) * ssh login nonresponsive (timeout) * ping works (!) Judging by the last seen timestamp, the machine should have been in the process of receiving rsync backups - so IO-bound. It looks like something really could be fishy in this area, at least with VMWare. The same thing happened again, and I have an additional data point: I left 'top' running on the console and when I attached the VMWare console to the VM, the top was still running ok, but as soon as I hit a key on the keyboard, the OS console locked up. All other symptoms were as I enumerated above. A buildworld is enough to kill the console for me under QEMU on a slow Windows host. Never seen problems with a native install of FreeBSD, though. -- http://www.velocityvector.com/ | http://www.classic-games.com/ Any government that can give you everything you want can take everything you have. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
Ivan Voras wrote: Another data point - the OS in the VM in question hanged today sometime after 5 AM in the following way: * console nonresponsive (also to ctrl-alt-del) * ssh login nonresponsive (timeout) * ping works (!) Judging by the last seen timestamp, the machine should have been in the process of receiving rsync backups - so IO-bound. It looks like something really could be fishy in this area, at least with VMWare. The same thing happened again, and I have an additional data point: I left 'top' running on the console and when I attached the VMWare console to the VM, the top was still running ok, but as soon as I hit a key on the keyboard, the OS console locked up. All other symptoms were as I enumerated above. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
On Oct 12, 2009, at 10:45 AM, Thomas Backman wrote: Here's the original thread (not from the beginning, though): http://lists.freebsd.org/pipermail/freebsd-performance/2009-October/003843.html Long story short, my version: when the disk is stressed hard enough, console IO becomes COMPLETELY unbearable. 10+ seconds to switch between windows in screen(1), running (or even typing) simple commands, etc. This happens both via SSH and the serial console. How to reproduce/test: 1) time file /etc/* /dev/null a few times, or something similar that uses the disk; write down a common/average/median/whatever time. 2) cat /dev/zero /uncompressed_fs/filename # please make *sure* it's uncompressed, since ZFS with lzjb/gzip enabled will squish this into a kilobyte-sized file, thus creating virtually *no* IO. 3) When cat has been running say 10 seconds, re-time command #1 and do some interactive stuff - run commands, edit files, etc. Hi Thomas, I'm trying to reproduce the issue though I don't have any ZFS filesystems. I'm not using SSH neither serial console. My system is quite responsive. I'm using a VmWare system with 2 cpu support, 1MB RAM with FreeBSD 7.2-RELEASE. I don't know if the issue is related to ZFS or your hardware configuration but can you report what top(1) say during the slowdown? -- Giovanni Trematerra ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
Ivan Voras wrote: 2009/10/13 Larry Rosenman l...@lerctr.org: note huge packet loss. It looks like it's VM fault or something like it. It sounds like the VM is failing to execute the guest during certain types of I/O. A bit of scheduler tracing in the host OS probably wouldn't go amiss to confirm that the VM really is suspending the guest It's VMWare ESXi underneath, which is *Officially Not Linux* though some ducks may disagree - anyway, I suspect tracing the host in this way is next to impossible without some kind of diamondium-level contract. What information do you need? I have a platinum VMWare contract. What version of ESXi? Hi, It is ESXi 3.5 - but if the problem is really in ESXi I presume anyone could reproduce it. My setup is nothing special - Xeon 5405, 8 GB RAM, SATA drives on ICH9. As for what data is needed, it depends on what you can get - from this discussion thread it looks like it would be enough to verify that disk IO doesn't leave VM processes waiting (i.e. that disk IO doesn't interfere with CPU-bound or idle virtual machines). Though now when I think of it - doesn't Linux ATA driver poll IO in some funky way, expecting to get lower latency that way? Another data point - the OS in the VM in question hanged today sometime after 5 AM in the following way: * console nonresponsive (also to ctrl-alt-del) * ssh login nonresponsive (timeout) * ping works (!) Judging by the last seen timestamp, the machine should have been in the process of receiving rsync backups - so IO-bound. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
Steven Hartland wrote: We're not running 8 yet but we do have a 7.x box which its under fairly high IO load doing mrtg graphs which has similar behaviour. When typing a command on ssh it will freeze for may seconds. We have a FreeBSD 7.2 cacti box running on a dell 1950 that has the same problems. RRDs are disk i/o hogs, and when disk i/o shoots up, the box becomes non-responsive for a few seconds. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
Larry Rosenman wrote: On Tue, 13 Oct 2009, Ivan Voras wrote: As for what data is needed, it depends on what you can get - from this discussion thread it looks like it would be enough to verify that disk IO doesn't leave VM processes waiting (i.e. that disk IO doesn't interfere with CPU-bound or idle virtual machines). Though now when I think of it - doesn't Linux ATA driver poll IO in some funky way, expecting to get lower latency that way? Have you looked at the information available via the performance tab(s) in the client pointing at the ESXi server? The 20 second time resolution I get in performance charts isn't nearly enough to diagnose something like this. I've now been running iostat in the virtual console (not ssh) during disk IO and I can't seem to get the characteristic stutter / pause behaviour so I think that it's very likely there is too much going on in the VM software to accurately conclude anything. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
Ivan Voras wrote: 2009/10/13 Larry Rosenman l...@lerctr.org: note huge packet loss. It looks like it's VM fault or something like it. It sounds like the VM is failing to execute the guest during certain types of I/O. A bit of scheduler tracing in the host OS probably wouldn't go amiss to confirm that the VM really is suspending the guest It's VMWare ESXi underneath, which is *Officially Not Linux* though some ducks may disagree - anyway, I suspect tracing the host in this way is next to impossible without some kind of diamondium-level contract. What information do you need? I have a platinum VMWare contract. What version of ESXi? Hi, It is ESXi 3.5 - but if the problem is really in ESXi I presume anyone could reproduce it. My setup is nothing special - Xeon 5405, 8 GB RAM, SATA drives on ICH9. I recall others having various weird problems in 3.5 that went away when they upgraded to 4.0. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
Kris Kennaway wrote: Ivan Voras wrote: I recall others having various weird problems in 3.5 that went away when they upgraded to 4.0. It would be a good idea except that apparently my installation is unupgradeable because of unsupported boot disk (a SCSI RAID volume). ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
On Tue, Oct 13, 2009 at 09:58:09AM +0200, Thomas Backman wrote: On Oct 13, 2009, at 12:35 AM, Luigi Rizzo wrote: ... hi, this issue (not specific to FreeBSD, and not new -- it has been like this forever) is discussed in some detail here http://www.bsdcan.org/2009/schedule/events/122.en.html The following code (a bit outdated) can help http://lists.freebsd.org/pipermail/freebsd-stable/2009-March/048704.html cheers luigi Hmm, how stable would you say the code is? (And/or has there been any progress since March?) I'd prefer something that I feel confident in using in production, and the warning in the README clearly says stay away!. Maybe not for production but if you want to use it on a test box to see if it helps then it is definitely reliable (as long as you don't exercise too much the plugging and unplugging schedulers on a mounted filesystems). I have been using the code for perhaps a couple of months on my desktop machine and no data loss, the only reason it is not loaded now is that it is not loaded by default at boot time. cheers luigi ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
On Oct 13, 2009, at 12:35 AM, Luigi Rizzo wrote: On Mon, Oct 12, 2009 at 09:48:42PM +0200, Thomas Backman wrote: I'm copying this over from the freebsd-performance list, as I'm looking for a few more opinions - not on the problems *I* am having, but rather to check whether the problem is universal or not, and if not, find a possible common factor. In other words: I want to hear about your experiences, *good or bad*! Here's the original thread (not from the beginning, though): http://lists.freebsd.org/pipermail/freebsd-performance/2009-October/003843.html Long story short, my version: when the disk is stressed hard enough, console IO becomes COMPLETELY unbearable. 10+ seconds to switch between windows in screen(1), running (or even typing) simple commands, etc. This happens both via SSH and the serial console. hi, this issue (not specific to FreeBSD, and not new -- it has been like this forever) is discussed in some detail here http://www.bsdcan.org/2009/schedule/events/122.en.html The following code (a bit outdated) can help http://lists.freebsd.org/pipermail/freebsd-stable/2009-March/048704.html cheers luigi Hmm, how stable would you say the code is? (And/or has there been any progress since March?) I'd prefer something that I feel confident in using in production, and the warning in the README clearly says stay away!. Regards, Thomas ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
Luigi Rizzo wrote: On Mon, Oct 12, 2009 at 09:48:42PM +0200, Thomas Backman wrote: I'm copying this over from the freebsd-performance list, as I'm looking for a few more opinions - not on the problems *I* am having, but rather to check whether the problem is universal or not, and if not, find a possible common factor. In other words: I want to hear about your experiences, *good or bad*! Here's the original thread (not from the beginning, though): http://lists.freebsd.org/pipermail/freebsd-performance/2009-October/003843.html Long story short, my version: when the disk is stressed hard enough, console IO becomes COMPLETELY unbearable. 10+ seconds to switch between windows in screen(1), running (or even typing) simple commands, etc. This happens both via SSH and the serial console. hi, this issue (not specific to FreeBSD, and not new -- it has been like this forever) is discussed in some detail here http://www.bsdcan.org/2009/schedule/events/122.en.html The following code (a bit outdated) can help http://lists.freebsd.org/pipermail/freebsd-stable/2009-March/048704.html Are you certain? The reported symptoms sound very unusual. Can you reproduce the problem with the provided instructions yourself? Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
On Tue, Oct 13, 2009 at 11:13:31AM +0100, Kris Kennaway wrote: Luigi Rizzo wrote: On Mon, Oct 12, 2009 at 09:48:42PM +0200, Thomas Backman wrote: I'm copying this over from the freebsd-performance list, as I'm looking for a few more opinions - not on the problems *I* am having, but rather to check whether the problem is universal or not, and if not, find a possible common factor. In other words: I want to hear about your experiences, *good or bad*! Here's the original thread (not from the beginning, though): http://lists.freebsd.org/pipermail/freebsd-performance/2009-October/003843.html Long story short, my version: when the disk is stressed hard enough, console IO becomes COMPLETELY unbearable. 10+ seconds to switch between windows in screen(1), running (or even typing) simple commands, etc. This happens both via SSH and the serial console. hi, this issue (not specific to FreeBSD, and not new -- it has been like this forever) is discussed in some detail here http://www.bsdcan.org/2009/schedule/events/122.en.html The following code (a bit outdated) can help http://lists.freebsd.org/pipermail/freebsd-stable/2009-March/048704.html Are you certain? The reported symptoms sound very unusual. Can you reproduce the problem with the provided instructions yourself? sure -- with ATA/SATA disks, the test in the original post is enough to trigger cwpart of the problem time file /etc/*# this one is fast cat /dev/zero /same_disk_as_etc/somedir/somefile sleep enough_to_flush_disk_cache # 10-30 sec time file /etc/*# this one takes forever Now, getting sluggish behaviour on the console might need something more such as dependencies between the program being run and disk activity (logging, etc.) but the 'capture effect' on the disk is completely reproducible. Perhaps SCSI and various RAID incarnations may have a better behaviour or need a larger number of greedy clients to trigger the problem. cheers luigi ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
Thomas Backman wrote: I'm copying this over from the freebsd-performance list, as I'm looking for a few more opinions - not on the problems *I* am having, but rather to check whether the problem is universal or not, and if not, find a possible common factor. In other words: I want to hear about your experiences, *good or bad*! Here's the original thread (not from the beginning, though): http://lists.freebsd.org/pipermail/freebsd-performance/2009-October/003843.html Long story short, my version: when the disk is stressed hard enough, console IO becomes COMPLETELY unbearable. 10+ seconds to switch between windows in screen(1), running (or even typing) simple commands, etc. This happens both via SSH and the serial console. Hmm, this looks familiar - I've noticed it before on the physical (VGA) console and I notice it all the time under VMWare. It sort of looks like disk IO really blocks network IO in this case - I use the VMs over ssh. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ivan Voras wrote: Thomas Backman wrote: I'm copying this over from the freebsd-performance list, as I'm looking for a few more opinions - not on the problems *I* am having, but rather to check whether the problem is universal or not, and if not, find a possible common factor. In other words: I want to hear about your experiences, *good or bad*! Here's the original thread (not from the beginning, though): http://lists.freebsd.org/pipermail/freebsd-performance/2009-October/003843.html Long story short, my version: when the disk is stressed hard enough, console IO becomes COMPLETELY unbearable. 10+ seconds to switch between windows in screen(1), running (or even typing) simple commands, etc. This happens both via SSH and the serial console. Hmm, this looks familiar - I've noticed it before on the physical (VGA) console and I notice it all the time under VMWare. It sort of looks like disk IO really blocks network IO in this case - I use the VMs over ssh. I've seen some similar behaviour here with my zfs/istgt backend. I resolved the issue by reducing the arc_max so that each disk-io cycle would be short enough not to cause the istgt to generate timeouts towards my two vmware hosts. The io-backend for the box is a megaraid 8308 (mfi) card with 8 disks on it. This box is running RELENG_7 now (I tried running it with RELENG_8 but it had a nasty tendency of simply locking up, dragging down both my ESXi hosts with it). //Svein - -- - +---+--- /\ |Svein Skogen | sv...@d80.iso100.no \ / |Solberg Østli 9| PGP Key: 0xE5E76831 X|2020 Skedsmokorset | sv...@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | sv...@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listm...@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +---+--- |msn messenger: | Mobile Phone: +47 907 03 575 |sv...@jernhuset.no | RIPE handle:SS16503-RIPE - +---+--- If you really are in a hurry, mail me at svein-mob...@stillbilde.net This mailbox goes directly to my cellphone and is checked even when I'm not in front of my computer. - Picture Gallery: https://gallery.stillbilde.net/v/svein/ - -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkrUdB4ACgkQODUnwSLUlKQjtwCcDWA7BqDdwQ6w8zo0shNJDpJW shkAoKK1hN5QVrmg59J4lGV3V45ooiPj =nUay -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
On Tue, 13 Oct 2009, Ivan Voras wrote: Thomas Backman wrote: I'm copying this over from the freebsd-performance list, as I'm looking for a few more opinions - not on the problems *I* am having, but rather to check whether the problem is universal or not, and if not, find a possible common factor. In other words: I want to hear about your experiences, *good or bad*! Here's the original thread (not from the beginning, though): http://lists.freebsd.org/pipermail/freebsd-performance/2009-October/003843.html Long story short, my version: when the disk is stressed hard enough, console IO becomes COMPLETELY unbearable. 10+ seconds to switch between windows in screen(1), running (or even typing) simple commands, etc. This happens both via SSH and the serial console. Hmm, this looks familiar - I've noticed it before on the physical (VGA) console and I notice it all the time under VMWare. It sort of looks like disk IO really blocks network IO in this case - I use the VMs over ssh. Real hardware and virtual hardware have vastly different performance properties, so I'd be careful not to assume that the issue described by the original reporter and the issue you're experiencing are the same. In our kernel, low level network protocols will essentially always take precedence over disk I/O activity. So on face value disk IO really blocks network IO is highly unlikely. There are two much more likely possibilities: (1) poor VM implementation causes the virtual CPU to be suspended behind synchronous host OS I/O or (2) the network stack is running fine but the interactive user application is getting I/O or locks scheduled behind a bulk process. A useful diagnostic here is to compare the behavior of three kinds of network latency tests: (1) ping from the host OS to the guest OS (2) netperf TCP_RR from the host OS to the guest OS (3) ssh interactive latency If (1) is highly variable during I/O, it's almost certainly a property of the VM technology you're using, and there's nought to be done about it in the guest OS. If (2) but not (1) is highly variable, it may well be a scheduling issue, although under high memory pressure you couldn't rule out paging out of netserver pages/etc causing latency. If (3) but not (1) or (2) is highly variable, it's most likely an I/O scheduling issue, perhaps caused by priority inversion on lockmgr locks on a vnode, disk I/O scheduling leading to starvation, etc. Robert ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
2009/10/13 Robert Watson rwat...@freebsd.org: On Tue, 13 Oct 2009, Ivan Voras wrote: Thomas Backman wrote: I'm copying this over from the freebsd-performance list, as I'm looking for a few more opinions - not on the problems *I* am having, but rather to check whether the problem is universal or not, and if not, find a possible common factor. In other words: I want to hear about your experiences, *good or bad*! Here's the original thread (not from the beginning, though): http://lists.freebsd.org/pipermail/freebsd-performance/2009-October/003843.html Long story short, my version: when the disk is stressed hard enough, console IO becomes COMPLETELY unbearable. 10+ seconds to switch between windows in screen(1), running (or even typing) simple commands, etc. This happens both via SSH and the serial console. Hmm, this looks familiar - I've noticed it before on the physical (VGA) console and I notice it all the time under VMWare. It sort of looks like disk IO really blocks network IO in this case - I use the VMs over ssh. Real hardware and virtual hardware have vastly different performance properties, so I'd be careful not to assume that the issue described by the original reporter and the issue you're experiencing are the same. In our kernel, low level network protocols will essentially always take precedence over disk I/O activity. So on face value disk IO really blocks network IO is highly unlikely. Yes, I agree for both reasons and that is why I wasn't complaining until encountering this thread. There are two much more likely possibilities: (1) poor VM implementation causes the virtual CPU to be suspended behind synchronous host OS I/O or (2) the network stack is running fine but the interactive user application is getting I/O or locks scheduled behind a bulk process. A useful diagnostic here is to compare the behavior of three kinds of network latency tests: (1) ping from the host OS to the guest OS (2) netperf TCP_RR from the host OS to the guest OS (3) ssh interactive latency If (1) is highly variable during I/O, it's almost certainly a property of the VM technology you're using, and there's nought to be done about it in the guest OS. Here's an example of a ping session with 0.1s resolution during a few seconds-stall in ssh: 64 bytes from 161.53.72.188: icmp_seq=1576 ttl=64 time=0.383 ms 64 bytes from 161.53.72.188: icmp_seq=1577 ttl=64 time=0.405 ms 64 bytes from 161.53.72.188: icmp_seq=1578 ttl=64 time=0.360 ms 64 bytes from 161.53.72.188: icmp_seq=2304 ttl=64 time=4.194 ms 64 bytes from 161.53.72.188: icmp_seq=2305 ttl=64 time=0.454 ms 64 bytes from 161.53.72.188: icmp_seq=2306 ttl=64 time=0.376 ms note huge packet loss. It looks like it's VM fault or something like it. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
On 13 Oct 2009, at 14:33, Ivan Voras wrote: If (1) is highly variable during I/O, it's almost certainly a property of the VM technology you're using, and there's nought to be done about it in the guest OS. Here's an example of a ping session with 0.1s resolution during a few seconds-stall in ssh: 64 bytes from 161.53.72.188: icmp_seq=1576 ttl=64 time=0.383 ms 64 bytes from 161.53.72.188: icmp_seq=1577 ttl=64 time=0.405 ms 64 bytes from 161.53.72.188: icmp_seq=1578 ttl=64 time=0.360 ms 64 bytes from 161.53.72.188: icmp_seq=2304 ttl=64 time=4.194 ms 64 bytes from 161.53.72.188: icmp_seq=2305 ttl=64 time=0.454 ms 64 bytes from 161.53.72.188: icmp_seq=2306 ttl=64 time=0.376 ms note huge packet loss. It looks like it's VM fault or something like it. It sounds like the VM is failing to execute the guest during certain types of I/O. A bit of scheduler tracing in the host OS probably wouldn't go amiss to confirm that the VM really is suspending the guest at about the same time ICMP latency goes up. However, given the above I think I you can reasonable assume that the 4ms jump you're seeing there is due to global host OS/VM scheduling, and not FreeBSD scheduling. Robert ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
Robert N. M. Watson wrote: On 13 Oct 2009, at 14:33, Ivan Voras wrote: If (1) is highly variable during I/O, it's almost certainly a property of the VM technology you're using, and there's nought to be done about it in the guest OS. Here's an example of a ping session with 0.1s resolution during a few seconds-stall in ssh: 64 bytes from 161.53.72.188: icmp_seq=1576 ttl=64 time=0.383 ms 64 bytes from 161.53.72.188: icmp_seq=1577 ttl=64 time=0.405 ms 64 bytes from 161.53.72.188: icmp_seq=1578 ttl=64 time=0.360 ms 64 bytes from 161.53.72.188: icmp_seq=2304 ttl=64 time=4.194 ms 64 bytes from 161.53.72.188: icmp_seq=2305 ttl=64 time=0.454 ms 64 bytes from 161.53.72.188: icmp_seq=2306 ttl=64 time=0.376 ms note huge packet loss. It looks like it's VM fault or something like it. It sounds like the VM is failing to execute the guest during certain types of I/O. A bit of scheduler tracing in the host OS probably wouldn't go amiss to confirm that the VM really is suspending the guest at about the same time ICMP latency goes up. However, given the above I think I you can reasonable assume that the 4ms jump you're seeing there is due to global host OS/VM scheduling, and not FreeBSD scheduling. Btw. it's not a 4 ms jump - there are 726 lost packets in between. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
Robert N. M. Watson wrote: On 13 Oct 2009, at 14:33, Ivan Voras wrote: If (1) is highly variable during I/O, it's almost certainly a property of the VM technology you're using, and there's nought to be done about it in the guest OS. Here's an example of a ping session with 0.1s resolution during a few seconds-stall in ssh: 64 bytes from 161.53.72.188: icmp_seq=1576 ttl=64 time=0.383 ms 64 bytes from 161.53.72.188: icmp_seq=1577 ttl=64 time=0.405 ms 64 bytes from 161.53.72.188: icmp_seq=1578 ttl=64 time=0.360 ms 64 bytes from 161.53.72.188: icmp_seq=2304 ttl=64 time=4.194 ms 64 bytes from 161.53.72.188: icmp_seq=2305 ttl=64 time=0.454 ms 64 bytes from 161.53.72.188: icmp_seq=2306 ttl=64 time=0.376 ms note huge packet loss. It looks like it's VM fault or something like it. It sounds like the VM is failing to execute the guest during certain types of I/O. A bit of scheduler tracing in the host OS probably wouldn't go amiss to confirm that the VM really is suspending the guest It's VMWare ESXi underneath, which is *Officially Not Linux* though some ducks may disagree - anyway, I suspect tracing the host in this way is next to impossible without some kind of diamondium-level contract. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
On Tue, 13 Oct 2009, Ivan Voras wrote: Robert N. M. Watson wrote: On 13 Oct 2009, at 14:33, Ivan Voras wrote: If (1) is highly variable during I/O, it's almost certainly a property of the VM technology you're using, and there's nought to be done about it in the guest OS. Here's an example of a ping session with 0.1s resolution during a few seconds-stall in ssh: 64 bytes from 161.53.72.188: icmp_seq=1576 ttl=64 time=0.383 ms 64 bytes from 161.53.72.188: icmp_seq=1577 ttl=64 time=0.405 ms 64 bytes from 161.53.72.188: icmp_seq=1578 ttl=64 time=0.360 ms 64 bytes from 161.53.72.188: icmp_seq=2304 ttl=64 time=4.194 ms 64 bytes from 161.53.72.188: icmp_seq=2305 ttl=64 time=0.454 ms 64 bytes from 161.53.72.188: icmp_seq=2306 ttl=64 time=0.376 ms note huge packet loss. It looks like it's VM fault or something like it. It sounds like the VM is failing to execute the guest during certain types of I/O. A bit of scheduler tracing in the host OS probably wouldn't go amiss to confirm that the VM really is suspending the guest It's VMWare ESXi underneath, which is *Officially Not Linux* though some ducks may disagree - anyway, I suspect tracing the host in this way is next to impossible without some kind of diamondium-level contract. What information do you need? I have a platinum VMWare contract. What version of ESXi? LER ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: l...@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
2009/10/13 Larry Rosenman l...@lerctr.org: note huge packet loss. It looks like it's VM fault or something like it. It sounds like the VM is failing to execute the guest during certain types of I/O. A bit of scheduler tracing in the host OS probably wouldn't go amiss to confirm that the VM really is suspending the guest It's VMWare ESXi underneath, which is *Officially Not Linux* though some ducks may disagree - anyway, I suspect tracing the host in this way is next to impossible without some kind of diamondium-level contract. What information do you need? I have a platinum VMWare contract. What version of ESXi? Hi, It is ESXi 3.5 - but if the problem is really in ESXi I presume anyone could reproduce it. My setup is nothing special - Xeon 5405, 8 GB RAM, SATA drives on ICH9. As for what data is needed, it depends on what you can get - from this discussion thread it looks like it would be enough to verify that disk IO doesn't leave VM processes waiting (i.e. that disk IO doesn't interfere with CPU-bound or idle virtual machines). Though now when I think of it - doesn't Linux ATA driver poll IO in some funky way, expecting to get lower latency that way? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
On Tue, 13 Oct 2009, Ivan Voras wrote: 2009/10/13 Larry Rosenman l...@lerctr.org: note huge packet loss. It looks like it's VM fault or something like it. It sounds like the VM is failing to execute the guest during certain types of I/O. A bit of scheduler tracing in the host OS probably wouldn't go amiss to confirm that the VM really is suspending the guest It's VMWare ESXi underneath, which is *Officially Not Linux* though some ducks may disagree - anyway, I suspect tracing the host in this way is next to impossible without some kind of diamondium-level contract. What information do you need? I have a platinum VMWare contract. What version of ESXi? Hi, It is ESXi 3.5 - but if the problem is really in ESXi I presume anyone could reproduce it. My setup is nothing special - Xeon 5405, 8 GB RAM, SATA drives on ICH9. As for what data is needed, it depends on what you can get - from this discussion thread it looks like it would be enough to verify that disk IO doesn't leave VM processes waiting (i.e. that disk IO doesn't interfere with CPU-bound or idle virtual machines). Though now when I think of it - doesn't Linux ATA driver poll IO in some funky way, expecting to get lower latency that way? Have you looked at the information available via the performance tab(s) in the client pointing at the ESXi server? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: l...@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
I'm copying this over from the freebsd-performance list, as I'm looking for a few more opinions - not on the problems *I* am having, but rather to check whether the problem is universal or not, and if not, find a possible common factor. In other words: I want to hear about your experiences, *good or bad*! Here's the original thread (not from the beginning, though): http://lists.freebsd.org/pipermail/freebsd-performance/2009-October/003843.html Long story short, my version: when the disk is stressed hard enough, console IO becomes COMPLETELY unbearable. 10+ seconds to switch between windows in screen(1), running (or even typing) simple commands, etc. This happens both via SSH and the serial console. How to reproduce/test: 1) time file /etc/* /dev/null a few times, or something similar that uses the disk; write down a common/average/median/whatever time. 2) cat /dev/zero /uncompressed_fs/filename # please make *sure* it's uncompressed, since ZFS with lzjb/gzip enabled will squish this into a kilobyte-sized file, thus creating virtually *no* IO. 3) When cat has been running say 10 seconds, re-time command #1 and do some interactive stuff - run commands, edit files, etc. I couldn't actually reproduce the *completely* horrific increase in latency I posted about below just now (I did update my sources and rebuild, but I'm pretty sure the delta between ~Sep 29 and Oct 6 had no major IO changes in 8-STABLE), and the time file /etc/* test only jumped by about 3x (compared to 20-60x+ previously), but it's still bad enough: commands such as ls and w take 2-3 seconds to run, as opposed to 0.005s for ls without the added IO... On Linux, the increase in latency is closer to 4%. A bit better than, oh, 400 times. ;) Oh, and again: this post is not a complaint; this is a post asking for your experiences. I know I'm not alone in having these issues - I just want to know if there are a lot of people that *don't* too, and what could cause them. I can't possibly switch to FreeBSD in production with this behaviour - and I've been looking forward to doing so for quite a while now. Regards, Thomas PS. I'll leave my post to the original discussion below. (I don't usually top post, but I don't consider this a reply, more of a new post with an addition below.) On Oct 5, 2009, at 10:45 AM, Thomas Backman wrote: Hey everyone, I'm having serious trouble with the same thing, and just found this thread while looking for the correct place to post. Looks like I found it. (I wrote most of the post before finding the thread, so some of it will seem a bit odd.) I run 8.0-RC1/amd64 with ZFS on an A64 3200+ with 2GB RAM and an old 80GB 7200rpm disk. My problem is that I get completely unacceptable latency on console IO (both via SSH and serial console) when the system is performing disk IO. The worst case I've noticed yet was when I tried copying a core dump from a lzjb compressed ZFS file system to a gzip-9 compressed one, to compare the file size/compression ratio. screen (1) took at LEAST ten seconds - probably a bit more - I'm not exaggerating here - to switch to another window, and an ls in an empty directory also about 5-10 seconds. Doing some silly CPU load with two instances of yes /dev/null (on a single core, remember) doesn't change anything, the system remains very responsive. cat /dev/zero /uncompressed_fs/... however produces the extreme slowdown. (On a gzip-1 FS it doesn't, since the file ends up extremely small - a kilobyte or so - even after a while, thus performing minimal IO). I'm thinking about switching to FreeBSD on my beefier production system (dual-core amd64, 4GB RAM, 4x1TB disks, compared to this one, single-core, 2GB RAM, 80GB disk), but unless I feel assured this won't happen there as well, I'm not so sure anymore. I can do any kind of heavy IO/compilation/whatever on that box, currently running Linux, and it's always unnoticable. In this case it's impossible *not* to notice that your key input is lagging behind 5-10 seconds... I thought multiple times that the box must have panicked. I do realize that the hardware isn't the best, especially the disks, but this is far worse than it should be! Here's some of the testing done in this thread (or at least something like it): [r...@chaos ~]# time file /etc/* /dev/null real0m1.725s user0m0.993s sys 0m0.021s [r...@chaos ~]# time file /etc/* /dev/null real0m1.008s user0m0.990s sys 0m0.015s [r...@chaos ~]# time file /etc/* /dev/null real0m1.008s user0m0.967s sys 0m0.038s [r...@chaos ~]# time file /etc/* /dev/null real0m1.015s user0m0.998s sys 0m0.008s So, pretty much exactly 1 second every time once the cache is warmed up. Now, let's try it 10 seconds after starting heavy disk writing... [r...@chaos ~]# cat /dev/zero /DELETE_ME (wait for 10 seconds) [r...@chaos ~]# time file /etc/* /dev/null
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
We're not running 8 yet but we do have a 7.x box which its under fairly high IO load doing mrtg graphs which has similar behaviour. When typing a command on ssh it will freeze for may seconds. I even went to far as to write a little C app which just prints out the time to screen and even that sees the big delay. Its always been like and I've never managed to get to the bottom of it, there's something in the IO / disk subsystem which can totally lock up the system under high IO load. Regards Steve - Original Message - From: Thomas Backman seren...@exscape.org To: freebsd-stable freebsd-stable@freebsd.org Sent: Monday, October 12, 2009 8:48 PM Subject: Extreme console latency during disk IO (8.0-RC1,previous releases also affected according to others) I'm copying this over from the freebsd-performance list, as I'm looking for a few more opinions - not on the problems *I* am having, but rather to check whether the problem is universal or not, and if not, find a possible common factor. In other words: I want to hear about your experiences, *good or bad*! Here's the original thread (not from the beginning, though): http://lists.freebsd.org/pipermail/freebsd-performance/2009-October/003843.html Long story short, my version: when the disk is stressed hard enough, console IO becomes COMPLETELY unbearable. 10+ seconds to switch between windows in screen(1), running (or even typing) simple commands, etc. This happens both via SSH and the serial console. How to reproduce/test: 1) time file /etc/* /dev/null a few times, or something similar that uses the disk; write down a common/average/median/whatever time. 2) cat /dev/zero /uncompressed_fs/filename # please make *sure* it's uncompressed, since ZFS with lzjb/gzip enabled will squish this into a kilobyte-sized file, thus creating virtually *no* IO. 3) When cat has been running say 10 seconds, re-time command #1 and do some interactive stuff - run commands, edit files, etc. I couldn't actually reproduce the *completely* horrific increase in latency I posted about below just now (I did update my sources and rebuild, but I'm pretty sure the delta between ~Sep 29 and Oct 6 had no major IO changes in 8-STABLE), and the time file /etc/* test only jumped by about 3x (compared to 20-60x+ previously), but it's still bad enough: commands such as ls and w take 2-3 seconds to run, as opposed to 0.005s for ls without the added IO... On Linux, the increase in latency is closer to 4%. A bit better than, oh, 400 times. ;) Oh, and again: this post is not a complaint; this is a post asking for your experiences. I know I'm not alone in having these issues - I just want to know if there are a lot of people that *don't* too, and what could cause them. I can't possibly switch to FreeBSD in production with this behaviour - and I've been looking forward to doing so for quite a while now. Regards, Thomas PS. I'll leave my post to the original discussion below. (I don't usually top post, but I don't consider this a reply, more of a new post with an addition below.) On Oct 5, 2009, at 10:45 AM, Thomas Backman wrote: Hey everyone, I'm having serious trouble with the same thing, and just found this thread while looking for the correct place to post. Looks like I found it. (I wrote most of the post before finding the thread, so some of it will seem a bit odd.) I run 8.0-RC1/amd64 with ZFS on an A64 3200+ with 2GB RAM and an old 80GB 7200rpm disk. My problem is that I get completely unacceptable latency on console IO (both via SSH and serial console) when the system is performing disk IO. The worst case I've noticed yet was when I tried copying a core dump from a lzjb compressed ZFS file system to a gzip-9 compressed one, to compare the file size/compression ratio. screen (1) took at LEAST ten seconds - probably a bit more - I'm not exaggerating here - to switch to another window, and an ls in an empty directory also about 5-10 seconds. Doing some silly CPU load with two instances of yes /dev/null (on a single core, remember) doesn't change anything, the system remains very responsive. cat /dev/zero /uncompressed_fs/... however produces the extreme slowdown. (On a gzip-1 FS it doesn't, since the file ends up extremely small - a kilobyte or so - even after a while, thus performing minimal IO). I'm thinking about switching to FreeBSD on my beefier production system (dual-core amd64, 4GB RAM, 4x1TB disks, compared to this one, single-core, 2GB RAM, 80GB disk), but unless I feel assured this won't happen there as well, I'm not so sure anymore. I can do any kind of heavy IO/compilation/whatever on that box, currently running Linux, and it's always unnoticable. In this case it's impossible *not* to notice that your key input is lagging behind 5-10 seconds... I thought multiple times that the box must have panicked. I do realize that the hardware isn't the best, especially the disks
Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others)
On Mon, Oct 12, 2009 at 09:48:42PM +0200, Thomas Backman wrote: I'm copying this over from the freebsd-performance list, as I'm looking for a few more opinions - not on the problems *I* am having, but rather to check whether the problem is universal or not, and if not, find a possible common factor. In other words: I want to hear about your experiences, *good or bad*! Here's the original thread (not from the beginning, though): http://lists.freebsd.org/pipermail/freebsd-performance/2009-October/003843.html Long story short, my version: when the disk is stressed hard enough, console IO becomes COMPLETELY unbearable. 10+ seconds to switch between windows in screen(1), running (or even typing) simple commands, etc. This happens both via SSH and the serial console. hi, this issue (not specific to FreeBSD, and not new -- it has been like this forever) is discussed in some detail here http://www.bsdcan.org/2009/schedule/events/122.en.html The following code (a bit outdated) can help http://lists.freebsd.org/pipermail/freebsd-stable/2009-March/048704.html cheers luigi How to reproduce/test: 1) time file /etc/* /dev/null a few times, or something similar that uses the disk; write down a common/average/median/whatever time. 2) cat /dev/zero /uncompressed_fs/filename # please make *sure* it's uncompressed, since ZFS with lzjb/gzip enabled will squish this into a kilobyte-sized file, thus creating virtually *no* IO. 3) When cat has been running say 10 seconds, re-time command #1 and do some interactive stuff - run commands, edit files, etc. I couldn't actually reproduce the *completely* horrific increase in latency I posted about below just now (I did update my sources and rebuild, but I'm pretty sure the delta between ~Sep 29 and Oct 6 had no major IO changes in 8-STABLE), and the time file /etc/* test only jumped by about 3x (compared to 20-60x+ previously), but it's still bad enough: commands such as ls and w take 2-3 seconds to run, as opposed to 0.005s for ls without the added IO... On Linux, the increase in latency is closer to 4%. A bit better than, oh, 400 times. ;) Oh, and again: this post is not a complaint; this is a post asking for your experiences. I know I'm not alone in having these issues - I just want to know if there are a lot of people that *don't* too, and what could cause them. I can't possibly switch to FreeBSD in production with this behaviour - and I've been looking forward to doing so for quite a while now. Regards, Thomas PS. I'll leave my post to the original discussion below. (I don't usually top post, but I don't consider this a reply, more of a new post with an addition below.) On Oct 5, 2009, at 10:45 AM, Thomas Backman wrote: Hey everyone, I'm having serious trouble with the same thing, and just found this thread while looking for the correct place to post. Looks like I found it. (I wrote most of the post before finding the thread, so some of it will seem a bit odd.) I run 8.0-RC1/amd64 with ZFS on an A64 3200+ with 2GB RAM and an old 80GB 7200rpm disk. My problem is that I get completely unacceptable latency on console IO (both via SSH and serial console) when the system is performing disk IO. The worst case I've noticed yet was when I tried copying a core dump from a lzjb compressed ZFS file system to a gzip-9 compressed one, to compare the file size/compression ratio. screen (1) took at LEAST ten seconds - probably a bit more - I'm not exaggerating here - to switch to another window, and an ls in an empty directory also about 5-10 seconds. Doing some silly CPU load with two instances of yes /dev/null (on a single core, remember) doesn't change anything, the system remains very responsive. cat /dev/zero /uncompressed_fs/... however produces the extreme slowdown. (On a gzip-1 FS it doesn't, since the file ends up extremely small - a kilobyte or so - even after a while, thus performing minimal IO). I'm thinking about switching to FreeBSD on my beefier production system (dual-core amd64, 4GB RAM, 4x1TB disks, compared to this one, single-core, 2GB RAM, 80GB disk), but unless I feel assured this won't happen there as well, I'm not so sure anymore. I can do any kind of heavy IO/compilation/whatever on that box, currently running Linux, and it's always unnoticable. In this case it's impossible *not* to notice that your key input is lagging behind 5-10 seconds... I thought multiple times that the box must have panicked. I do realize that the hardware isn't the best, especially the disks, but this is far worse than it should be! Here's some of the testing done in this thread (or at least something like it): [r...@chaos ~]# time file /etc/* /dev/null real0m1.725s user0m0.993s sys 0m0.021s [r...@chaos ~]# time file /etc/* /dev/null real0m1.008s user