Jason D. McCormick <jasonmc@...> writes: > > Hello all, > > Has anyone experienced issues with Red Hat EL 5.6 using kernels 2.6.18-238, 2.6.18-238.1.1 and > 2.6.18-238.5.1 booting in an ESX 3.5 virtual environment? We are running into a condition where VMs are > hanging during the initial kernel boot process. I'm unable to correlate these hangs to any particular > ESX-level event, the VMs are running on different ESX hosts and even different clusters. All of the issues > began with the upgrade to EL 5.6 and kernel 2.6.18-238.1.1.el5 and persists in 2.6.18-238.5.1.el5 (we > skipped -238.el5). This has affected more than 20 hosts at this point of all different configurations, > but always EL 5.6 VMs only. AS4 is not affected and we don't have any EL6 VMs yet. The issue is exactly the > same. During the initial kernel start, it gets as far as: > > PCI: Setting latency timer of device 0000:00:01.0 to 64 > NET: Registered protocol family 2 > IP route cache hash table entries: 32768 (order: 5, 131072 bytes) > TCP established hash table entries: 131072 (order: 8, 1048576 bytes) > TCP bind hash table entries: 65536 (order: 7, 524288 bytes) > TCP: Hash tables configured (established 131072 bind 65536) > TCP reno registered > Simple Boot Flag at 0x36 set to 0x80 > > The next line on all VMs that boot successfully is: > > Using TSC for driving interrupts > > However VMs that are hanging during boot never reach the "Using TSC..." line. This leads me to believe that > the problem is related to the OS electing to use TSC as the clocksouce and that is somehow an unstable > combination with ESX 3.5 and EL 5.6 VMs. However the issue is sporadic and I can't make this issue occur - > simply that when an EL5.6 VM fails to boot, they all fail in the same place in the same way. I've considered > moving back to clocksource=acpi_pm divider=10 as kernel flags that was recommended for EL 5.3 and > previously, but I'm hesitant to do that since TSC is clearly a better- performing timekeeper. > > On physical hosts, even ones that use TSC, I never see a "Using TSC for driving interrupts" kernel message so > the behavior is subtly different but I can't find anything in Google about this kernel message or event. > > Has anyone encountered this? Anyone able to shed light on the inner workings of TSC that might lead me to a > solution for this (or perhaps being able to intelligently file a Bugzilla)? > > Thanks. > > -- > Jason McCormick > Unix Team Lead, Systems Group, IT > Software Engineering Institute, Carnegie Mellon Univ. > E: jasonmc@... >
Jason, I am experiencing exactly the same symptons using ESXi 4.0 and Red hat EL 5.6. Did you find a solution? Thanks in advance for your response. Dave Klotzbach _______________________________________________ rhelv5-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/rhelv5-list
