Jason D. McCormick <jasonmc@...> writes:

> 
> Hello all,
> 
> Has anyone experienced issues with Red Hat EL 5.6 using kernels 2.6.18-238, 
2.6.18-238.1.1 and
> 2.6.18-238.5.1 booting in an ESX 3.5 virtual environment?  We are running 
into a condition where VMs are
> hanging during the initial kernel boot process.  I'm unable to correlate 
these hangs to any particular
> ESX-level event, the VMs are running on different ESX hosts and even 
different clusters.  All of the issues
> began with the upgrade to EL 5.6 and kernel 2.6.18-238.1.1.el5 and persists 
in 2.6.18-238.5.1.el5 (we
> skipped -238.el5).  This has affected more than 20 hosts at this point of 
all different configurations,
> but always EL 5.6 VMs only.  AS4 is not affected and we don't have any EL6 
VMs yet.  The issue is exactly the
> same.  During the initial kernel start, it gets as far as:
> 
>   PCI: Setting latency timer of device 0000:00:01.0 to 64
>   NET: Registered protocol family 2
>   IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
>   TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
>   TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
>   TCP: Hash tables configured (established 131072 bind 65536)
>   TCP reno registered
>   Simple Boot Flag at 0x36 set to 0x80
> 
> The next line on all VMs that boot successfully is:
> 
>   Using TSC for driving interrupts
> 
> However VMs that are hanging during boot never reach the "Using TSC..." 
line.  This leads me to believe that
> the problem is related to the OS electing to use TSC as the clocksouce and 
that is somehow an unstable
> combination with ESX 3.5 and EL 5.6 VMs.  However the issue is sporadic and 
I can't make this issue occur -
> simply that when an EL5.6 VM fails to boot, they all fail in the same place 
in the same way.  I've considered
> moving back to clocksource=acpi_pm divider=10 as kernel flags that was 
recommended for EL 5.3 and
> previously, but I'm hesitant to do that since TSC is clearly a better-
performing timekeeper.
> 
> On physical hosts, even ones that use TSC, I never see a "Using TSC for 
driving interrupts" kernel message so
> the behavior is subtly different but I can't find anything in Google about 
this kernel message or event.
> 
> Has anyone encountered this?  Anyone able to shed light on the inner 
workings of TSC that might lead me to a
> solution for this (or perhaps being able to intelligently file a Bugzilla)?
> 
> Thanks.
> 
> --
> Jason McCormick
> Unix Team Lead, Systems Group, IT
> Software Engineering Institute, Carnegie Mellon Univ.
> E: jasonmc@...
> 


Jason,
I am experiencing exactly the same symptons using ESXi 4.0 and Red hat EL 5.6. 
Did you find a solution? Thanks in advance for your response.

Dave Klotzbach




_______________________________________________
rhelv5-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/rhelv5-list

Reply via email to