On 7/10/06, Jürgen Keil <[EMAIL PROTECTED]> wrote:
I've filed a new bug for this problem: 6446729 "cpu 1 failed to start" when TSC counters are not in sync http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6446729
By the way, I've raised this bug's priority to P2.
To fix the problem, I've moved the tsc clock synchronization code for the slave cpus in mp_startup() up a few lines, so that the gethrtime() call [via cmn_err() -> gethrestime() -> pc_gethrestime() ] runs when the tsc clock delta is initialized. (see the bug report) The ASUS N4L-VM mainboard doesn't hang any more during mp cpu startup.
This is a good fix and it solves the problem with early gethrtime() calls as a result of cmn_err() calls in mp_startup(). But I do think that slave TSC clock synchronization could be moved even higher on the list of things done by mp_startup(). I'm worried that, for example, if somebody were to change cpuid_pass1() to call cmn_err() or something else that could call gethrtime(), then we'd run into this exact problem again. I'm wondering if procset bitmap setting with tsc_sync_slave() call should be the very first thing done by mp_startup() after splx(ipltospl(LOCK_LEVEL)) call. I think MTRR sync can be safely done after TSC sync. I'm not so sure about the syscall handlers though. I'll take a closer look at this and will update the bug report. - Andrei _______________________________________________ opensolaris-code mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/opensolaris-code
