Greetings to smartos-discuss, Here is a forward of an email that I sent to [email protected] on 9 October 2014, when I was not subscribed to the list.
I don't see my message in the archives, so I have subscribed to the list, and am trying again to get this message on to the list. Regards, Steve * * * Steve Petrie, P.Eng. ITS-ETO Consortium Oakville, Ontario, Canada (905) 847-3253 [email protected] ----- Original Message ----- From: Steve Petrie, P.Eng. To: [email protected] Cc: Elastic Hosts - Support Sent: Thursday, October 09, 2014 2:50 PM Subject: SmartOS setup & boot Errors -- Testing SmartOS On An Elastic Hostd VM; Greetings To smartos-discuss, I am working to get SmartOS operational under a VM provisioned on the Elastic Hosts (EH) cloud infrastucture www.elastichosts.com Presently, on a trial EH VM, a SmartOS setup / SmartOS boot, running from from the iso CD image on an EH cloud disk, is giving four messages that seem particularly of note: a.. trap: unknown trap type 8 in user mode b.. Notice: MPO disabled because memory is disabled c.. WARNING: KVM no hardware support d.. WARNING rtls0: failure resetting PHY I would appreciate very much, any comments smart-os list members would care to make. The KVM warning is not a big issue for my particular application. EH support advise me, that the "trap" message could be of particular concern. They also advise trying to disable ACPI. * * * * * * At the http://wiki.smartos.org/display/DOC/Home web page, a Google custom search for the string: a.. smartos setup message "trap: unknown trap type 8 in user mode" ? yields a single result, being the illumos IRC log of [August 18 - 2011], a copy of which is attached to this email, in edited form: a.. <SmartOS -- Elastic Hosts VM - _Unknown trap type 8 in user mode_ - boot message - edited search results - 20141409.txt> * * * * * * As a help to smart-os list participants, here are snippets (emphasis added) selected from the attached edited IRC log: a.. [01:51:17] <grey> I'm getting "Unknown trap type 8 in user mode" right after it starts trying to load the Kernel ... [01:57:25] <richlowe> I've seen the unknown usermode trap before b.. [02:06:34] <richlowe> could be ACPI, I suppose. [02:06:48] <grey> It's possible I didn't give nexenta enough time, in hindsight, that message may not have been fatal? c.. [02:06:49] <lewellyn> grey: does OI boot? ... [02:07:11] <richlowe> grey: it is. [02:07:13] <grey> Nexenta booted up this time [02:07:20] <richlowe> drat. d.. [02:08:00] <richlowe> last I saw it, we took the "bogus interrupt" early in boot, took it to be a trapped, and tried to core a process that didn't exist. ... [02:09:30] <richlowe> lewellyn: if it involved counter underflow, that was different. ... [02:09:55] <lewellyn> i don't remember details. that's why i have computers. but in any case, it was something i didn't want to even get near. [02:09:59] <richlowe> if you booted with kmdb loaded, and you can get there, ::interrupts might be useful. e.. [02:16:13] <richlowe> got it. ... [02:16:20] <richlowe> this is the !@#$% intel_nhmex pain. ... [02:16:25] <richlowe> grey: can you modify your boot image? [02:16:33] <grey> can I? you tell me ;) [02:16:45] <richlowe> intel_nhmex (and intel_nhm, actually) do actual things in _init, presumably mistaking them for _attach f.. [02:17:00] <richlowe> when they do this on certain CPUs, it traps and the whole world blows up [02:17:08] <richlowe> if you're looking, you just get a warning from ACPI ... [02:17:40] <lewellyn> i get messages at boot regarding acpi and then things go wonky later [02:17:43] <richlowe> I went through hell on this older athlon64 chasing that down. [02:18:33] <richlowe> it turns out that this used to crash Xen too, and rather than fixing it, they just made the module not load under i86xpv g.. [02:19:03] <richlowe> grey: if you can delete /kernel/**/intel_nhm* you may come up. [02:19:11] <richlowe> I don't think -Bdisable-intel_nhmex works h.. [02:19:43] <grey> richlowe: ok, I'll take a crack at that (I don't suppose you know off the top of your head the easiest way to modify an ISO image in that fashion?, I'll google around for it) [02:20:06] <richlowe> I think on the iso, you'd have to unpack it, delete the modules, and then rebuild its boot_archive. i.. [02:22:49] <grey> richlowe: that's the kernel/**/intel_nhm* under platform/i86pc/ ? [02:23:15] <richlowe> under /kernel/drv, I think the other is the cpu mod [02:24:46] <grey> this is on the nexenta distro? I don't see any intel_nhm* files anywhere in the entire tree ... [02:25:43] <grey> http://pastebin.com/Q9Nzb30D There's the directory structure [02:26:28] <grey> oh, is it inside the usr.img.zlib? [02:27:11] <richlowe> If that's OI media, I have no idea where it'd be on the install media [02:27:33] <grey> that's from nexenta-core-platform_3.0.1-b134_x86.iso j.. [02:33:21] <richlowe> sort of interesting would be, boot with -kvd ( hit 'e' in grub, add -kvd to the kernel$ line), and hit 'b' to boot. it'll stop saying [0]>. Type "::bp pcplusmp`apic_error_intr -c $C" then :c [02:34:07] <richlowe> hopefully it'll hit the breakpoint and dump the stack, rather than exploding. ... [02:34:20] <richlowe> if it doesn't, I think you found a new way to get a random interrupt. [02:34:42] <richlowe> if it happened to log APIC errors, even when it worked, that'd help to place the blame on nhmex again, too. * * * * * * Would smart-os list members have any knowlledge -- was there a resolution of this "trap" issue? In the interest of getting SmartOS to work on the EH cloud platform, if this requires building a special distribution that simply omits some SmartOS functionality, whose absence is not fatal, I would be happy to undertake that. I'm also happy to try to make any diagnostic efforts, that smart-os list participants might recommend. Please be aware that, although I am a software engineer with 30 years experience, I have little Unix and less Linux expeience. I do have C++ experience and some hardware-level comprehension. And a keen desire to enter the SmartOS world. So I would need advice from smart-os list members and from EH sipport. * * * * * * My motivation to climb a SmartOS learning curve, is because I really like the idea of hosting in production under SmartOS, a new website I have developed. Please see the attached PDF rendition of the home page, for the new website (not yet online anywhere): a.. <eto_home - 20140812.pdf> (Unfortunately, the PDF does not provide animation of the horizontal graphic of the expressway section. If smartos-discuss list members would like to see the animation, just let me know, and I will supply the necessary files.) * * * * * * Hopefully Clicking "Send" Now :) Steve * * * Steve Petrie, P.Eng. ITS-ETO Consortium Oakville, Ontario, Canada (905) 847-3253 [email protected]
eto_home - 20140812.pdf
Description: Adobe PDF document
SmartOS -- Elastic Hosts VM - _Unknown trap type 8 in user mode_ - boot message - edited search results - 20141409.txt 9 October 2014 This is an edited version, of file: ==> <SmartOS -- Elastic Hosts VM - _Unknown trap type 8 in user mode_ - boot message - raw search results - 20141409.txt> being a portion of the target page, of the single result link, from a Google custom search, invoked from web page: ==> http://wiki.smartos.org/display/DOC/Home with search parameter <smartos setup message "trap: unknown trap type 8 in user mode" ?>. * * * * * * The search results page was titled "illumos IRC log of [August 18 - 2011]". They are provided below, edited for brevity only, by deletion of some lines, with ellipses substituted, between the opening "@@@ ..." and closing "... @@@" lines. * * * @@@ illumos IRC log of [August 18 - 2011 ... [01:51:17] <grey> I'm getting "Unknown trap type 8 in user mode" right after it starts trying to load the Kernel ... [01:57:25] <richlowe> I've seen the unknown usermode trap before ... [01:58:43] <Triskelios> ...I've seen an FP exception before [01:59:25] <grey> I'm going to try smartos, not sure if they made any changes to the kernel/boot options [01:59:29] <richlowe> not FP, we take an interrupt we don't expect, and land in an IDT that wasn't setup [01:59:50] <richlowe> I remember bitching about it, don't remember what I broke to cause it. [02:00:09] *** McBofh has quit IRC [02:01:22] *** McBofh has joined #illumos [02:03:10] <grey> wat. [02:03:27] <grey> "System detected 256 cpus, but only 1 cpu(s) were enabled during boot." [02:04:07] <Triskelios> vbox? [02:04:46] <alanc> it's so annoying when you have 255 imaginary cpus going to waste [02:05:32] <grey> Triskelios: qemu+kvm [02:05:36] <grey> well, it booted anyways .. [02:05:56] <grey> I thought it was stalled, I'm used to much more feedback from a linux bootup [02:06:12] <lewellyn> grey: you can make it more verbose... [02:06:15] <richlowe> grey: wait, which booted and which didn't? [02:06:19] <lewellyn> but all that tends to do is confuse people. [02:06:26] <grey> SmartOS booted, Nexenta didn't [02:06:34] <richlowe> could be ACPI, I suppose. [02:06:48] <grey> It's possible I didn't give nexenta enough time, in hindsight, that message may not have been fatal? [02:06:49] <lewellyn> grey: does OI boot? ... [02:07:11] <richlowe> grey: it is. [02:07:13] <grey> Nexenta booted up this time [02:07:20] <richlowe> drat. ... [02:07:36] <grey> lewellyn: I don't have an OI iso around to test .. [02:08:00] <richlowe> last I saw it, we took the "bogus interrupt" early in boot, took it to be a trapped, and tried to core a process that didn't exist. ... [02:09:06] <grey> lewellyn: I'll grab a copy and give it a shot, it'll take a little while to download [02:09:11] <lewellyn> i might have your summary around somewhere. [02:09:30] <richlowe> lewellyn: if it involved counter underflow, that was different. [02:09:35] <grey> richlowe: is there anything I can do to provide more useful debugging info if I see that bug again? [02:09:45] <richlowe> I don't know. [02:09:55] <lewellyn> i don't remember details. that's why i have computers. but in any case, it was something i didn't want to even get near. [02:09:59] <richlowe> if you booted with kmdb loaded, and you can get there, ::interrupts might be useful. [02:11:49] <grey> Ah there we go ... [02:12:07] <grey> the installer starts, then fails with that error, I missed it before because I didn't start up VNC fast enough ... [02:16:13] <richlowe> got it. .. [02:16:20] <richlowe> this is the !@#$% intel_nhmex pain. ... [02:16:25] <richlowe> grey: can you modify your boot image? [02:16:33] <grey> can I? you tell me ;) [02:16:45] <richlowe> intel_nhmex (and intel_nhm, actually) do actual things in _init, presumably mistaking them for _attach [02:16:45] <grey> I'm just booting the iso's of the main distributions [02:17:00] <richlowe> when they do this on certain CPUs, it traps and the whole world blows up [02:17:08] <richlowe> if you're looking, you just get a warning from ACPI ... [02:17:40] <lewellyn> i get messages at boot regarding acpi and then things go wonky later [02:17:43] <richlowe> I went through hell on this older athlon64 chasing that down. [02:18:33] <richlowe> it turns out that this used to crash Xen too, and rather than fixing it, they just made the module not load under i86xpv ... [02:19:03] <richlowe> grey: if you can delete /kernel/**/intel_nhm* you may come up. [02:19:11] <richlowe> I don't think -Bdisable-intel_nhmex works [02:19:39] <lewellyn> richlowe: i may have more datapoints on that for you tomorrow :) [02:19:43] <grey> richlowe: ok, I'll take a crack at that (I don't suppose you know off the top of your head the easiest way to modify an ISO image in that fashion?, I'll google around for it) [02:20:06] <richlowe> I think on the iso, you'd have to unpack it, delete the modules, and then rebuild its boot_archive. [02:20:20] <richlowe> ryancnelson: don't suppose you feel like experimenting? :) [02:20:40] <grey> ok, I should be able to do that [02:22:49] <grey> richlowe: that's the kernel/**/intel_nhm* under platform/i86pc/ ? [02:23:15] <richlowe> under /kernel/drv, I think the other is the cpu mod [02:24:46] <grey> this is on the nexenta distro? I don't see any intel_nhm* files anywhere in the entire tree ... [02:25:43] <grey> http://pastebin.com/Q9Nzb30D There's the directory structure [02:26:28] <grey> oh, is it inside the usr.img.zlib? [02:27:11] <richlowe> If that's OI media, I have no idea where it'd be on the install media [02:27:33] <grey> that's from nexenta-core-platform_3.0.1-b134_x86.iso ... [02:28:22] <grey> the OI media is downloading now, smartOS booted at least ... [02:33:21] <richlowe> sort of interesting would be, boot with -kvd ( hit 'e' in grub, add -kvd to the kernel$ line), and hit 'b' to boot. it'll stop saying [0]>. Type "::bp pcplusmp`apic_error_intr -c $C" then :c [02:34:07] <richlowe> hopefully it'll hit the breakpoint and dump the stack, rather than exploding. ... [02:34:20] <richlowe> if it doesn't, I think you found a new way to get a random interrupt. [02:34:42] <richlowe> if it happened to log APIC errors, even when it worked, that'd help to place the blame on nhmex again, too. ... @@@ * * * *** End of Document *** ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com
