Hi Robert,

> I would suggest checking FMA for any ereports or faults. What exactly is
> it and how is it misbehaving?

I just took a look at FMA (`fmadm faulty -v`) and there was no output - I’m not 
sure it’ll get a chance to catch anything though; the box runs fine for around 
3-5 days, then just reboots.
No dump is created and nothing gets logged - just several `rsyslogd: -- MARK —` 
messages at the normal intervals, then the `SunOS` boot banner. I went as far 
as increasing the dump volume to match system RAM, but I don’t actually think a 
panic is getting triggered.
Nothing shows up in the BMC logs either. I have never happened to be connected 
to the SOL at the time of the reboot either, so I’ve never actually witnessed 
it - I only find out because the sad trombone starts to harass me.

It’s bizarre, because the box “feels” fine and the zones act and perform as 
you’d expect, right up until the second it reboots.

> Well, we leverage the X2APIC; however, when there's an explicit BIOS
> option, it seems to be triggering a bug depending on how the BIOS does
> things, so I think it's often left at it's default disabled option.

Pretty sure it’s disabled-by-default on this board, but I can enable it and see 
if it will still boot, I guess.

> We're not actively leveraging SR-IOV, but I don't recall if we've
> explicitly disabled it as an option.

Also disabled by default, so I’ll leave that switched off!

Adam

> On 23 Sep 2016, at 22:31, Robert Mustacchi <[email protected]> wrote:
> 
> On 9/23/16 14:29 , Adam Richmond-Gordon wrote:
>> Hi Robert,
>> 
>> Many thanks! That’s exactly what I’ve done, and the box is still having 
>> problems, so I am going to raise an RMA.
> 
> I would suggest checking FMA for any ereports or faults. What exactly is
> it and how is it misbehaving?
> 
>> I also disable features like X2APIC, SR-IOV - does that sound about right?
> 
> Well, we leverage the X2APIC; however, when there's an explicit BIOS
> option, it seems to be triggering a bug depending on how the BIOS does
> things, so I think it's often left at it's default disabled option.
> 
> We're not actively leveraging SR-IOV, but I don't recall if we've
> explicitly disabled it as an option.
> 
> Robert
> 
>>> On 23 Sep 2016, at 22:23, Robert Mustacchi <[email protected]> wrote:
>>> 
>>> On 9/23/16 14:20 , Adam Richmond-Gordon wrote:
>>>> Hello all,
>>>> 
>>>> I (seem to) remember some time ago that the general advice was to disable 
>>>> power management where possible. Is this still correct, or have I been 
>>>> imagining this all along?
>>>> 
>>>> I recently took delivery of a box with an E5 v4 CPU, and SmartOS would 
>>>> panic on boot due to expecting the `mwait` CPU feature. I have managed to 
>>>> get the box to boot by allowing C0-C2 states, but it is somewhat unstable.
>>>> Although I do think this is quite likely to be a hardware issue, which CPU 
>>>> power management features can SmartOS actually take advantage of, and what 
>>>> is recommended?
>>>> 
>>>> I plan to open an RMA for the box next week, but it’s probably worth 
>>>> making sure that my running configuration is actually valid, first.
>>>> 
>>>> If memory serves me correctly, the CPU and board firmware can support a 
>>>> variety of C-, T- and P-states, amongst other features that either aren’t 
>>>> selectable or don’t exist on other hardware I have.
>>> 
>>> In general, our recommendation is to disable the deep C-States. So we
>>> leave T- and P- states enabled, but not C3 and C6, etc.
>>> 
>>> That help?
>>> Robert
>>> 
>> 
>> 
> 
> 



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to