> On Oct 20, 2015, at 11:49 PM, Robert Mustacchi <r...@joyent.com> wrote:
> 
> On 10/20/15 20:08 , Rob Seastrom wrote:
>> 
>> Last night I made a trip to the datacenter, to update the SmartOS thumb 
>> drive on our lab/test machine, an HP DL160G6.  What was supposed to be a 
>> quick easy reboot turned into an hour of head scratching.
>> 
>> Long story short:  after upgrading from 20150625T055522Z to 20151015T063628Z 
>> the machine appeared to not come back online.  Further examination revealed 
>> that the host had in fact booted all the way, but the NICs weren't found.  
>> But ifconfig plumb was able to provision them.  So the driver is there, but 
>> it didn't set things up properly on the way up.  A full power-off reboot 
>> (init 5 with a wait before ipmi power-on) yielded the same results.
>> 
>> This was made all the more remarkable by the fact that Saturday night 
>> another DL160G6 at home got the exact same upgrade and it went just fine.
>> 
>> The only difference of note in the configurations between these machines is 
>> that the one at the datacenter has two nics active ("admin" and "vlan", 
>> which are untagged/admin-only and tagged/vms respectively), while the one at 
>> home runs everything through the "admin" port.
>> 
>> The NICs are nothing special.  Intel 82576 / "NC362i Integrated Dual port 
>> Gigabit Server Adapter", built in to the motherboard.  They seem to be 
>> identical right down to the device and vendor IDs.
>> 
>> f4-ce-46-b0-39-7a was happy with the upgrade.  f4-ce-46-bc-29-92 was not:
>> 
>> https://us-east.manta.joyent.com/res3066/public/dmesg-out-f4-ce-46-b0-39-7a.txt
>> https://us-east.manta.joyent.com/res3066/public/prtconf-out-f4-ce-46-b0-39-7a.txt
>> https://us-east.manta.joyent.com/res3066/public/sysinfo-out-f4-ce-46-b0-39-7a.txt
>> 
>> https://us-east.manta.joyent.com/res3066/public/dmesg-out-f4-ce-46-bc-29-92.txt
>> https://us-east.manta.joyent.com/res3066/public/prtconf-out-f4-ce-46-bc-29-92.txt
>> https://us-east.manta.joyent.com/res3066/public/sysinfo-out-f4-ce-46-bc-29-92.txt
>> 
>> https://us-east.manta.joyent.com/res3066/public/sick-f4-ce-46-bc-29-92.txt 
>> 
>> Anyone got an idea what might have gone wrong or what other data I ought to 
>> provide?
> 
> Hi,
> 
> Thanks for the log from the sick node. There's a bunch of suspicious
> output in the xtrace output for the network/physical service log which
> is definitely where things have gone south. It was really helpful to
> have that. Would it be possible for you to share what the /usbkey/config
> looks like for that node?

Sure.

https://us-east.manta.joyent.com/res3066/public/usbkey-config-f4-ce-46-bc-29-92.txt

> The suspicious thing in the log is seeing all of those 3D values showing
> up there, but that may be an artifact of something else.

Pilot error on my side.  Sorry about that; the logs were cut and pasted into an 
email message by a colleague on the console and I uploaded it as 
text/quoted-plain rather than cutting and pasting it back out into a plain file.

https://us-east.manta.joyent.com/res3066/public/sick-f4-ce-46-bc-29-92-plaintext.txt

should be the same thing but with less fail.

> In addition,
> could you confirm what dladm show-phys -m looks like when the node is sick?

I'll interleave this trip into my day today and follow up.

> Thanks,
> Robert

Thanks!

-r




-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to