> On Oct 20, 2015, at 11:49 PM, Robert Mustacchi <r...@joyent.com> wrote: > > On 10/20/15 20:08 , Rob Seastrom wrote: >> >> Last night I made a trip to the datacenter, to update the SmartOS thumb >> drive on our lab/test machine, an HP DL160G6. What was supposed to be a >> quick easy reboot turned into an hour of head scratching. >> >> Long story short: after upgrading from 20150625T055522Z to 20151015T063628Z >> the machine appeared to not come back online. Further examination revealed >> that the host had in fact booted all the way, but the NICs weren't found. >> But ifconfig plumb was able to provision them. So the driver is there, but >> it didn't set things up properly on the way up. A full power-off reboot >> (init 5 with a wait before ipmi power-on) yielded the same results. >> >> This was made all the more remarkable by the fact that Saturday night >> another DL160G6 at home got the exact same upgrade and it went just fine. >> >> The only difference of note in the configurations between these machines is >> that the one at the datacenter has two nics active ("admin" and "vlan", >> which are untagged/admin-only and tagged/vms respectively), while the one at >> home runs everything through the "admin" port. >> >> The NICs are nothing special. Intel 82576 / "NC362i Integrated Dual port >> Gigabit Server Adapter", built in to the motherboard. They seem to be >> identical right down to the device and vendor IDs. >> >> f4-ce-46-b0-39-7a was happy with the upgrade. f4-ce-46-bc-29-92 was not: >> >> https://us-east.manta.joyent.com/res3066/public/dmesg-out-f4-ce-46-b0-39-7a.txt >> https://us-east.manta.joyent.com/res3066/public/prtconf-out-f4-ce-46-b0-39-7a.txt >> https://us-east.manta.joyent.com/res3066/public/sysinfo-out-f4-ce-46-b0-39-7a.txt >> >> https://us-east.manta.joyent.com/res3066/public/dmesg-out-f4-ce-46-bc-29-92.txt >> https://us-east.manta.joyent.com/res3066/public/prtconf-out-f4-ce-46-bc-29-92.txt >> https://us-east.manta.joyent.com/res3066/public/sysinfo-out-f4-ce-46-bc-29-92.txt >> >> https://us-east.manta.joyent.com/res3066/public/sick-f4-ce-46-bc-29-92.txt >> >> Anyone got an idea what might have gone wrong or what other data I ought to >> provide? > > Hi, > > Thanks for the log from the sick node. There's a bunch of suspicious > output in the xtrace output for the network/physical service log which > is definitely where things have gone south. It was really helpful to > have that. Would it be possible for you to share what the /usbkey/config > looks like for that node?
Sure. https://us-east.manta.joyent.com/res3066/public/usbkey-config-f4-ce-46-bc-29-92.txt > The suspicious thing in the log is seeing all of those 3D values showing > up there, but that may be an artifact of something else. Pilot error on my side. Sorry about that; the logs were cut and pasted into an email message by a colleague on the console and I uploaded it as text/quoted-plain rather than cutting and pasting it back out into a plain file. https://us-east.manta.joyent.com/res3066/public/sick-f4-ce-46-bc-29-92-plaintext.txt should be the same thing but with less fail. > In addition, > could you confirm what dladm show-phys -m looks like when the node is sick? I'll interleave this trip into my day today and follow up. > Thanks, > Robert Thanks! -r ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com