On 10/20/15 20:08 , Rob Seastrom wrote: > > Last night I made a trip to the datacenter, to update the SmartOS thumb drive > on our lab/test machine, an HP DL160G6. What was supposed to be a quick easy > reboot turned into an hour of head scratching. > > Long story short: after upgrading from 20150625T055522Z to 20151015T063628Z > the machine appeared to not come back online. Further examination revealed > that the host had in fact booted all the way, but the NICs weren't found. > But ifconfig plumb was able to provision them. So the driver is there, but > it didn't set things up properly on the way up. A full power-off reboot > (init 5 with a wait before ipmi power-on) yielded the same results. > > This was made all the more remarkable by the fact that Saturday night another > DL160G6 at home got the exact same upgrade and it went just fine. > > The only difference of note in the configurations between these machines is > that the one at the datacenter has two nics active ("admin" and "vlan", which > are untagged/admin-only and tagged/vms respectively), while the one at home > runs everything through the "admin" port. > > The NICs are nothing special. Intel 82576 / "NC362i Integrated Dual port > Gigabit Server Adapter", built in to the motherboard. They seem to be > identical right down to the device and vendor IDs. > > f4-ce-46-b0-39-7a was happy with the upgrade. f4-ce-46-bc-29-92 was not: > > https://us-east.manta.joyent.com/res3066/public/dmesg-out-f4-ce-46-b0-39-7a.txt > https://us-east.manta.joyent.com/res3066/public/prtconf-out-f4-ce-46-b0-39-7a.txt > https://us-east.manta.joyent.com/res3066/public/sysinfo-out-f4-ce-46-b0-39-7a.txt > > https://us-east.manta.joyent.com/res3066/public/dmesg-out-f4-ce-46-bc-29-92.txt > https://us-east.manta.joyent.com/res3066/public/prtconf-out-f4-ce-46-bc-29-92.txt > https://us-east.manta.joyent.com/res3066/public/sysinfo-out-f4-ce-46-bc-29-92.txt > > https://us-east.manta.joyent.com/res3066/public/sick-f4-ce-46-bc-29-92.txt > > Anyone got an idea what might have gone wrong or what other data I ought to > provide?
Hi, Thanks for the log from the sick node. There's a bunch of suspicious output in the xtrace output for the network/physical service log which is definitely where things have gone south. It was really helpful to have that. Would it be possible for you to share what the /usbkey/config looks like for that node? The suspicious thing in the log is seeing all of those 3D values showing up there, but that may be an artifact of something else. In addition, could you confirm what dladm show-phys -m looks like when the node is sick? Thanks, Robert ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com