Re: IP auto-config with DHCP on sparc64 possibly broken
On 02/18/2018 10:40 PM, Frank Scheiner wrote: >> , it's just that >> the tool you are using is apparently producing unaligned accesses. > > What you emphasized is from the AlphaServer DS25, do you assume that > unaligned accesses lead to bus errors on sparc64? To be honest, you wrote so much text that I didn't notice you were talking about an Alpha machine here. If you're talking about a command crashing, it would be enough to just show the reproducer command line and then someone can have a look with GDB to find the place where the unaligned access happens. So, please try to reduce messages in such cases a bit so that we don't have to search the important information in a wall of text. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaub...@debian.org `. `' Freie Universitaet Berlin - glaub...@physik.fu-berlin.de `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Re: IP auto-config with DHCP on sparc64 possibly broken
On 02/18/2018 10:48 PM, James Clarke wrote: Having said that if there are unaligned accesses reported on alpha then those are likely to also happen on sparc64 and be the critters we're looking for. Why do these only trap to software on alpha and not on sparc64? Is that due to a compiler option? Cheers, Frank
Re: IP auto-config with DHCP on sparc64 possibly broken
On 18 Feb 2018, at 21:34, John Paul Adrian Glaubitz wrote: > On 02/18/2018 10:10 PM, John Paul Adrian Glaubitz wrote: >> See my emphasis above. It's not that the network stack is broken, it's just >> that >> the tool you are using is apparently producing unaligned accesses. >> >> Did you try running systemd-networkd? > > As a quick shot into the dark, I have triggered a binNMU of src:klibc, maybe > it was broken by binutils which recently suffered from issues on sparc64. Which will (or should) be rejected by mini-dak, as my 2.0.4-11+sparc64.1 is in unreleased (#885852 hasn't been fixed yet or even acknowledged by anyone). I'd be surprised if it's the binutils bug, but I guess you could do a rebuild locally and upload a .2 to unreleased. Having said that if there are unaligned accesses reported on alpha then those are likely to also happen on sparc64 and be the critters we're looking for. James
Re: IP auto-config with DHCP on sparc64 possibly broken
On 02/18/2018 10:10 PM, John Paul Adrian Glaubitz wrote: IP-Config: no response after 3 s[ 7.950191] ipconfig(196): unaligned trap at 000120003868: 00011ffcf3af 28 2 [ 8.058589] ipconfig(196): unaligned trap at 000120003868: 00011ffcf3af 28 2 ^^ (...) Any idea what could be wrong with `ipconfig` or how I can further debug this? See my emphasis above. It's not that the network stack is broken Yeah, I should have been more specific in the subject, it's `ipconfig` or some dependency of it that looks broken. , it's just that the tool you are using is apparently producing unaligned accesses. What you emphasized is from the AlphaServer DS25, do you assume that unaligned accesses lead to bus errors on sparc64? Did you try running systemd-networkd? No, but I also don't know how to include this in a Debian initramfs. Cheers, Frank
Re: IP auto-config with DHCP on sparc64 possibly broken
On 02/18/2018 10:10 PM, John Paul Adrian Glaubitz wrote: > See my emphasis above. It's not that the network stack is broken, it's just > that > the tool you are using is apparently producing unaligned accesses. > > Did you try running systemd-networkd? As a quick shot into the dark, I have triggered a binNMU of src:klibc, maybe it was broken by binutils which recently suffered from issues on sparc64. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaub...@debian.org `. `' Freie Universitaet Berlin - glaub...@physik.fu-berlin.de `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Re: IP auto-config with DHCP on sparc64 possibly broken
On 02/18/2018 09:55 PM, Frank Scheiner wrote: > I'm currently working on creating a useful configuration for network boot > with GRUB2 on sparc64. But I'm experiencing problems during IP > auto-configuration. > This is done by using the `ip=[...]` kernel command line option and the > `ipconfig` tool ([1]) included in the initramfs which evaluates this option. > (...) > Hence I currently do all needed configuration on sparc64 "manually" by > providing all required addresses in the `ip=[...]` option which at least > works but of > course duplicates configuration. Other architectures (ppc64, alpha, hppa) > don't have this problem AFAICS. > (...) > IP-Config: no response after 3 s[ 7.950191] ipconfig(196): unaligned trap > at 000120003868: 00011ffcf3af 28 2 > [ 8.058589] ipconfig(196): unaligned trap at 000120003868: > 00011ffcf3af 28 2 ^^ > (...) > Any idea what could be wrong with `ipconfig` or how I can further debug this? See my emphasis above. It's not that the network stack is broken, it's just that the tool you are using is apparently producing unaligned accesses. Did you try running systemd-networkd? Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaub...@debian.org `. `' Freie Universitaet Berlin - glaub...@physik.fu-berlin.de `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
IP auto-config with DHCP on sparc64 possibly broken
Hi, I'm currently working on creating a useful configuration for network boot with GRUB2 on sparc64. But I'm experiencing problems during IP auto-configuration. This is done by using the `ip=[...]` kernel command line option and the `ipconfig` tool ([1]) included in the initramfs which evaluates this option. [1]: https://git.kernel.org/pub/scm/libs/klibc/klibc.git/tree/usr/kinit/ipconfig/README.ipconfig Whenever I try to use DHCP I see bus errors during operation: ``` [...] Begin: Running /scripts/init-premount .[ 36.957993] RPC: Registered named UNIX socket transport module. .. do[ 37.033724] RPC: Registered udp transport module. [ 37.094913] RPC: Registered tcp transport module. Beg[ 37.155124] RPC: Registered tcp NFSv4.1 backchannel transport module. in: Mounting root file system ... Begin: Running /scr[ 37.286986] FS-Cache: Netfs 'nfs' registered for caching ipts/nfs-top ... done. Begin: Running /scripts/nfs-premount ... done. IP-Config: enp1s1f1 hardware address 08:00:20:12:34:56 mtu 1500 DHCP IP-Config: no response after 2 secs - giving up IP-Config: enp1s1f1 hardware address 08:00:20:12:34:56 mtu 1500 DHCP Bus error [...] IP-Config: enp1s1f1 hardware address 08:00:20:12:34:56 mtu 1500 DHCP Bus error IP-Config: enp1s1f1 hardware address 08:00:20:12:34:56[ 41.042475] enp1s1f1: Link is up using mtu[ 41.042483] internal 150[ 41.092568] transceiver at 0 DH[ 41.123683] 100Mb/s, Full Duplex. CP Bus error [... IP-Config: enp1s1f1 hardware address 08:00:20:12:34:56 mtu 1500 DHCP Bus error /init: .: line 230: can't open '/run/net-enp1s1f1.conf' [ 41.534502] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0200 [...] ``` These are from an Ultra 10, but I also saw the same on a T5220 and a T5240. The same happens, when running `ipconfig` manually from the initramfs: ``` [...] No init found. Try passing init= bootarg. BusyBox v1.27.2 (Debian 1:1.27.2-2) built-in shell (ash) Enter 'help' for a list of built-in commands. (initramfs) ipconfig :enp1s1f1:dhcp IP-Config: enp1s1f1 hardware address 08:00:20:12:34:56 mtu 1500 DHCP Bus error ``` Hence I currently do all needed configuration on sparc64 "manually" by providing all required addresses in the `ip=[...]` option which at least works but of course duplicates configuration. Other architectures (ppc64, alpha, hppa) don't have this problem AFAICS. e.g. on DS25: ``` [...] Begin: Running /scripts/init-p[3.334959] tg3 0002:02:05.0: firmware: direct-loading firmware tigon/tg3_tso.bin remount ... done. Begin: Mounting root file system ... Begin: Running /scripts/nfs-top ... done. Begin: Runnin[3.439451] IPv6: ADDRCONF(NETDEV_UP): enP2p2s5: link is not ready g /scripts/nfs-premount ... done. IP-Config: enP2p2s5 hardware address 00:16:35:12:34:56 mtu 1500 DHCP [3.803708] random: crng init done IP-Config: no response after 2 secs - giving up IP-Config: enP2p2s5 hardware address 00:16:35:12:34:56 mtu 1500 DHCP [7.153316] tg3 0002:02:05.0 enP2p2s5: Link is up at 1000 Mbps, full duplex [7.154293] tg3 0002:02:05.0 enP2p2s5: Flow control is on for TX and on for RX [7.155269] IPv6: ADDRCONF(NETDEV_CHANGE): enP2p2s5: link becomes ready IP-Config: no response after 3 s[7.950191] ipconfig(196): unaligned trap at 000120003868: 00011ffcf3af 28 2 [8.058589] ipconfig(196): unaligned trap at 000120003868: 00011ffcf3af 28 2 IP-Config: enP2p2s5 hardware address 00:16:35:12:34:56 mtu 1500 DHCP IP-Config: enP2p2s5 guessed broadcast address 172.16.255.255 IP-Config: enP2p2s5 complete (dhcp from 172.16.0.1): address: 172.16.2.89 broadcast: 172.16.255.255 netmask: 255.255.0.0 gateway: 172.16.0.1 dns0 : 172.16.0.1 dns1 : 0.0.0.0 host : ds25 domain : domain.tld rootserver: 172.16.0.9 rootpath: /path/to/ds25 filename : /AC100259 done. [...] ``` ...everything works as expected. As there were some problems with klibc on sparc64 end of 2017 I assume that there might still be something wrong there which breaks IP auto-configuration with DHCP on sparc64. I only tried on a limited selection of machines, but as there were years between the introduction of Ultra 10 and T5240 and both use different NICs but show the same issues it's likely that this affects more or all sparc64 machines. You can reproduce this by reconfiguring and rebuilding your initramfs (add `BOOT=nfs` to `/etc/initramfs-tools/initramfs.conf` and rebuild) and adding `ip=::dhcp` to your kernel command line. The latter should work regardless of using SILO or GRUB2 and also when the kernel is loaded from disk. Before reboot, make sure you still have your original initramfs and a boot loader configuration which boots with the original initramfs. A running DHCP server is required. I use the ISC DHCP server and a host configuration similar to the following: ``` host ultra-10 { hardware ethernet 08:00:20:12:34:56; fixed-address ultra-10.domain.