There's something odd going on here, and I believe it's ultimately a problem with the Soekris BIOS console emulation via serial port. I have a workaround by modifying pxelinux, and I'm documenting my findings here in case it's useful to someone else, and/or someone familiar with Soekris BIOS internals wants to investigate the underlying issue further.
[WARNING: technical discussion involving PXE bootstrap loading, serial ports, handshaking and/or BIOS follows. Casual readers please disregard this message] The overall problem is that pxelinux bootstrap freezes during its initial sending of progress information to the console, until a real serial cable and PC is plugged into the serial port. I am using a recent Soekris net4501, with "comBIOS ver. 1.28 20050527" as shipped from factory. I have set up a pxeboot installer environment for it on a separate PC, using pxelinux.0 from syslinux-3.52.tar.bz2, downloaded from http://www.kernel.org/pub/linux/utils/boot/syslinux/ The config file, /tftpboot/pxelinux.cfg/default, contains the following: serial 0 19200 0x303 console 0 label linux kernel openwrt-x86-2.6-vmlinuz append initrd=openwrt-x86-2.6-rootfs.cpio.gz init=/etc/preinit console=tty0 console=ttyS0,19200n8 reboot=bios Now, everything works fine as long as there's a serial cable connected to a PC (running minicom). The kernel and ramdisk are fetched and run. However, if no serial cable is connected, booting hangs while pxelinux tries to write to the serial port. I see a few flashes on the Net LED, but no kernel is downloaded. If I reconnect the serial cable, then I see it continue as follows: Copyright (C ) 1994-2007 H. Peter Anvin UNDI data segme nt at: 0009B7 ... Whereas if a serial cable is connected from power up, the startup message looks like this: PXELINUX 3.52 2 007-09-25 Copy right (C) 1994- ... etc So it seems that if no serial cable is present, the device hangs after sending ~25 characters, and it continues after this point when the serial cable is connected. The first thing I did was to change pxelinux.cfg/default to serial 0 19200 0x000 which is the documented way to turn off handshaking. This didn't fix it. Next, I thought this was going to be a simple case of misconfigured handshaking, so I built a little DB9 loopback connector as follows: RTS 7 --. CTS 8 --' DTR 4 --. DCD 1 --+ DSR 6 --' Surprisingly, I found that the pxeboot still stopped at the same place; and removing this loopback and plugging in a 'real' serial connection (to a PC running minicom) still enabled it to go again. Measuring the voltages, I saw that DTR was high (+5) but RTS was low (-5). So I reconfigured the DB9 as DTR 4 --. CTS 8 --+ DCD 1 --+ DSR 6 --' but still no joy. So the only other thing I could think of was that the signal on RD is wrong, and indeed if I loopback TD(3) to RD(2), then it works. If I connect RD to SG(5), then it doesn't. If I connect RD to DTR(4), then it doesn't. This is bizarre. It seems highly unlikely to me that the default level of RD is floating the wrong way, so as to be receiving a 'break' all the time, but that's the only explanation I can think of. Also, it's odd that pxelinux would be affected by this, but not the BIOS bootstrap output (nor Linux, although Linux talks directly to the serial hardware). Put that aside for now. Next, I had a bit of a think about the bootup sequence. It's clear that the initial PXE debug messages (copyright etc) can't be being written directly to the serial port, since the serial port config is in pxelinux.cfg/default, and that hasn't been loaded yet. pxelinux must be talking to the BIOS API or to a PXE API, and the hang occurs before pxelinux.cfg/default has even been read from the TFTP server. The question is now, where does the problem lie? Next I looked at the source code for pxelinux, which starts at pxelinux.asm in the tarball whose URL was given earlier. It writes the messages like this: mov si,syslinux_banner call writestr mov si,copyright_str call writestr In turn, writestr is 'cwritestr' from writestr.inc, and this calls writechr to do its job. writechr comes from rawcon.inc, and this in turn uses int 10h BIOS calls to write characters. So, this would seem to point the finger at the Soekris BIOS. I haven't tried updating to BIOS 1.32, because: a. the changelogs between 1.28 and 1.32 don't mention this issue b. the idea behind this pxe boot is to install units from the factory with minimum manual intervention (just plug in a CF card and switch on). If units are shipped with 1.28 then I need a simple process which works with 1.28; upgrading the BIOS adds extra complexity. (Aside: I can't just plug a serial port into the unit and be done with it because I need to be able to install multiple net4501's in parallel. The install fills most of a 4GB flash card, so it takes rather a long time) My next thought was to rebuild pxelinux.0 so that it doesn't send anything to the "console" via the Soekris BIOS. This turned out to be easy: --- syslinux-3.52/rawcon.inc.orig 2007-11-15 10:51:42.000000000 +0000 +++ syslinux-3.52/rawcon.inc 2007-11-15 10:51:53.000000000 +0000 @@ -18,7 +18,7 @@ call write_serial ; write to serial port if needed pushfd test byte [DisplayCon],01h ; Write to screen? - jz .nothing + jmp .nothing pushad mov bh,[BIOS_page] # apt-get install nasm # make pxelinux.0 This has solved the problem for me. Everything boots just fine, with or without serial port, and actually saves several seconds as I don't get all the UNDI debug messages being written slowly out. Strangely, if I connect a serial cable, I still see: PXELINUX 3.52 0x46f99a24 Copyright (C) 1994-2007 H. Peter Anvin Loading openwrt-x86-2.6-vmlinuz................... Loading openwrt-x86-2.6-rootfs.cpio.gz.......................ready. I was expecting to see the "Loading ..." lines, since pxelinux.cfg tells it to use the serial port, but I was not expecting to see the PXELINUX banner and copyright message. Anyway, I have a workaround so I'm happy. Someone with access to BIOS documentation and source may wish to investigate the underlying problem further, with a view to improving the BIOS's API compatibility. Regards, Brian. ---------------------------------------------------------------------------- Aside: while I was looking at this, I thought I'd check out another oddity of pxelinux: it writes in a tiny window of 15-character lines, as you can see from the paste output above. Now, I see that in font.inc, it tries to calculate the screen size (rows and cols), if necessary doing some arithmetic on pixels and font sizes. But my suspicion is that the issue arises here: vidrows_ok: mov [VidRows],al mov ah,0fh int 10h ; Read video state dec ah ; Store count-1 (same as rows) mov [VidCols],ah popa ret If int10h with ah=0fh were a no-op, then we'd end up with 0e in VidCols, which I think would give 15-column output. Looking at documentation for what BIOS API calls the Soekris implements could clear this up, but I've been unable to find any. However, this problem is a minor inconvenience only. Another issue is why the text is written so slowly. This again is only a minor inconvenience. It's a pain when grub is displaying its full menu, but adjusting grub's menu.lst so that it has terminal --timeout=0 --dumb serial hiddenmenu works around this problem. This is also the solution to the often-reported problem that the Soekris won't boot into Linux with no serial cable connected. Maybe the BIOS issue I outline above could be the cause of this too. ---------------------------------------------------------------------------- _______________________________________________ Soekris-tech mailing list [email protected] http://lists.soekris.com/mailman/listinfo/soekris-tech
