There's something odd going on here, and I believe it's ultimately a problem
with the Soekris BIOS console emulation via serial port. I have a workaround
by modifying pxelinux, and I'm documenting my findings here in case it's
useful to someone else, and/or someone familiar with Soekris BIOS internals
wants to investigate the underlying issue further.

[WARNING: technical discussion involving PXE bootstrap loading, serial
ports, handshaking and/or BIOS follows. Casual readers please disregard this
message]

The overall problem is that pxelinux bootstrap freezes during its initial
sending of progress information to the console, until a real serial cable
and PC is plugged into the serial port.

I am using a recent Soekris net4501, with "comBIOS ver. 1.28 20050527" as
shipped from factory. I have set up a pxeboot installer environment for it
on a separate PC, using pxelinux.0 from syslinux-3.52.tar.bz2, downloaded
from http://www.kernel.org/pub/linux/utils/boot/syslinux/

The config file, /tftpboot/pxelinux.cfg/default, contains the following:

serial 0 19200 0x303
console 0
label linux
  kernel openwrt-x86-2.6-vmlinuz
  append initrd=openwrt-x86-2.6-rootfs.cpio.gz init=/etc/preinit
console=tty0 console=ttyS0,19200n8 reboot=bios

Now, everything works fine as long as there's a serial cable connected to a
PC (running minicom). The kernel and ramdisk are fetched and run.

However, if no serial cable is connected, booting hangs while pxelinux tries
to write to the serial port. I see a few flashes on the Net LED, but no
kernel is downloaded. If I reconnect the serial cable, then I see it
continue as follows:

   Copyright (C
) 1994-2007 H.
Peter Anvin
UNDI data segme
nt at:   0009B7
...

Whereas if a serial cable is connected from power up, the startup message
looks like this:

PXELINUX 3.52 2
007-09-25  Copy
right (C) 1994-
... etc

So it seems that if no serial cable is present, the device hangs after
sending ~25 characters, and it continues after this point when the serial
cable is connected.

The first thing I did was to change pxelinux.cfg/default to

serial 0 19200 0x000

which is the documented way to turn off handshaking. This didn't fix it.

Next, I thought this was going to be a simple case of misconfigured
handshaking, so I built a little DB9 loopback connector as follows:

   RTS 7 --.
   CTS 8 --'

   DTR 4 --.
   DCD 1 --+
   DSR 6 --'

Surprisingly, I found that the pxeboot still stopped at the same place; and
removing this loopback and plugging in a 'real' serial connection (to a PC
running minicom) still enabled it to go again.

Measuring the voltages, I saw that DTR was high (+5) but RTS was low (-5).
So I reconfigured the DB9 as

   DTR 4 --.
   CTS 8 --+
   DCD 1 --+
   DSR 6 --'

but still no joy.

So the only other thing I could think of was that the signal on RD is wrong,
and indeed if I loopback TD(3) to RD(2), then it works.

If I connect RD to SG(5), then it doesn't. If I connect RD to DTR(4), then
it doesn't.

This is bizarre. It seems highly unlikely to me that the default level of RD
is floating the wrong way, so as to be receiving a 'break' all the time, but
that's the only explanation I can think of. Also, it's odd that pxelinux
would be affected by this, but not the BIOS bootstrap output (nor Linux,
although Linux talks directly to the serial hardware).

Put that aside for now. Next, I had a bit of a think about the bootup
sequence. It's clear that the initial PXE debug messages (copyright etc)
can't be being written directly to the serial port, since the serial port
config is in pxelinux.cfg/default, and that hasn't been loaded yet. pxelinux
must be talking to the BIOS API or to a PXE API, and the hang occurs before
pxelinux.cfg/default has even been read from the TFTP server.

The question is now, where does the problem lie?

Next I looked at the source code for pxelinux, which starts at pxelinux.asm
in the tarball whose URL was given earlier. It writes the messages like
this:

                mov si,syslinux_banner
                call writestr

                mov si,copyright_str
                call writestr

In turn, writestr is 'cwritestr' from writestr.inc, and this calls writechr
to do its job. writechr comes from rawcon.inc, and this in turn uses int 10h
BIOS calls to write characters. So, this would seem to point the finger at
the Soekris BIOS.

I haven't tried updating to BIOS 1.32, because:
a. the changelogs between 1.28 and 1.32 don't mention this issue
b. the idea behind this pxe boot is to install units from the factory with
   minimum manual intervention (just plug in a CF card and switch on).
   If units are shipped with 1.28 then I need a simple process which works
   with 1.28; upgrading the BIOS adds extra complexity.

(Aside: I can't just plug a serial port into the unit and be done with it
because I need to be able to install multiple net4501's in parallel. The
install fills most of a 4GB flash card, so it takes rather a long time)

My next thought was to rebuild pxelinux.0 so that it doesn't send anything
to the "console" via the Soekris BIOS. This turned out to be easy:

--- syslinux-3.52/rawcon.inc.orig       2007-11-15 10:51:42.000000000 +0000
+++ syslinux-3.52/rawcon.inc    2007-11-15 10:51:53.000000000 +0000
@@ -18,7 +18,7 @@
                call write_serial       ; write to serial port if needed
                pushfd
                test byte [DisplayCon],01h      ; Write to screen?
-               jz .nothing
+               jmp .nothing

                pushad
                mov bh,[BIOS_page]

# apt-get install nasm
# make pxelinux.0

This has solved the problem for me. Everything boots just fine, with or
without serial port, and actually saves several seconds as I don't get all
the UNDI debug messages being written slowly out.

Strangely, if I connect a serial cable, I still see:

PXELINUX 3.52 0x46f99a24  Copyright (C) 1994-2007 H. Peter Anvin
Loading openwrt-x86-2.6-vmlinuz...................
Loading openwrt-x86-2.6-rootfs.cpio.gz.......................ready.

I was expecting to see the "Loading ..." lines, since pxelinux.cfg tells it
to use the serial port, but I was not expecting to see the PXELINUX banner
and copyright message.

Anyway, I have a workaround so I'm happy. Someone with access to BIOS
documentation and source may wish to investigate the underlying problem
further, with a view to improving the BIOS's API compatibility.

Regards,

Brian.

----------------------------------------------------------------------------
Aside: while I was looking at this, I thought I'd check out another oddity
of pxelinux: it writes in a tiny window of 15-character lines, as you can
see from the paste output above.

Now, I see that in font.inc, it tries to calculate the screen size (rows and
cols), if necessary doing some arithmetic on pixels and font sizes. But my
suspicion is that the issue arises here:

vidrows_ok:     mov [VidRows],al
                mov ah,0fh
                int 10h                         ; Read video state
                dec ah                          ; Store count-1 (same as rows)
                mov [VidCols],ah
                popa
                ret

If int10h with ah=0fh were a no-op, then we'd end up with 0e in VidCols,
which I think would give 15-column output.

Looking at documentation for what BIOS API calls the Soekris implements
could clear this up, but I've been unable to find any. However, this problem
is a minor inconvenience only.

Another issue is why the text is written so slowly. This again is only a
minor inconvenience. It's a pain when grub is displaying its full menu, but
adjusting grub's menu.lst so that it has

terminal --timeout=0 --dumb serial
hiddenmenu

works around this problem.

This is also the solution to the often-reported problem that the Soekris
won't boot into Linux with no serial cable connected. Maybe the BIOS issue
I outline above could be the cause of this too.
----------------------------------------------------------------------------
_______________________________________________
Soekris-tech mailing list
[email protected]
http://lists.soekris.com/mailman/listinfo/soekris-tech

Reply via email to