Re: A tale of bootstrap ld.so debugging

2011-11-09 Thread Ludovic Courtès
Hi Roland,

Roland McGrath rol...@hack.frob.com skribis:

 I think what we used to say was that you should try to do debugging with
 sub-hurds where the console port is faked with io ports (I think).

The thing is that I was debugging a system cross-built from scratch, so
sub-hurds were not an option.

 We definitely don't want to clutter up ld.so with stuff that is only
 for debugging early boot.  I think it would be better to have
 serverboot supply a trivial io server on the fd ports it hands to the
 initial processes, that translates to console device_write calls in a
 very simple way.

Here the ‘ld.so /hurd/exec’ command line comes from GRUB and is spawned
by Mach directly (serverboot has been deprecated for years and is no
longer in the repository.)

Adding a term server or similar in the early boot process seems tricky
and would probably require tweaking the boot process in diskfs, exec,
 co. in ugly ways, AIUI.

Thanks,
Ludo’.



Re: A tale of bootstrap ld.so debugging

2011-11-08 Thread Roland McGrath
I think what we used to say was that you should try to do debugging with
sub-hurds where the console port is faked with io ports (I think).  We
definitely don't want to clutter up ld.so with stuff that is only for
debugging early boot.  I think it would be better to have serverboot supply
a trivial io server on the fd ports it hands to the initial processes, that
translates to console device_write calls in a very simple way.  I may be
misremembering some details of the boot sequence.



A tale of bootstrap ld.so debugging

2011-11-07 Thread Ludovic Courtès
Hello!

While cross-building a GNU/Hurd QEMU image, I stumbled upon a series of
bugs, the last one of which led to a one-liner adding libgcc_s.so to the
image.

Lack of libgcc_s.so prevented the initial /hurd/exec to run.  The kernel
debugger showed a backtrace in ld.so hinting at an undefined symbol or
missing shared library error:

  $ addr2line -pfa -e 
/nix/store/xbrr8dwqf2ngipa08l7qsvb8hy1y6cx3-glibc-20110623-i586-pc-gnu/lib/ld.so.1
 0x3c82 0x12da7 0x14e64 0x12191 0x1d97 0x12c6d 0x1764c
  0x3c82: _dl_start at ??:0
  0x00012da7: _dl_sysdep_start at ??:0
  0x00014e64: _hurd_startup at ??:0
  0x00012191: go.10655 at ??:0
  0x1d97: dl_main at ??:0
  0x00012c6d: _exit at ??:0
  0x0001764c: syscall_task_terminate at ??:0

However, ld.so remained mute, as no messages appeared on the console.

It turns out that ld.so is written to speak to the outside world using
io_write RPCs to a term, but no such thing is available at boot time.
Furthermore, at this point ld.so cannot open the Mach console because it
is not granted a send right to the device master port.

So the trick was to:

  1. hack Mach to grant ld.so a send right to the
 device master port to all the boot tasks, unconditionally;

  2. change ld.so’s _dl_sysdep_start to open the Mach console;

  3. change libc’s __libc_writev (this is what _dl_printf uses) to use
 raw __device_write_inband calls instead of io_write.

The patches below illustrate that in a very crude way.  ;-)  Actually I
realized that the Mach patch could probably be avoided by just adding
${device-master-port} on the GRUB command-line for ld.so/exec.

Perhaps ld.so could support an additional --device-master-port argument
for that purpose.  But then __libc_writev  co. would also need to be
duplicated to support writing to the Mach console.

Thanks,
Ludo’.

diff --git a/sysdeps/mach/hurd/dl-sysdep.c b/sysdeps/mach/hurd/dl-sysdep.c
index 12c39cd..30b1802 100644
--- a/sysdeps/mach/hurd/dl-sysdep.c
+++ b/sysdeps/mach/hurd/dl-sysdep.c
@@ -44,6 +44,8 @@
 #include dl-machine.h
 #include dl-procinfo.h
 
+#include device/device.h
+
 extern void __mach_init (void);
 
 extern int _dl_argc;
@@ -116,6 +118,29 @@ static void fmh(void) {
 /* XXX loser kludge for vm_map kernel bug */
 #endif
 
+/* Return a port to the Mach console.  */
+static mach_port_t
+get_console (void)
+{
+  mach_port_t device_master, console;
+#if 0
+  error_t err = __get_privileged_ports (0, device_master);
+
+  if (err)
+return MACH_PORT_NULL;
+#else
+  error_t err = 0;
+  device_master = 2;
+#endif
+
+  err = __device_open (device_master, D_WRITE | D_READ, console, console);
+  if (err)
+return MACH_PORT_NULL;
+
+  return console;
+}
+
+static mach_port_t console = MACH_PORT_NULL;
 
 ElfW(Addr)
 _dl_sysdep_start (void **start_argptr,
@@ -256,6 +281,25 @@ unfmh();			/* XXX */
   /* Set up so we can do RPCs.  */
   __mach_init ();
 
+  /* Open the Mach console so that any message can actually be seen.  This is
+ particularly useful at boot time, when started by the bootstrap file
+ system.  */
+  console = get_console ();
+  if (console != MACH_PORT_NULL)
+{
+  int written;
+  __device_write_inband (console, 0, 0, hello, world!\r\n, 15, written);
+
+  /* _hurd_intern_fd (console, O_WRONLY, 0); */
+  /* _hurd_intern_fd (console, O_WRONLY, 0); */
+  /* _hurd_intern_fd (console, O_WRONLY, 0); */
+
+  struct iovec out = { hello stdout!\n, 14 };
+  struct iovec err = { hello stderr!\n, 14 };
+  __writev (STDOUT_FILENO, out, 1);
+  __writev (STDERR_FILENO, err, 1);
+}
+
   /* Initialize frequently used global variable.  */
   GLRO(dl_pagesize) = __getpagesize ();
 
@@ -393,11 +437,24 @@ __libc_write (int fd, const void *buf, size_t nbytes)
   error_t err;
   mach_msg_type_number_t nwrote;
 
+#if 0
   assert (fd  _hurd_init_dtablesize);
 
   err = __io_write (_hurd_init_dtable[fd], buf, nbytes, -1, nwrote);
   if (err)
 return __hurd_fail (err);
+#else
+  if (fd == STDOUT_FILENO || fd == STDERR_FILENO)
+{
+  /* Assume the last byte is \n and convert it to \r\n.  */
+  int n;
+  __device_write_inband (console, 0, 0, buf, nbytes - 1, n);
+  nwrote = n + 2;
+  __device_write_inband (console, 0, 0, \r\n, 2, n);
+}
+  else
+nwrote = 0;
+#endif
 
   return nwrote;
 }
@@ -413,6 +470,25 @@ __writev (int fd, const struct iovec *iov, int niov)
   return -1;
 }
 
+  if (fd == STDOUT_FILENO || fd == STDERR_FILENO)
+{
+  /* Assume the last byte is \n and convert it to \r\n.  */
+  int i;
+  ssize_t total = 0;
+
+  for (i = 0; i  niov; i++)
+	{
+	  int n;
+	  __device_write_inband (console, 0, 0,
+ iov[i].iov_base, iov[i].iov_len - 1,
+ n);
+	  total += n + 2;
+	  __device_write_inband (console, 0, 0, \r\n, 2, n);
+	}
+
+  return total;
+}
+
   int i;
   size_t total = 0;
   for (i = 0; i  niov; ++i)

And the Mach patch:

diff --git a/kern/bootstrap.c b/kern/bootstrap.c
index