Re: Fwd: Hurd shutdown problems

2016-08-15 Thread Brent W. Baccala
Aloha -

I've updated to the latest Debian kernel package, which includes Samuel's
patch.  (thank you)

This fixes the symbol table problem, but my VM still locks up after a
failed swapoff.

I do, however, get symbolic names displayed correctly from the kernel
debugger at that point.

Obviously, there is another bug, and I will continue to hunt for it.

agape
brent


Re: Fwd: Hurd shutdown problems

2016-08-12 Thread Richard Braun
On Fri, Aug 12, 2016 at 09:07:48PM +0200, Samuel Thibault wrote:
> > I'm curious: what makes it definitely wrong on a PC ?
> 
> A PC has BIOS stuff between A and 10.

Right, misread a 0 again.

-- 
Richard Braun



Re: Fwd: Hurd shutdown problems

2016-08-12 Thread Samuel Thibault
Richard Braun, on Fri 12 Aug 2016 21:06:48 +0200, wrote:
> On Fri, Aug 12, 2016 at 07:53:08PM +0200, Samuel Thibault wrote:
> > That's what I'm talking about, and that's the second part of the printfs
> > above, and they are wrong: 1-100 is definitely wrong on a PC,
> > and it includes the debugging symbols.
> 
> I'm curious: what makes it definitely wrong on a PC ?

A PC has BIOS stuff between A and 10.

Samuel



Re: Fwd: Hurd shutdown problems

2016-08-12 Thread Richard Braun
On Fri, Aug 12, 2016 at 07:53:08PM +0200, Samuel Thibault wrote:
> That's what I'm talking about, and that's the second part of the printfs
> above, and they are wrong: 1-100 is definitely wrong on a PC,
> and it includes the debugging symbols.

I'm curious: what makes it definitely wrong on a PC ?

> Yes, but only the heap. The load of segments not containing the heap is
> full:
> 
> vm_page_load 1-100 1-100
> vm_page_load 7a00-7ffe 7a00-7ffe

That's indeed a problem, and one that I don't see in X15...

I guess I didn't have the case where a segment didn't clip with the
heap until now, in which case, instead of being completely loaded
as available, it should be loaded as reserved, and only later made
available.

-- 
Richard Braun



Re: Fwd: Hurd shutdown problems

2016-08-12 Thread Samuel Thibault
Richard Braun, on Fri 12 Aug 2016 19:57:10 +0200, wrote:
> On Fri, Aug 12, 2016 at 05:17:26PM +0200, Samuel Thibault wrote:
> > biosmem: heap: 114f000-7a00
> > 
> > and objdump shows:
> > 
> > LOAD off0x1000 vaddr 0x8100 paddr 0x0100 align 2**12
> >  filesz 0x00114700 memsz 0x00114700 flags r-x
> > LOAD off0x00116000 vaddr 0x81115000 paddr 0x01115000 align 2**12
> >  filesz 0xd151 memsz 0x00039384 flags rw-
> > 
> > It seems biosmem's heap doesn't even exclude the kernel data?!
> 
> Could be a mistake with regard to the linker script.

No, I was just misreading the figures. 0x0100 + 0x00114700 is
0x01114700, and 0x01115000 plus 0x00039384 is 0x0114e384, and I confused
0x00114700 with 0x0114e384.

Samuel



Re: Fwd: Hurd shutdown problems

2016-08-12 Thread Richard Braun
On Fri, Aug 12, 2016 at 07:57:10PM +0200, Richard Braun wrote:
> On Fri, Aug 12, 2016 at 05:17:26PM +0200, Samuel Thibault wrote:
> > biosmem: heap: 114f000-7a00
> > 
> > and objdump shows:
> > 
> > LOAD off0x1000 vaddr 0x8100 paddr 0x0100 align 2**12
> >  filesz 0x00114700 memsz 0x00114700 flags r-x
> > LOAD off0x00116000 vaddr 0x81115000 paddr 0x01115000 align 2**12
> >  filesz 0xd151 memsz 0x00039384 flags rw-
> > 
> > It seems biosmem's heap doesn't even exclude the kernel data?!
> 
> Could be a mistake with regard to the linker script.

Misread it too :).

-- 
Richard Braun



Re: Fwd: Hurd shutdown problems

2016-08-12 Thread Richard Braun
On Fri, Aug 12, 2016 at 05:17:26PM +0200, Samuel Thibault wrote:
> biosmem: heap: 114f000-7a00
> 
> and objdump shows:
> 
> LOAD off0x1000 vaddr 0x8100 paddr 0x0100 align 2**12
>  filesz 0x00114700 memsz 0x00114700 flags r-x
> LOAD off0x00116000 vaddr 0x81115000 paddr 0x01115000 align 2**12
>  filesz 0xd151 memsz 0x00039384 flags rw-
> 
> It seems biosmem's heap doesn't even exclude the kernel data?!

Could be a mistake with regard to the linker script.

-- 
Richard Braun



Re: Fwd: Hurd shutdown problems

2016-08-12 Thread Samuel Thibault
Richard Braun, on Fri 12 Aug 2016 19:50:51 +0200, wrote:
> On Fri, Aug 12, 2016 at 07:08:07PM +0200, Samuel Thibault wrote:
> > More precisely though, adding debugging to vm_page_load:
> > 
> > vm_page_load 1-100 1-100
> > vm_page_load 100-7a00 114f000-79c41000
> > vm_page_load 7a00-7ffe 7a00-7ffe
> > 
> > I.e. it properly skips the kernel (100-114f000), but nothing else.
> > 
> > What is supposed to exclude everything else? (modules, VGA BIOS, etc.)
> 
> Look at the vm_page_load calls and you'll see there is a range of
> available pages inside each loaded region.

That's what I'm talking about, and that's the second part of the printfs
above, and they are wrong: 1-100 is definitely wrong on a PC,
and it includes the debugging symbols.

> 3/ When enabling the virtual memory system, biosmem_setup is called.
> It loads each segment into the vm_page module, but is careful to clip
> the biosmem heap from them.

But nothing else.

> When loading a segment, the biosmem heap
> part is passed as [avail_start, avail_end] to vm_page_load.

> To properly answer your question, step 1/ is what looks at the boot
> data (biosmem_find_boot_data) so that the resulting heap boundaries
> completely exclude any.

Yes, but only the heap. The load of segments not containing the heap is
full:

vm_page_load 1-100 1-100
vm_page_load 7a00-7ffe 7a00-7ffe

Samuel



Re: Fwd: Hurd shutdown problems

2016-08-12 Thread Richard Braun
On Fri, Aug 12, 2016 at 07:08:07PM +0200, Samuel Thibault wrote:
> More precisely though, adding debugging to vm_page_load:
> 
> vm_page_load 1-100 1-100
> vm_page_load 100-7a00 114f000-79c41000
> vm_page_load 7a00-7ffe 7a00-7ffe
> 
> I.e. it properly skips the kernel (100-114f000), but nothing else.
> 
> What is supposed to exclude everything else? (modules, VGA BIOS, etc.)

Look at the vm_page_load calls and you'll see there is a range of
available pages inside each loaded region.

Here is how biosmem operates :

1/ biosmem_bootstrap is called. It sets the early allocator heap by
looking at each segment (dma, directmap and highmem if present).
Each segment and their associated heap is stored in biosmem_segments.
The biosmem heap is the heap with most available pages that can
be directly mapped by the kernel once paging is enabled. Other heaps
are of no interest after this.

2/ When using the early allocator (biosmem_bootalloc), a chunk of
contiguous pages is removed from the heap and given to the caller.

3/ When enabling the virtual memory system, biosmem_setup is called.
It loads each segment into the vm_page module, but is careful to clip
the biosmem heap from them. When loading a segment, the biosmem heap
part is passed as [avail_start, avail_end] to vm_page_load.

4/ Once the VM system is enabled, memory that wasn't part of the
heap is normally reserved, and can be made available by calling
vm_page_manage.

To properly answer your question, step 1/ is what looks at the boot
data (biosmem_find_boot_data) so that the resulting heap boundaries
completely exclude any.

-- 
Richard Braun



Re: Fwd: Hurd shutdown problems

2016-08-12 Thread Samuel Thibault
Samuel Thibault, on Fri 12 Aug 2016 19:08:07 +0200, wrote:
> What is supposed to exclude everything else? (modules, VGA BIOS, etc.)

I'm tempted to apply the attached patch at least to the Debian package
to brown-tape-fix the issue.

What it does is:

- Make biosmem_load_segment look for the biggest area available inside
  the segment, instead of assuming that either the segment contains the
  biosmem heap and only the available part of the heap should be used,
  or it doesn't contain the heap, and thus the whole segment should be
  used.

- Make biosmem_find_boot_data skip the biosmem heap, as well as the
  wholes in the biosmem map.

Samuel
diff --git a/i386/i386at/biosmem.c b/i386/i386at/biosmem.c
index a7a440e..2ca1f61 100644
--- a/i386/i386at/biosmem.c
+++ b/i386/i386at/biosmem.c
@@ -440,9 +440,34 @@ biosmem_find_boot_data(const struct multiboot_raw_info 
*mbi, uint32_t min,
 struct elf_shdr *shdr;
 uint32_t i, start, end = end;
 unsigned long tmp;
+const struct biosmem_map_entry *entry;
 
 start = max;
 
+/* Exclude unmapped areas */
+i = 0;
+entry = biosmem_map;
+while (entry < biosmem_map + biosmem_map_size)
+{
+/* Exclude memory before this entry */
+if (i < entry->base_addr)
+biosmem_find_boot_data_update(min, , , i, 
entry->base_addr);
+if (entry->type == BIOSMEM_TYPE_AVAILABLE)
+/* Do not exclude this area */
+i = entry->base_addr + entry->length;
+else
+/* Exclude this area too */
+i = entry->base_addr;
+entry++;
+}
+/* Exclude last entry and anything else beyond */
+if (i < max)
+biosmem_find_boot_data_update(min, , , i, max);
+
+if (biosmem_heap_cur)
+/* Heap is in use */
+biosmem_find_boot_data_update(min, , , biosmem_heap_cur, 
biosmem_heap_end);
+
 biosmem_find_boot_data_update(min, , , _kvtophys(&_start),
   _kvtophys(&_end));
 
@@ -738,6 +763,8 @@ biosmem_load_segment(struct biosmem_segment *seg, uint64_t 
max_phys_end,
  phys_addr_t avail_start, phys_addr_t avail_end)
 {
 unsigned int seg_index;
+phys_addr_t start, end, max_start, max_end;
+uint32_t next;
 
 seg_index = seg - biosmem_segments;
 
@@ -753,6 +780,34 @@ biosmem_load_segment(struct biosmem_segment *seg, uint64_t 
max_phys_end,
 phys_end = max_phys_end;
 }
 
+#ifndef MACH_HYP
+max_start = phys_start;
+max_end = phys_start;
+next = phys_start;
+
+do {
+   extern struct multiboot_info boot_info;
+
+start = next;
+end = biosmem_find_boot_data((struct multiboot_raw_info *)_info, 
start, phys_end, );
+
+if (end == 0) {
+end = phys_end;
+next = 0;
+}
+
+if ((end - start) > (max_end - max_start)) {
+max_start = start;
+max_end = end;
+}
+} while (next != 0);
+
+max_start = round_page(max_start);
+max_end = trunc_page(max_end);
+
+seg->avail_start = max_start;
+seg->avail_end = max_end;
+#else
 if ((avail_start < phys_start) || (avail_start >= phys_end))
 avail_start = phys_start;
 
@@ -761,7 +816,9 @@ biosmem_load_segment(struct biosmem_segment *seg, uint64_t 
max_phys_end,
 
 seg->avail_start = avail_start;
 seg->avail_end = avail_end;
-vm_page_load(seg_index, phys_start, phys_end, avail_start, avail_end);
+#endif
+
+vm_page_load(seg_index, phys_start, phys_end, seg->avail_start, 
seg->avail_end);
 }
 
 void __init


Re: Fwd: Hurd shutdown problems

2016-08-12 Thread Samuel Thibault
Hello,

Samuel Thibault, on Fri 12 Aug 2016 17:27:42 +0200, wrote:
> Samuel Thibault, on Fri 12 Aug 2016 17:17:26 +0200, wrote:
> > biosmem: heap: 114f000-7a00
> > 
> > and objdump shows:
> > 
> > LOAD off0x1000 vaddr 0x8100 paddr 0x0100 align 2**12
> >  filesz 0x00114700 memsz 0x00114700 flags r-x
> > LOAD off0x00116000 vaddr 0x81115000 paddr 0x01115000 align 2**12
> >  filesz 0xd151 memsz 0x00039384 flags rw-
> > 
> > It seems biosmem's heap doesn't even exclude the kernel data?!
> 
> Ah, sorry, it does, I misread it. But it doesn't exclude anything else,
> apparently.

Actually it does since it's only 114f000-7a00. 
I guess the biosmem heap was only meant for biosmem_bootalloc
allocations.

More precisely though, adding debugging to vm_page_load:

vm_page_load 1-100 1-100
vm_page_load 100-7a00 114f000-79c41000
vm_page_load 7a00-7ffe 7a00-7ffe

I.e. it properly skips the kernel (100-114f000), but nothing else.

What is supposed to exclude everything else? (modules, VGA BIOS, etc.)

Samuel



Re: Fwd: Hurd shutdown problems

2016-08-12 Thread Samuel Thibault
Samuel Thibault, on Fri 12 Aug 2016 17:17:26 +0200, wrote:
> biosmem: heap: 114f000-7a00
> 
> and objdump shows:
> 
> LOAD off0x1000 vaddr 0x8100 paddr 0x0100 align 2**12
>  filesz 0x00114700 memsz 0x00114700 flags r-x
> LOAD off0x00116000 vaddr 0x81115000 paddr 0x01115000 align 2**12
>  filesz 0xd151 memsz 0x00039384 flags rw-
> 
> It seems biosmem's heap doesn't even exclude the kernel data?!

Ah, sorry, it does, I misread it. But it doesn't exclude anything else,
apparently.

Samuel



Re: Fwd: Hurd shutdown problems

2016-08-12 Thread Samuel Thibault
Brent W. Baccala, on Thu 11 Aug 2016 15:29:27 -1000, wrote:
> On Wed, Aug 10, 2016 at 4:33 AM, Richard Braun <[1]rbr...@sceen.net> wrote:
> 
> On Wed, Aug 10, 2016 at 04:26:35PM +0200, Richard Braun wrote:
> > the boot loader (see MULTIBOOT_FLAGS in boothdr.S), and at
> > some point, late during the boot process, module data are freed
> > using (see free_bootstrap_pages in bootstrap.c). This might
> 
> Using vm_page_manage().
> 
> The symbol table is far enough away from the module data that I don't think
> it's getting freed at that point.

Ok, but:

> But it does seem to be freed.  Please check my calculations.
> 
> Here's the location of the symbol table in virtual memory.
> 
> (gdb) print self->start
> $15 = (Elf32_Sym *) 0x804fb5ec

That's very far. i386/intel/pmap.c's pmap_bootstrap uses the etext
symbol so as to know what to make read-only, notably. We'd probably want
to make the symbol table read-only too.

Looking at the output of the biosmem code on my box:

biosmem: physical memory map:
biosmem: 00:09fc00, available
biosmem: 09fc00:0a, reserved
biosmem: 0f:10, reserved
biosmem: 10:007ffe, available
biosmem: 007ffe:008000, reserved
biosmem: 00feffc000:00ff00, reserved
biosmem: 00fffc:01, reserved
biosmem: heap: 114f000-7a00

and objdump shows:

LOAD off0x1000 vaddr 0x8100 paddr 0x0100 align 2**12
 filesz 0x00114700 memsz 0x00114700 flags r-x
LOAD off0x00116000 vaddr 0x81115000 paddr 0x01115000 align 2**12
 filesz 0xd151 memsz 0x00039384 flags rw-

It seems biosmem's heap doesn't even exclude the kernel data?!

Samuel



Re: Fwd: Hurd shutdown problems

2016-08-11 Thread Brent W. Baccala
On Wed, Aug 10, 2016 at 4:33 AM, Richard Braun  wrote:

> On Wed, Aug 10, 2016 at 04:26:35PM +0200, Richard Braun wrote:
> > the boot loader (see MULTIBOOT_FLAGS in boothdr.S), and at
> > some point, late during the boot process, module data are freed
> > using (see free_bootstrap_pages in bootstrap.c). This might
>
> Using vm_page_manage().
>
> --
> Richard Braun
>

The symbol table is far enough away from the module data that I don't think
it's getting freed at that point.

But it does seem to be freed.  Please check my calculations.

Here's the location of the symbol table in virtual memory.

(gdb) print self->start
$15 = (Elf32_Sym *) 0x804fb5ec

Here's its location in physical memory.

(gdb) print *symtab
$23 = {sh_name = 1, sh_type = 2, sh_flags = 0, sh_addr = 5223916, sh_offset
= 5367452, sh_size = 70736, sh_link = 16,
  sh_info = 1663, sh_addralign = 4, sh_entsize = 16}

(gdb) printf "%x\n", 5223916
4fb5ec

Now, with the system fully booted, I find this address's page:

(gdb) print (5223916 - vm_page_segs[0].start)/4096
$44 = 1259

...and now start looking at the page table entries:

(gdb) print vm_page_segs[0].pages[1259].type
$52 = 0
(gdb) print vm_page_segs[0].pages[1260].type
$53 = 0
(gdb) print vm_page_segs[0].pages[1261].type
$54 = 0
(gdb) print vm_page_segs[0].pages[1262].type
$55 = 0

0 is VM_PT_FREE.  It should be VM_PT_RESERVED (1), right?

agape
brent


Re: Fwd: Hurd shutdown problems

2016-08-10 Thread Richard Braun
On Wed, Aug 10, 2016 at 04:26:35PM +0200, Richard Braun wrote:
> the boot loader (see MULTIBOOT_FLAGS in boothdr.S), and at
> some point, late during the boot process, module data are freed
> using (see free_bootstrap_pages in bootstrap.c). This might

Using vm_page_manage().

-- 
Richard Braun



Re: Fwd: Hurd shutdown problems

2016-08-09 Thread Brent W. Baccala
On Mon, Aug 8, 2016 at 9:32 PM, Justus Winter  wrote:

> Hello,
>
> "Brent W. Baccala"  writes:
>
> > I don't have to swapoff to have "symptoms".  The kernel debugger normally
> > shows symbolic names, i.e:
> >
> > Stopped  at  machine_idle+0xe:   leave
> > machine_idle(0,81a2c630,3806f64,0,9b448b38)+0xe
> > idle_thread_continue(9fcbdde0,81028b50,9c0c7fe4,0,9c3d5548)+0x2a
> >
> > Once I've got enough swap in use, though, it stops doing this.  Now I
> see:
> >
> > Stopped   at  0x81be: leave
> > 0x81be(0,0,9fcc5990,0,9fb90b30)
> > 0x810293fa(9fcbdde0,81028b50,99526fe4,0,9c3d5548)
>
> Uh :( that is not good.  That sounds like a swap-related corruption in
> the kernel.
>
> > When I see a kernel page fault, it's always in strcmp()
>
> strcmp is used in the elf symbol lookup code, so that might explain the
> fault.
>
>
GDB on the kernel shows a seemingly corrupted ELF symbol table when
elf_db_search_symbol() is called.

Here's what the symbol table looks like when the system boots:

(gdb) print self->start
$3 = (Elf32_Sym *) 0x804fb5ec
(gdb) print self->start[0]
$4 = {st_name = 0, st_value = 0, st_size = 0, st_info = 0 '\000', st_other
= 0 '\000', st_shndx = 0}
(gdb) print self->start[1]
$5 = {st_name = 0, st_value = 2164260864, st_size = 0, st_info = 3 '\003',
st_other = 0 '\000', st_shndx = 1}
(gdb) print self->start[2]
$6 = {st_name = 0, st_value = 2165125376, st_size = 0, st_info = 3 '\003',
st_other = 0 '\000', st_shndx = 2}
(gdb) print self->start[3]
$7 = {st_name = 0, st_value = 2165262992, st_size = 0, st_info = 3 '\003',
st_other = 0 '\000', st_shndx = 3}
(gdb) print self->start[4]
$8 = {st_name = 0, st_value = 2165395456, st_size = 0, st_info = 3 '\003',
st_other = 0 '\000', st_shndx = 4}
(gdb) print self->start[5]
$9 = {st_name = 0, st_value = 2165452800, st_size = 0, st_info = 3 '\003',
st_other = 0 '\000', st_shndx = 5}
(gdb) print self->start[6]
$10 = {st_name = 0, st_value = 0, st_size = 0, st_info = 3 '\003', st_other
= 0 '\000', st_shndx = 6}

 After I run a certain compile (just make, g++, ld), here's what it looks
like:

(gdb) print self->start
$15 = (Elf32_Sym *) 0x804fb5ec
(gdb) print self->start[0]
$16 = {st_name = 22, st_value = 0, st_size = 0, st_info = 13 '\r', st_other
= 26 '\032', st_shndx = 0}
(gdb) print self->start[1]
$17 = {st_name = 0, st_value = 562210328, st_size = 562101944, st_info = 0
'\000', st_other = 0 '\000', st_shndx = 0}
(gdb) print self->start[2]
$18 = {st_name = 0, st_value = 0, st_size = 0, st_info = 0 '\000', st_other
= 0 '\000', st_shndx = 0}
(gdb) print self->start[3]
$19 = {st_name = 0, st_value = 0, st_size = 0, st_info = 3 '\003', st_other
= 0 '\000', st_shndx = 0}
(gdb) print self->start[4]
$20 = {st_name = 23, st_value = 0, st_size = 0, st_info = 13 '\r', st_other
= 26 '\032', st_shndx = 0}
(gdb) print self->start[5]
$22 = {st_name = 0, st_value = 562210352, st_size = 562210400, st_info = 0
'\000', st_other = 0 '\000', st_shndx = 0}
(gdb) print self->start[6]
$23 = {st_name = 0, st_value = 0, st_size = 0, st_info = 0 '\000', st_other
= 0 '\000', st_shndx = 0}
(gdb) print self->start[7]
$24 = {st_name = 0, st_value = 0, st_size = 0, st_info = 3 '\003', st_other
= 0 '\000', st_shndx = 0}

Both GDB traces are with the kernel halted near the beginning of
elf_db_search_symbol(), called from the kernel debugger:

(gdb) where
#0  elf_db_search_symbol (stab=0x81127b00 , off=2164261054,
strategy=2, diffp=0x81124ea0 )
at ../ddb/db_elf.c:159
#1  0x810132e7 in db_search_in_task_symbol (val=2164261054, strategy=2,
offp=0x81124f10 , task=0x0)
at ../ddb/db_sym.c:354
#2  0x8101342a in db_search_task_symbol (val=2164261054, strategy=2,
offp=0x81124f10 , task=0x0)
at ../ddb/db_sym.c:315
#3  0x810135dd in db_task_printsym (off=2164261054, strategy=2, task=0x0)
at ../ddb/db_sym.c:458
#4  0x8100f377 in db_print_loc_and_inst (loc=2164261054, task=0x0) at
../ddb/db_examine.c:328
#5  0x8104fe9d in db_task_trap (type=-1, code=0, user_space=0) at
../ddb/db_trap.c:92
#6  0x81045d61 in kdb_kentry (int_regs=0x81124fe8 ) at
../i386/i386/db_interface.c:392
#7  0x810082ac in kdb_from_iret () at ../i386/i386/locore.S:864
#8  0x942dff6c in ?? ()
#9  0x81146610 in default_pset ()
#10 0x in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Any chance the symbol table could have been swapped out?  Any idea how to
debug it?

> I'm just learning Hurd.  Any ideas?
>
> Keep at it, the Hurd is an interesting system to learn from.  But you
> might want to start with a simpler problem.
>
>
I wouldn't mind a simpler problem, but I want to get my system cleanly
booting and shutting down!

I hate this kind of "recursion", but hopefully the result will be a better
system.

agape
brent


Re: Fwd: Hurd shutdown problems

2016-08-09 Thread Justus Winter
Hello,

"Brent W. Baccala"  writes:

> Further progress trying to track this down:
>
> I don't have to shutdown the system to have problems.  "swapoff /dev/hd0s5"
> is enough to cause problems, once enough swap is in use.  After a failed
> swapoff, I have an extra 98 storeio processes running!

:(

So we are seeing different problems.  I sometimes see the shutdown hang
way before swapoff is called.

Nevertheless, I have finished my little utility that you can use to make
a shell survive the shutdown process:

http://darnassus.sceen.net/~teythoon/bless

> I don't have to swapoff to have "symptoms".  The kernel debugger normally
> shows symbolic names, i.e:
>
> Stopped  at  machine_idle+0xe:   leave
> machine_idle(0,81a2c630,3806f64,0,9b448b38)+0xe
> idle_thread_continue(9fcbdde0,81028b50,9c0c7fe4,0,9c3d5548)+0x2a
>
> Once I've got enough swap in use, though, it stops doing this.  Now I see:
>
> Stopped   at  0x81be: leave
> 0x81be(0,0,9fcc5990,0,9fb90b30)
> 0x810293fa(9fcbdde0,81028b50,99526fe4,0,9c3d5548)

Uh :( that is not good.  That sounds like a swap-related corruption in
the kernel.

> When I see a kernel page fault, it's always in strcmp()

strcmp is used in the elf symbol lookup code, so that might explain the
fault.

> I can't task_terminate the auth server, as this typically does nothing once
> I've started having symptoms, but I can kill the auth server from the
> command line (just "kill 7") and that triggers a reboot that leaves the
> disk in a clean state.

Well, once the symbol lookup mechanism is fried, you likely cannot
!task_terminate anything anymore, since this relies on that mechanism.

> I'm just learning Hurd.  Any ideas?

Keep at it, the Hurd is an interesting system to learn from.  But you
might want to start with a simpler problem.


Cheers,
Justus


signature.asc
Description: PGP signature


Fwd: Hurd shutdown problems

2016-08-08 Thread Brent W. Baccala
Further progress trying to track this down:

I don't have to shutdown the system to have problems.  "swapoff /dev/hd0s5"
is enough to cause problems, once enough swap is in use.  After a failed
swapoff, I have an extra 98 storeio processes running!

I don't have to swapoff to have "symptoms".  The kernel debugger normally
shows symbolic names, i.e:

Stopped  at  machine_idle+0xe:   leave
machine_idle(0,81a2c630,3806f64,0,9b448b38)+0xe
idle_thread_continue(9fcbdde0,81028b50,9c0c7fe4,0,9c3d5548)+0x2a

Once I've got enough swap in use, though, it stops doing this.  Now I see:

Stopped   at  0x81be: leave
0x81be(0,0,9fcc5990,0,9fb90b30)
0x810293fa(9fcbdde0,81028b50,99526fe4,0,9c3d5548)

When I see a kernel page fault, it's always in strcmp()

It doesn't matter if an ssh session is open or not (Riccardo Mottola's
suggestion).

I can't task_terminate the auth server, as this typically does nothing once
I've started having symptoms, but I can kill the auth server from the
command line (just "kill 7") and that triggers a reboot that leaves the
disk in a clean state.

I'm just learning Hurd.  Any ideas?

agape
brent