Re: kernel panic in gsync_wait
Hi, On Fri, Nov 04, 2016 at 12:14:27PM -1000, Brent W. Baccala wrote: > How do we want to handle fixed bugs, like this gsync problem? Should we > open and close a bug report, just so it's documented in the bug database? > I can open the bug, of course, but I don't have permission to close it... Nah, the tracker is not supposed to be a complete history of all issues ever... It's just to keep track of known open issues. -antrik-
Re: kernel panic in gsync_wait
On Fri, Nov 4, 2016 at 12:00 AM, Samuel Thibaultwrote: > Brent W. Baccala, on Thu 03 Nov 2016 15:51:04 -1000, wrote: > > I see... so there must be fallback code (option 2 on my list); I just > haven't > > found it. > > > > Where is KERN_FAILURE handled in user space? > > It's not. Gsync_wait just returns, and the while loop just tries to take > the lock again. > Of course; it's designed to replace a spinlock! Very clever. Clever can cause problems, though. We need to document this. I've booted a new kernel and the weird ext2fs data corruption problems have gone away. I'm currently working on a test case where I create a small ramdisk and fill it to exhaustion. 'dd' doesn't cleanly error out, though; it hangs. I'm working through the error handling code in libpager (and friends) and have already found one problem, but there are others. I guess I should file it as a bug. How do we want to handle fixed bugs, like this gsync problem? Should we open and close a bug report, just so it's documented in the bug database? I can open the bug, of course, but I don't have permission to close it... agape brent
Re: kernel panic in gsync_wait
Brent W. Baccala, on Thu 03 Nov 2016 15:51:04 -1000, wrote: > I see... so there must be fallback code (option 2 on my list); I just haven't > found it. > > Where is KERN_FAILURE handled in user space? It's not. Gsync_wait just returns, and the while loop just tries to take the lock again. Samuel
Re: kernel panic in gsync_wait
On Thu, Nov 3, 2016 at 2:18 PM, Samuel Thibaultwrote: > Brent W. Baccala, on Thu 03 Nov 2016 14:12:41 -1000, wrote: > > I suspect that this ultimately affects just about every program on a > > Hurd system. > > Sure, it's used internally by glibc. But see the commit I made to > gnumach: that makes gnumach return an error. glibc thus doesn't actually > wait, just spinlock, i.e. behave correctly, just not so efficiently. > I see... so there must be fallback code (option 2 on my list); I just haven't found it. Where is KERN_FAILURE handled in user space? agape brent
Re: kernel panic in gsync_wait
Brent W. Baccala, on Thu 03 Nov 2016 14:12:41 -1000, wrote: > I suspect that this ultimately affects just about every program on a > Hurd system. Sure, it's used internally by glibc. But see the commit I made to gnumach: that makes gnumach return an error. glibc thus doesn't actually wait, just spinlock, i.e. behave correctly, just not so efficiently. Samuel
Re: kernel panic in gsync_wait
Hi - I've been trying to figure what to do with the gsync code. It causes undefined behavior and occasional kernel panics when rpctrace is used on gsync_wait and gsync_wake calls. I suspect that this ultimately affects just about every program on a Hurd system. Even if a program doesn't call locking primitives directly, they're used by the C library, right? On Sun, Oct 30, 2016 at 10:16 PM, Kalle Olavi Niemitalowrote: > "Brent W. Baccala" writes: > > > Even if I'm right about the nature of this bug, I don't understand > gnumach > > well enough to know how a task should access another task's memory. > > vm_copy apparently supports such access; code from there could be > reused. But if rpctrace uses gsync_wait on the address space of > another task and the page has been paged out, then the call could > end up blocking for the pager, and I don't think you want that. > I studied vm_copy and came up with some code, but gsync_wake needs to actually modify the memory location, so setting up a copy doesn't work. Plus, vm_copy makes a mapping that is visible to user space and leaves the user space code responsible for deallocating it. As for blocking for the pager, I think the code does that already. As innocuous as this line of code looks: *(unsigned int *)addr = val; it can trigger a page fault and block the RPC, right? Here's what I think gsync_wait and gsync_wake need to do if their task argument isn't the current task: 1. lookup the address in the other task's pagemap, flag the map entry 'shared', and insert a copy of it into the kernel pagemap, 2. access the memory location, triggering a page fault which actually creates the physical mapping, since Mach doesn't create physical maps until required, then 3a. either remove the pagemap entry from the kernel map when we're done, or 3b. leave it for future use. If we choose option (3a), then every wrapped call to gsync_{wait,wake} will trigger a page fault. If we choose option (3b), then step 1 needs to be modified to search the kernel pagemap to see if it already has a copy of the map entry in question. I have to think some more to figure how that mapping ultimately gets removed, and of course there's a memory exhaustion issue if we add lots of new entries to the kernel map on a 32-bit system. Another possibility would be to reject the RPC back to user space, and put the old locking code back into glibc as a fallback. All this brings us back to... On Sun, Oct 30, 2016 at 10:16 PM, Kalle Olavi Niemitalo wrote: > > Could gsync_wait be removed from gnumach.defs and replaced with > only a trap that does not take a task_t parameter and cannot be > intercepted by rpctrace? > I don't think we have anything else quite like gsync - kernel code that directly accesses user space memory in a different task from the one that trapped. Our choices seem to be: 1. add a lot of new complexity to the gsync kernel routines 2. add fallback code to glibc 3. add a new system call as Kalle proposed Comments? agape brent
Re: kernel panic in gsync_wait
"Brent W. Baccala"writes: > My new and improved rpctrace is generating kernel panics when run on > ext2fs. This happens when rpctrace calls gsync_wait, with ext2fs as the > 'task' argument. Could gsync_wait be removed from gnumach.defs and replaced with only a trap that does not take a task_t parameter and cannot be intercepted by rpctrace? (If the gsync code is not deleted altogether because of the license conflict.) > Even if I'm right about the nature of this bug, I don't understand gnumach > well enough to know how a task should access another task's memory. vm_copy apparently supports such access; code from there could be reused. But if rpctrace uses gsync_wait on the address space of another task and the page has been paged out, then the call could end up blocking for the pager, and I don't think you want that.
kernel panic in gsync_wait
Hi - My new and improved rpctrace is generating kernel panics when run on ext2fs. This happens when rpctrace calls gsync_wait, with ext2fs as the 'task' argument. Could you guys look at gnumach's kern/gsync.c, at line 212? It looks to me like that code tacitly assumes that the 'addr' it's accessing is in the memory space of the task calling the RPC, instead of the task passed in as the first argument. Even if I'm right about the nature of this bug, I don't understand gnumach well enough to know how a task should access another task's memory. agape brent