Re: [uml-devel] 2.6.25 uml kernel crashes when it calls down() on a semaphore with zero counter
On Mon, Jul 14, 2008 at 10:46 PM, Jeff Dike <[EMAIL PROTECTED]> wrote: > On Mon, Jul 14, 2008 at 05:06:49PM +0800, Jiaying Zhang wrote: > > The 2.6.24 kernels are OK, but I have seen this problem with all of the > > 2.6.25 kernels I have tried. There have been a lot of changes between > > 2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead > > to this problem. > > So bisect it. The problem seems to be related to the getting rid of fastcall changes introduced in 2.6.25 kernels. I found the problem started to happen from commit 82f74e7159749cc511ebf5954a7b9ea6ad634949: x86: unify include/asm-x86/linkage_[32|64].h. After that, several commits related to __down_interruptible had been checked in, but they did not solve the crashing problem I saw. In particular, I thought the d50efc6c40620b2e11648cac64ebf4a824e40382 x86: fix UML and -regparm=3 commit would solve the problem because it adds the asmregparm macro that is the same as fastcall and uses the macro for __down_failed_interruptible declaration. Unfortunately, I tried that version of git code and saw the same problem happened. > > Looks like the problem happens when __down_interruptible is called. > > I checked the semaphore passed to __down_interruptible under gdb > > and found it was corrupted: > > (gdb) f 18 > > #18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50 > > 50 prev->next = new; > > (gdb) p sem > > $15 = (struct semaphore *) 0x9f68d08 > > (gdb) p *sem > > $16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock = > > {raw_lock = {}}, task_list = { > > next = 0x9f68d5c, prev = 0x18124}}} > > > > But the semaphore looks correct before calling down_interruptible: > > What's the problem with debugging this, then? You step through the > code starting when the semaphore is good and see exactly when it gets > corrupted. > Yes. Looks like the corruption happens when __down_failed_interruptible() calls __down_interruptible() and it has something to do with the 2.6.25's x86 gcc attribute changes. Jiaying - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/___ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
Re: [uml-devel] Build Errors
On Tue, Jul 15, 2008 at 11:32 AM, Jeff Dike <[EMAIL PROTECTED]> wrote: > On Fri, Jul 11, 2008 at 02:53:09AM -0500, Stoyan Gaydarov wrote: >> If I did these builds wrong then do let me know what I can >> do to fix them and re-run them so that I can provide some useful >> information in the future. > > There's not a lot of point to tarring up 17 logs when there are only > four distinct failures. Two of them were due to you not having pcap > or VDE installed on the host. > > The next most common one looks like SMP getting turned on somehow. > > There's one with some undefined symbols at the very end, which I have > never seen before. I am glad I could help with this even though the rest of the logs were pretty much useless. Any suggestions on what I can do to actually do this correctly would be greatly apreciated. > >Jeff > > -- > Work email - jdike at linux dot intel dot com > - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ___ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
Re: [uml-devel] 2.6.25 uml kernel crashes when it calls down() on a semaphore with zero counter
The patch below solves the 2.6.25 uml crash problem for me. Looks like the problem should be away in 2.6.26 kernel because down_interruptible has changed to the C code since 2.6.26. But I got kernel panic while booting the 2.6.26 kernel :(. --- linux-2.6.25.4/lib/semaphore-sleepers.c 2008-05-15 23:00:12.0 +0800 +++ linux-2.6.25.4-new/lib/semaphore-sleepers.c 2008-07-17 12:20:47.0 +0800 @@ -48,12 +48,12 @@ *we cannot lose wakeup events. */ -void __up(struct semaphore *sem) +asmregparm void __up(struct semaphore *sem) { wake_up(&sem->wait); } -void __sched __down(struct semaphore *sem) +asmregparm void __sched __down(struct semaphore *sem) { struct task_struct *tsk = current; DECLARE_WAITQUEUE(wait, tsk); @@ -90,7 +90,7 @@ void __sched __down(struct semaphore *se tsk->state = TASK_RUNNING; } -int __sched __down_interruptible(struct semaphore *sem) +asmregparm int __sched __down_interruptible(struct semaphore *sem) { int retval = 0; struct task_struct *tsk = current; @@ -153,7 +153,7 @@ int __sched __down_interruptible(struct * single "cmpxchg" without failure cases, * but then it wouldn't work on a 386. */ -int __down_trylock(struct semaphore *sem) +asmregparm int __down_trylock(struct semaphore *sem) { int sleepers; unsigned long flags; Jiaying On Wed, Jul 16, 2008 at 5:52 PM, Jiaying Zhang <[EMAIL PROTECTED]> wrote: > > > On Mon, Jul 14, 2008 at 10:46 PM, Jeff Dike <[EMAIL PROTECTED]> wrote: > >> On Mon, Jul 14, 2008 at 05:06:49PM +0800, Jiaying Zhang wrote: >> > The 2.6.24 kernels are OK, but I have seen this problem with all of the >> > 2.6.25 kernels I have tried. There have been a lot of changes between >> > 2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead >> > to this problem. >> >> So bisect it. > > > The problem seems to be related to the getting rid of fastcall changes > introduced in 2.6.25 kernels. I found the problem started to happen from > commit 82f74e7159749cc511ebf5954a7b9ea6ad634949: x86: unify > include/asm-x86/linkage_[32|64].h. > After that, several commits related to __down_interruptible had been > checked in, but they did not solve the crashing problem I saw. > In particular, I thought the d50efc6c40620b2e11648cac64ebf4a824e40382 > x86: fix UML and -regparm=3 commit would solve the problem because it > adds the asmregparm macro that is the same as fastcall and uses the macro > for __down_failed_interruptible declaration. Unfortunately, I tried that > version > of git code and saw the same problem happened. > > >> > Looks like the problem happens when __down_interruptible is called. >> > I checked the semaphore passed to __down_interruptible under gdb >> > and found it was corrupted: >> > (gdb) f 18 >> > #18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50 >> > 50 prev->next = new; >> > (gdb) p sem >> > $15 = (struct semaphore *) 0x9f68d08 >> > (gdb) p *sem >> > $16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock = >> > {raw_lock = {}}, task_list = { >> > next = 0x9f68d5c, prev = 0x18124}}} >> > >> > But the semaphore looks correct before calling down_interruptible: >> >> What's the problem with debugging this, then? You step through the >> code starting when the semaphore is good and see exactly when it gets >> corrupted. >> > > Yes. Looks like the corruption happens when __down_failed_interruptible() > calls __down_interruptible() and it has something to do with the 2.6.25's > x86 > gcc attribute changes. > > Jiaying > > - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/___ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel