Re: [uml-devel] 2.6.25 uml kernel crashes when it calls down() on a semaphore with zero counter

2008-07-16 Thread Jiaying Zhang
On Mon, Jul 14, 2008 at 10:46 PM, Jeff Dike <[EMAIL PROTECTED]> wrote:

> On Mon, Jul 14, 2008 at 05:06:49PM +0800, Jiaying Zhang wrote:
> > The 2.6.24 kernels are OK, but I have seen this problem with all of the
> > 2.6.25 kernels I have tried. There have been a lot of changes between
> > 2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead
> > to this problem.
>
> So bisect it.


The problem seems to be related to the getting rid of fastcall changes
introduced in 2.6.25 kernels. I found the problem started to happen from
commit 82f74e7159749cc511ebf5954a7b9ea6ad634949: x86: unify
include/asm-x86/linkage_[32|64].h.
After that, several commits related to __down_interruptible had been
checked in, but they did not solve the crashing problem I saw.
In particular, I thought the d50efc6c40620b2e11648cac64ebf4a824e40382
x86: fix UML and -regparm=3 commit would solve the problem because it
adds the asmregparm macro that is the same as fastcall and uses the macro
for  __down_failed_interruptible declaration. Unfortunately, I tried that
version
of git code and saw the same problem happened.


> > Looks like the problem happens when __down_interruptible is called.
> > I checked the semaphore passed to __down_interruptible under gdb
> > and found it was corrupted:
> > (gdb) f 18
> > #18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50
> > 50  prev->next = new;
> > (gdb) p sem
> > $15 = (struct semaphore *) 0x9f68d08
> > (gdb) p *sem
> > $16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock =
> > {raw_lock = {}}, task_list = {
> >   next = 0x9f68d5c, prev = 0x18124}}}
> >
> > But the semaphore looks correct before calling down_interruptible:
>
> What's the problem with debugging this, then?  You step through the
> code starting when the semaphore is good and see exactly when it gets
> corrupted.
>

Yes. Looks like the corruption happens when __down_failed_interruptible()
calls __down_interruptible() and it has something to do with the 2.6.25's
x86
gcc attribute changes.

Jiaying
-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/___
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


Re: [uml-devel] Build Errors

2008-07-16 Thread Stoyan Gaydarov
On Tue, Jul 15, 2008 at 11:32 AM, Jeff Dike <[EMAIL PROTECTED]> wrote:
> On Fri, Jul 11, 2008 at 02:53:09AM -0500, Stoyan Gaydarov wrote:
>> If I did these builds wrong then do let me know what I can
>> do to fix them and re-run them so that I can provide some useful
>> information in the future.
>
> There's not a lot of point to tarring up 17 logs when there are only
> four distinct failures.  Two of them were due to you not having pcap
> or VDE installed on the host.
>
> The next most common one looks like SMP getting turned on somehow.
>
> There's one with some undefined symbols at the very end, which I have
> never seen before.

I am glad I could help with this even though the rest of the logs were
pretty much useless. Any suggestions on what I can do to actually do
this correctly would be greatly apreciated.
>
>Jeff
>
> --
> Work email - jdike at linux dot intel dot com
>

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


Re: [uml-devel] 2.6.25 uml kernel crashes when it calls down() on a semaphore with zero counter

2008-07-16 Thread Jiaying Zhang
The patch below solves the 2.6.25 uml crash problem for me. Looks like the
problem should be away in 2.6.26 kernel because down_interruptible has
changed to the C code since 2.6.26. But I got kernel panic while booting
the 2.6.26 kernel :(.

--- linux-2.6.25.4/lib/semaphore-sleepers.c 2008-05-15
23:00:12.0 +0800
+++ linux-2.6.25.4-new/lib/semaphore-sleepers.c 2008-07-17
12:20:47.0 +0800
@@ -48,12 +48,12 @@
  *we cannot lose wakeup events.
  */

-void __up(struct semaphore *sem)
+asmregparm void __up(struct semaphore *sem)
 {
wake_up(&sem->wait);
 }

-void __sched __down(struct semaphore *sem)
+asmregparm void __sched __down(struct semaphore *sem)
 {
struct task_struct *tsk = current;
DECLARE_WAITQUEUE(wait, tsk);
@@ -90,7 +90,7 @@ void __sched __down(struct semaphore *se
tsk->state = TASK_RUNNING;
 }

-int __sched __down_interruptible(struct semaphore *sem)
+asmregparm int __sched __down_interruptible(struct semaphore *sem)
 {
int retval = 0;
struct task_struct *tsk = current;
@@ -153,7 +153,7 @@ int __sched __down_interruptible(struct
  * single "cmpxchg" without failure cases,
  * but then it wouldn't work on a 386.
  */
-int __down_trylock(struct semaphore *sem)
+asmregparm int __down_trylock(struct semaphore *sem)
 {
int sleepers;
unsigned long flags;

Jiaying

On Wed, Jul 16, 2008 at 5:52 PM, Jiaying Zhang <[EMAIL PROTECTED]> wrote:

>
>
> On Mon, Jul 14, 2008 at 10:46 PM, Jeff Dike <[EMAIL PROTECTED]> wrote:
>
>> On Mon, Jul 14, 2008 at 05:06:49PM +0800, Jiaying Zhang wrote:
>> > The 2.6.24 kernels are OK, but I have seen this problem with all of the
>> > 2.6.25 kernels I have tried. There have been a lot of changes between
>> > 2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead
>> > to this problem.
>>
>> So bisect it.
>
>
> The problem seems to be related to the getting rid of fastcall changes
> introduced in 2.6.25 kernels. I found the problem started to happen from
> commit 82f74e7159749cc511ebf5954a7b9ea6ad634949: x86: unify
> include/asm-x86/linkage_[32|64].h.
> After that, several commits related to __down_interruptible had been
> checked in, but they did not solve the crashing problem I saw.
> In particular, I thought the d50efc6c40620b2e11648cac64ebf4a824e40382
> x86: fix UML and -regparm=3 commit would solve the problem because it
> adds the asmregparm macro that is the same as fastcall and uses the macro
> for  __down_failed_interruptible declaration. Unfortunately, I tried that
> version
> of git code and saw the same problem happened.
>
>
>> > Looks like the problem happens when __down_interruptible is called.
>> > I checked the semaphore passed to __down_interruptible under gdb
>> > and found it was corrupted:
>> > (gdb) f 18
>> > #18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50
>> > 50  prev->next = new;
>> > (gdb) p sem
>> > $15 = (struct semaphore *) 0x9f68d08
>> > (gdb) p *sem
>> > $16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock =
>> > {raw_lock = {}}, task_list = {
>> >   next = 0x9f68d5c, prev = 0x18124}}}
>> >
>> > But the semaphore looks correct before calling down_interruptible:
>>
>> What's the problem with debugging this, then?  You step through the
>> code starting when the semaphore is good and see exactly when it gets
>> corrupted.
>>
>
> Yes. Looks like the corruption happens when __down_failed_interruptible()
> calls __down_interruptible() and it has something to do with the 2.6.25's
> x86
> gcc attribute changes.
>
> Jiaying
>
>
-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/___
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel