RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2009-01-04 Thread Zhang, Xiantao
Hi, Isaku  All
The attached patch should fix the weird issue.  In upstream, we also find 
some other weird issues, for example, we can't boot dom0 on some platforms, and 
dom0 may have different behavior with different initrds.  After debug, I found 
it should be caused by incorrect setting for pirq_needs_eoi page.  There are 
two main issues found during the debug: 
1.  the related two hypercalls are not enabled in the correct way, so dom0 and 
hypervisor doesn't have the agreement on which pirq needs EOI. 
2.  the page is not really linked to bss section even if this is the must, so 
kernel deems it as memory cache and uses it for many ways, and finally leads to 
varid issues. 
Thanks 
Xiantao



You, Yongkang wrote:
 I tried 2048M (and other value), but I wasn't reproduce it.
 Hmm, does it reproduce with dom0_mem=2048M on all boxes which you
 tested?
 
 Isaku/All,
 
 This issue is really very hard to locate. Now I am a little
 suspecting it is related with building process, as if changing
 building method, this issue is gone too.  
 
 1, It doesn't happen to all machines. But it can be stably reproduce
 in our nightly test machine with same binary. 2, When system
 crashing, dom0_mem is set to 2048M. And if using other memory size,
 this issue disappeared too. 3, It seems happened between dom0
 changeset 743~753, as it workds if we use old built Dom0 kernel + new
 Xen. And the old nightly testing doesn't have issue. 4, When I try to
 do regression testing between 743~753, I found different build method
 might cause crashing and non-crashing.
 
 In our default building process, as stubdomain is not supported in
 IA64, so we removed install-stubdom and dist-stubdom from install:
 and dist: lines in main Makefile. It has been changed  more than 2
 months. The real compiling command is make -j3 xyz_file. And the
 crashing issue is related with building process.
 
 When I do regression testing, sometimes I didn't change Makefile, but
 still use make -j3. Then the crashing is gone. 
 
 I am not sure if my suspection is possible, as it still need more
 trying. Compiling Dom0 is not easy like Xen. It is costing. I would
 try to do more, but maybe not so quick, as many another things need
 to do at the same time. If the default compilation is okay, do you
 think it is worthy to do more investigating?
 
 Any suggestion will be much appreciated.
 
 Best Regards,
 Yongkang You
 
 On Tuesday, December 16, 2008 10:22 AM, Isaku Yamahata wrote:
 
 On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote:
 On Monday, December 08, 2008 2:10 PM, Isaku Yamahata wrote:
 
 On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote:
 Isaku Yamahata wrote:
 On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
 Hi Isaku,
 We re-get the detail information from serial port, please
 see below. Two comments add:
 
 Thank you.
 
 
 1. We can be sure the Cset#18832 works well on the same
 tiger4 machine. But we did not do regression test between 18832
 and this 18860. 
 2. It is strange that on another Tiger4 box, dom0 will NOT
 crash. Do you have any idea from the serial log? Thanks!
 
 I haven't hit this crash. And Kuwamura-san's test seems that
 he haven't hit it either. Kuwamura-san, is it correct?
 Hmm... it seems to depend on hw configuration?
 I'm inclined to suspect masking/unmasking interruption race.
 event channel issues? But that's just only my very vague guess.
 
 The difference between 18832 and 18860 means the merging
 xen-unstable into xen-ia64-unstable. Looking the log, I suspect
 linux-2.6.18-xen instead of xen.
 Could you provide the linux c/s which corresponds to 18832 and
 18860?
 
 
 Hi Isaku,
 Yes, some of our machines do not crash. I am afraid there may
 be some potential issue. By testing 18832, we use linux#742.
 While 18860 uses linux#753. Thanks!
 
 Thank you. Taking rough look at them those change sets doesn't seem
 culprit. I agree with you that this may indicate some potential
 bugs...
 
 Hi All,
 
 This bug is stably reproduced, if providing dom0_mem=2048M in
 append option. And if setting dom0_mem to 1024M or 4096M, the
 crashing doesn't happen. 
 
 We tried #18869 Xen + #742 Dom0, system is okay. So the problem
 might be in Linux tree between #742~#753
 
 I tried 2048M (and other value), but I wasn't reproduce it.
 Hmm, does it reproduce with dom0_mem=2048M on all boxes which you
 tested? 
 
 thanks,
 
 ___
 Xen-ia64-devel mailing list
 Xen-ia64-devel@lists.xensource.com
 http://lists.xensource.com/xen-ia64-devel



fix_pirq_eoi_page.patch
Description: fix_pirq_eoi_page.patch
___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel

Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2009-01-04 Thread Isaku Yamahata
Hi. Good catch. Some comments.
I attached two patches to fix, could you try them?

- bss.page_aligned.
  Where is the section used?
  grep didn't tell me. Surely x86 uses .bss.page_aligned in
  linux/arch/[i386, x86_64]/kernel/head[-xen].S,
  but no files unuder linux/arch/ia64/ don't use it.

- ia64_fast_eoi.
  I suppose ia64_fast_eoi is used for optimization instead of
  PHYSDEVOP_eoi. I'm not sure how much improvement it provides, though.
  Anyway ia64_fast_eoi hypercall implementation should also be updated
  which I overlooked when I added PHYSDEVOP_pirq_eoi_gmfn support.

thanks,

On Sun, Jan 04, 2009 at 06:05:07PM +0800, Zhang, Xiantao wrote:
 Hi, Isaku  All
 The attached patch should fix the weird issue.  In upstream, we also find 
 some other weird issues, for example, we can't boot dom0 on some platforms, 
 and dom0 may have different behavior with different initrds.  After debug, I 
 found it should be caused by incorrect setting for pirq_needs_eoi page.  
 There are two main issues found during the debug: 
 1.  the related two hypercalls are not enabled in the correct way, so dom0 
 and hypervisor doesn't have the agreement on which pirq needs EOI. 
 2.  the page is not really linked to bss section even if this is the must, so 
 kernel deems it as memory cache and uses it for many ways, and finally leads 
 to varid issues. 
 Thanks 
 Xiantao
 
 
 
 You, Yongkang wrote:
  I tried 2048M (and other value), but I wasn't reproduce it.
  Hmm, does it reproduce with dom0_mem=2048M on all boxes which you
  tested?
  
  Isaku/All,
  
  This issue is really very hard to locate. Now I am a little
  suspecting it is related with building process, as if changing
  building method, this issue is gone too.  
  
  1, It doesn't happen to all machines. But it can be stably reproduce
  in our nightly test machine with same binary. 2, When system
  crashing, dom0_mem is set to 2048M. And if using other memory size,
  this issue disappeared too. 3, It seems happened between dom0
  changeset 743~753, as it workds if we use old built Dom0 kernel + new
  Xen. And the old nightly testing doesn't have issue. 4, When I try to
  do regression testing between 743~753, I found different build method
  might cause crashing and non-crashing.
  
  In our default building process, as stubdomain is not supported in
  IA64, so we removed install-stubdom and dist-stubdom from install:
  and dist: lines in main Makefile. It has been changed  more than 2
  months. The real compiling command is make -j3 xyz_file. And the
  crashing issue is related with building process.
  
  When I do regression testing, sometimes I didn't change Makefile, but
  still use make -j3. Then the crashing is gone. 
  
  I am not sure if my suspection is possible, as it still need more
  trying. Compiling Dom0 is not easy like Xen. It is costing. I would
  try to do more, but maybe not so quick, as many another things need
  to do at the same time. If the default compilation is okay, do you
  think it is worthy to do more investigating?
  
  Any suggestion will be much appreciated.
  
  Best Regards,
  Yongkang You
  
  On Tuesday, December 16, 2008 10:22 AM, Isaku Yamahata wrote:
  
  On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote:
  On Monday, December 08, 2008 2:10 PM, Isaku Yamahata wrote:
  
  On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote:
  Isaku Yamahata wrote:
  On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
  Hi Isaku,
  We re-get the detail information from serial port, please
  see below. Two comments add:
  
  Thank you.
  
  
  1. We can be sure the Cset#18832 works well on the same
  tiger4 machine. But we did not do regression test between 18832
  and this 18860. 
  2. It is strange that on another Tiger4 box, dom0 will NOT
  crash. Do you have any idea from the serial log? Thanks!
  
  I haven't hit this crash. And Kuwamura-san's test seems that
  he haven't hit it either. Kuwamura-san, is it correct?
  Hmm... it seems to depend on hw configuration?
  I'm inclined to suspect masking/unmasking interruption race.
  event channel issues? But that's just only my very vague guess.
  
  The difference between 18832 and 18860 means the merging
  xen-unstable into xen-ia64-unstable. Looking the log, I suspect
  linux-2.6.18-xen instead of xen.
  Could you provide the linux c/s which corresponds to 18832 and
  18860?
  
  
  Hi Isaku,
  Yes, some of our machines do not crash. I am afraid there may
  be some potential issue. By testing 18832, we use linux#742.
  While 18860 uses linux#753. Thanks!
  
  Thank you. Taking rough look at them those change sets doesn't seem
  culprit. I agree with you that this may indicate some potential
  bugs...
  
  Hi All,
  
  This bug is stably reproduced, if providing dom0_mem=2048M in
  append option. And if setting dom0_mem to 1024M or 4096M, the
  crashing doesn't happen. 
  
  We tried #18869 Xen + #742 Dom0, system is okay. 

RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2009-01-04 Thread Zhang, Xiantao
Isaku Yamahata wrote:
 Hi. Good catch. Some comments.
 I attached two patches to fix, could you try them?
 
 - bss.page_aligned.
   Where is the section used?
   grep didn't tell me. Surely x86 uses .bss.page_aligned in
   linux/arch/[i386, x86_64]/kernel/head[-xen].S,
   but no files unuder linux/arch/ia64/ don't use it.

You may need to check drivers/xen/core/evtchn.c, the code as following :-)
Xiantao

static int pirq_eoi_does_unmask;
static DECLARE_BITMAP(pirq_needs_eoi, ALIGN(NR_PIRQS, PAGE_SIZE * 8))
__attribute__ ((__section__(.bss.page_aligned), 
__aligned__(PAGE_SIZE)));



 - ia64_fast_eoi.
   I suppose ia64_fast_eoi is used for optimization instead of
   PHYSDEVOP_eoi. I'm not sure how much improvement it provides,
   though. Anyway ia64_fast_eoi hypercall implementation should also
   be updated which I overlooked when I added PHYSDEVOP_pirq_eoi_gmfn
 support. 
 
 thanks,
 
 On Sun, Jan 04, 2009 at 06:05:07PM +0800, Zhang, Xiantao wrote:
 Hi, Isaku  All
 The attached patch should fix the weird issue.  In upstream, we
 also find some other weird issues, for example, we can't boot dom0
 on some platforms, and dom0 may have different behavior with
 different initrds.  After debug, I found it should be caused by
 incorrect setting for pirq_needs_eoi page.  There are two main
 issues found during the debug: 
 1.  the related two hypercalls are not enabled in the correct way,
 so dom0 and hypervisor doesn't have the agreement on which pirq
 needs EOI.  
 2.  the page is not really linked to bss section even if this is the
 must, so kernel deems it as memory cache and uses it for many ways,
 and finally leads to varid issues.  
 Thanks
 Xiantao
 
 
 
 You, Yongkang wrote:
 I tried 2048M (and other value), but I wasn't reproduce it.
 Hmm, does it reproduce with dom0_mem=2048M on all boxes which you
 tested?
 
 Isaku/All,
 
 This issue is really very hard to locate. Now I am a little
 suspecting it is related with building process, as if changing
 building method, this issue is gone too.
 
 1, It doesn't happen to all machines. But it can be stably reproduce
 in our nightly test machine with same binary. 2, When system
 crashing, dom0_mem is set to 2048M. And if using other memory size,
 this issue disappeared too. 3, It seems happened between dom0
 changeset 743~753, as it workds if we use old built Dom0 kernel +
 new Xen. And the old nightly testing doesn't have issue. 4, When I
 try to do regression testing between 743~753, I found different
 build method might cause crashing and non-crashing.
 
 In our default building process, as stubdomain is not supported in
 IA64, so we removed install-stubdom and dist-stubdom from install:
 and dist: lines in main Makefile. It has been changed  more than 2
 months. The real compiling command is make -j3 xyz_file. And the
 crashing issue is related with building process.
 
 When I do regression testing, sometimes I didn't change Makefile,
 but still use make -j3. Then the crashing is gone.
 
 I am not sure if my suspection is possible, as it still need more
 trying. Compiling Dom0 is not easy like Xen. It is costing. I would
 try to do more, but maybe not so quick, as many another things need
 to do at the same time. If the default compilation is okay, do you
 think it is worthy to do more investigating?
 
 Any suggestion will be much appreciated.
 
 Best Regards,
 Yongkang You
 
 On Tuesday, December 16, 2008 10:22 AM, Isaku Yamahata wrote:
 
 On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote:
 On Monday, December 08, 2008 2:10 PM, Isaku Yamahata wrote:
 
 On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote:
 Isaku Yamahata wrote:
 On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
 Hi Isaku,
 We re-get the detail information from serial port, please
 see below. Two comments add:
 
 Thank you.
 
 
 1. We can be sure the Cset#18832 works well on the same
 tiger4 machine. But we did not do regression test between
 18832 and this 18860. 
 2. It is strange that on another Tiger4 box, dom0 will NOT
 crash. Do you have any idea from the serial log? Thanks!
 
 I haven't hit this crash. And Kuwamura-san's test seems that
 he haven't hit it either. Kuwamura-san, is it correct?
 Hmm... it seems to depend on hw configuration?
 I'm inclined to suspect masking/unmasking interruption race.
 event channel issues? But that's just only my very vague guess.
 
 The difference between 18832 and 18860 means the merging
 xen-unstable into xen-ia64-unstable. Looking the log, I suspect
 linux-2.6.18-xen instead of xen.
 Could you provide the linux c/s which corresponds to 18832 and
 18860?
 
 
 Hi Isaku,
 Yes, some of our machines do not crash. I am afraid there
 may be some potential issue. By testing 18832, we use
 linux#742. While 18860 uses linux#753. Thanks!
 
 Thank you. Taking rough look at them those change sets doesn't
 seem culprit. I agree with you that this may indicate some
 potential bugs...
 
 Hi 

Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2009-01-04 Thread Isaku Yamahata
On Mon, Jan 05, 2009 at 12:29:55PM +0800, Zhang, Xiantao wrote:
 Isaku Yamahata wrote:
  Hi. Good catch. Some comments.
  I attached two patches to fix, could you try them?
  
  - bss.page_aligned.
Where is the section used?
grep didn't tell me. Surely x86 uses .bss.page_aligned in
linux/arch/[i386, x86_64]/kernel/head[-xen].S,
but no files unuder linux/arch/ia64/ don't use it.
 
 You may need to check drivers/xen/core/evtchn.c, the code as following :-)
 Xiantao
 
 static int pirq_eoi_does_unmask;
 static DECLARE_BITMAP(pirq_needs_eoi, ALIGN(NR_PIRQS, PAGE_SIZE * 8))
 __attribute__ ((__section__(.bss.page_aligned), 
 __aligned__(PAGE_SIZE)));
 

Ah, that line was deleted by the chageset of 760:0d10be086a78.

-- 
yamahata

___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2009-01-04 Thread Zhang, Xiantao

Isaku Yamahata wrote:
 On Mon, Jan 05, 2009 at 12:29:55PM +0800, Zhang, Xiantao wrote:
 Isaku Yamahata wrote:
 Hi. Good catch. Some comments.
 I attached two patches to fix, could you try them?
 
 - bss.page_aligned.
   Where is the section used?
   grep didn't tell me. Surely x86 uses .bss.page_aligned in
   linux/arch/[i386, x86_64]/kernel/head[-xen].S,
   but no files unuder linux/arch/ia64/ don't use it.
 
 You may need to check drivers/xen/core/evtchn.c, the code as
 following :-) 
 Xiantao
 
 static int pirq_eoi_does_unmask;
 static DECLARE_BITMAP(pirq_needs_eoi, ALIGN(NR_PIRQS, PAGE_SIZE * 8))
 __attribute__ ((__section__(.bss.page_aligned),
 __aligned__(PAGE_SIZE))); 
 
 
 Ah, that line was deleted by the chageset of 760:0d10be086a78

Oh, I haven't notice the check-in due to my old codebase. It introduces many 
odd issues to us.   Okay, it is also good to remove it. :)
For adopting fast eoi path,  it should be okay to me.  Please check-in them.  
Xiantao


___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2009-01-04 Thread Isaku Yamahata
On Mon, Jan 05, 2009 at 01:06:23PM +0800, Zhang, Xiantao wrote:
 Oh, I haven't notice the check-in due to my old codebase. It introduces many 
 odd issues to us.   Okay, it is also good to remove it. :)
 For adopting fast eoi path,  it should be okay to me.  Please check-in them.  

Applied, thanks.
-- 
yamahata

___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2008-12-18 Thread You, Yongkang
 I tried 2048M (and other value), but I wasn't reproduce it.
 Hmm, does it reproduce with dom0_mem=2048M on all boxes which you
 tested? 

Isaku/All,

This issue is really very hard to locate. Now I am a little suspecting it is 
related with building process, as if changing building method, this issue is 
gone too.

1, It doesn't happen to all machines. But it can be stably reproduce in our 
nightly test machine with same binary.
2, When system crashing, dom0_mem is set to 2048M. And if using other memory 
size, this issue disappeared too. 
3, It seems happened between dom0 changeset 743~753, as it workds if we use old 
built Dom0 kernel + new Xen. And the old nightly testing doesn't have issue.
4, When I try to do regression testing between 743~753, I found different build 
method might cause crashing and non-crashing. 

In our default building process, as stubdomain is not supported in IA64, so we 
removed install-stubdom and dist-stubdom from install: and dist: lines in 
main Makefile. It has been changed  more than 2 months. The real compiling 
command is make -j3 xyz_file. And the crashing issue is related with 
building process.

When I do regression testing, sometimes I didn't change Makefile, but still use 
make -j3. Then the crashing is gone. 

I am not sure if my suspection is possible, as it still need more trying. 
Compiling Dom0 is not easy like Xen. It is costing. I would try to do more, but 
maybe not so quick, as many another things need to do at the same time. If the 
default compilation is okay, do you think it is worthy to do more 
investigating? 

Any suggestion will be much appreciated. 

Best Regards,
Yongkang You

On Tuesday, December 16, 2008 10:22 AM, Isaku Yamahata wrote:

 On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote:
 On Monday, December 08, 2008 2:10 PM, Isaku Yamahata wrote:
 
 On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote:
 Isaku Yamahata wrote:
 On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
 Hi Isaku,
 We re-get the detail information from serial port, please see
 below. Two comments add:
 
 Thank you.
 
 
 1. We can be sure the Cset#18832 works well on the same
 tiger4 machine. But we did not do regression test between 18832
 and this 18860. 
 2. It is strange that on another Tiger4 box, dom0 will NOT
 crash. Do you have any idea from the serial log? Thanks!
 
 I haven't hit this crash. And Kuwamura-san's test seems that
 he haven't hit it either. Kuwamura-san, is it correct?
 Hmm... it seems to depend on hw configuration?
 I'm inclined to suspect masking/unmasking interruption race.
 event channel issues? But that's just only my very vague guess.
 
 The difference between 18832 and 18860 means the merging
 xen-unstable into xen-ia64-unstable. Looking the log, I suspect
 linux-2.6.18-xen instead of xen.
 Could you provide the linux c/s which corresponds to 18832 and
 18860?
 
 
 Hi Isaku,
 Yes, some of our machines do not crash. I am afraid there may
 be some potential issue. By testing 18832, we use linux#742.
 While 18860 uses linux#753. Thanks!
 
 Thank you. Taking rough look at them those change sets doesn't seem
 culprit. I agree with you that this may indicate some potential
 bugs... 
 
 Hi All,
 
 This bug is stably reproduced, if providing dom0_mem=2048M in
 append option. And if setting dom0_mem to 1024M or 4096M, the
 crashing doesn't happen.  
 
 We tried #18869 Xen + #742 Dom0, system is okay. So the problem
 might be in Linux tree between #742~#753 
 
 I tried 2048M (and other value), but I wasn't reproduce it.
 Hmm, does it reproduce with dom0_mem=2048M on all boxes which you
 tested? 
 
 thanks,

___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2008-12-09 Thread You, Yongkang
On Monday, December 08, 2008 2:10 PM, Isaku Yamahata wrote:

 On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote:
 Isaku Yamahata wrote:
 On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
 Hi Isaku,
 We re-get the detail information from serial port, please see
 below. Two comments add:
 
 Thank you.
 
 
 1. We can be sure the Cset#18832 works well on the same tiger4
 machine. But we did not do regression test between 18832 and this
 18860. 
 2. It is strange that on another Tiger4 box, dom0 will NOT
 crash. Do you have any idea from the serial log? Thanks!
 
 I haven't hit this crash. And Kuwamura-san's test seems that
 he haven't hit it either. Kuwamura-san, is it correct?
 Hmm... it seems to depend on hw configuration?
 I'm inclined to suspect masking/unmasking interruption race.
 event channel issues? But that's just only my very vague guess.
 
 The difference between 18832 and 18860 means the merging
 xen-unstable into xen-ia64-unstable. Looking the log, I suspect
 linux-2.6.18-xen instead of xen. 
 Could you provide the linux c/s which corresponds to 18832 and
 18860? 
 
 
 Hi Isaku,
 Yes, some of our machines do not crash. I am afraid there may be
 some potential issue. By testing 18832, we use linux#742. While
 18860 uses linux#753. Thanks! 
 
 Thank you. Taking rough look at them those change sets doesn't
 seem culprit.
 I agree with you that this may indicate some potential bugs...

Hi All,

This bug is stably reproduced, if providing dom0_mem=2048M in append option. 
And if setting dom0_mem to 1024M or 4096M, the crashing doesn't happen. 

We tried #18869 Xen + #742 Dom0, system is okay. So the problem might be in 
Linux tree between #742~#753

Best Regards,
Yongkang You
___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2008-12-07 Thread Zhang, Jingke
Hi Isaku,
We re-get the detail information from serial port, please see below. Two 
comments add:
1. We can be sure the Cset#18832 works well on the same tiger4 machine. But 
we did not do regression test between 18832 and this 18860.
2. It is strange that on another Tiger4 box, dom0 will NOT crash. Do you 
have any idea from the serial log? Thanks!

ACPI: bus type pci registered
(XEN) mm.c:769:d0 vcpu 0 iip 0xa0010054b360: bad mpa d 0 0x807ff14f08 (= 0x
100ab4000)
Unable to handle kernel NULL pointer dereference (address 0008)
swapper[1]: Oops 8804682956800 [1]
Modules linked in:

Pid: 1, CPU 0, comm:  swapper
psr : 1010085a2010 ifs : 850d ip  : [a00100136340]Not
tainted
ip is at cache_alloc_refill+0x300/0x540
unat:  pfs : 450d rsc : 0007
rnat: 1010085a6010 bsps: a0010054b360 pr  : a581
ldrs:  ccv :  fpsr: 0009804c8a70433f
csd :  ssd : 
b0  : a00100136280 b6  : a001005659d0 b7  : a0010054f2a0
f6  : 1003e f7  : 1003e
f8  : 1003e0040 f9  : 0
f10 : 0 f11 : 0
r1  : a001011448b0 r2  :  r3  : e0807ff14f30
r8  : 00f0 r9  : 001b r10 : e0807ff14f20
r11 : 2094 r12 : e0007f5cfd90 r13 : e0007f5c8000
r14 :  r15 : e0007ff1a114 r16 : 00100100
r17 : e0807ff14f24 r18 : e0807ff14f20 r19 : e0807ff14f08
r20 : 0040 r21 : e0007ff14f00 r22 : 
r23 : 00200200 r24 : e0007ff14f48 r25 : e19e6480
r26 : e0007ff18080 r27 :  r28 : e19e6670
r29 : e19e6660 r30 : 003b r31 : e19e6488

Call Trace:
 [a0010001db20] show_stack+0x40/0xa0
sp=e0007f5cf940 bsp=e0007f5c9500
 [a0010001e780] show_regs+0x840/0x880
sp=e0007f5cfb10 bsp=e0007f5c94a8
 [a00100043460] die+0x1c0/0x380
sp=e0007f5cfb10 bsp=e0007f5c9460
 [a0010006dae0] ia64_do_page_fault+0x880/0x9a0
sp=e0007f5cfb30 bsp=e0007f5c9410
 [a001000702c0] xen_leave_kernel+0x0/0x3e0
sp=e0007f5cfbc0 bsp=e0007f5c9410
 [a00100136340] cache_alloc_refill+0x300/0x540
sp=e0007f5cfd90 bsp=e0007f5c93a0
 [a001001367c0] __kmalloc+0x240/0x360
sp=e0007f5cfd90 bsp=e0007f5c9368
 [a00100106af0] __kzalloc+0x30/0x80
sp=e0007f5cfd90 bsp=e0007f5c9340
 [a00100551f90] acpi_ds_build_internal_package_obj+0x190/0x340
sp=e0007f5cfd90 bsp=e0007f5c92f0
 [a0010054e290] acpi_ds_eval_data_object_operands+0x1b0/0x260
sp=e0007f5cfd90 bsp=e0007f5c92b8
 [a00100550070] acpi_ds_exec_end_op+0x730/0xac0
sp=e0007f5cfda0 bsp=e0007f5c9270
 [a00100578700] acpi_ps_parse_loop+0x1040/0x1940
sp=e0007f5cfda0 bsp=e0007f5c9210
 [a00100576b40] acpi_ps_parse_aml+0x100/0x560
sp=e0007f5cfdb0 bsp=e0007f5c91b8
 [a0010054ee10] acpi_ds_execute_arguments+0x1f0/0x260
sp=e0007f5cfdb0 bsp=e0007f5c9160
 [a0010054f0c0] acpi_ds_get_package_arguments+0xc0/0xe0
sp=e0007f5cfdb0 bsp=e0007f5c9140
 [a00100574710] acpi_ns_init_one_object+0x2b0/0x380
sp=e0007f5cfdb0 bsp=e0007f5c90f0
 [a001005753c0] acpi_ns_walk_namespace+0x280/0x2e0
sp=e0007f5cfdb0 bsp=e0007f5c9080
 [a00100570b50] acpi_walk_namespace+0x90/0xc0
sp=e0007f5cfdb0 bsp=e0007f5c9030
 [a00100574400] acpi_ns_initialize_objects+0x60/0xc0
sp=e0007f5cfdb0 bsp=e0007f5c9018
 [a001005884e0] acpi_initialize_objects+0x60/0x100
sp=e0007f5cfdd0 bsp=e0007f5c8ff0
 [a00100c802f0] acpi_init+0xb0/0x460
sp=e0007f5cfdd0 bsp=e0007f5c8fd0
 [a00100011900] init+0x320/0x840
sp=e0007f5cfe00 bsp=e0007f5c8f98
 [a0010001bfd0] kernel_thread_helper+0x30/0x60
sp=e0007f5cfe30 bsp=e0007f5c8f70
 [a001000110e0] start_kernel_thread+0x20/0x40
sp=e0007f5cfe30 bsp=e0007f5c8f70
 0Kernel panic - not syncing: Attempted to kill init!
 (XEN) Domain 0 crashed: rebooting machine in 5 seconds.



Isaku Yamahata 

Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2008-12-07 Thread Isaku Yamahata
On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
 Hi Isaku,
 We re-get the detail information from serial port, please see below. Two 
 comments add:

Thank you.


 1. We can be sure the Cset#18832 works well on the same tiger4 machine. 
 But we did not do regression test between 18832 and this 18860.
 2. It is strange that on another Tiger4 box, dom0 will NOT crash. Do you 
 have any idea from the serial log? Thanks!

I haven't hit this crash. And Kuwamura-san's test seems that
he haven't hit it either. Kuwamura-san, is it correct?
Hmm... it seems to depend on hw configuration?
I'm inclined to suspect masking/unmasking interruption race.
event channel issues? But that's just only my very vague guess.

The difference between 18832 and 18860 means the merging xen-unstable
into xen-ia64-unstable.
Looking the log, I suspect linux-2.6.18-xen instead of xen.
Could you provide the linux c/s which corresponds to 18832 and 18860?

thanks,

 ACPI: bus type pci registered
 (XEN) mm.c:769:d0 vcpu 0 iip 0xa0010054b360: bad mpa d 0 0x807ff14f08 (= 
 0x
 100ab4000)
 Unable to handle kernel NULL pointer dereference (address 0008)
 swapper[1]: Oops 8804682956800 [1]
 Modules linked in:
 
 Pid: 1, CPU 0, comm:  swapper
 psr : 1010085a2010 ifs : 850d ip  : [a00100136340]
 Not
 tainted
 ip is at cache_alloc_refill+0x300/0x540
 unat:  pfs : 450d rsc : 0007
 rnat: 1010085a6010 bsps: a0010054b360 pr  : a581
 ldrs:  ccv :  fpsr: 0009804c8a70433f
 csd :  ssd : 
 b0  : a00100136280 b6  : a001005659d0 b7  : a0010054f2a0
 f6  : 1003e f7  : 1003e
 f8  : 1003e0040 f9  : 0
 f10 : 0 f11 : 0
 r1  : a001011448b0 r2  :  r3  : e0807ff14f30
 r8  : 00f0 r9  : 001b r10 : e0807ff14f20
 r11 : 2094 r12 : e0007f5cfd90 r13 : e0007f5c8000
 r14 :  r15 : e0007ff1a114 r16 : 00100100
 r17 : e0807ff14f24 r18 : e0807ff14f20 r19 : e0807ff14f08
 r20 : 0040 r21 : e0007ff14f00 r22 : 
 r23 : 00200200 r24 : e0007ff14f48 r25 : e19e6480
 r26 : e0007ff18080 r27 :  r28 : e19e6670
 r29 : e19e6660 r30 : 003b r31 : e19e6488
 
 Call Trace:
  [a0010001db20] show_stack+0x40/0xa0
 sp=e0007f5cf940 bsp=e0007f5c9500
  [a0010001e780] show_regs+0x840/0x880
 sp=e0007f5cfb10 bsp=e0007f5c94a8
  [a00100043460] die+0x1c0/0x380
 sp=e0007f5cfb10 bsp=e0007f5c9460
  [a0010006dae0] ia64_do_page_fault+0x880/0x9a0
 sp=e0007f5cfb30 bsp=e0007f5c9410
  [a001000702c0] xen_leave_kernel+0x0/0x3e0
 sp=e0007f5cfbc0 bsp=e0007f5c9410
  [a00100136340] cache_alloc_refill+0x300/0x540
 sp=e0007f5cfd90 bsp=e0007f5c93a0
  [a001001367c0] __kmalloc+0x240/0x360
 sp=e0007f5cfd90 bsp=e0007f5c9368
  [a00100106af0] __kzalloc+0x30/0x80
 sp=e0007f5cfd90 bsp=e0007f5c9340
  [a00100551f90] acpi_ds_build_internal_package_obj+0x190/0x340
 sp=e0007f5cfd90 bsp=e0007f5c92f0
  [a0010054e290] acpi_ds_eval_data_object_operands+0x1b0/0x260
 sp=e0007f5cfd90 bsp=e0007f5c92b8
  [a00100550070] acpi_ds_exec_end_op+0x730/0xac0
 sp=e0007f5cfda0 bsp=e0007f5c9270
  [a00100578700] acpi_ps_parse_loop+0x1040/0x1940
 sp=e0007f5cfda0 bsp=e0007f5c9210
  [a00100576b40] acpi_ps_parse_aml+0x100/0x560
 sp=e0007f5cfdb0 bsp=e0007f5c91b8
  [a0010054ee10] acpi_ds_execute_arguments+0x1f0/0x260
 sp=e0007f5cfdb0 bsp=e0007f5c9160
  [a0010054f0c0] acpi_ds_get_package_arguments+0xc0/0xe0
 sp=e0007f5cfdb0 bsp=e0007f5c9140
  [a00100574710] acpi_ns_init_one_object+0x2b0/0x380
 sp=e0007f5cfdb0 bsp=e0007f5c90f0
  [a001005753c0] acpi_ns_walk_namespace+0x280/0x2e0
 sp=e0007f5cfdb0 bsp=e0007f5c9080
  [a00100570b50] acpi_walk_namespace+0x90/0xc0
 sp=e0007f5cfdb0 bsp=e0007f5c9030
  [a00100574400] acpi_ns_initialize_objects+0x60/0xc0
 sp=e0007f5cfdb0 bsp=e0007f5c9018
  [a001005884e0] 

RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2008-12-07 Thread Zhang, Jingke
Isaku Yamahata wrote:
 On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
 Hi Isaku,
 We re-get the detail information from serial port, please see
 below. Two comments add: 
 
 Thank you.
 
 
 1. We can be sure the Cset#18832 works well on the same tiger4
 machine. But we did not do regression test between 18832 and this
 18860.  
 2. It is strange that on another Tiger4 box, dom0 will NOT
 crash. Do you have any idea from the serial log? Thanks! 
 
 I haven't hit this crash. And Kuwamura-san's test seems that
 he haven't hit it either. Kuwamura-san, is it correct?
 Hmm... it seems to depend on hw configuration?
 I'm inclined to suspect masking/unmasking interruption race.
 event channel issues? But that's just only my very vague guess.
 
 The difference between 18832 and 18860 means the merging xen-unstable
 into xen-ia64-unstable.
 Looking the log, I suspect linux-2.6.18-xen instead of xen.
 Could you provide the linux c/s which corresponds to 18832 and 18860?


Hi Isaku,
Yes, some of our machines do not crash. I am afraid there may be some 
potential issue.
By testing 18832, we use linux#742. While 18860 uses linux#753. Thanks!


 
 thanks,
 
 ACPI: bus type pci registered
 (XEN) mm.c:769:d0 vcpu 0 iip 0xa0010054b360: bad mpa d 0
 0x807ff14f08 (= 0x 100ab4000) 
 Unable to handle kernel NULL pointer dereference (address
 0008) 
 swapper[1]: Oops 8804682956800 [1]
 Modules linked in:
 
 Pid: 1, CPU 0, comm:  swapper
 psr : 1010085a2010 ifs : 850d ip  :
 [a00100136340]Not 
 tainted
 ip is at cache_alloc_refill+0x300/0x540
 unat:  pfs : 450d rsc : 0007
 rnat: 1010085a6010 bsps: a0010054b360 pr  : a581
 ldrs:  ccv :  fpsr: 0009804c8a70433f
 csd :  ssd : 
 b0  : a00100136280 b6  : a001005659d0 b7  : a0010054f2a0
 f6  : 1003e f7  : 1003e
 f8  : 1003e0040 f9  : 0
 f10 : 0 f11 : 0
 r1  : a001011448b0 r2  :  r3  : e0807ff14f30
 r8  : 00f0 r9  : 001b r10 : e0807ff14f20
 r11 : 2094 r12 : e0007f5cfd90 r13 : e0007f5c8000
 r14 :  r15 : e0007ff1a114 r16 : 00100100
 r17 : e0807ff14f24 r18 : e0807ff14f20 r19 : e0807ff14f08
 r20 : 0040 r21 : e0007ff14f00 r22 : 
 r23 : 00200200 r24 : e0007ff14f48 r25 : e19e6480
 r26 : e0007ff18080 r27 :  r28 : e19e6670
 r29 : e19e6660 r30 : 003b r31 : e19e6488
 
 Call Trace:
  [a0010001db20] show_stack+0x40/0xa0
 sp=e0007f5cf940
  bsp=e0007f5c9500 [a0010001e780] show_regs+0x840/0x880
 sp=e0007f5cfb10
  bsp=e0007f5c94a8 [a00100043460] die+0x1c0/0x380
 sp=e0007f5cfb10
  bsp=e0007f5c9460 [a0010006dae0]
 ia64_do_page_fault+0x880/0x9a0
  sp=e0007f5cfb30 bsp=e0007f5c9410 [a001000702c0]
 xen_leave_kernel+0x0/0x3e0
  sp=e0007f5cfbc0 bsp=e0007f5c9410 [a00100136340]
 cache_alloc_refill+0x300/0x540
  sp=e0007f5cfd90 bsp=e0007f5c93a0 [a001001367c0]
 __kmalloc+0x240/0x360
  sp=e0007f5cfd90 bsp=e0007f5c9368 [a00100106af0]
 __kzalloc+0x30/0x80
  sp=e0007f5cfd90 bsp=e0007f5c9340 [a00100551f90]

  acpi_ds_build_internal_package_obj+0x190/0x340 sp=e0007f5cfd90
 bsp=e0007f5c92f0
  [a0010054e290] acpi_ds_eval_data_object_operands+0x1b0/0x260
 sp=e0007f5cfd90
  bsp=e0007f5c92b8 [a00100550070]
 acpi_ds_exec_end_op+0x730/0xac0
  sp=e0007f5cfda0 bsp=e0007f5c9270 [a00100578700]
 acpi_ps_parse_loop+0x1040/0x1940
  sp=e0007f5cfda0 bsp=e0007f5c9210 [a00100576b40]
 acpi_ps_parse_aml+0x100/0x560
  sp=e0007f5cfdb0 bsp=e0007f5c91b8 [a0010054ee10]

  acpi_ds_execute_arguments+0x1f0/0x260 sp=e0007f5cfdb0
 bsp=e0007f5c9160
  [a0010054f0c0] acpi_ds_get_package_arguments+0xc0/0xe0
 sp=e0007f5cfdb0
  bsp=e0007f5c9140 [a00100574710]
 acpi_ns_init_one_object+0x2b0/0x380
  sp=e0007f5cfdb0 bsp=e0007f5c90f0 [a001005753c0]
 acpi_ns_walk_namespace+0x280/0x2e0
  sp=e0007f5cfdb0 bsp=e0007f5c9080 

Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2008-12-07 Thread Isaku Yamahata
On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote:
 Isaku Yamahata wrote:
  On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
  Hi Isaku,
  We re-get the detail information from serial port, please see
  below. Two comments add: 
  
  Thank you.
  
  
  1. We can be sure the Cset#18832 works well on the same tiger4
  machine. But we did not do regression test between 18832 and this
  18860.  
  2. It is strange that on another Tiger4 box, dom0 will NOT
  crash. Do you have any idea from the serial log? Thanks! 
  
  I haven't hit this crash. And Kuwamura-san's test seems that
  he haven't hit it either. Kuwamura-san, is it correct?
  Hmm... it seems to depend on hw configuration?
  I'm inclined to suspect masking/unmasking interruption race.
  event channel issues? But that's just only my very vague guess.
  
  The difference between 18832 and 18860 means the merging xen-unstable
  into xen-ia64-unstable.
  Looking the log, I suspect linux-2.6.18-xen instead of xen.
  Could you provide the linux c/s which corresponds to 18832 and 18860?
 
 
 Hi Isaku,
 Yes, some of our machines do not crash. I am afraid there may be some 
 potential issue.
 By testing 18832, we use linux#742. While 18860 uses linux#753. Thanks!

Thank you. Taking rough look at them those change sets doesn't
seem culprit.
I agree with you that this may indicate some potential bugs...

-- 
yamahata

___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2008-12-07 Thread KUWAMURA Shin'ya

Hi,

On [EMAIL PROTECTED],
Isaku Yamahata wrote:


On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
 1. We can be sure the Cset#18832 works well on the same 
tiger4 machine. But we did not do regression test between 18832 and 
this 18860.
 2. It is strange that on another Tiger4 box, dom0 will NOT 
crash. Do you have any idea from the serial log? Thanks!

I haven't hit this crash. And Kuwamura-san's test seems that
he haven't hit it either. Kuwamura-san, is it correct?


Yes, that's correct. I have not found this crash.
However, I did not test 18860. I tested 18832 and 18869, and dom0
worked well on both csets.

Best regards,
--
 KUWAMURA Shin'ya

___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2008-12-04 Thread Isaku Yamahata
Only call stack?
Panic messages and/or register dump aren't available?

On Fri, Dec 05, 2008 at 11:45:46AM +0800, Zhang, Jingke wrote:
 Hi all,
 We found the latest Cset#18860 will make dom0 crash. Last stable Cset we 
 have tested is 18832. Thanks!
 
 Build info:
 
 xen: 18860
 linux: 753
 remote-ioemu: b4d410a1c28fcd1ea528d94eb8b94b79286c25ed
 
 Call trace:
 =
  [a0010001db20] show_stack+0x40/0xa0
 sp=e0007f5cf940 bsp=e0007f5c9500
  [a0010001e780] show_regs+0x840/0x880
 sp=e0007f5cfb10 bsp=e0007f5c94a8
  [a00100043460] die+0x1c0/0x380
 sp=e0007f5cfb10 bsp=e0007f5c9460
  [a0010006dae0] ia64_do_page_fault+0x880/0x9a0
 sp=e0007f5cfb30 bsp=e0007f5c9410
  [a001000702c0] xen_leave_kernel+0x0/0x3e0
 sp=e0007f5cfbc0 bsp=e0007f5c9410
  [a00100136340] cache_alloc_refill+0x300/0x540
 sp=e0007f5cfd90 bsp=e0007f5c93a0
  [a001001367c0] __kmalloc+0x240/0x360
 sp=e0007f5cfd90 bsp=e0007f5c9368
  [a00100106af0] __kzalloc+0x30/0x80
 sp=e0007f5cfd90 bsp=e0007f5c9340
  [a00100551f90] acpi_ds_build_internal_package_obj+0x190/0x340
 sp=e0007f5cfd90 bsp=e0007f5c92f0
  [a0010054e290] acpi_ds_eval_data_object_operands+0x1b0/0x260
 sp=e0007f5cfd90 bsp=e0007f5c92b8
  [a00100550070] acpi_ds_exec_end_op+0x730/0xac0
 sp=e0007f5cfda0 bsp=e0007f5c9270
  [a00100578700] acpi_ps_parse_loop+0x1040/0x1940
 sp=e0007f5cfda0 bsp=e0007f5c9210
  [a00100576b40] acpi_ps_parse_aml+0x100/0x560
 sp=e0007f5cfdb0 bsp=e0007f5c91b8
  [a0010054ee10] acpi_ds_execute_arguments+0x1f0/0x260
 sp=e0007f5cfdb0 bsp=e0007f5c9160
  [a0010054f0c0] acpi_ds_get_package_arguments+0xc0/0xe0
 sp=e0007f5cfdb0 bsp=e0007f5c9140
  [a00100574710] acpi_ns_init_one_object+0x2b0/0x380
 sp=e0007f5cfdb0 bsp=e0007f5c90f0
  [a001005753c0] acpi_ns_walk_namespace+0x280/0x2e0
 sp=e0007f5cfdb0 bsp=e0007f5c9080
  [a00100570b50] acpi_walk_namespace+0x90/0xc0
 sp=e0007f5cfdb0 bsp=e0007f5c9030
  [a00100574400] acpi_ns_initialize_objects+0x60/0xc0
 sp=e0007f5cfdb0 bsp=e0007f5c9018
  [a001005884e0] acpi_initialize_objects+0x60/0x100
 sp=e0007f5cfdd0 bsp=e0007f5c8ff0
  [a00100c802f0] acpi_init+0xb0/0x460
 sp=e0007f5cfdd0 bsp=e0007f5c8fd0
  [a00100011900] init+0x320/0x840
 sp=e0007f5cfe00 bsp=e0007f5c8f98
  [a0010001bfd0] kernel_thread_helper+0x30/0x60
 sp=e0007f5cfe30 bsp=e0007f5c8f70
  [a001000110e0] start_kernel_thread+0x20/0x40
 sp=e0007f5cfe30 bsp=e0007f5c8f70
  0Kernel panic - not syncing: Attempted to kill init!
  (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
 
 
 Thanks,
 Zhang Jingke
 
 
 ___
 Xen-ia64-devel mailing list
 Xen-ia64-devel@lists.xensource.com
 http://lists.xensource.com/xen-ia64-devel
 

-- 
yamahata

___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel