RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
Hi, Isaku All The attached patch should fix the weird issue. In upstream, we also find some other weird issues, for example, we can't boot dom0 on some platforms, and dom0 may have different behavior with different initrds. After debug, I found it should be caused by incorrect setting for pirq_needs_eoi page. There are two main issues found during the debug: 1. the related two hypercalls are not enabled in the correct way, so dom0 and hypervisor doesn't have the agreement on which pirq needs EOI. 2. the page is not really linked to bss section even if this is the must, so kernel deems it as memory cache and uses it for many ways, and finally leads to varid issues. Thanks Xiantao You, Yongkang wrote: I tried 2048M (and other value), but I wasn't reproduce it. Hmm, does it reproduce with dom0_mem=2048M on all boxes which you tested? Isaku/All, This issue is really very hard to locate. Now I am a little suspecting it is related with building process, as if changing building method, this issue is gone too. 1, It doesn't happen to all machines. But it can be stably reproduce in our nightly test machine with same binary. 2, When system crashing, dom0_mem is set to 2048M. And if using other memory size, this issue disappeared too. 3, It seems happened between dom0 changeset 743~753, as it workds if we use old built Dom0 kernel + new Xen. And the old nightly testing doesn't have issue. 4, When I try to do regression testing between 743~753, I found different build method might cause crashing and non-crashing. In our default building process, as stubdomain is not supported in IA64, so we removed install-stubdom and dist-stubdom from install: and dist: lines in main Makefile. It has been changed more than 2 months. The real compiling command is make -j3 xyz_file. And the crashing issue is related with building process. When I do regression testing, sometimes I didn't change Makefile, but still use make -j3. Then the crashing is gone. I am not sure if my suspection is possible, as it still need more trying. Compiling Dom0 is not easy like Xen. It is costing. I would try to do more, but maybe not so quick, as many another things need to do at the same time. If the default compilation is okay, do you think it is worthy to do more investigating? Any suggestion will be much appreciated. Best Regards, Yongkang You On Tuesday, December 16, 2008 10:22 AM, Isaku Yamahata wrote: On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote: On Monday, December 08, 2008 2:10 PM, Isaku Yamahata wrote: On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote: Isaku Yamahata wrote: On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote: Hi Isaku, We re-get the detail information from serial port, please see below. Two comments add: Thank you. 1. We can be sure the Cset#18832 works well on the same tiger4 machine. But we did not do regression test between 18832 and this 18860. 2. It is strange that on another Tiger4 box, dom0 will NOT crash. Do you have any idea from the serial log? Thanks! I haven't hit this crash. And Kuwamura-san's test seems that he haven't hit it either. Kuwamura-san, is it correct? Hmm... it seems to depend on hw configuration? I'm inclined to suspect masking/unmasking interruption race. event channel issues? But that's just only my very vague guess. The difference between 18832 and 18860 means the merging xen-unstable into xen-ia64-unstable. Looking the log, I suspect linux-2.6.18-xen instead of xen. Could you provide the linux c/s which corresponds to 18832 and 18860? Hi Isaku, Yes, some of our machines do not crash. I am afraid there may be some potential issue. By testing 18832, we use linux#742. While 18860 uses linux#753. Thanks! Thank you. Taking rough look at them those change sets doesn't seem culprit. I agree with you that this may indicate some potential bugs... Hi All, This bug is stably reproduced, if providing dom0_mem=2048M in append option. And if setting dom0_mem to 1024M or 4096M, the crashing doesn't happen. We tried #18869 Xen + #742 Dom0, system is okay. So the problem might be in Linux tree between #742~#753 I tried 2048M (and other value), but I wasn't reproduce it. Hmm, does it reproduce with dom0_mem=2048M on all boxes which you tested? thanks, ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel fix_pirq_eoi_page.patch Description: fix_pirq_eoi_page.patch ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
Hi. Good catch. Some comments. I attached two patches to fix, could you try them? - bss.page_aligned. Where is the section used? grep didn't tell me. Surely x86 uses .bss.page_aligned in linux/arch/[i386, x86_64]/kernel/head[-xen].S, but no files unuder linux/arch/ia64/ don't use it. - ia64_fast_eoi. I suppose ia64_fast_eoi is used for optimization instead of PHYSDEVOP_eoi. I'm not sure how much improvement it provides, though. Anyway ia64_fast_eoi hypercall implementation should also be updated which I overlooked when I added PHYSDEVOP_pirq_eoi_gmfn support. thanks, On Sun, Jan 04, 2009 at 06:05:07PM +0800, Zhang, Xiantao wrote: Hi, Isaku All The attached patch should fix the weird issue. In upstream, we also find some other weird issues, for example, we can't boot dom0 on some platforms, and dom0 may have different behavior with different initrds. After debug, I found it should be caused by incorrect setting for pirq_needs_eoi page. There are two main issues found during the debug: 1. the related two hypercalls are not enabled in the correct way, so dom0 and hypervisor doesn't have the agreement on which pirq needs EOI. 2. the page is not really linked to bss section even if this is the must, so kernel deems it as memory cache and uses it for many ways, and finally leads to varid issues. Thanks Xiantao You, Yongkang wrote: I tried 2048M (and other value), but I wasn't reproduce it. Hmm, does it reproduce with dom0_mem=2048M on all boxes which you tested? Isaku/All, This issue is really very hard to locate. Now I am a little suspecting it is related with building process, as if changing building method, this issue is gone too. 1, It doesn't happen to all machines. But it can be stably reproduce in our nightly test machine with same binary. 2, When system crashing, dom0_mem is set to 2048M. And if using other memory size, this issue disappeared too. 3, It seems happened between dom0 changeset 743~753, as it workds if we use old built Dom0 kernel + new Xen. And the old nightly testing doesn't have issue. 4, When I try to do regression testing between 743~753, I found different build method might cause crashing and non-crashing. In our default building process, as stubdomain is not supported in IA64, so we removed install-stubdom and dist-stubdom from install: and dist: lines in main Makefile. It has been changed more than 2 months. The real compiling command is make -j3 xyz_file. And the crashing issue is related with building process. When I do regression testing, sometimes I didn't change Makefile, but still use make -j3. Then the crashing is gone. I am not sure if my suspection is possible, as it still need more trying. Compiling Dom0 is not easy like Xen. It is costing. I would try to do more, but maybe not so quick, as many another things need to do at the same time. If the default compilation is okay, do you think it is worthy to do more investigating? Any suggestion will be much appreciated. Best Regards, Yongkang You On Tuesday, December 16, 2008 10:22 AM, Isaku Yamahata wrote: On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote: On Monday, December 08, 2008 2:10 PM, Isaku Yamahata wrote: On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote: Isaku Yamahata wrote: On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote: Hi Isaku, We re-get the detail information from serial port, please see below. Two comments add: Thank you. 1. We can be sure the Cset#18832 works well on the same tiger4 machine. But we did not do regression test between 18832 and this 18860. 2. It is strange that on another Tiger4 box, dom0 will NOT crash. Do you have any idea from the serial log? Thanks! I haven't hit this crash. And Kuwamura-san's test seems that he haven't hit it either. Kuwamura-san, is it correct? Hmm... it seems to depend on hw configuration? I'm inclined to suspect masking/unmasking interruption race. event channel issues? But that's just only my very vague guess. The difference between 18832 and 18860 means the merging xen-unstable into xen-ia64-unstable. Looking the log, I suspect linux-2.6.18-xen instead of xen. Could you provide the linux c/s which corresponds to 18832 and 18860? Hi Isaku, Yes, some of our machines do not crash. I am afraid there may be some potential issue. By testing 18832, we use linux#742. While 18860 uses linux#753. Thanks! Thank you. Taking rough look at them those change sets doesn't seem culprit. I agree with you that this may indicate some potential bugs... Hi All, This bug is stably reproduced, if providing dom0_mem=2048M in append option. And if setting dom0_mem to 1024M or 4096M, the crashing doesn't happen. We tried #18869 Xen + #742 Dom0, system is okay.
RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
Isaku Yamahata wrote: Hi. Good catch. Some comments. I attached two patches to fix, could you try them? - bss.page_aligned. Where is the section used? grep didn't tell me. Surely x86 uses .bss.page_aligned in linux/arch/[i386, x86_64]/kernel/head[-xen].S, but no files unuder linux/arch/ia64/ don't use it. You may need to check drivers/xen/core/evtchn.c, the code as following :-) Xiantao static int pirq_eoi_does_unmask; static DECLARE_BITMAP(pirq_needs_eoi, ALIGN(NR_PIRQS, PAGE_SIZE * 8)) __attribute__ ((__section__(.bss.page_aligned), __aligned__(PAGE_SIZE))); - ia64_fast_eoi. I suppose ia64_fast_eoi is used for optimization instead of PHYSDEVOP_eoi. I'm not sure how much improvement it provides, though. Anyway ia64_fast_eoi hypercall implementation should also be updated which I overlooked when I added PHYSDEVOP_pirq_eoi_gmfn support. thanks, On Sun, Jan 04, 2009 at 06:05:07PM +0800, Zhang, Xiantao wrote: Hi, Isaku All The attached patch should fix the weird issue. In upstream, we also find some other weird issues, for example, we can't boot dom0 on some platforms, and dom0 may have different behavior with different initrds. After debug, I found it should be caused by incorrect setting for pirq_needs_eoi page. There are two main issues found during the debug: 1. the related two hypercalls are not enabled in the correct way, so dom0 and hypervisor doesn't have the agreement on which pirq needs EOI. 2. the page is not really linked to bss section even if this is the must, so kernel deems it as memory cache and uses it for many ways, and finally leads to varid issues. Thanks Xiantao You, Yongkang wrote: I tried 2048M (and other value), but I wasn't reproduce it. Hmm, does it reproduce with dom0_mem=2048M on all boxes which you tested? Isaku/All, This issue is really very hard to locate. Now I am a little suspecting it is related with building process, as if changing building method, this issue is gone too. 1, It doesn't happen to all machines. But it can be stably reproduce in our nightly test machine with same binary. 2, When system crashing, dom0_mem is set to 2048M. And if using other memory size, this issue disappeared too. 3, It seems happened between dom0 changeset 743~753, as it workds if we use old built Dom0 kernel + new Xen. And the old nightly testing doesn't have issue. 4, When I try to do regression testing between 743~753, I found different build method might cause crashing and non-crashing. In our default building process, as stubdomain is not supported in IA64, so we removed install-stubdom and dist-stubdom from install: and dist: lines in main Makefile. It has been changed more than 2 months. The real compiling command is make -j3 xyz_file. And the crashing issue is related with building process. When I do regression testing, sometimes I didn't change Makefile, but still use make -j3. Then the crashing is gone. I am not sure if my suspection is possible, as it still need more trying. Compiling Dom0 is not easy like Xen. It is costing. I would try to do more, but maybe not so quick, as many another things need to do at the same time. If the default compilation is okay, do you think it is worthy to do more investigating? Any suggestion will be much appreciated. Best Regards, Yongkang You On Tuesday, December 16, 2008 10:22 AM, Isaku Yamahata wrote: On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote: On Monday, December 08, 2008 2:10 PM, Isaku Yamahata wrote: On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote: Isaku Yamahata wrote: On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote: Hi Isaku, We re-get the detail information from serial port, please see below. Two comments add: Thank you. 1. We can be sure the Cset#18832 works well on the same tiger4 machine. But we did not do regression test between 18832 and this 18860. 2. It is strange that on another Tiger4 box, dom0 will NOT crash. Do you have any idea from the serial log? Thanks! I haven't hit this crash. And Kuwamura-san's test seems that he haven't hit it either. Kuwamura-san, is it correct? Hmm... it seems to depend on hw configuration? I'm inclined to suspect masking/unmasking interruption race. event channel issues? But that's just only my very vague guess. The difference between 18832 and 18860 means the merging xen-unstable into xen-ia64-unstable. Looking the log, I suspect linux-2.6.18-xen instead of xen. Could you provide the linux c/s which corresponds to 18832 and 18860? Hi Isaku, Yes, some of our machines do not crash. I am afraid there may be some potential issue. By testing 18832, we use linux#742. While 18860 uses linux#753. Thanks! Thank you. Taking rough look at them those change sets doesn't seem culprit. I agree with you that this may indicate some potential bugs... Hi
Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
On Mon, Jan 05, 2009 at 12:29:55PM +0800, Zhang, Xiantao wrote: Isaku Yamahata wrote: Hi. Good catch. Some comments. I attached two patches to fix, could you try them? - bss.page_aligned. Where is the section used? grep didn't tell me. Surely x86 uses .bss.page_aligned in linux/arch/[i386, x86_64]/kernel/head[-xen].S, but no files unuder linux/arch/ia64/ don't use it. You may need to check drivers/xen/core/evtchn.c, the code as following :-) Xiantao static int pirq_eoi_does_unmask; static DECLARE_BITMAP(pirq_needs_eoi, ALIGN(NR_PIRQS, PAGE_SIZE * 8)) __attribute__ ((__section__(.bss.page_aligned), __aligned__(PAGE_SIZE))); Ah, that line was deleted by the chageset of 760:0d10be086a78. -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
Isaku Yamahata wrote: On Mon, Jan 05, 2009 at 12:29:55PM +0800, Zhang, Xiantao wrote: Isaku Yamahata wrote: Hi. Good catch. Some comments. I attached two patches to fix, could you try them? - bss.page_aligned. Where is the section used? grep didn't tell me. Surely x86 uses .bss.page_aligned in linux/arch/[i386, x86_64]/kernel/head[-xen].S, but no files unuder linux/arch/ia64/ don't use it. You may need to check drivers/xen/core/evtchn.c, the code as following :-) Xiantao static int pirq_eoi_does_unmask; static DECLARE_BITMAP(pirq_needs_eoi, ALIGN(NR_PIRQS, PAGE_SIZE * 8)) __attribute__ ((__section__(.bss.page_aligned), __aligned__(PAGE_SIZE))); Ah, that line was deleted by the chageset of 760:0d10be086a78 Oh, I haven't notice the check-in due to my old codebase. It introduces many odd issues to us. Okay, it is also good to remove it. :) For adopting fast eoi path, it should be okay to me. Please check-in them. Xiantao ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
On Mon, Jan 05, 2009 at 01:06:23PM +0800, Zhang, Xiantao wrote: Oh, I haven't notice the check-in due to my old codebase. It introduces many odd issues to us. Okay, it is also good to remove it. :) For adopting fast eoi path, it should be okay to me. Please check-in them. Applied, thanks. -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
I tried 2048M (and other value), but I wasn't reproduce it. Hmm, does it reproduce with dom0_mem=2048M on all boxes which you tested? Isaku/All, This issue is really very hard to locate. Now I am a little suspecting it is related with building process, as if changing building method, this issue is gone too. 1, It doesn't happen to all machines. But it can be stably reproduce in our nightly test machine with same binary. 2, When system crashing, dom0_mem is set to 2048M. And if using other memory size, this issue disappeared too. 3, It seems happened between dom0 changeset 743~753, as it workds if we use old built Dom0 kernel + new Xen. And the old nightly testing doesn't have issue. 4, When I try to do regression testing between 743~753, I found different build method might cause crashing and non-crashing. In our default building process, as stubdomain is not supported in IA64, so we removed install-stubdom and dist-stubdom from install: and dist: lines in main Makefile. It has been changed more than 2 months. The real compiling command is make -j3 xyz_file. And the crashing issue is related with building process. When I do regression testing, sometimes I didn't change Makefile, but still use make -j3. Then the crashing is gone. I am not sure if my suspection is possible, as it still need more trying. Compiling Dom0 is not easy like Xen. It is costing. I would try to do more, but maybe not so quick, as many another things need to do at the same time. If the default compilation is okay, do you think it is worthy to do more investigating? Any suggestion will be much appreciated. Best Regards, Yongkang You On Tuesday, December 16, 2008 10:22 AM, Isaku Yamahata wrote: On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote: On Monday, December 08, 2008 2:10 PM, Isaku Yamahata wrote: On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote: Isaku Yamahata wrote: On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote: Hi Isaku, We re-get the detail information from serial port, please see below. Two comments add: Thank you. 1. We can be sure the Cset#18832 works well on the same tiger4 machine. But we did not do regression test between 18832 and this 18860. 2. It is strange that on another Tiger4 box, dom0 will NOT crash. Do you have any idea from the serial log? Thanks! I haven't hit this crash. And Kuwamura-san's test seems that he haven't hit it either. Kuwamura-san, is it correct? Hmm... it seems to depend on hw configuration? I'm inclined to suspect masking/unmasking interruption race. event channel issues? But that's just only my very vague guess. The difference between 18832 and 18860 means the merging xen-unstable into xen-ia64-unstable. Looking the log, I suspect linux-2.6.18-xen instead of xen. Could you provide the linux c/s which corresponds to 18832 and 18860? Hi Isaku, Yes, some of our machines do not crash. I am afraid there may be some potential issue. By testing 18832, we use linux#742. While 18860 uses linux#753. Thanks! Thank you. Taking rough look at them those change sets doesn't seem culprit. I agree with you that this may indicate some potential bugs... Hi All, This bug is stably reproduced, if providing dom0_mem=2048M in append option. And if setting dom0_mem to 1024M or 4096M, the crashing doesn't happen. We tried #18869 Xen + #742 Dom0, system is okay. So the problem might be in Linux tree between #742~#753 I tried 2048M (and other value), but I wasn't reproduce it. Hmm, does it reproduce with dom0_mem=2048M on all boxes which you tested? thanks, ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
On Monday, December 08, 2008 2:10 PM, Isaku Yamahata wrote: On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote: Isaku Yamahata wrote: On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote: Hi Isaku, We re-get the detail information from serial port, please see below. Two comments add: Thank you. 1. We can be sure the Cset#18832 works well on the same tiger4 machine. But we did not do regression test between 18832 and this 18860. 2. It is strange that on another Tiger4 box, dom0 will NOT crash. Do you have any idea from the serial log? Thanks! I haven't hit this crash. And Kuwamura-san's test seems that he haven't hit it either. Kuwamura-san, is it correct? Hmm... it seems to depend on hw configuration? I'm inclined to suspect masking/unmasking interruption race. event channel issues? But that's just only my very vague guess. The difference between 18832 and 18860 means the merging xen-unstable into xen-ia64-unstable. Looking the log, I suspect linux-2.6.18-xen instead of xen. Could you provide the linux c/s which corresponds to 18832 and 18860? Hi Isaku, Yes, some of our machines do not crash. I am afraid there may be some potential issue. By testing 18832, we use linux#742. While 18860 uses linux#753. Thanks! Thank you. Taking rough look at them those change sets doesn't seem culprit. I agree with you that this may indicate some potential bugs... Hi All, This bug is stably reproduced, if providing dom0_mem=2048M in append option. And if setting dom0_mem to 1024M or 4096M, the crashing doesn't happen. We tried #18869 Xen + #742 Dom0, system is okay. So the problem might be in Linux tree between #742~#753 Best Regards, Yongkang You ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
Hi Isaku, We re-get the detail information from serial port, please see below. Two comments add: 1. We can be sure the Cset#18832 works well on the same tiger4 machine. But we did not do regression test between 18832 and this 18860. 2. It is strange that on another Tiger4 box, dom0 will NOT crash. Do you have any idea from the serial log? Thanks! ACPI: bus type pci registered (XEN) mm.c:769:d0 vcpu 0 iip 0xa0010054b360: bad mpa d 0 0x807ff14f08 (= 0x 100ab4000) Unable to handle kernel NULL pointer dereference (address 0008) swapper[1]: Oops 8804682956800 [1] Modules linked in: Pid: 1, CPU 0, comm: swapper psr : 1010085a2010 ifs : 850d ip : [a00100136340]Not tainted ip is at cache_alloc_refill+0x300/0x540 unat: pfs : 450d rsc : 0007 rnat: 1010085a6010 bsps: a0010054b360 pr : a581 ldrs: ccv : fpsr: 0009804c8a70433f csd : ssd : b0 : a00100136280 b6 : a001005659d0 b7 : a0010054f2a0 f6 : 1003e f7 : 1003e f8 : 1003e0040 f9 : 0 f10 : 0 f11 : 0 r1 : a001011448b0 r2 : r3 : e0807ff14f30 r8 : 00f0 r9 : 001b r10 : e0807ff14f20 r11 : 2094 r12 : e0007f5cfd90 r13 : e0007f5c8000 r14 : r15 : e0007ff1a114 r16 : 00100100 r17 : e0807ff14f24 r18 : e0807ff14f20 r19 : e0807ff14f08 r20 : 0040 r21 : e0007ff14f00 r22 : r23 : 00200200 r24 : e0007ff14f48 r25 : e19e6480 r26 : e0007ff18080 r27 : r28 : e19e6670 r29 : e19e6660 r30 : 003b r31 : e19e6488 Call Trace: [a0010001db20] show_stack+0x40/0xa0 sp=e0007f5cf940 bsp=e0007f5c9500 [a0010001e780] show_regs+0x840/0x880 sp=e0007f5cfb10 bsp=e0007f5c94a8 [a00100043460] die+0x1c0/0x380 sp=e0007f5cfb10 bsp=e0007f5c9460 [a0010006dae0] ia64_do_page_fault+0x880/0x9a0 sp=e0007f5cfb30 bsp=e0007f5c9410 [a001000702c0] xen_leave_kernel+0x0/0x3e0 sp=e0007f5cfbc0 bsp=e0007f5c9410 [a00100136340] cache_alloc_refill+0x300/0x540 sp=e0007f5cfd90 bsp=e0007f5c93a0 [a001001367c0] __kmalloc+0x240/0x360 sp=e0007f5cfd90 bsp=e0007f5c9368 [a00100106af0] __kzalloc+0x30/0x80 sp=e0007f5cfd90 bsp=e0007f5c9340 [a00100551f90] acpi_ds_build_internal_package_obj+0x190/0x340 sp=e0007f5cfd90 bsp=e0007f5c92f0 [a0010054e290] acpi_ds_eval_data_object_operands+0x1b0/0x260 sp=e0007f5cfd90 bsp=e0007f5c92b8 [a00100550070] acpi_ds_exec_end_op+0x730/0xac0 sp=e0007f5cfda0 bsp=e0007f5c9270 [a00100578700] acpi_ps_parse_loop+0x1040/0x1940 sp=e0007f5cfda0 bsp=e0007f5c9210 [a00100576b40] acpi_ps_parse_aml+0x100/0x560 sp=e0007f5cfdb0 bsp=e0007f5c91b8 [a0010054ee10] acpi_ds_execute_arguments+0x1f0/0x260 sp=e0007f5cfdb0 bsp=e0007f5c9160 [a0010054f0c0] acpi_ds_get_package_arguments+0xc0/0xe0 sp=e0007f5cfdb0 bsp=e0007f5c9140 [a00100574710] acpi_ns_init_one_object+0x2b0/0x380 sp=e0007f5cfdb0 bsp=e0007f5c90f0 [a001005753c0] acpi_ns_walk_namespace+0x280/0x2e0 sp=e0007f5cfdb0 bsp=e0007f5c9080 [a00100570b50] acpi_walk_namespace+0x90/0xc0 sp=e0007f5cfdb0 bsp=e0007f5c9030 [a00100574400] acpi_ns_initialize_objects+0x60/0xc0 sp=e0007f5cfdb0 bsp=e0007f5c9018 [a001005884e0] acpi_initialize_objects+0x60/0x100 sp=e0007f5cfdd0 bsp=e0007f5c8ff0 [a00100c802f0] acpi_init+0xb0/0x460 sp=e0007f5cfdd0 bsp=e0007f5c8fd0 [a00100011900] init+0x320/0x840 sp=e0007f5cfe00 bsp=e0007f5c8f98 [a0010001bfd0] kernel_thread_helper+0x30/0x60 sp=e0007f5cfe30 bsp=e0007f5c8f70 [a001000110e0] start_kernel_thread+0x20/0x40 sp=e0007f5cfe30 bsp=e0007f5c8f70 0Kernel panic - not syncing: Attempted to kill init! (XEN) Domain 0 crashed: rebooting machine in 5 seconds. Isaku Yamahata
Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote: Hi Isaku, We re-get the detail information from serial port, please see below. Two comments add: Thank you. 1. We can be sure the Cset#18832 works well on the same tiger4 machine. But we did not do regression test between 18832 and this 18860. 2. It is strange that on another Tiger4 box, dom0 will NOT crash. Do you have any idea from the serial log? Thanks! I haven't hit this crash. And Kuwamura-san's test seems that he haven't hit it either. Kuwamura-san, is it correct? Hmm... it seems to depend on hw configuration? I'm inclined to suspect masking/unmasking interruption race. event channel issues? But that's just only my very vague guess. The difference between 18832 and 18860 means the merging xen-unstable into xen-ia64-unstable. Looking the log, I suspect linux-2.6.18-xen instead of xen. Could you provide the linux c/s which corresponds to 18832 and 18860? thanks, ACPI: bus type pci registered (XEN) mm.c:769:d0 vcpu 0 iip 0xa0010054b360: bad mpa d 0 0x807ff14f08 (= 0x 100ab4000) Unable to handle kernel NULL pointer dereference (address 0008) swapper[1]: Oops 8804682956800 [1] Modules linked in: Pid: 1, CPU 0, comm: swapper psr : 1010085a2010 ifs : 850d ip : [a00100136340] Not tainted ip is at cache_alloc_refill+0x300/0x540 unat: pfs : 450d rsc : 0007 rnat: 1010085a6010 bsps: a0010054b360 pr : a581 ldrs: ccv : fpsr: 0009804c8a70433f csd : ssd : b0 : a00100136280 b6 : a001005659d0 b7 : a0010054f2a0 f6 : 1003e f7 : 1003e f8 : 1003e0040 f9 : 0 f10 : 0 f11 : 0 r1 : a001011448b0 r2 : r3 : e0807ff14f30 r8 : 00f0 r9 : 001b r10 : e0807ff14f20 r11 : 2094 r12 : e0007f5cfd90 r13 : e0007f5c8000 r14 : r15 : e0007ff1a114 r16 : 00100100 r17 : e0807ff14f24 r18 : e0807ff14f20 r19 : e0807ff14f08 r20 : 0040 r21 : e0007ff14f00 r22 : r23 : 00200200 r24 : e0007ff14f48 r25 : e19e6480 r26 : e0007ff18080 r27 : r28 : e19e6670 r29 : e19e6660 r30 : 003b r31 : e19e6488 Call Trace: [a0010001db20] show_stack+0x40/0xa0 sp=e0007f5cf940 bsp=e0007f5c9500 [a0010001e780] show_regs+0x840/0x880 sp=e0007f5cfb10 bsp=e0007f5c94a8 [a00100043460] die+0x1c0/0x380 sp=e0007f5cfb10 bsp=e0007f5c9460 [a0010006dae0] ia64_do_page_fault+0x880/0x9a0 sp=e0007f5cfb30 bsp=e0007f5c9410 [a001000702c0] xen_leave_kernel+0x0/0x3e0 sp=e0007f5cfbc0 bsp=e0007f5c9410 [a00100136340] cache_alloc_refill+0x300/0x540 sp=e0007f5cfd90 bsp=e0007f5c93a0 [a001001367c0] __kmalloc+0x240/0x360 sp=e0007f5cfd90 bsp=e0007f5c9368 [a00100106af0] __kzalloc+0x30/0x80 sp=e0007f5cfd90 bsp=e0007f5c9340 [a00100551f90] acpi_ds_build_internal_package_obj+0x190/0x340 sp=e0007f5cfd90 bsp=e0007f5c92f0 [a0010054e290] acpi_ds_eval_data_object_operands+0x1b0/0x260 sp=e0007f5cfd90 bsp=e0007f5c92b8 [a00100550070] acpi_ds_exec_end_op+0x730/0xac0 sp=e0007f5cfda0 bsp=e0007f5c9270 [a00100578700] acpi_ps_parse_loop+0x1040/0x1940 sp=e0007f5cfda0 bsp=e0007f5c9210 [a00100576b40] acpi_ps_parse_aml+0x100/0x560 sp=e0007f5cfdb0 bsp=e0007f5c91b8 [a0010054ee10] acpi_ds_execute_arguments+0x1f0/0x260 sp=e0007f5cfdb0 bsp=e0007f5c9160 [a0010054f0c0] acpi_ds_get_package_arguments+0xc0/0xe0 sp=e0007f5cfdb0 bsp=e0007f5c9140 [a00100574710] acpi_ns_init_one_object+0x2b0/0x380 sp=e0007f5cfdb0 bsp=e0007f5c90f0 [a001005753c0] acpi_ns_walk_namespace+0x280/0x2e0 sp=e0007f5cfdb0 bsp=e0007f5c9080 [a00100570b50] acpi_walk_namespace+0x90/0xc0 sp=e0007f5cfdb0 bsp=e0007f5c9030 [a00100574400] acpi_ns_initialize_objects+0x60/0xc0 sp=e0007f5cfdb0 bsp=e0007f5c9018 [a001005884e0]
RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
Isaku Yamahata wrote: On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote: Hi Isaku, We re-get the detail information from serial port, please see below. Two comments add: Thank you. 1. We can be sure the Cset#18832 works well on the same tiger4 machine. But we did not do regression test between 18832 and this 18860. 2. It is strange that on another Tiger4 box, dom0 will NOT crash. Do you have any idea from the serial log? Thanks! I haven't hit this crash. And Kuwamura-san's test seems that he haven't hit it either. Kuwamura-san, is it correct? Hmm... it seems to depend on hw configuration? I'm inclined to suspect masking/unmasking interruption race. event channel issues? But that's just only my very vague guess. The difference between 18832 and 18860 means the merging xen-unstable into xen-ia64-unstable. Looking the log, I suspect linux-2.6.18-xen instead of xen. Could you provide the linux c/s which corresponds to 18832 and 18860? Hi Isaku, Yes, some of our machines do not crash. I am afraid there may be some potential issue. By testing 18832, we use linux#742. While 18860 uses linux#753. Thanks! thanks, ACPI: bus type pci registered (XEN) mm.c:769:d0 vcpu 0 iip 0xa0010054b360: bad mpa d 0 0x807ff14f08 (= 0x 100ab4000) Unable to handle kernel NULL pointer dereference (address 0008) swapper[1]: Oops 8804682956800 [1] Modules linked in: Pid: 1, CPU 0, comm: swapper psr : 1010085a2010 ifs : 850d ip : [a00100136340]Not tainted ip is at cache_alloc_refill+0x300/0x540 unat: pfs : 450d rsc : 0007 rnat: 1010085a6010 bsps: a0010054b360 pr : a581 ldrs: ccv : fpsr: 0009804c8a70433f csd : ssd : b0 : a00100136280 b6 : a001005659d0 b7 : a0010054f2a0 f6 : 1003e f7 : 1003e f8 : 1003e0040 f9 : 0 f10 : 0 f11 : 0 r1 : a001011448b0 r2 : r3 : e0807ff14f30 r8 : 00f0 r9 : 001b r10 : e0807ff14f20 r11 : 2094 r12 : e0007f5cfd90 r13 : e0007f5c8000 r14 : r15 : e0007ff1a114 r16 : 00100100 r17 : e0807ff14f24 r18 : e0807ff14f20 r19 : e0807ff14f08 r20 : 0040 r21 : e0007ff14f00 r22 : r23 : 00200200 r24 : e0007ff14f48 r25 : e19e6480 r26 : e0007ff18080 r27 : r28 : e19e6670 r29 : e19e6660 r30 : 003b r31 : e19e6488 Call Trace: [a0010001db20] show_stack+0x40/0xa0 sp=e0007f5cf940 bsp=e0007f5c9500 [a0010001e780] show_regs+0x840/0x880 sp=e0007f5cfb10 bsp=e0007f5c94a8 [a00100043460] die+0x1c0/0x380 sp=e0007f5cfb10 bsp=e0007f5c9460 [a0010006dae0] ia64_do_page_fault+0x880/0x9a0 sp=e0007f5cfb30 bsp=e0007f5c9410 [a001000702c0] xen_leave_kernel+0x0/0x3e0 sp=e0007f5cfbc0 bsp=e0007f5c9410 [a00100136340] cache_alloc_refill+0x300/0x540 sp=e0007f5cfd90 bsp=e0007f5c93a0 [a001001367c0] __kmalloc+0x240/0x360 sp=e0007f5cfd90 bsp=e0007f5c9368 [a00100106af0] __kzalloc+0x30/0x80 sp=e0007f5cfd90 bsp=e0007f5c9340 [a00100551f90] acpi_ds_build_internal_package_obj+0x190/0x340 sp=e0007f5cfd90 bsp=e0007f5c92f0 [a0010054e290] acpi_ds_eval_data_object_operands+0x1b0/0x260 sp=e0007f5cfd90 bsp=e0007f5c92b8 [a00100550070] acpi_ds_exec_end_op+0x730/0xac0 sp=e0007f5cfda0 bsp=e0007f5c9270 [a00100578700] acpi_ps_parse_loop+0x1040/0x1940 sp=e0007f5cfda0 bsp=e0007f5c9210 [a00100576b40] acpi_ps_parse_aml+0x100/0x560 sp=e0007f5cfdb0 bsp=e0007f5c91b8 [a0010054ee10] acpi_ds_execute_arguments+0x1f0/0x260 sp=e0007f5cfdb0 bsp=e0007f5c9160 [a0010054f0c0] acpi_ds_get_package_arguments+0xc0/0xe0 sp=e0007f5cfdb0 bsp=e0007f5c9140 [a00100574710] acpi_ns_init_one_object+0x2b0/0x380 sp=e0007f5cfdb0 bsp=e0007f5c90f0 [a001005753c0] acpi_ns_walk_namespace+0x280/0x2e0 sp=e0007f5cfdb0 bsp=e0007f5c9080
Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote: Isaku Yamahata wrote: On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote: Hi Isaku, We re-get the detail information from serial port, please see below. Two comments add: Thank you. 1. We can be sure the Cset#18832 works well on the same tiger4 machine. But we did not do regression test between 18832 and this 18860. 2. It is strange that on another Tiger4 box, dom0 will NOT crash. Do you have any idea from the serial log? Thanks! I haven't hit this crash. And Kuwamura-san's test seems that he haven't hit it either. Kuwamura-san, is it correct? Hmm... it seems to depend on hw configuration? I'm inclined to suspect masking/unmasking interruption race. event channel issues? But that's just only my very vague guess. The difference between 18832 and 18860 means the merging xen-unstable into xen-ia64-unstable. Looking the log, I suspect linux-2.6.18-xen instead of xen. Could you provide the linux c/s which corresponds to 18832 and 18860? Hi Isaku, Yes, some of our machines do not crash. I am afraid there may be some potential issue. By testing 18832, we use linux#742. While 18860 uses linux#753. Thanks! Thank you. Taking rough look at them those change sets doesn't seem culprit. I agree with you that this may indicate some potential bugs... -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
Hi, On [EMAIL PROTECTED], Isaku Yamahata wrote: On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote: 1. We can be sure the Cset#18832 works well on the same tiger4 machine. But we did not do regression test between 18832 and this 18860. 2. It is strange that on another Tiger4 box, dom0 will NOT crash. Do you have any idea from the serial log? Thanks! I haven't hit this crash. And Kuwamura-san's test seems that he haven't hit it either. Kuwamura-san, is it correct? Yes, that's correct. I have not found this crash. However, I did not test 18860. I tested 18832 and 18869, and dom0 worked well on both csets. Best regards, -- KUWAMURA Shin'ya ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash
Only call stack? Panic messages and/or register dump aren't available? On Fri, Dec 05, 2008 at 11:45:46AM +0800, Zhang, Jingke wrote: Hi all, We found the latest Cset#18860 will make dom0 crash. Last stable Cset we have tested is 18832. Thanks! Build info: xen: 18860 linux: 753 remote-ioemu: b4d410a1c28fcd1ea528d94eb8b94b79286c25ed Call trace: = [a0010001db20] show_stack+0x40/0xa0 sp=e0007f5cf940 bsp=e0007f5c9500 [a0010001e780] show_regs+0x840/0x880 sp=e0007f5cfb10 bsp=e0007f5c94a8 [a00100043460] die+0x1c0/0x380 sp=e0007f5cfb10 bsp=e0007f5c9460 [a0010006dae0] ia64_do_page_fault+0x880/0x9a0 sp=e0007f5cfb30 bsp=e0007f5c9410 [a001000702c0] xen_leave_kernel+0x0/0x3e0 sp=e0007f5cfbc0 bsp=e0007f5c9410 [a00100136340] cache_alloc_refill+0x300/0x540 sp=e0007f5cfd90 bsp=e0007f5c93a0 [a001001367c0] __kmalloc+0x240/0x360 sp=e0007f5cfd90 bsp=e0007f5c9368 [a00100106af0] __kzalloc+0x30/0x80 sp=e0007f5cfd90 bsp=e0007f5c9340 [a00100551f90] acpi_ds_build_internal_package_obj+0x190/0x340 sp=e0007f5cfd90 bsp=e0007f5c92f0 [a0010054e290] acpi_ds_eval_data_object_operands+0x1b0/0x260 sp=e0007f5cfd90 bsp=e0007f5c92b8 [a00100550070] acpi_ds_exec_end_op+0x730/0xac0 sp=e0007f5cfda0 bsp=e0007f5c9270 [a00100578700] acpi_ps_parse_loop+0x1040/0x1940 sp=e0007f5cfda0 bsp=e0007f5c9210 [a00100576b40] acpi_ps_parse_aml+0x100/0x560 sp=e0007f5cfdb0 bsp=e0007f5c91b8 [a0010054ee10] acpi_ds_execute_arguments+0x1f0/0x260 sp=e0007f5cfdb0 bsp=e0007f5c9160 [a0010054f0c0] acpi_ds_get_package_arguments+0xc0/0xe0 sp=e0007f5cfdb0 bsp=e0007f5c9140 [a00100574710] acpi_ns_init_one_object+0x2b0/0x380 sp=e0007f5cfdb0 bsp=e0007f5c90f0 [a001005753c0] acpi_ns_walk_namespace+0x280/0x2e0 sp=e0007f5cfdb0 bsp=e0007f5c9080 [a00100570b50] acpi_walk_namespace+0x90/0xc0 sp=e0007f5cfdb0 bsp=e0007f5c9030 [a00100574400] acpi_ns_initialize_objects+0x60/0xc0 sp=e0007f5cfdb0 bsp=e0007f5c9018 [a001005884e0] acpi_initialize_objects+0x60/0x100 sp=e0007f5cfdd0 bsp=e0007f5c8ff0 [a00100c802f0] acpi_init+0xb0/0x460 sp=e0007f5cfdd0 bsp=e0007f5c8fd0 [a00100011900] init+0x320/0x840 sp=e0007f5cfe00 bsp=e0007f5c8f98 [a0010001bfd0] kernel_thread_helper+0x30/0x60 sp=e0007f5cfe30 bsp=e0007f5c8f70 [a001000110e0] start_kernel_thread+0x20/0x40 sp=e0007f5cfe30 bsp=e0007f5c8f70 0Kernel panic - not syncing: Attempted to kill init! (XEN) Domain 0 crashed: rebooting machine in 5 seconds. Thanks, Zhang Jingke ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel