Hi, Isaku & All
    The attached patch should fix the weird issue.  In upstream, we also find 
some other weird issues, for example, we can't boot dom0 on some platforms, and 
dom0 may have different behavior with different initrds.  After debug, I found 
it should be caused by incorrect setting for pirq_needs_eoi page.  There are 
two main issues found during the debug: 
1.  the related two hypercalls are not enabled in the correct way, so dom0 and 
hypervisor doesn't have the agreement on which pirq needs EOI. 
2.  the page is not really linked to bss section even if this is the must, so 
kernel deems it as memory cache and uses it for many ways, and finally leads to 
varid issues. 

You, Yongkang wrote:
>> I tried 2048M (and other value), but I wasn't reproduce it.
>> Hmm, does it reproduce with "dom0_mem=2048M" on all boxes which you
>> tested?
> Isaku/All,
> This issue is really very hard to locate. Now I am a little
> suspecting it is related with building process, as if changing
> building method, this issue is gone too.  
> 1, It doesn't happen to all machines. But it can be stably reproduce
> in our nightly test machine with same binary. 2, When system
> crashing, dom0_mem is set to 2048M. And if using other memory size,
> this issue disappeared too. 3, It seems happened between dom0
> changeset 743~753, as it workds if we use old built Dom0 kernel + new
> Xen. And the old nightly testing doesn't have issue. 4, When I try to
> do regression testing between 743~753, I found different build method
> might cause crashing and non-crashing.    
> In our default building process, as stubdomain is not supported in
> IA64, so we removed install-stubdom and dist-stubdom from "install:"
> and "dist:" lines in main Makefile. It has been changed  more than 2
> months. The real compiling command is "make -j3 >xyz_file". And the
> crashing issue is related with building process.    
> When I do regression testing, sometimes I didn't change Makefile, but
> still use "make -j3". Then the crashing is gone. 
> I am not sure if my suspection is possible, as it still need more
> trying. Compiling Dom0 is not easy like Xen. It is costing. I would
> try to do more, but maybe not so quick, as many another things need
> to do at the same time. If the default compilation is okay, do you
> think it is worthy to do more investigating?    
> Any suggestion will be much appreciated.
> Best Regards,
> Yongkang You
> On Tuesday, December 16, 2008 10:22 AM, "Isaku Yamahata" wrote:
>> On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote:
>>> On Monday, December 08, 2008 2:10 PM, "Isaku Yamahata" wrote:
>>>> On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote:
>>>>> Isaku Yamahata wrote:
>>>>>> On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
>>>>>>> Hi Isaku,
>>>>>>>     We re-get the detail information from serial port, please
>>>>>>> see below. Two comments add:
>>>>>> Thank you.
>>>>>>>     1. We can be sure the Cset#18832 works well on the same
>>>>>>> tiger4 machine. But we did not do regression test between 18832
>>>>>>> and this 18860. 
>>>>>>>     2. It is strange that on another Tiger4 box, dom0 will NOT
>>>>>>> crash. Do you have any idea from the serial log? Thanks!
>>>>>> I haven't hit this crash. And Kuwamura-san's test seems that
>>>>>> he haven't hit it either. Kuwamura-san, is it correct?
>>>>>> Hmm... it seems to depend on hw configuration?
>>>>>> I'm inclined to suspect masking/unmasking interruption race.
>>>>>> event channel issues? But that's just only my very vague guess.
>>>>>> The difference between 18832 and 18860 means the merging
>>>>>> xen-unstable into xen-ia64-unstable. Looking the log, I suspect
>>>>>> linux-2.6.18-xen instead of xen.
>>>>>> Could you provide the linux c/s which corresponds to 18832 and
>>>>>> 18860?
>>>>> Hi Isaku,
>>>>>     Yes, some of our machines do not crash. I am afraid there may
>>>>>     be some potential issue. By testing 18832, we use linux#742.
>>>>> While 18860 uses linux#753. Thanks!
>>>> Thank you. Taking rough look at them those change sets doesn't seem
>>>> culprit. I agree with you that this may indicate some potential
>>>> bugs...
>>> Hi All,
>>> This bug is stably reproduced, if providing "dom0_mem=2048M" in
>>> append option. And if setting dom0_mem to 1024M or 4096M, the
>>> crashing doesn't happen. 
>>> We tried #18869 Xen + #742 Dom0, system is okay. So the problem
>>> might be in Linux tree between #742~#753
>> I tried 2048M (and other value), but I wasn't reproduce it.
>> Hmm, does it reproduce with "dom0_mem=2048M" on all boxes which you
>> tested? 
>> thanks,
> _______________________________________________
> Xen-ia64-devel mailing list
> Xen-ia64-devel@lists.xensource.com
> http://lists.xensource.com/xen-ia64-devel

Attachment: fix_pirq_eoi_page.patch
Description: fix_pirq_eoi_page.patch

Xen-ia64-devel mailing list

Reply via email to