Re: [Xen-ia64-devel] PATCH: slightly improve stability

2006-06-23 Thread Isaku Yamahata

Is there any reason why the Anthony's patch was dropped?
I think this patch is also needed.

I got the following message. I guess the cause is as follows
But this happens very rarely...

linux-2.6-xen-sparse/arch/ia64/xen/xenentry.S
Here psr.i and psr.ic is off
rse_clear_invalid:
...
(pRecurse) br.call.dptk.few b0=rse_clear_invalid
;;
mov loc8=0  0xa001000687c0
   please notice ifs = 8000
mov loc9=0

1. Right before mov loc8=0, vcpu is switched to another cpu.
2. While the vcpu is waiting for cpu, the tlb entry which backs the rse
   stack is purged.
3. The vcpu gets cpu again, tlb miss fault occurs with isr.ir = 1.
4. xen ia64_do_page_fault() calls handle_lazy_cover() which sets
   cr.ifs = 0.
5. xen returns cpu execution to the guest.
6. mov loc8 = 0 is executed with cfm = 0.
   Illigal operation fault is raised 
7. priv_handle_op() is called. but it fails to emulate because
   mov loc8 = 0 isn't privileged op.
8. ia64_handle_privop() calls panic_domain().


Thanks.

(XEN) priv_emulate: priv_handle_op fails, isr=0x0
(XEN) $ PANIC in domain 0 (k6=0xf41c8000): psr.ic off, delivering 
fault=5400,ipsr=101208026030,iip=a001000687c0,ifa=20144f60,isr=0,PSCB.iip=20144f60
(XEN) 
(XEN) Call Trace:
(XEN)  [f409e030] show_stack+0x80/0xa0
(XEN) sp=f41cfb80 bsp=f41c8e48
(XEN)  [f407d780] panic_domain+0xf0/0x1d0
(XEN) sp=f41cfd50 bsp=f41c8de0
(XEN)  [f40707b0] check_bad_nested_interruption+0x110/0x120
(XEN) sp=f41cfe00 bsp=f41c8db0
(XEN)  [f4070a20] reflect_interruption+0x260/0x460
(XEN) sp=f41cfe00 bsp=f41c8d60
(XEN)  [f409cba0] ia64_leave_kernel+0x0/0x310
(XEN) sp=f41cfe00 bsp=f41c8d60
(XEN)  [a001000687c0] ???
(XEN) sp=f41d bsp=f41c8d60
(XEN) d 0xf7ffb208 domid 0
(XEN) vcpu 0xf41c8000 vcpu 3
(XEN) 
(XEN) CPU 3
(XEN) psr : 101208026030 ifs : 8000 ip  : [a001000687c0]
(XEN) ip is at ???
(XEN) unat:  pfs : 8710 rsc : 00580008
(XEN) rnat:  bsps: eb328fe8 pr  : 0559a7a9
(XEN) ldrs: 0060 ccv :  fpsr: 0009804c0270033f
(XEN) csd :  ssd : 
(XEN) b0  : a001000687c0 b6  : 20144f60 b7  : a0010640
(XEN) f6  : 1003e f7  : 0
(XEN) f8  : 100198ff97fe0 f9  : 1003eff05
(XEN) f10 : 1003e00b0 f11 : 1001192d7b6702eedd629
(XEN) r1  : 2021c278 r2  : c309 r3  : 6fc5e7e0
(XEN) r8  : 2003eff0 r9  : 0001 r10 : 
(XEN) r11 : c593 r12 : 6fc5e7e0 r13 : 2048cac0
(XEN) r14 : 20144f60 r15 : 20217320 r16 : eb328fc8
(XEN) r17 : 02b0 r18 : 0058 r19 : 0058
(XEN) r20 : 0009804c8a70033f r21 : 20109c70 r22 : 
(XEN) r23 : 6fff7fffc128 r24 :  r25 : 
(XEN) r26 : c48b r27 : 000f r28 : 20144f60
(XEN) r29 : 001308126030 r30 : 8002 r31 : 0559a361
(XEN) domain_crash_sync called from xenmisc.c:194
(XEN) Domain 0 (vcpu#3) crashed on cpu#3:
(XEN) d 0xf7ffb208 domid 0
(XEN) vcpu 0xf41c8000 vcpu 3


On Fri, Apr 28, 2006 at 11:18:45AM +0800, Xu, Anthony wrote:
 Hi Tristan,
 Could you please check whether this patch address RSE issue?
 
 Yes, Intel QA team is doing the test in the meantime.
 
 
 Thanks,
 -Anthony 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Xu, Anthony
 Sent: 2006?4?28? 9:48
 To: Tristan Gingold; xen-ia64-devel@lists.xensource.com; Magenheimer, Dan (HP
 Labs Fort Collins); Alex Williamson
 Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Tristan
 Gingold
 Sent: 2006?4?27? 23:14
 To: xen-ia64-devel@lists.xensource.com; Magenheimer, Dan (HP Labs Fort
 Collins); Alex Williamson
 Subject: [Xen-ia64-devel] PATCH: slightly improve stability
 
 Hi,
 
 as reported earlier, this patch seems to improve stability: crashes are at
 least more coherent and maybe less frequent.
 
 RSE handling seems to have a bug: crahes are now due to either a bad value 
 in
 a stacked register or a use of an invalid stacked register (although cfm
 seems correct in gdb!)
 
 I'm looking at this too,
 Yes there is a bug about handle_lazy_cover.
 
 void ia64_do_page_fault (unsigned long address, unsigned long isr, struct
 pt_regs *regs, unsigned long itir)
 {
  unsigned long iip = regs-cr_iip

Re: [Xen-ia64-devel] PATCH: slightly improve stability

2006-06-23 Thread Alex Williamson
On Fri, 2006-06-23 at 18:19 +0900, Isaku Yamahata wrote:
 Is there any reason why the Anthony's patch was dropped?
 I think this patch is also needed.

   I don't recall specifically, but I would guess it was because there
were several test patches tagged onto this thread and while trying to
parse out the important parts, I thought the minstate.h changes
superseded these.  I can add in the rest as well.  I've seen the same
panic on rare occasion.  Thanks,

Alex

-- 
Alex Williamson HP Open Source  Linux Org.


___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


Re: [Xen-ia64-devel] PATCH: slightly improve stability

2006-06-23 Thread Alex Williamson
On Fri, 2006-06-23 at 18:19 +0900, Isaku Yamahata wrote:
 Is there any reason why the Anthony's patch was dropped?
 I think this patch is also needed.

   I went ahead and applied this.  Thanks,

Alex

-- 
Alex Williamson HP Open Source  Linux Org.


___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


RE: [Xen-ia64-devel] PATCH: slightly improve stability

2006-05-08 Thread Alex Williamson
On Sun, 2006-04-30 at 13:43 +0800, Xu, Anthony wrote:

 --- a/linux-2.6-xen-sparse/arch/ia64/xen/xenminstate.hThu Apr 27 
 02:55:42 2006
 +++ b/linux-2.6-xen-sparse/arch/ia64/xen/xenminstate.hSat Apr 29 
 13:14:58 2006
 @@ -155,6 +155,8 @@
   ;;  
 \
   ld4 r30=[r8];   
 \
   ;;  
 \
 + /* set XSI_INCOMPL_REGFR 0 */   
 \
 + st4 [r8]=r0;
 \
   cmp.eq  p6,p7=r30,r0;   
 \
   ;; /* not sure if this stop bit is necessary */ 
 \
  (p6) adds r8=XSI_PRECOVER_IFS-XSI_INCOMPL_REGFR,r8;  

   Applied.

-- 
Alex Williamson HP Linux  Open Source Lab


___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


RE: [Xen-ia64-devel] PATCH: slightly improve stability

2006-04-30 Thread Magenheimer, Dan (HP Labs Fort Collins)
Excellent!  I agree you have found a very difficult bug!
I am now up to 167 linux compiles with no segfaults!
Congratulations Anthony!

One minor suggestion:  I think the new added store can
be in the same cycle as the previous load (no stop bit
needed).  I didn't look at the bundling... perhaps it
doesn't matter.

Dan

 -Original Message-
 From: Xu, Anthony [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, April 29, 2006 11:44 PM
 To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold; 
 xen-ia64-devel@lists.xensource.com; Williamson, Alex (Linux 
 Kernel Dev)
 Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
 From: Magenheimer, Dan 
 Sent: 2006年4月29日 21:58
 To: Xu, Anthony; Tristan Gingold; xen-ia64-devel@lists.xensource.com;
 Williamson, Alex (Linux Kernel Dev)
 Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
 Hi Anthony --
 
 With both Tristan's stability patch and your earlier patch,
 I have completed 103 linux compiles now with no segfaults
 yet.   Did you see your segfault with Tristan's patch
 included?
 
 I'll continue running over the weekend with the bits I
 have but if I see a segfault I will add in the additional
 store in Xen entry (minstate.h) from your newer patch.
 
 
 --- a/linux-2.6-xen-sparse/arch/ia64/xen/xenminstate.h
 Thu Apr 27 02:55:42 2006
 +++ b/linux-2.6-xen-sparse/arch/ia64/xen/xenminstate.h
 Sat Apr 29 13:14:58 2006
 @@ -155,6 +155,8 @@
   ;;  
   \
   ld4 r30=[r8];   
   \
   ;;  
   \
 + /* set XSI_INCOMPL_REGFR 0 */   
   \
 + st4 [r8]=r0;
   \
   cmp.eq  p6,p7=r30,r0;   
   \
   ;; /* not sure if this stop bit is necessary */ 
   \
  (p6) adds r8=XSI_PRECOVER_IFS-XSI_INCOMPL_REGFR,r8;  
 
 The additional store is necessary.
 
 In theory, after Guest executes cover, incomplete frame 
 changes to complete 
 frame. So Guest should set INCOMPL to 0 just after cover. 
 At least before guest 
 psr.ic and psr.i are turned on.
 
 Previously, only when Guest executes rfi, INCOMPL is set to 
 0. The window 
 between cover and rfi causes trouble in below scenario.
 
 1. Application A calls system call.
 
 2. In OS breaks handler entry, INCOMPL is 0. Due to its system call, 
Linux kernel doesn't execute cover.
 
 3. Before returning to Application A, schedule happens, 
 Application B begins
to run.
 
 4. A TLB miss happens on the context of B, this may make 
 INCOMPL 1, before 
Returning to B, (that means rfi is not executed, and 
 INCOMPL is still 1)
   schedule happens again. A resumes to run with INCOMPL 1 
 (this is incorrect now).
 
 5. As mentioned before, this is system call, cover is executed in 
ia64_leave_kernel path.  Because INCOMPL is 1, this 
 cover is not actually
executed, but this cover should be executed.
 
 5. Thus application A's frame is destroyed. Issue appears.
 
 
 I did catch this scenario.
 
 Thanks,
 Anthony
 
 
 Dan
 
  -Original Message-
  From: Xu, Anthony [mailto:[EMAIL PROTECTED]
  Sent: Saturday, April 29, 2006 12:03 AM
  To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold;
  xen-ia64-devel@lists.xensource.com; Williamson, Alex (Linux
  Kernel Dev)
  Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
  Hi Dan,
 
  Yes, we also got a segmentation fault in 1 run out of 30.
 
  Could you please try this new patch?
 
  Thanks,
  -Anthony
 
  -Original Message-
  From: Magenheimer, Dan (HP Labs Fort Collins)
  [mailto:[EMAIL PROTECTED]
  Sent: 2006年4月28日 22:49
  To: Xu, Anthony; Tristan Gingold; 
 xen-ia64-devel@lists.xensource.com;
  Williamson, Alex (Linux Kernel Dev)
  Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
  
  Hi Anthony --
  
  I tried your patch overnight and still got a segmentation
  fault in 1 run out of 50.  I didn't try Tristan's patch yet,
  so will try both at the same time next... perhaps there
  are two different problems that show up as the segmentation
  fault.
  
  Dan
  
   -Original Message-
   From: Xu, Anthony [mailto:[EMAIL PROTECTED]
   Sent: Thursday, April 27, 2006 9:19 PM
   To: Xu, Anthony; Tristan Gingold;
   xen-ia64-devel@lists.xensource.com; Magenheimer, Dan (HP Labs
   Fort Collins); Williamson, Alex (Linux Kernel Dev)
   Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
  
   Hi Tristan,
   Could you please check whether this patch address RSE issue?
  
   Yes, Intel QA team is doing the test in the meantime.
  
  
   Thanks,
   -Anthony
  
   -Original Message-
   From

RE: [Xen-ia64-devel] PATCH: slightly improve stability

2006-04-29 Thread Xu, Anthony
Hi Dan,

Yes, we also got a segmentation fault in 1 run out of 30.

Could you please try this new patch?

Thanks,
-Anthony 

-Original Message-
From: Magenheimer, Dan (HP Labs Fort Collins)
[mailto:[EMAIL PROTECTED]
Sent: 2006?4?28? 22:49
To: Xu, Anthony; Tristan Gingold; xen-ia64-devel@lists.xensource.com;
Williamson, Alex (Linux Kernel Dev)
Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability

Hi Anthony --

I tried your patch overnight and still got a segmentation
fault in 1 run out of 50.  I didn't try Tristan's patch yet,
so will try both at the same time next... perhaps there
are two different problems that show up as the segmentation
fault.

Dan

 -Original Message-
 From: Xu, Anthony [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 27, 2006 9:19 PM
 To: Xu, Anthony; Tristan Gingold;
 xen-ia64-devel@lists.xensource.com; Magenheimer, Dan (HP Labs
 Fort Collins); Williamson, Alex (Linux Kernel Dev)
 Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability

 Hi Tristan,
 Could you please check whether this patch address RSE issue?

 Yes, Intel QA team is doing the test in the meantime.


 Thanks,
 -Anthony

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On
 Behalf Of Xu, Anthony
 Sent: 2006?4?28? 9:48
 To: Tristan Gingold; xen-ia64-devel@lists.xensource.com;
 Magenheimer, Dan (HP
 Labs Fort Collins); Alex Williamson
 Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On
 Behalf Of Tristan
 Gingold
 Sent: 2006?4?27? 23:14
 To: xen-ia64-devel@lists.xensource.com; Magenheimer, Dan
 (HP Labs Fort
 Collins); Alex Williamson
 Subject: [Xen-ia64-devel] PATCH: slightly improve stability
 
 Hi,
 
 as reported earlier, this patch seems to improve stability:
 crashes are at
 least more coherent and maybe less frequent.
 
 RSE handling seems to have a bug: crahes are now due to
 either a bad value in
 a stacked register or a use of an invalid stacked register
 (although cfm
 seems correct in gdb!)
 
 I'm looking at this too,
 Yes there is a bug about handle_lazy_cover.
 
 void ia64_do_page_fault (unsigned long address, unsigned
 long isr, struct
 pt_regs *regs, unsigned long itir)
 {
 unsigned long iip = regs-cr_iip, iha;
 // FIXME should validate address here
 unsigned long pteval;
 unsigned long is_data = !((isr  IA64_ISR_X_BIT)  1UL);
 IA64FAULT fault;
 
 if ((isr  IA64_ISR_IR)  handle_lazy_cover(current,
 isr, regs)) return;
 
 This code sequence is intended to handle following scenario.
 
 1. Guest executes br.ret, this may cause mandatory RSE load,
 and this load may
 cause TLB miss.
 2. VMM gets control, but VMM can't handle this TLB miss
 itself, then VMM injects
 TLB miss to Guest TLB miss handler, when VMM executing rfi
 to jump to Guest
 TLB miss handler, this TLB miss happens again.
 3. At this time, interrupt_collection_enabled is 0, so
 handle_lazy_cover
 executes cover on behalf of Guest, and return to Guest TLB
 miss handler again,
 this time there is no TLB miss.
 
 
 Following code sequence is in ia64_leave_kernel path with
 psr.ic and psr.i off.
 When br.ret.dptk.many b0 is executed, there may be a
 mandatory load, thus
 There may be a tlb miss, according to above description
 handle_lazy_cover
 executes cover on behalf of Guest and return to Guest,
 this is no correct
 in this scenario.
 
 I didn't find an easy way to fix this bug.
 
 
 mov loc6=0
 mov loc7=0
 (pRecurse) br.call.dptk.few b0=rse_clear_invalid
 ;;
 mov loc8=0
 mov loc9=0
 cmp.ne pReturn,p0=r0,in1// if recursion count
 != 0, we need to do a
 br.ret
 mov loc10=0
 mov loc11=0
 (pReturn) br.ret.dptk.many b0
 #endif /* !CONFIG_ITANIUM */
 #   undef pRecurse
 #   undef pReturn
 ;;
 alloc r17=ar.pfs,0,0,0,0// drop current register frame
 ;;
 loadrs
 
 Thanks,
 Anthony
 
 
 
 Tested by doing many linux kernel compilation in SMP domU ( 100).
 
 Tristan.
 
 ___
 Xen-ia64-devel mailing list
 Xen-ia64-devel@lists.xensource.com
 http://lists.xensource.com/xen-ia64-devel



RSE_incomplete_cfm.patch
Description: RSE_incomplete_cfm.patch
___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel

RE: [Xen-ia64-devel] PATCH: slightly improve stability

2006-04-29 Thread Magenheimer, Dan (HP Labs Fort Collins)
Hi Anthony --

With both Tristan's stability patch and your earlier patch,
I have completed 103 linux compiles now with no segfaults
yet.   Did you see your segfault with Tristan's patch
included?

I'll continue running over the weekend with the bits I
have but if I see a segfault I will add in the additional
store in Xen entry (minstate.h) from your newer patch.

Dan

 -Original Message-
 From: Xu, Anthony [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, April 29, 2006 12:03 AM
 To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold; 
 xen-ia64-devel@lists.xensource.com; Williamson, Alex (Linux 
 Kernel Dev)
 Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
 Hi Dan,
 
 Yes, we also got a segmentation fault in 1 run out of 30.
 
 Could you please try this new patch?
 
 Thanks,
 -Anthony 
 
 -Original Message-
 From: Magenheimer, Dan (HP Labs Fort Collins) 
 [mailto:[EMAIL PROTECTED]
 Sent: 2006年4月28日 22:49
 To: Xu, Anthony; Tristan Gingold; xen-ia64-devel@lists.xensource.com;
 Williamson, Alex (Linux Kernel Dev)
 Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
 Hi Anthony --
 
 I tried your patch overnight and still got a segmentation
 fault in 1 run out of 50.  I didn't try Tristan's patch yet,
 so will try both at the same time next... perhaps there
 are two different problems that show up as the segmentation
 fault.
 
 Dan
 
  -Original Message-
  From: Xu, Anthony [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 27, 2006 9:19 PM
  To: Xu, Anthony; Tristan Gingold;
  xen-ia64-devel@lists.xensource.com; Magenheimer, Dan (HP Labs
  Fort Collins); Williamson, Alex (Linux Kernel Dev)
  Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
  Hi Tristan,
  Could you please check whether this patch address RSE issue?
 
  Yes, Intel QA team is doing the test in the meantime.
 
 
  Thanks,
  -Anthony
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On
  Behalf Of Xu, Anthony
  Sent: 2006?4?28? 9:48
  To: Tristan Gingold; xen-ia64-devel@lists.xensource.com;
  Magenheimer, Dan (HP
  Labs Fort Collins); Alex Williamson
  Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
  
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On
  Behalf Of Tristan
  Gingold
  Sent: 2006?4?27? 23:14
  To: xen-ia64-devel@lists.xensource.com; Magenheimer, Dan
  (HP Labs Fort
  Collins); Alex Williamson
  Subject: [Xen-ia64-devel] PATCH: slightly improve stability
  
  Hi,
  
  as reported earlier, this patch seems to improve stability:
  crashes are at
  least more coherent and maybe less frequent.
  
  RSE handling seems to have a bug: crahes are now due to
  either a bad value in
  a stacked register or a use of an invalid stacked register
  (although cfm
  seems correct in gdb!)
  
  I'm looking at this too,
  Yes there is a bug about handle_lazy_cover.
  
  void ia64_do_page_fault (unsigned long address, unsigned
  long isr, struct
  pt_regs *regs, unsigned long itir)
  {
unsigned long iip = regs-cr_iip, iha;
// FIXME should validate address here
unsigned long pteval;
unsigned long is_data = !((isr  IA64_ISR_X_BIT)  1UL);
IA64FAULT fault;
  
if ((isr  IA64_ISR_IR)  handle_lazy_cover(current,
  isr, regs)) return;
  
  This code sequence is intended to handle following scenario.
  
  1. Guest executes br.ret, this may cause mandatory RSE load,
  and this load may
  cause TLB miss.
  2. VMM gets control, but VMM can't handle this TLB miss
  itself, then VMM injects
  TLB miss to Guest TLB miss handler, when VMM executing rfi
  to jump to Guest
  TLB miss handler, this TLB miss happens again.
  3. At this time, interrupt_collection_enabled is 0, so
  handle_lazy_cover
  executes cover on behalf of Guest, and return to Guest TLB
  miss handler again,
  this time there is no TLB miss.
  
  
  Following code sequence is in ia64_leave_kernel path with
  psr.ic and psr.i off.
  When br.ret.dptk.many b0 is executed, there may be a
  mandatory load, thus
  There may be a tlb miss, according to above description
  handle_lazy_cover
  executes cover on behalf of Guest and return to Guest,
  this is no correct
  in this scenario.
  
  I didn't find an easy way to fix this bug.
  
  
mov loc6=0
mov loc7=0
  (pRecurse) br.call.dptk.few b0=rse_clear_invalid
;;
mov loc8=0
mov loc9=0
cmp.ne pReturn,p0=r0,in1// if recursion count
  != 0, we need to do a
  br.ret
mov loc10=0
mov loc11=0
  (pReturn) br.ret.dptk.many b0
  #endif /* !CONFIG_ITANIUM */
  # undef pRecurse
  # undef pReturn
;;
alloc r17=ar.pfs,0,0,0,0// drop current register frame
;;
loadrs
  
  Thanks,
  Anthony
  
  
  
  Tested by doing many linux kernel compilation in SMP 
 domU ( 100).
  
  Tristan.
  
  ___
  Xen-ia64-devel mailing list
  Xen-ia64-devel@lists.xensource.com
  http://lists.xensource.com/xen-ia64-devel

RE: [Xen-ia64-devel] PATCH: slightly improve stability

2006-04-29 Thread Magenheimer, Dan (HP Labs Fort Collins)
Argh!  After 103 successful linux compiles, two of the
next 10 had a segfault.  Restarting again with Anthony's
updated patch (plus Tristan's stability patch)... 

 -Original Message-
 From: Magenheimer, Dan (HP Labs Fort Collins) 
 Sent: Saturday, April 29, 2006 7:58 AM
 To: 'Xu, Anthony'; Tristan Gingold; 
 xen-ia64-devel@lists.xensource.com; Williamson, Alex (Linux 
 Kernel Dev)
 Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
 Hi Anthony --
 
 With both Tristan's stability patch and your earlier patch,
 I have completed 103 linux compiles now with no segfaults
 yet.   Did you see your segfault with Tristan's patch
 included?
 
 I'll continue running over the weekend with the bits I
 have but if I see a segfault I will add in the additional
 store in Xen entry (minstate.h) from your newer patch.
 
 Dan
 
  -Original Message-
  From: Xu, Anthony [mailto:[EMAIL PROTECTED] 
  Sent: Saturday, April 29, 2006 12:03 AM
  To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold; 
  xen-ia64-devel@lists.xensource.com; Williamson, Alex (Linux 
  Kernel Dev)
  Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
  
  Hi Dan,
  
  Yes, we also got a segmentation fault in 1 run out of 30.
  
  Could you please try this new patch?
  
  Thanks,
  -Anthony 
  
  -Original Message-
  From: Magenheimer, Dan (HP Labs Fort Collins) 
  [mailto:[EMAIL PROTECTED]
  Sent: 2006年4月28日 22:49
  To: Xu, Anthony; Tristan Gingold; 
 xen-ia64-devel@lists.xensource.com;
  Williamson, Alex (Linux Kernel Dev)
  Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
  
  Hi Anthony --
  
  I tried your patch overnight and still got a segmentation
  fault in 1 run out of 50.  I didn't try Tristan's patch yet,
  so will try both at the same time next... perhaps there
  are two different problems that show up as the segmentation
  fault.
  
  Dan
  
   -Original Message-
   From: Xu, Anthony [mailto:[EMAIL PROTECTED]
   Sent: Thursday, April 27, 2006 9:19 PM
   To: Xu, Anthony; Tristan Gingold;
   xen-ia64-devel@lists.xensource.com; Magenheimer, Dan (HP Labs
   Fort Collins); Williamson, Alex (Linux Kernel Dev)
   Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
  
   Hi Tristan,
   Could you please check whether this patch address RSE issue?
  
   Yes, Intel QA team is doing the test in the meantime.
  
  
   Thanks,
   -Anthony
  
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On
   Behalf Of Xu, Anthony
   Sent: 2006?4?28? 9:48
   To: Tristan Gingold; xen-ia64-devel@lists.xensource.com;
   Magenheimer, Dan (HP
   Labs Fort Collins); Alex Williamson
   Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
   
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On
   Behalf Of Tristan
   Gingold
   Sent: 2006?4?27? 23:14
   To: xen-ia64-devel@lists.xensource.com; Magenheimer, Dan
   (HP Labs Fort
   Collins); Alex Williamson
   Subject: [Xen-ia64-devel] PATCH: slightly improve stability
   
   Hi,
   
   as reported earlier, this patch seems to improve stability:
   crashes are at
   least more coherent and maybe less frequent.
   
   RSE handling seems to have a bug: crahes are now due to
   either a bad value in
   a stacked register or a use of an invalid stacked register
   (although cfm
   seems correct in gdb!)
   
   I'm looking at this too,
   Yes there is a bug about handle_lazy_cover.
   
   void ia64_do_page_fault (unsigned long address, unsigned
   long isr, struct
   pt_regs *regs, unsigned long itir)
   {
   unsigned long iip = regs-cr_iip, iha;
   // FIXME should validate address here
   unsigned long pteval;
   unsigned long is_data = !((isr  
 IA64_ISR_X_BIT)  1UL);
   IA64FAULT fault;
   
   if ((isr  IA64_ISR_IR)  handle_lazy_cover(current,
   isr, regs)) return;
   
   This code sequence is intended to handle following scenario.
   
   1. Guest executes br.ret, this may cause mandatory RSE load,
   and this load may
   cause TLB miss.
   2. VMM gets control, but VMM can't handle this TLB miss
   itself, then VMM injects
   TLB miss to Guest TLB miss handler, when VMM executing rfi
   to jump to Guest
   TLB miss handler, this TLB miss happens again.
   3. At this time, interrupt_collection_enabled is 0, so
   handle_lazy_cover
   executes cover on behalf of Guest, and return to Guest TLB
   miss handler again,
   this time there is no TLB miss.
   
   
   Following code sequence is in ia64_leave_kernel path with
   psr.ic and psr.i off.
   When br.ret.dptk.many b0 is executed, there may be a
   mandatory load, thus
   There may be a tlb miss, according to above description
   handle_lazy_cover
   executes cover on behalf of Guest and return to Guest,
   this is no correct
   in this scenario.
   
   I didn't find an easy way to fix this bug.
   
   
   mov loc6=0
   mov loc7=0
   (pRecurse) br.call.dptk.few b0

RE: [Xen-ia64-devel] PATCH: slightly improve stability

2006-04-29 Thread Xu, Anthony
With this new patch (not including Tristan's stability patch by far), we can 
Successfully finish 50 linux compiles.
We'll continue the test.

Thanks,
-Anthony 
-Original Message-
From: Magenheimer, Dan (HP Labs Fort Collins) [mailto:[EMAIL PROTECTED]
Sent: 2006年4月30日 0:13
To: Magenheimer, Dan (HP Labs Fort Collins); Xu, Anthony; Tristan Gingold;
xen-ia64-devel@lists.xensource.com; Williamson, Alex (Linux Kernel Dev)
Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability

Argh!  After 103 successful linux compiles, two of the
next 10 had a segfault.  Restarting again with Anthony's
updated patch (plus Tristan's stability patch)...

 -Original Message-
 From: Magenheimer, Dan (HP Labs Fort Collins)
 Sent: Saturday, April 29, 2006 7:58 AM
 To: 'Xu, Anthony'; Tristan Gingold;
 xen-ia64-devel@lists.xensource.com; Williamson, Alex (Linux
 Kernel Dev)
 Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability

 Hi Anthony --

 With both Tristan's stability patch and your earlier patch,
 I have completed 103 linux compiles now with no segfaults
 yet.   Did you see your segfault with Tristan's patch
 included?

 I'll continue running over the weekend with the bits I
 have but if I see a segfault I will add in the additional
 store in Xen entry (minstate.h) from your newer patch.

 Dan

  -Original Message-
  From: Xu, Anthony [mailto:[EMAIL PROTECTED]
  Sent: Saturday, April 29, 2006 12:03 AM
  To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold;
  xen-ia64-devel@lists.xensource.com; Williamson, Alex (Linux
  Kernel Dev)
  Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
  Hi Dan,
 
  Yes, we also got a segmentation fault in 1 run out of 30.
 
  Could you please try this new patch?
 
  Thanks,
  -Anthony
 
  -Original Message-
  From: Magenheimer, Dan (HP Labs Fort Collins)
  [mailto:[EMAIL PROTECTED]
  Sent: 2006年4月28日 22:49
  To: Xu, Anthony; Tristan Gingold;
 xen-ia64-devel@lists.xensource.com;
  Williamson, Alex (Linux Kernel Dev)
  Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
  
  Hi Anthony --
  
  I tried your patch overnight and still got a segmentation
  fault in 1 run out of 50.  I didn't try Tristan's patch yet,
  so will try both at the same time next... perhaps there
  are two different problems that show up as the segmentation
  fault.
  
  Dan
  
   -Original Message-
   From: Xu, Anthony [mailto:[EMAIL PROTECTED]
   Sent: Thursday, April 27, 2006 9:19 PM
   To: Xu, Anthony; Tristan Gingold;
   xen-ia64-devel@lists.xensource.com; Magenheimer, Dan (HP Labs
   Fort Collins); Williamson, Alex (Linux Kernel Dev)
   Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
  
   Hi Tristan,
   Could you please check whether this patch address RSE issue?
  
   Yes, Intel QA team is doing the test in the meantime.
  
  
   Thanks,
   -Anthony
  
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On
   Behalf Of Xu, Anthony
   Sent: 2006?4?28? 9:48
   To: Tristan Gingold; xen-ia64-devel@lists.xensource.com;
   Magenheimer, Dan (HP
   Labs Fort Collins); Alex Williamson
   Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
   
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On
   Behalf Of Tristan
   Gingold
   Sent: 2006?4?27? 23:14
   To: xen-ia64-devel@lists.xensource.com; Magenheimer, Dan
   (HP Labs Fort
   Collins); Alex Williamson
   Subject: [Xen-ia64-devel] PATCH: slightly improve stability
   
   Hi,
   
   as reported earlier, this patch seems to improve stability:
   crashes are at
   least more coherent and maybe less frequent.
   
   RSE handling seems to have a bug: crahes are now due to
   either a bad value in
   a stacked register or a use of an invalid stacked register
   (although cfm
   seems correct in gdb!)
   
   I'm looking at this too,
   Yes there is a bug about handle_lazy_cover.
   
   void ia64_do_page_fault (unsigned long address, unsigned
   long isr, struct
   pt_regs *regs, unsigned long itir)
   {
  unsigned long iip = regs-cr_iip, iha;
  // FIXME should validate address here
  unsigned long pteval;
  unsigned long is_data = !((isr 
 IA64_ISR_X_BIT)  1UL);
  IA64FAULT fault;
   
  if ((isr  IA64_ISR_IR)  handle_lazy_cover(current,
   isr, regs)) return;
   
   This code sequence is intended to handle following scenario.
   
   1. Guest executes br.ret, this may cause mandatory RSE load,
   and this load may
   cause TLB miss.
   2. VMM gets control, but VMM can't handle this TLB miss
   itself, then VMM injects
   TLB miss to Guest TLB miss handler, when VMM executing rfi
   to jump to Guest
   TLB miss handler, this TLB miss happens again.
   3. At this time, interrupt_collection_enabled is 0, so
   handle_lazy_cover
   executes cover on behalf of Guest, and return to Guest TLB
   miss handler again,
   this time there is no TLB miss.
   
   
   Following code

RE: [Xen-ia64-devel] PATCH: slightly improve stability

2006-04-29 Thread Xu, Anthony
From: Magenheimer, Dan 
Sent: 2006年4月29日 21:58
To: Xu, Anthony; Tristan Gingold; xen-ia64-devel@lists.xensource.com;
Williamson, Alex (Linux Kernel Dev)
Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability

Hi Anthony --

With both Tristan's stability patch and your earlier patch,
I have completed 103 linux compiles now with no segfaults
yet.   Did you see your segfault with Tristan's patch
included?

I'll continue running over the weekend with the bits I
have but if I see a segfault I will add in the additional
store in Xen entry (minstate.h) from your newer patch.


--- a/linux-2.6-xen-sparse/arch/ia64/xen/xenminstate.h  Thu Apr 27 02:55:42 2006
+++ b/linux-2.6-xen-sparse/arch/ia64/xen/xenminstate.h  Sat Apr 29 13:14:58 2006
@@ -155,6 +155,8 @@
;;  
\
ld4 r30=[r8];   
\
;;  
\
+   /* set XSI_INCOMPL_REGFR 0 */   
\
+   st4 [r8]=r0;
\
cmp.eq  p6,p7=r30,r0;   
\
;; /* not sure if this stop bit is necessary */ 
\
 (p6)   adds r8=XSI_PRECOVER_IFS-XSI_INCOMPL_REGFR,r8;  

The additional store is necessary.

In theory, after Guest executes cover, incomplete frame changes to complete 
frame. So Guest should set INCOMPL to 0 just after cover. At least before 
guest 
psr.ic and psr.i are turned on.

Previously, only when Guest executes rfi, INCOMPL is set to 0. The window 
between cover and rfi causes trouble in below scenario.

1. Application A calls system call.

2. In OS breaks handler entry, INCOMPL is 0. Due to its system call, 
   Linux kernel doesn't execute cover.

3. Before returning to Application A, schedule happens, Application B begins
   to run.

4. A TLB miss happens on the context of B, this may make INCOMPL 1, before 
   Returning to B, (that means rfi is not executed, and INCOMPL is still 1)
  schedule happens again. A resumes to run with INCOMPL 1 (this is incorrect 
now).

5. As mentioned before, this is system call, cover is executed in 
   ia64_leave_kernel path.  Because INCOMPL is 1, this cover is not actually
   executed, but this cover should be executed.

5. Thus application A's frame is destroyed. Issue appears.


I did catch this scenario.

Thanks,
Anthony


Dan

 -Original Message-
 From: Xu, Anthony [mailto:[EMAIL PROTECTED]
 Sent: Saturday, April 29, 2006 12:03 AM
 To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold;
 xen-ia64-devel@lists.xensource.com; Williamson, Alex (Linux
 Kernel Dev)
 Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability

 Hi Dan,

 Yes, we also got a segmentation fault in 1 run out of 30.

 Could you please try this new patch?

 Thanks,
 -Anthony

 -Original Message-
 From: Magenheimer, Dan (HP Labs Fort Collins)
 [mailto:[EMAIL PROTECTED]
 Sent: 2006年4月28日 22:49
 To: Xu, Anthony; Tristan Gingold; xen-ia64-devel@lists.xensource.com;
 Williamson, Alex (Linux Kernel Dev)
 Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
 Hi Anthony --
 
 I tried your patch overnight and still got a segmentation
 fault in 1 run out of 50.  I didn't try Tristan's patch yet,
 so will try both at the same time next... perhaps there
 are two different problems that show up as the segmentation
 fault.
 
 Dan
 
  -Original Message-
  From: Xu, Anthony [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 27, 2006 9:19 PM
  To: Xu, Anthony; Tristan Gingold;
  xen-ia64-devel@lists.xensource.com; Magenheimer, Dan (HP Labs
  Fort Collins); Williamson, Alex (Linux Kernel Dev)
  Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
  Hi Tristan,
  Could you please check whether this patch address RSE issue?
 
  Yes, Intel QA team is doing the test in the meantime.
 
 
  Thanks,
  -Anthony
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On
  Behalf Of Xu, Anthony
  Sent: 2006?4?28? 9:48
  To: Tristan Gingold; xen-ia64-devel@lists.xensource.com;
  Magenheimer, Dan (HP
  Labs Fort Collins); Alex Williamson
  Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
  
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On
  Behalf Of Tristan
  Gingold
  Sent: 2006?4?27? 23:14
  To: xen-ia64-devel@lists.xensource.com; Magenheimer, Dan
  (HP Labs Fort
  Collins); Alex Williamson
  Subject: [Xen-ia64-devel] PATCH: slightly improve stability
  
  Hi,
  
  as reported earlier, this patch seems to improve stability:
  crashes are at
  least more coherent and maybe less frequent.
  
  RSE handling seems to have a bug: crahes are now due to
  either a bad

RE: [Xen-ia64-devel] PATCH: slightly improve stability

2006-04-28 Thread Magenheimer, Dan (HP Labs Fort Collins)
Hi Anthony --

I tried your patch overnight and still got a segmentation
fault in 1 run out of 50.  I didn't try Tristan's patch yet,
so will try both at the same time next... perhaps there
are two different problems that show up as the segmentation
fault.

Dan 

 -Original Message-
 From: Xu, Anthony [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, April 27, 2006 9:19 PM
 To: Xu, Anthony; Tristan Gingold; 
 xen-ia64-devel@lists.xensource.com; Magenheimer, Dan (HP Labs 
 Fort Collins); Williamson, Alex (Linux Kernel Dev)
 Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
 Hi Tristan,
 Could you please check whether this patch address RSE issue?
 
 Yes, Intel QA team is doing the test in the meantime.
 
 
 Thanks,
 -Anthony 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On 
 Behalf Of Xu, Anthony
 Sent: 2006?4?28? 9:48
 To: Tristan Gingold; xen-ia64-devel@lists.xensource.com; 
 Magenheimer, Dan (HP
 Labs Fort Collins); Alex Williamson
 Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
 
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On 
 Behalf Of Tristan
 Gingold
 Sent: 2006?4?27? 23:14
 To: xen-ia64-devel@lists.xensource.com; Magenheimer, Dan 
 (HP Labs Fort
 Collins); Alex Williamson
 Subject: [Xen-ia64-devel] PATCH: slightly improve stability
 
 Hi,
 
 as reported earlier, this patch seems to improve stability: 
 crashes are at
 least more coherent and maybe less frequent.
 
 RSE handling seems to have a bug: crahes are now due to 
 either a bad value in
 a stacked register or a use of an invalid stacked register 
 (although cfm
 seems correct in gdb!)
 
 I'm looking at this too,
 Yes there is a bug about handle_lazy_cover.
 
 void ia64_do_page_fault (unsigned long address, unsigned 
 long isr, struct
 pt_regs *regs, unsigned long itir)
 {
  unsigned long iip = regs-cr_iip, iha;
  // FIXME should validate address here
  unsigned long pteval;
  unsigned long is_data = !((isr  IA64_ISR_X_BIT)  1UL);
  IA64FAULT fault;
 
  if ((isr  IA64_ISR_IR)  handle_lazy_cover(current, 
 isr, regs)) return;
 
 This code sequence is intended to handle following scenario.
 
 1. Guest executes br.ret, this may cause mandatory RSE load, 
 and this load may
 cause TLB miss.
 2. VMM gets control, but VMM can't handle this TLB miss 
 itself, then VMM injects
 TLB miss to Guest TLB miss handler, when VMM executing rfi 
 to jump to Guest
 TLB miss handler, this TLB miss happens again.
 3. At this time, interrupt_collection_enabled is 0, so 
 handle_lazy_cover
 executes cover on behalf of Guest, and return to Guest TLB 
 miss handler again,
 this time there is no TLB miss.
 
 
 Following code sequence is in ia64_leave_kernel path with 
 psr.ic and psr.i off.
 When br.ret.dptk.many b0 is executed, there may be a 
 mandatory load, thus
 There may be a tlb miss, according to above description 
 handle_lazy_cover
 executes cover on behalf of Guest and return to Guest, 
 this is no correct
 in this scenario.
 
 I didn't find an easy way to fix this bug.
 
 
  mov loc6=0
  mov loc7=0
 (pRecurse) br.call.dptk.few b0=rse_clear_invalid
  ;;
  mov loc8=0
  mov loc9=0
  cmp.ne pReturn,p0=r0,in1// if recursion count 
 != 0, we need to do a
 br.ret
  mov loc10=0
  mov loc11=0
 (pReturn) br.ret.dptk.many b0
 #endif /* !CONFIG_ITANIUM */
 #undef pRecurse
 #undef pReturn
  ;;
  alloc r17=ar.pfs,0,0,0,0// drop current register frame
  ;;
  loadrs
 
 Thanks,
 Anthony
 
 
 
 Tested by doing many linux kernel compilation in SMP domU ( 100).
 
 Tristan.
 
 ___
 Xen-ia64-devel mailing list
 Xen-ia64-devel@lists.xensource.com
 http://lists.xensource.com/xen-ia64-devel
 

___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


RE: [Xen-ia64-devel] PATCH: slightly improve stability

2006-04-27 Thread Xu, Anthony
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tristan
Gingold
Sent: 2006?4?27? 23:14
To: xen-ia64-devel@lists.xensource.com; Magenheimer, Dan (HP Labs Fort
Collins); Alex Williamson
Subject: [Xen-ia64-devel] PATCH: slightly improve stability

Hi,

as reported earlier, this patch seems to improve stability: crashes are at
least more coherent and maybe less frequent.

RSE handling seems to have a bug: crahes are now due to either a bad value in
a stacked register or a use of an invalid stacked register (although cfm
seems correct in gdb!)

I'm looking at this too,
Yes there is a bug about handle_lazy_cover.

void ia64_do_page_fault (unsigned long address, unsigned long isr, struct 
pt_regs *regs, unsigned long itir)
{
unsigned long iip = regs-cr_iip, iha;
// FIXME should validate address here
unsigned long pteval;
unsigned long is_data = !((isr  IA64_ISR_X_BIT)  1UL);
IA64FAULT fault;

if ((isr  IA64_ISR_IR)  handle_lazy_cover(current, isr, regs)) 
return;

This code sequence is intended to handle following scenario.

1. Guest executes br.ret, this may cause mandatory RSE load, and this load may 
cause TLB miss.
2. VMM gets control, but VMM can't handle this TLB miss itself, then VMM 
injects 
TLB miss to Guest TLB miss handler, when VMM executing rfi to jump to Guest 
TLB miss handler, this TLB miss happens again.
3. At this time, interrupt_collection_enabled is 0, so handle_lazy_cover 
executes cover on behalf of Guest, and return to Guest TLB miss handler 
again, this time there is no TLB miss.

   
Following code sequence is in ia64_leave_kernel path with psr.ic and psr.i off.
When br.ret.dptk.many b0 is executed, there may be a mandatory load, thus
There may be a tlb miss, according to above description handle_lazy_cover 
executes cover on behalf of Guest and return to Guest, this is no correct
in this scenario.

I didn't find an easy way to fix this bug. 


mov loc6=0
mov loc7=0
(pRecurse) br.call.dptk.few b0=rse_clear_invalid
;;
mov loc8=0
mov loc9=0
cmp.ne pReturn,p0=r0,in1// if recursion count != 0, we need to 
do a br.ret
mov loc10=0
mov loc11=0
(pReturn) br.ret.dptk.many b0
#endif /* !CONFIG_ITANIUM */
#   undef pRecurse
#   undef pReturn
;;
alloc r17=ar.pfs,0,0,0,0// drop current register frame
;;
loadrs

Thanks,
Anthony



Tested by doing many linux kernel compilation in SMP domU ( 100).

Tristan.

___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


RE: [Xen-ia64-devel] PATCH: slightly improve stability

2006-04-27 Thread Xu, Anthony
Hi Tristan,
Could you please check whether this patch address RSE issue?

Yes, Intel QA team is doing the test in the meantime.


Thanks,
-Anthony 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Xu, Anthony
Sent: 2006?4?28? 9:48
To: Tristan Gingold; xen-ia64-devel@lists.xensource.com; Magenheimer, Dan (HP
Labs Fort Collins); Alex Williamson
Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tristan
Gingold
Sent: 2006?4?27? 23:14
To: xen-ia64-devel@lists.xensource.com; Magenheimer, Dan (HP Labs Fort
Collins); Alex Williamson
Subject: [Xen-ia64-devel] PATCH: slightly improve stability

Hi,

as reported earlier, this patch seems to improve stability: crashes are at
least more coherent and maybe less frequent.

RSE handling seems to have a bug: crahes are now due to either a bad value in
a stacked register or a use of an invalid stacked register (although cfm
seems correct in gdb!)

I'm looking at this too,
Yes there is a bug about handle_lazy_cover.

void ia64_do_page_fault (unsigned long address, unsigned long isr, struct
pt_regs *regs, unsigned long itir)
{
   unsigned long iip = regs-cr_iip, iha;
   // FIXME should validate address here
   unsigned long pteval;
   unsigned long is_data = !((isr  IA64_ISR_X_BIT)  1UL);
   IA64FAULT fault;

   if ((isr  IA64_ISR_IR)  handle_lazy_cover(current, isr, regs)) 
 return;

This code sequence is intended to handle following scenario.

1. Guest executes br.ret, this may cause mandatory RSE load, and this load may
cause TLB miss.
2. VMM gets control, but VMM can't handle this TLB miss itself, then VMM 
injects
TLB miss to Guest TLB miss handler, when VMM executing rfi to jump to Guest
TLB miss handler, this TLB miss happens again.
3. At this time, interrupt_collection_enabled is 0, so handle_lazy_cover
executes cover on behalf of Guest, and return to Guest TLB miss handler 
again,
this time there is no TLB miss.


Following code sequence is in ia64_leave_kernel path with psr.ic and psr.i off.
When br.ret.dptk.many b0 is executed, there may be a mandatory load, thus
There may be a tlb miss, according to above description handle_lazy_cover
executes cover on behalf of Guest and return to Guest, this is no correct
in this scenario.

I didn't find an easy way to fix this bug.


   mov loc6=0
   mov loc7=0
(pRecurse) br.call.dptk.few b0=rse_clear_invalid
   ;;
   mov loc8=0
   mov loc9=0
   cmp.ne pReturn,p0=r0,in1// if recursion count != 0, we need to 
 do a
br.ret
   mov loc10=0
   mov loc11=0
(pReturn) br.ret.dptk.many b0
#endif /* !CONFIG_ITANIUM */
#  undef pRecurse
#  undef pReturn
   ;;
   alloc r17=ar.pfs,0,0,0,0// drop current register frame
   ;;
   loadrs

Thanks,
Anthony



Tested by doing many linux kernel compilation in SMP domU ( 100).

Tristan.

___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


rse.patch
Description: rse.patch
___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel