subject:"Re\: \[Xen\-devel\] Xen 4.5 random freeze question"

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-20 Thread Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 19 лист. 2014 20:32, користувач Stefano Stabellini 
 stefano.stabell...@eu.citrix.com написав:
 
  On Wed, 19 Nov 2014, Julien Grall wrote:
   On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
That's right, the maintenance interrupt handler is not called, but it
doesn't do anything so we are fine. The important thing is that an
interrupt is sent and git_clear_lrs gets called on hypervisor entry.
  
   It would be worth to write down this somewhere. Just in case someone
   decide to add code in maintenance interrupt later.
 
  Yes, I could add a comment in the handler
 
 Maybe it wouldn't take a lot of effort to fix it? I am just worrying that we 
 may hide some issue -
 typically spurious interrupt this not what is expected.

My guess is that by clearing UIE before reading GICC_IAR, we clear the
maintenance interrupt too, as a consequence the following read to
GICC_IAR would return 1023 (nothing to be read). As bit as if the
maintenance interrupt was a level interrupt and we just disabled it.

So I think that if we cleared UIE after reading GICC_IAR, GICC_IAR would
return the correct value.

However with the current structure of the code, the first thing that we
do upon entering the hypervisor is clearing LRs and given what happened
on your platform I think is a good idea to do it with UIE disabled.

This is way I would rather read spurious interrupts but read/write LRs
with UIE disabled than reading maintenance interrupts but risking
strange behaviours on some platforms.___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-20 Thread Julien Grall

On 11/20/2014 10:28 AM, Stefano Stabellini wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 19 лист. 2014 20:32, користувач Stefano Stabellini 
 stefano.stabell...@eu.citrix.com написав:

 On Wed, 19 Nov 2014, Julien Grall wrote:
 On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
 That's right, the maintenance interrupt handler is not called, but it
 doesn't do anything so we are fine. The important thing is that an
 interrupt is sent and git_clear_lrs gets called on hypervisor entry.

 It would be worth to write down this somewhere. Just in case someone
 decide to add code in maintenance interrupt later.

 Yes, I could add a comment in the handler

 Maybe it wouldn't take a lot of effort to fix it? I am just worrying that we 
 may hide some issue -
 typically spurious interrupt this not what is expected.
 
 My guess is that by clearing UIE before reading GICC_IAR, we clear the
 maintenance interrupt too, as a consequence the following read to
 GICC_IAR would return 1023 (nothing to be read). As bit as if the
 maintenance interrupt was a level interrupt and we just disabled it.
 
 So I think that if we cleared UIE after reading GICC_IAR, GICC_IAR would
 return the correct value.
 
 However with the current structure of the code, the first thing that we
 do upon entering the hypervisor is clearing LRs and given what happened
 on your platform I think is a good idea to do it with UIE disabled.

Agreed. UIE should be disabled to avoid another maintenance interrupt as
soon as we EOI the IRQ.

 This is way I would rather read spurious interrupts but read/write LRs
 with UIE disabled than reading maintenance interrupts but risking
 strange behaviours on some platforms.

Reading the GIC-v2 documentation, the spurious interrupt things should
happen on any platform every time the UIE is disabled while we receive a
maintenance interrupt.

The read returns a spurious interrupt ID of 1023 if any of the
following apply:

- no pending interrupt on the CPU interface has sufficient priority for
the interface to signal it to the processor

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-20 Thread Andrii Tseglytskyi

I think I'll debug this a bit later - unfortunately, now don't have
time for this. But I want to get rid of spurious interrupt here.

BTW - Stefano are you going to post the patch that we created
yesterday ? Will Ian accept it?

Regards,
Andrii

On Thu, Nov 20, 2014 at 1:15 PM, Julien Grall julien.gr...@linaro.org wrote:
 On 11/20/2014 10:28 AM, Stefano Stabellini wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 19 лист. 2014 20:32, користувач Stefano Stabellini 
 stefano.stabell...@eu.citrix.com написав:

 On Wed, 19 Nov 2014, Julien Grall wrote:
 On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
 That's right, the maintenance interrupt handler is not called, but it
 doesn't do anything so we are fine. The important thing is that an
 interrupt is sent and git_clear_lrs gets called on hypervisor entry.

 It would be worth to write down this somewhere. Just in case someone
 decide to add code in maintenance interrupt later.

 Yes, I could add a comment in the handler

 Maybe it wouldn't take a lot of effort to fix it? I am just worrying that 
 we may hide some issue -
 typically spurious interrupt this not what is expected.

 My guess is that by clearing UIE before reading GICC_IAR, we clear the
 maintenance interrupt too, as a consequence the following read to
 GICC_IAR would return 1023 (nothing to be read). As bit as if the
 maintenance interrupt was a level interrupt and we just disabled it.

 So I think that if we cleared UIE after reading GICC_IAR, GICC_IAR would
 return the correct value.

 However with the current structure of the code, the first thing that we
 do upon entering the hypervisor is clearing LRs and given what happened
 on your platform I think is a good idea to do it with UIE disabled.

 Agreed. UIE should be disabled to avoid another maintenance interrupt as
 soon as we EOI the IRQ.

 This is way I would rather read spurious interrupts but read/write LRs
 with UIE disabled than reading maintenance interrupts but risking
 strange behaviours on some platforms.

 Reading the GIC-v2 documentation, the spurious interrupt things should
 happen on any platform every time the UIE is disabled while we receive a
 maintenance interrupt.

 The read returns a spurious interrupt ID of 1023 if any of the
 following apply:

 - no pending interrupt on the CPU interface has sufficient priority for
 the interface to signal it to the processor

 --
 Julien Grall



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-20 Thread Andrii Tseglytskyi

OK - I see. thanks a lot.

On Thu, Nov 20, 2014 at 6:15 PM, Stefano Stabellini 
stefano.stabell...@eu.citrix.com wrote:

 Already posted:

 http://marc.info/?l=xen-develm=141648092100568

 Ian hasn't provided any feedback yet.

 On Thu, 20 Nov 2014, Andrii Tseglytskyi wrote:
  I think I'll debug this a bit later - unfortunately, now don't have
  time for this. But I want to get rid of spurious interrupt here.
 
  BTW - Stefano are you going to post the patch that we created
  yesterday ? Will Ian accept it?
 
  Regards,
  Andrii
 
  On Thu, Nov 20, 2014 at 1:15 PM, Julien Grall julien.gr...@linaro.org
 wrote:
   On 11/20/2014 10:28 AM, Stefano Stabellini wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   19 лист. 2014 20:32, користувач Stefano Stabellini 
 stefano.stabell...@eu.citrix.com написав:
  
   On Wed, 19 Nov 2014, Julien Grall wrote:
   On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
   That's right, the maintenance interrupt handler is not called,
 but it
   doesn't do anything so we are fine. The important thing is that an
   interrupt is sent and git_clear_lrs gets called on hypervisor
 entry.
  
   It would be worth to write down this somewhere. Just in case
 someone
   decide to add code in maintenance interrupt later.
  
   Yes, I could add a comment in the handler
  
   Maybe it wouldn't take a lot of effort to fix it? I am just worrying
 that we may hide some issue -
   typically spurious interrupt this not what is expected.
  
   My guess is that by clearing UIE before reading GICC_IAR, we clear
 the
   maintenance interrupt too, as a consequence the following read to
   GICC_IAR would return 1023 (nothing to be read). As bit as if the
   maintenance interrupt was a level interrupt and we just disabled it.
  
   So I think that if we cleared UIE after reading GICC_IAR, GICC_IAR
 would
   return the correct value.
  
   However with the current structure of the code, the first thing that
 we
   do upon entering the hypervisor is clearing LRs and given what
 happened
   on your platform I think is a good idea to do it with UIE disabled.
  
   Agreed. UIE should be disabled to avoid another maintenance interrupt
 as
   soon as we EOI the IRQ.
  
   This is way I would rather read spurious interrupts but read/write LRs
   with UIE disabled than reading maintenance interrupts but risking
   strange behaviours on some platforms.
  
   Reading the GIC-v2 documentation, the spurious interrupt things should
   happen on any platform every time the UIE is disabled while we receive
 a
   maintenance interrupt.
  
   The read returns a spurious interrupt ID of 1023 if any of the
   following apply:
  
   - no pending interrupt on the CPU interface has sufficient priority for
   the interface to signal it to the processor
  
   --
   Julien Grall
 
 
 
  --
 
  Andrii Tseglytskyi | Embedded Dev
  GlobalLogic
  www.globallogic.com
 




-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi

Hi Stefano,

Thank you for your support.

You are right - with latest change you've proposed I got a continuous
prints during platform hang:

(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0

Looks line issue needs further deeper debugging.

Regards,
Andrii

On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 Hello Andrii,
 we are getting closer :-)

 It would help if you post the output with GIC_DEBUG defined but without
 the other change that fixes the issue.

 I think the problem is probably due to software irqs.
 You are getting too many

 gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending

 messages. That means you are loosing virtual SGIs (guest VCPU to guest
 VCPU). It would be best to investigate why, especially if you get many
 more of the same messages without the MAINTENANCE_IRQ change I
 suggested.

 This patch might also help understading the problem more:


 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index b7516c0..5eaeca2 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
  list_for_each_entry_safe ( p, t, v-arch.vgic.lr_pending, lr_queue )
  {
  i = find_first_zero_bit(this_cpu(lr_mask), nr_lrs);
 -if ( i = nr_lrs ) return;
 +if ( i = nr_lrs )
 +{
 +gdprintk(XENLOG_DEBUG, LRs full, not injecting irq=%u into 
 d%dv%d\n,
 +p-irq, v-domain-domain_id, v-vcpu_id);
 +continue;
 +}

  spin_lock_irqsave(gic.lock, flags);
  gic_set_lr(i, p, GICH_LR_PENDING);




 On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 No hangs with this change.
 Complete log is the following:

 U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
 DRA752 ES1.0
 ethaddr not set. Validating first E-fuse MAC
 cpsw
 - UART enabled -
 - CPU  booting -
 - Xen starting in Hyp mode -
 - Zero BSS -
 - Setting up control registers -
 - Turning on paging -
 - Ready -
 (XEN) Checking for initrd in /chosen
 (XEN) RAM: 8000 - 9fff
 (XEN) RAM: a000 - bfff
 (XEN) RAM: c000 - dfff
 (XEN)
 (XEN) MODULE[1]: c200 - c20069aa
 (XEN) MODULE[2]: c000 - c200
 (XEN) MODULE[3]:  - 
 (XEN) MODULE[4]: c300 - c301
 (XEN)  RESVD[0]: ba30 - bfd0
 (XEN)  RESVD[1]: 9580 - 9590
 (XEN)  RESVD[2]: 98a0 - 98b0
 (XEN)  RESVD[3]: 95f0 - 98a0
 (XEN)  RESVD[4]: 9590 - 95f0
 (XEN)
 (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
 dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
 (XEN) Placing Xen at 0xdfe0-0xe000
 (XEN) Xen heap: d200-de00 (49152 pages)
 (XEN) Dom heap: 344064 pages
 (XEN) Domain heap initialised
 (XEN) Looking for UART console serial0
  Xen 4.5-unstable
 (XEN) Xen version 4.5-unstable (atseglytskyi@)
 (arm-linux-gnueabihf-gcc (crosstool-NG
 linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
 20130328 (prerelease)) debu4
 (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
 (XEN) Processor: 412fc0f2: ARM Limited, variant: 0x2, part 0xc0f, rev 0x2
 (XEN) 32-bit Execution:
 (XEN)   Processor Features: 1131:00011011
 (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
 (XEN) Extensions: GenericTimer Security
 (XEN)   Debug Features: 02010555
 (XEN)   Auxiliary Features: 
 (XEN)   Memory Model Features: 10201105 2000 0124 02102211
 (XEN)  ISA Features: 02101110 13112111 21232041 2131 10011142 
 (XEN) Platform: TI DRA7
 (XEN) /psci method must be smc, but is: hvc
 (XEN) Set AuxCoreBoot1 to dfe0004c (0020004c)
 (XEN) Set AuxCoreBoot0 to 0x20
 (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
 (XEN) Using generic timer at 6144 KHz
 (XEN) GIC initialization:
 (XEN) gic_dist_addr=48211000
 (XEN) gic_cpu_addr=48212000
 (XEN) gic_hyp_addr=48214000
 (XEN) gic_vcpu_addr=48216000
 (XEN) gic_maintenance_irq=25
 (XEN) GIC: 192 lines, 2 cpus, secure (IID 043b).
 (XEN) Using scheduler: SMP Credit Scheduler (credit)
 (XEN) I/O virtualisation disabled
 (XEN) Allocated console ring of 16 KiB.
 (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
 (XEN) Bringing up CPU1
 - CPU

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,
 
 Thank you for your support.
 
 You are right - with latest change you've proposed I got a continuous
 prints during platform hang:
 
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 
 Looks line issue needs further deeper debugging.

Cool! You could simply print what irqs are in all LRs when they are
full, for example you could call gic_dump_info. That would tell us what
is taking all the LRs space we have.

How many LRs are available on omap5 anyway?

I doubt you have so much interrupt traffic to actually fill all the LRs,
so I am thinking that a few LRs might not be cleared properly (that
should happen on hypervisor entry, gic_update_one_lr should take care of
it).


 Regards,
 Andrii
 
 On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  Hello Andrii,
  we are getting closer :-)
 
  It would help if you post the output with GIC_DEBUG defined but without
  the other change that fixes the issue.
 
  I think the problem is probably due to software irqs.
  You are getting too many
 
  gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending
 
  messages. That means you are loosing virtual SGIs (guest VCPU to guest
  VCPU). It would be best to investigate why, especially if you get many
  more of the same messages without the MAINTENANCE_IRQ change I
  suggested.
 
  This patch might also help understading the problem more:
 
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index b7516c0..5eaeca2 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
   list_for_each_entry_safe ( p, t, v-arch.vgic.lr_pending, lr_queue )
   {
   i = find_first_zero_bit(this_cpu(lr_mask), nr_lrs);
  -if ( i = nr_lrs ) return;
  +if ( i = nr_lrs )
  +{
  +gdprintk(XENLOG_DEBUG, LRs full, not injecting irq=%u into 
  d%dv%d\n,
  +p-irq, v-domain-domain_id, v-vcpu_id);
  +continue;
  +}
 
   spin_lock_irqsave(gic.lock, flags);
   gic_set_lr(i, p, GICH_LR_PENDING);
 
 
 
 
  On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  No hangs with this change.
  Complete log is the following:
 
  U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
  DRA752 ES1.0
  ethaddr not set. Validating first E-fuse MAC
  cpsw
  - UART enabled -
  - CPU  booting -
  - Xen starting in Hyp mode -
  - Zero BSS -
  - Setting up control registers -
  - Turning on paging -
  - Ready -
  (XEN) Checking for initrd in /chosen
  (XEN) RAM: 8000 - 9fff
  (XEN) RAM: a000 - bfff
  (XEN) RAM: c000 - dfff
  (XEN)
  (XEN) MODULE[1]: c200 - c20069aa
  (XEN) MODULE[2]: c000 - c200
  (XEN) MODULE[3]:  - 
  (XEN) MODULE[4]: c300 - c301
  (XEN)  RESVD[0]: ba30 - bfd0
  (XEN)  RESVD[1]: 9580 - 9590
  (XEN)  RESVD[2]: 98a0 - 98b0
  (XEN)  RESVD[3]: 95f0 - 98a0
  (XEN)  RESVD[4]: 9590 - 95f0
  (XEN)
  (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
  dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
  (XEN) Placing Xen at 0xdfe0-0xe000
  (XEN) Xen heap: d200-de00 (49152 pages)
  (XEN) Dom heap: 344064 pages
  (XEN) Domain heap initialised
  (XEN) Looking for UART console serial0
   Xen 4.5-unstable
  (XEN) Xen version 4.5-unstable (atseglytskyi@)
  (arm-linux-gnueabihf-gcc (crosstool-NG
  linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
  20130328 (prerelease)) debu4
  (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
  (XEN) Processor: 412fc0f2: ARM Limited, variant: 0x2, part 0xc0f, rev 0x2
  (XEN) 32-bit Execution:
  (XEN)   Processor Features: 1131:00011011
  (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
  (XEN) Extensions: GenericTimer Security
  (XEN)   Debug Features: 02010555
  (XEN)   Auxiliary Features: 
  (XEN)   Memory Model Features: 10201105 2000 0124 02102211
  (XEN)  ISA Features: 02101110 13112111 21232041 2131 10011142 
  (XEN) Platform: TI DRA7
  (XEN) /psci method must be smc, but is: hvc
  (XEN) Set AuxCoreBoot1 to dfe0004c (0020004c)
  (XEN) Set AuxCoreBoot0 to 0x20
  (XEN) Generic

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi

On Wed, Nov 19, 2014 at 1:12 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 Thank you for your support.

 You are right - with latest change you've proposed I got a continuous
 prints during platform hang:

 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0

 Looks line issue needs further deeper debugging.

 Cool! You could simply print what irqs are in all LRs when they are
 full, for example you could call gic_dump_info. That would tell us what
 is taking all the LRs space we have.

 How many LRs are available on omap5 anyway?

:) Already done this:


(XEN) gic.c:725:d0v0 LRs full, not injecting irq=27 nr_lrs 4 i 4 into d0v0
(XEN) GICH_LRs (vcpu 0) mask=f
(XEN)HW_LR[0]=1a1f
(XEN)HW_LR[1]=9a00e439
(XEN)HW_LR[2]=1a02
(XEN)HW_LR[3]=9a015856
(XEN) Inflight irq=31 lr=0
(XEN) Inflight irq=57 lr=1
(XEN) Inflight irq=2 lr=2
(XEN) Inflight irq=86 lr=3
(XEN) Inflight irq=27 lr=255
(XEN) Pending irq=27



 I doubt you have so much interrupt traffic to actually fill all the LRs,
 so I am thinking that a few LRs might not be cleared properly (that
 should happen on hypervisor entry, gic_update_one_lr should take care of
 it).

This actually explains why this happens during domU start - SGI
traffic might be very heavy this time



 Regards,
 Andrii

 On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  Hello Andrii,
  we are getting closer :-)
 
  It would help if you post the output with GIC_DEBUG defined but without
  the other change that fixes the issue.
 
  I think the problem is probably due to software irqs.
  You are getting too many
 
  gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still 
  lr_pending
 
  messages. That means you are loosing virtual SGIs (guest VCPU to guest
  VCPU). It would be best to investigate why, especially if you get many
  more of the same messages without the MAINTENANCE_IRQ change I
  suggested.
 
  This patch might also help understading the problem more:
 
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index b7516c0..5eaeca2 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
   list_for_each_entry_safe ( p, t, v-arch.vgic.lr_pending, lr_queue )
   {
   i = find_first_zero_bit(this_cpu(lr_mask), nr_lrs);
  -if ( i = nr_lrs ) return;
  +if ( i = nr_lrs )
  +{
  +gdprintk(XENLOG_DEBUG, LRs full, not injecting irq=%u into 
  d%dv%d\n,
  +p-irq, v-domain-domain_id, v-vcpu_id);
  +continue;
  +}
 
   spin_lock_irqsave(gic.lock, flags);
   gic_set_lr(i, p, GICH_LR_PENDING);
 
 
 
 
  On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  No hangs with this change.
  Complete log is the following:
 
  U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
  DRA752 ES1.0
  ethaddr not set. Validating first E-fuse MAC
  cpsw
  - UART enabled -
  - CPU  booting -
  - Xen starting in Hyp mode -
  - Zero BSS -
  - Setting up control registers -
  - Turning on paging -
  - Ready -
  (XEN) Checking for initrd in /chosen
  (XEN) RAM: 8000 - 9fff
  (XEN) RAM: a000 - bfff
  (XEN) RAM: c000 - dfff
  (XEN)
  (XEN) MODULE[1]: c200 - c20069aa
  (XEN) MODULE[2]: c000 - c200
  (XEN) MODULE[3]:  - 
  (XEN) MODULE[4]: c300 - c301
  (XEN)  RESVD[0]: ba30 - bfd0
  (XEN)  RESVD[1]: 9580 - 9590
  (XEN)  RESVD[2]: 98a0 - 98b0
  (XEN)  RESVD[3]: 95f0 - 98a0
  (XEN)  RESVD[4]: 9590 - 95f0
  (XEN)
  (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
  dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
  (XEN) Placing Xen at 0xdfe0-0xe000
  (XEN) Xen heap: d200-de00 (49152 pages)
  (XEN) Dom heap: 344064 pages
  (XEN) Domain heap initialised
  (XEN) Looking for UART console serial0
   Xen 4.5-unstable
  (XEN) Xen version 4.5-unstable (atseglytskyi@)
  (arm-linux-gnueabihf-gcc (crosstool-NG
  linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
  20130328 (prerelease)) debu4
  (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
  (XEN) Processor: 412fc0f2: ARM Limited, variant: 0x2,

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Ian Campbell

On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
 So it looks like there is not actually anything wrong, is just that you
 have too much inflight irqs? It should cause problems because in that
 case GICH_HCR_UIE should be set and you should get a maintenance
 interrupt when LRs become available (actually when none, or only one,
 of the List register entries is marked as a valid interrupt).
 
 Maybe GICH_HCR_UIE is the one that doesn't work properly.

How much testing did this aspect get when the no-maint-irq series
originally went in? Did you manage to find a workload which filled all
the LRs or try artificially limiting the number of LRs somehow in order
to provoke it?

I ask because my intuition is that this won't happen very much, meaning
those code paths may not be as well tested...



  It might be
 worth checking that you are receiving maintenance interrupts:
 
 
 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index b7516c0..b3eaa44 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, 
 struct cpu_user_regs *r
   * on return to guest that is going to clear the old LRs and inject
   * new interrupts.
   */
 +
 +gdprintk(XENLOG_DEBUG, maintenance interrupt\n);
  }
  
  void gic_dump_info(struct vcpu *v)
 
  
 You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you
 should still be receiving maintenance interrupts when one or more LRs
 become available.
 
 
  
   I doubt you have so much interrupt traffic to actually fill all the LRs,
   so I am thinking that a few LRs might not be cleared properly (that
   should happen on hypervisor entry, gic_update_one_lr should take care of
   it).
  
  This actually explains why this happens during domU start - SGI
  traffic might be very heavy this time
  
  
  
   Regards,
   Andrii
  
   On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
Hello Andrii,
we are getting closer :-)
   
It would help if you post the output with GIC_DEBUG defined but without
the other change that fixes the issue.
   
I think the problem is probably due to software irqs.
You are getting too many
   
gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still 
lr_pending
   
messages. That means you are loosing virtual SGIs (guest VCPU to guest
VCPU). It would be best to investigate why, especially if you get many
more of the same messages without the MAINTENANCE_IRQ change I
suggested.
   
This patch might also help understading the problem more:
   
   
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index b7516c0..5eaeca2 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu 
*v)
 list_for_each_entry_safe ( p, t, v-arch.vgic.lr_pending, 
lr_queue )
 {
 i = find_first_zero_bit(this_cpu(lr_mask), nr_lrs);
-if ( i = nr_lrs ) return;
+if ( i = nr_lrs )
+{
+gdprintk(XENLOG_DEBUG, LRs full, not injecting irq=%u 
into d%dv%d\n,
+p-irq, v-domain-domain_id, v-vcpu_id);
+continue;
+}
   
 spin_lock_irqsave(gic.lock, flags);
 gic_set_lr(i, p, GICH_LR_PENDING);
   
   
   
   
On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
Hi Stefano,
   
No hangs with this change.
Complete log is the following:
   
U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
DRA752 ES1.0
ethaddr not set. Validating first E-fuse MAC
cpsw
- UART enabled -
- CPU  booting -
- Xen starting in Hyp mode -
- Zero BSS -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) Checking for initrd in /chosen
(XEN) RAM: 8000 - 9fff
(XEN) RAM: a000 - bfff
(XEN) RAM: c000 - dfff
(XEN)
(XEN) MODULE[1]: c200 - c20069aa
(XEN) MODULE[2]: c000 - c200
(XEN) MODULE[3]:  - 
(XEN) MODULE[4]: c300 - c301
(XEN)  RESVD[0]: ba30 - bfd0
(XEN)  RESVD[1]: 9580 - 9590
(XEN)  RESVD[2]: 98a0 - 98b0
(XEN)  RESVD[3]: 95f0 - 98a0
(XEN)  RESVD[4]: 9590 - 95f0
(XEN)
(XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
(XEN) Placing Xen at 0xdfe0-0xe000
(XEN) Xen heap: d200-de00 (49152 pages)
(XEN) Dom heap: 344064 pages
(XEN) Domain heap initialised
(XEN) Looking for UART console serial0
 Xen

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi

Hi Stefano,

   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
  -GICH[GICH_HCR] |= GICH_HCR_UIE;
  +GICH[GICH_HCR] |= GICH_HCR_NPIE;
   else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +GICH[GICH_HCR] = ~GICH_HCR_NPIE;
 
   }

 Yes, exactly

I tried, hang still occurs with this change

Regards,
Andrii




-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi

Hi Julien,

On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall julien.gr...@linaro.org wrote:
 On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
 On Wed, 19 Nov 2014, Ian Campbell wrote:
 On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
 So it looks like there is not actually anything wrong, is just that you
 have too much inflight irqs? It should cause problems because in that
 case GICH_HCR_UIE should be set and you should get a maintenance
 interrupt when LRs become available (actually when none, or only one,
 of the List register entries is marked as a valid interrupt).

 Maybe GICH_HCR_UIE is the one that doesn't work properly.

 How much testing did this aspect get when the no-maint-irq series
 originally went in? Did you manage to find a workload which filled all
 the LRs or try artificially limiting the number of LRs somehow in order
 to provoke it?

 I ask because my intuition is that this won't happen very much, meaning
 those code paths may not be as well tested...

 I did test it by artificially limiting the number of LRs to 1.
 However there have been many iterations of that series and I didn't run
 this test at every iteration.

 am I the only to think this may not be related to this bug? All the LRs
 are full with IRQ of the same priority. So it's valid.

 As gic_restore_pending_irqs is called every time that we return to the
 guest. It could be anything else.

 It would be interesting to see why we are trapping all the time in Xen.


I may perform any test if you have some specific scenario.


 Regards,

 --
 Julien Grall



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Julien Grall

On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote:
 Hi Julien,
 
 On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall julien.gr...@linaro.org wrote:
 On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
 On Wed, 19 Nov 2014, Ian Campbell wrote:
 On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
 So it looks like there is not actually anything wrong, is just that you
 have too much inflight irqs? It should cause problems because in that
 case GICH_HCR_UIE should be set and you should get a maintenance
 interrupt when LRs become available (actually when none, or only one,
 of the List register entries is marked as a valid interrupt).

 Maybe GICH_HCR_UIE is the one that doesn't work properly.

 How much testing did this aspect get when the no-maint-irq series
 originally went in? Did you manage to find a workload which filled all
 the LRs or try artificially limiting the number of LRs somehow in order
 to provoke it?

 I ask because my intuition is that this won't happen very much, meaning
 those code paths may not be as well tested...

 I did test it by artificially limiting the number of LRs to 1.
 However there have been many iterations of that series and I didn't run
 this test at every iteration.

 am I the only to think this may not be related to this bug? All the LRs
 are full with IRQ of the same priority. So it's valid.

 As gic_restore_pending_irqs is called every time that we return to the
 guest. It could be anything else.

 It would be interesting to see why we are trapping all the time in Xen.

 
 I may perform any test if you have some specific scenario.

I have no specific scenario in my mind :/.

It looks like I'm able to reproduce it on my ARM board by the restricted
the number of LRs to 1.

I will investigate.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Julien Grall

On 11/19/2014 01:30 PM, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 3:26 PM, Julien Grall julien.gr...@linaro.org wrote:
 On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote:
 Hi Julien,

 On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall julien.gr...@linaro.org 
 wrote:
 On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
 On Wed, 19 Nov 2014, Ian Campbell wrote:
 On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
 So it looks like there is not actually anything wrong, is just that you
 have too much inflight irqs? It should cause problems because in that
 case GICH_HCR_UIE should be set and you should get a maintenance
 interrupt when LRs become available (actually when none, or only one,
 of the List register entries is marked as a valid interrupt).

 Maybe GICH_HCR_UIE is the one that doesn't work properly.

 How much testing did this aspect get when the no-maint-irq series
 originally went in? Did you manage to find a workload which filled all
 the LRs or try artificially limiting the number of LRs somehow in order
 to provoke it?

 I ask because my intuition is that this won't happen very much, meaning
 those code paths may not be as well tested...

 I did test it by artificially limiting the number of LRs to 1.
 However there have been many iterations of that series and I didn't run
 this test at every iteration.

 am I the only to think this may not be related to this bug? All the LRs
 are full with IRQ of the same priority. So it's valid.

 As gic_restore_pending_irqs is called every time that we return to the
 guest. It could be anything else.

 It would be interesting to see why we are trapping all the time in Xen.


 I may perform any test if you have some specific scenario.

 I have no specific scenario in my mind :/.

 It looks like I'm able to reproduce it on my ARM board by the restricted
 the number of LRs to 1.

 
 Do you mean that you got a hang with current xen/master branch ?

Yes but I forgot to update another part of the code.

With the patch below to restrict the number of LRs I'm still able to boot.
And don't see any maintenance interrupt.

Stefano, is it valid?

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index faad1ff..c1c0f7ff 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -327,6 +327,7 @@ static void __cpuinit gicv2_hyp_init(void)
 vtr = readl_gich(GICH_VTR);
 nr_lrs  = (vtr  GICH_V2_VTR_NRLRGS) + 1;
 gicv2_info.nr_lrs = nr_lrs;
+gicv2_info.nr_lrs = 1;
 
 writel_gich(GICH_MISR_EOI, GICH_MISR);
 }
@@ -488,6 +489,16 @@ static void gicv2_write_lr(int lr, const struct gic_lr 
*lr_reg)
 
 static void gicv2_hcr_status(uint32_t flag, bool_t status)
 {
+uint32_t lr = readl_gich(GICH_LR + 0);
+
+if ( status )
+lr |= GICH_V2_LR_MAINTENANCE_IRQ;
+else
+lr = ~GICH_V2_LR_MAINTENANCE_IRQ;
+
+writel_gich(lr, GICH_LR + 0);
+
+#if 0
 uint32_t hcr = readl_gich(GICH_HCR);
 
 if ( status )
@@ -496,6 +507,7 @@ static void gicv2_hcr_status(uint32_t flag, bool_t status)
 hcr = (~flag);
 
 writel_gich(hcr, GICH_HCR);
+#endif
 }
 
 static unsigned int gicv2_read_vmcr_priority(void)
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 70d10d6..c726d7a 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -599,6 +599,7 @@ static void maintenance_interrupt(int irq, void *dev_id, 
struct cpu_user_regs *r
  * on return to guest that is going to clear the old LRs and inject
  * new interrupts.
  */
+gdprintk(XENLOG_DEBUG, \n);
 }
 
 void gic_dump_info(struct vcpu *v)


-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,
 
if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   -GICH[GICH_HCR] |= GICH_HCR_UIE;
   +GICH[GICH_HCR] |= GICH_HCR_NPIE;
else
   -GICH[GICH_HCR] = ~GICH_HCR_UIE;
   +GICH[GICH_HCR] = ~GICH_HCR_NPIE;
  
}
 
  Yes, exactly
 
 I tried, hang still occurs with this change

We need to figure out why during the hang you still have all the LRs
busy even if you are getting maintenance interrupts that should cause
them to be cleared.

Could you please call gic_dump_info(current) from maintenance_interrupt,
and post the output during the hang? Remove the other gic_dump_info to
avoid confusion, we want to understand what is the status of the LRs
after clearing them upon receiving a maintenance interrupt at busy times.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,
 
 On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
 if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() 
)
-GICH[GICH_HCR] |= GICH_HCR_UIE;
+GICH[GICH_HCR] |= GICH_HCR_NPIE;
 else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
+GICH[GICH_HCR] = ~GICH_HCR_NPIE;
   
 }
  
   Yes, exactly
 
  I tried, hang still occurs with this change
 
  We need to figure out why during the hang you still have all the LRs
  busy even if you are getting maintenance interrupts that should cause
  them to be cleared.
 
 
 I see that I have free LRs during maintenance interrupt
 
 (XEN) gic.c:871:d0v0 maintenance interrupt
 (XEN) GICH_LRs (vcpu 0) mask=0
 (XEN)HW_LR[0]=9a015856
 (XEN)HW_LR[1]=0
 (XEN)HW_LR[2]=0
 (XEN)HW_LR[3]=0
 (XEN) Inflight irq=86 lr=0
 (XEN) Inflight irq=2 lr=255
 (XEN) Pending irq=2
 
 But I see that after I got hang - maintenance interrupts are generated
 continuously. Platform continues printing the same log till reboot.

Exactly the same log? As in the one above you just pasted?
That is very very suspicious.

I am thinking that we are not handling GICH_HCR_UIE correctly and
something we do in Xen, maybe writing to an LR register, might trigger a
new maintenance interrupt immediately causing an infinite loop.

Could you please try this patch? It disable GICH_HCR_UIE immediately on
hypervisor entry.


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 4d2a92d..6ae8dc4 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
 if ( is_idle_vcpu(v) )
 return;
 
+GICH[GICH_HCR] = ~GICH_HCR_UIE;
+
 spin_lock_irqsave(v-arch.vgic.lock, flags);
 
 while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
@@ -821,12 +823,8 @@ void gic_inject(void)
 
 gic_restore_pending_irqs(current);
 
-
 if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
 GICH[GICH_HCR] |= GICH_HCR_UIE;
-else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
-
 }
 
 static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi

On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
 if ( !list_empty(current-arch.vgic.lr_pending)  
lr_all_full() )
-GICH[GICH_HCR] |= GICH_HCR_UIE;
+GICH[GICH_HCR] |= GICH_HCR_NPIE;
 else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
+GICH[GICH_HCR] = ~GICH_HCR_NPIE;
   
 }
  
   Yes, exactly
 
  I tried, hang still occurs with this change
 
  We need to figure out why during the hang you still have all the LRs
  busy even if you are getting maintenance interrupts that should cause
  them to be cleared.
 

 I see that I have free LRs during maintenance interrupt

 (XEN) gic.c:871:d0v0 maintenance interrupt
 (XEN) GICH_LRs (vcpu 0) mask=0
 (XEN)HW_LR[0]=9a015856
 (XEN)HW_LR[1]=0
 (XEN)HW_LR[2]=0
 (XEN)HW_LR[3]=0
 (XEN) Inflight irq=86 lr=0
 (XEN) Inflight irq=2 lr=255
 (XEN) Pending irq=2

 But I see that after I got hang - maintenance interrupts are generated
 continuously. Platform continues printing the same log till reboot.

 Exactly the same log? As in the one above you just pasted?
 That is very very suspicious.

Yes exactly the same log. And looks like it means that LRs are flushed
correctly.


 I am thinking that we are not handling GICH_HCR_UIE correctly and
 something we do in Xen, maybe writing to an LR register, might trigger a
 new maintenance interrupt immediately causing an infinite loop.


Yes, this is what I'm thinking about. Taking in account all collected
debug info it looks like once LRs are overloaded with SGIs -
maintenance interrupt occurs.
And then it is not handled properly, and occurs again and again - so
platform hangs inside its handler.

 Could you please try this patch? It disable GICH_HCR_UIE immediately on
 hypervisor entry.


Now trying.


 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index 4d2a92d..6ae8dc4 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
  if ( is_idle_vcpu(v) )
  return;

 +GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +
  spin_lock_irqsave(v-arch.vgic.lock, flags);

  while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
 @@ -821,12 +823,8 @@ void gic_inject(void)

  gic_restore_pending_irqs(current);

 -
  if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
  GICH[GICH_HCR] |= GICH_HCR_UIE;
 -else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 -
  }

  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi 
 sgi)



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
  On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   Hi Stefano,
  
  if ( !list_empty(current-arch.vgic.lr_pending)  
 lr_all_full() )
 -GICH[GICH_HCR] |= GICH_HCR_UIE;
 +GICH[GICH_HCR] |= GICH_HCR_NPIE;
  else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +GICH[GICH_HCR] = ~GICH_HCR_NPIE;

  }
   
Yes, exactly
  
   I tried, hang still occurs with this change
  
   We need to figure out why during the hang you still have all the LRs
   busy even if you are getting maintenance interrupts that should cause
   them to be cleared.
  
 
  I see that I have free LRs during maintenance interrupt
 
  (XEN) gic.c:871:d0v0 maintenance interrupt
  (XEN) GICH_LRs (vcpu 0) mask=0
  (XEN)HW_LR[0]=9a015856
  (XEN)HW_LR[1]=0
  (XEN)HW_LR[2]=0
  (XEN)HW_LR[3]=0
  (XEN) Inflight irq=86 lr=0
  (XEN) Inflight irq=2 lr=255
  (XEN) Pending irq=2
 
  But I see that after I got hang - maintenance interrupts are generated
  continuously. Platform continues printing the same log till reboot.
 
  Exactly the same log? As in the one above you just pasted?
  That is very very suspicious.
 
  Yes exactly the same log. And looks like it means that LRs are flushed
  correctly.
 
 
  I am thinking that we are not handling GICH_HCR_UIE correctly and
  something we do in Xen, maybe writing to an LR register, might trigger a
  new maintenance interrupt immediately causing an infinite loop.
 
 
  Yes, this is what I'm thinking about. Taking in account all collected
  debug info it looks like once LRs are overloaded with SGIs -
  maintenance interrupt occurs.
  And then it is not handled properly, and occurs again and again - so
  platform hangs inside its handler.
 
  Could you please try this patch? It disable GICH_HCR_UIE immediately on
  hypervisor entry.
 
 
  Now trying.
 
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 4d2a92d..6ae8dc4 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
 
  +GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
 
   while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
  @@ -821,12 +823,8 @@ void gic_inject(void)
 
   gic_restore_pending_irqs(current);
 
  -
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   GICH[GICH_HCR] |= GICH_HCR_UIE;
  -else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  -
   }
 
   static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi 
  sgi)
 
 
 Heh - I don't see hangs with this patch :) But also I see that
 maintenance interrupt doesn't occur (and no hang as result)
 Stefano - is this expected?

No maintenance interrupts at all? That's strange. You should be
receiving them when LRs are full and you still have interrupts pending
to be added to them.

You could add another printk here to see if you should be receiving
them:

 if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
+{
+gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n);
 GICH[GICH_HCR] |= GICH_HCR_UIE;
-else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
-
+}
 }


 
 
  --
 
  Andrii Tseglytskyi | Embedded Dev
  GlobalLogic
  www.globallogic.com
 
 
 
 -- 
 
 Andrii Tseglytskyi | Embedded Dev
 GlobalLogic
 www.globallogic.com
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi

On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
  On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   Hi Stefano,
  
  if ( !list_empty(current-arch.vgic.lr_pending)  
 lr_all_full() )
 -GICH[GICH_HCR] |= GICH_HCR_UIE;
 +GICH[GICH_HCR] |= GICH_HCR_NPIE;
  else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +GICH[GICH_HCR] = ~GICH_HCR_NPIE;

  }
   
Yes, exactly
  
   I tried, hang still occurs with this change
  
   We need to figure out why during the hang you still have all the LRs
   busy even if you are getting maintenance interrupts that should cause
   them to be cleared.
  
 
  I see that I have free LRs during maintenance interrupt
 
  (XEN) gic.c:871:d0v0 maintenance interrupt
  (XEN) GICH_LRs (vcpu 0) mask=0
  (XEN)HW_LR[0]=9a015856
  (XEN)HW_LR[1]=0
  (XEN)HW_LR[2]=0
  (XEN)HW_LR[3]=0
  (XEN) Inflight irq=86 lr=0
  (XEN) Inflight irq=2 lr=255
  (XEN) Pending irq=2
 
  But I see that after I got hang - maintenance interrupts are generated
  continuously. Platform continues printing the same log till reboot.
 
  Exactly the same log? As in the one above you just pasted?
  That is very very suspicious.
 
  Yes exactly the same log. And looks like it means that LRs are flushed
  correctly.
 
 
  I am thinking that we are not handling GICH_HCR_UIE correctly and
  something we do in Xen, maybe writing to an LR register, might trigger a
  new maintenance interrupt immediately causing an infinite loop.
 
 
  Yes, this is what I'm thinking about. Taking in account all collected
  debug info it looks like once LRs are overloaded with SGIs -
  maintenance interrupt occurs.
  And then it is not handled properly, and occurs again and again - so
  platform hangs inside its handler.
 
  Could you please try this patch? It disable GICH_HCR_UIE immediately on
  hypervisor entry.
 
 
  Now trying.
 
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 4d2a92d..6ae8dc4 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
 
  +GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
 
   while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
  @@ -821,12 +823,8 @@ void gic_inject(void)
 
   gic_restore_pending_irqs(current);
 
  -
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   GICH[GICH_HCR] |= GICH_HCR_UIE;
  -else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  -
   }
 
   static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
  gic_sgi sgi)
 

 Heh - I don't see hangs with this patch :) But also I see that
 maintenance interrupt doesn't occur (and no hang as result)
 Stefano - is this expected?

 No maintenance interrupts at all? That's strange. You should be
 receiving them when LRs are full and you still have interrupts pending
 to be added to them.

 You could add another printk here to see if you should be receiving
 them:

  if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
 +{
 +gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n);
  GICH[GICH_HCR] |= GICH_HCR_UIE;
 -else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 -
 +}
  }


Requested properly:

(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt

But does not occur



 
 
  --
 
  Andrii Tseglytskyi | Embedded Dev
  GlobalLogic
  www.globallogic.com



 --

 Andrii Tseglytskyi | Embedded Dev
 GlobalLogic
 www.globallogic.com




-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi

Gic dump during interrupt requesting:

(XEN) GICH_LRs (vcpu 0) mask=f
(XEN)HW_LR[0]=3a1f
(XEN)HW_LR[1]=9a015856
(XEN)HW_LR[2]=1a1b
(XEN)HW_LR[3]=9a00e439
(XEN) Inflight irq=31 lr=0
(XEN) Inflight irq=86 lr=1
(XEN) Inflight irq=27 lr=2
(XEN) Inflight irq=57 lr=3
(XEN) Inflight irq=2 lr=255
(XEN) Pending irq=2

On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi
andrii.tseglyts...@globallogic.com wrote:
 On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
  On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   Hi Stefano,
  
  if ( !list_empty(current-arch.vgic.lr_pending)  
 lr_all_full() )
 -GICH[GICH_HCR] |= GICH_HCR_UIE;
 +GICH[GICH_HCR] |= GICH_HCR_NPIE;
  else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +GICH[GICH_HCR] = ~GICH_HCR_NPIE;

  }
   
Yes, exactly
  
   I tried, hang still occurs with this change
  
   We need to figure out why during the hang you still have all the LRs
   busy even if you are getting maintenance interrupts that should cause
   them to be cleared.
  
 
  I see that I have free LRs during maintenance interrupt
 
  (XEN) gic.c:871:d0v0 maintenance interrupt
  (XEN) GICH_LRs (vcpu 0) mask=0
  (XEN)HW_LR[0]=9a015856
  (XEN)HW_LR[1]=0
  (XEN)HW_LR[2]=0
  (XEN)HW_LR[3]=0
  (XEN) Inflight irq=86 lr=0
  (XEN) Inflight irq=2 lr=255
  (XEN) Pending irq=2
 
  But I see that after I got hang - maintenance interrupts are generated
  continuously. Platform continues printing the same log till reboot.
 
  Exactly the same log? As in the one above you just pasted?
  That is very very suspicious.
 
  Yes exactly the same log. And looks like it means that LRs are flushed
  correctly.
 
 
  I am thinking that we are not handling GICH_HCR_UIE correctly and
  something we do in Xen, maybe writing to an LR register, might trigger a
  new maintenance interrupt immediately causing an infinite loop.
 
 
  Yes, this is what I'm thinking about. Taking in account all collected
  debug info it looks like once LRs are overloaded with SGIs -
  maintenance interrupt occurs.
  And then it is not handled properly, and occurs again and again - so
  platform hangs inside its handler.
 
  Could you please try this patch? It disable GICH_HCR_UIE immediately on
  hypervisor entry.
 
 
  Now trying.
 
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 4d2a92d..6ae8dc4 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
 
  +GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
 
   while ((i = find_next_bit((const unsigned long *) 
  this_cpu(lr_mask),
  @@ -821,12 +823,8 @@ void gic_inject(void)
 
   gic_restore_pending_irqs(current);
 
  -
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   GICH[GICH_HCR] |= GICH_HCR_UIE;
  -else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  -
   }
 
   static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
  gic_sgi sgi)
 

 Heh - I don't see hangs with this patch :) But also I see that
 maintenance interrupt doesn't occur (and no hang as result)
 Stefano - is this expected?

 No maintenance interrupts at all? That's strange. You should be
 receiving them when LRs are full and you still have interrupts pending
 to be added to them.

 You could add another printk here to see if you should be receiving
 them:

  if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
 +{
 +gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n);
  GICH[GICH_HCR] |= GICH_HCR_UIE;
 -else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 -
 +}
  }


 Requested properly:

 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt

 But does not occur



 
 
  --
 
  Andrii Tseglytskyi | Embedded Dev
  GlobalLogic
  www.globallogic.com



 --

 Andrii Tseglytskyi | Embedded Dev
 GlobalLogic
 www.globallogic.com




 --

 Andrii Tseglytskyi | Embedded Dev
 GlobalLogic
 www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi

BTW - shouldn't this flag GICH_LR_MAINTENANCE_IRQ be set after
maintenance interrupt requesting ?

On Wed, Nov 19, 2014 at 6:32 PM, Andrii Tseglytskyi
andrii.tseglyts...@globallogic.com wrote:
 Gic dump during interrupt requesting:

 (XEN) GICH_LRs (vcpu 0) mask=f
 (XEN)HW_LR[0]=3a1f
 (XEN)HW_LR[1]=9a015856
 (XEN)HW_LR[2]=1a1b
 (XEN)HW_LR[3]=9a00e439
 (XEN) Inflight irq=31 lr=0
 (XEN) Inflight irq=86 lr=1
 (XEN) Inflight irq=27 lr=2
 (XEN) Inflight irq=57 lr=3
 (XEN) Inflight irq=2 lr=255
 (XEN) Pending irq=2

 On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
 On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
  On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   Hi Stefano,
  
  if ( !list_empty(current-arch.vgic.lr_pending)  
 lr_all_full() )
 -GICH[GICH_HCR] |= GICH_HCR_UIE;
 +GICH[GICH_HCR] |= GICH_HCR_NPIE;
  else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +GICH[GICH_HCR] = ~GICH_HCR_NPIE;

  }
   
Yes, exactly
  
   I tried, hang still occurs with this change
  
   We need to figure out why during the hang you still have all the LRs
   busy even if you are getting maintenance interrupts that should cause
   them to be cleared.
  
 
  I see that I have free LRs during maintenance interrupt
 
  (XEN) gic.c:871:d0v0 maintenance interrupt
  (XEN) GICH_LRs (vcpu 0) mask=0
  (XEN)HW_LR[0]=9a015856
  (XEN)HW_LR[1]=0
  (XEN)HW_LR[2]=0
  (XEN)HW_LR[3]=0
  (XEN) Inflight irq=86 lr=0
  (XEN) Inflight irq=2 lr=255
  (XEN) Pending irq=2
 
  But I see that after I got hang - maintenance interrupts are generated
  continuously. Platform continues printing the same log till reboot.
 
  Exactly the same log? As in the one above you just pasted?
  That is very very suspicious.
 
  Yes exactly the same log. And looks like it means that LRs are flushed
  correctly.
 
 
  I am thinking that we are not handling GICH_HCR_UIE correctly and
  something we do in Xen, maybe writing to an LR register, might trigger a
  new maintenance interrupt immediately causing an infinite loop.
 
 
  Yes, this is what I'm thinking about. Taking in account all collected
  debug info it looks like once LRs are overloaded with SGIs -
  maintenance interrupt occurs.
  And then it is not handled properly, and occurs again and again - so
  platform hangs inside its handler.
 
  Could you please try this patch? It disable GICH_HCR_UIE immediately on
  hypervisor entry.
 
 
  Now trying.
 
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 4d2a92d..6ae8dc4 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
 
  +GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
 
   while ((i = find_next_bit((const unsigned long *) 
  this_cpu(lr_mask),
  @@ -821,12 +823,8 @@ void gic_inject(void)
 
   gic_restore_pending_irqs(current);
 
  -
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   GICH[GICH_HCR] |= GICH_HCR_UIE;
  -else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  -
   }
 
   static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
  gic_sgi sgi)
 

 Heh - I don't see hangs with this patch :) But also I see that
 maintenance interrupt doesn't occur (and no hang as result)
 Stefano - is this expected?

 No maintenance interrupts at all? That's strange. You should be
 receiving them when LRs are full and you still have interrupts pending
 to be added to them.

 You could add another printk here to see if you should be receiving
 them:

  if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
 +{
 +gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n);
  GICH[GICH_HCR] |= GICH_HCR_UIE;
 -else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 -
 +}
  }


 Requested properly:

 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt

 But does not occur



 
 
  --
 
  Andrii Tseglytskyi | Embedded Dev
  GlobalLogic
  www.globallogic.com



 --

 Andrii Tseglytskyi | Embedded Dev

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
  andrii.tseglyts...@globallogic.com wrote:
   On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   Hi Stefano,
  
   On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
Hi Stefano,
   
   if ( !list_empty(current-arch.vgic.lr_pending)  
  lr_all_full() )
  -GICH[GICH_HCR] |= GICH_HCR_UIE;
  +GICH[GICH_HCR] |= GICH_HCR_NPIE;
   else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +GICH[GICH_HCR] = ~GICH_HCR_NPIE;
 
   }

 Yes, exactly
   
I tried, hang still occurs with this change
   
We need to figure out why during the hang you still have all the LRs
busy even if you are getting maintenance interrupts that should cause
them to be cleared.
   
  
   I see that I have free LRs during maintenance interrupt
  
   (XEN) gic.c:871:d0v0 maintenance interrupt
   (XEN) GICH_LRs (vcpu 0) mask=0
   (XEN)HW_LR[0]=9a015856
   (XEN)HW_LR[1]=0
   (XEN)HW_LR[2]=0
   (XEN)HW_LR[3]=0
   (XEN) Inflight irq=86 lr=0
   (XEN) Inflight irq=2 lr=255
   (XEN) Pending irq=2
  
   But I see that after I got hang - maintenance interrupts are generated
   continuously. Platform continues printing the same log till reboot.
  
   Exactly the same log? As in the one above you just pasted?
   That is very very suspicious.
  
   Yes exactly the same log. And looks like it means that LRs are flushed
   correctly.
  
  
   I am thinking that we are not handling GICH_HCR_UIE correctly and
   something we do in Xen, maybe writing to an LR register, might trigger a
   new maintenance interrupt immediately causing an infinite loop.
  
  
   Yes, this is what I'm thinking about. Taking in account all collected
   debug info it looks like once LRs are overloaded with SGIs -
   maintenance interrupt occurs.
   And then it is not handled properly, and occurs again and again - so
   platform hangs inside its handler.
  
   Could you please try this patch? It disable GICH_HCR_UIE immediately on
   hypervisor entry.
  
  
   Now trying.
  
  
   diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
   index 4d2a92d..6ae8dc4 100644
   --- a/xen/arch/arm/gic.c
   +++ b/xen/arch/arm/gic.c
   @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
if ( is_idle_vcpu(v) )
return;
  
   +GICH[GICH_HCR] = ~GICH_HCR_UIE;
   +
spin_lock_irqsave(v-arch.vgic.lock, flags);
  
while ((i = find_next_bit((const unsigned long *) 
   this_cpu(lr_mask),
   @@ -821,12 +823,8 @@ void gic_inject(void)
  
gic_restore_pending_irqs(current);
  
   -
if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
GICH[GICH_HCR] |= GICH_HCR_UIE;
   -else
   -GICH[GICH_HCR] = ~GICH_HCR_UIE;
   -
}
  
static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
   gic_sgi sgi)
  
 
  Heh - I don't see hangs with this patch :) But also I see that
  maintenance interrupt doesn't occur (and no hang as result)
  Stefano - is this expected?
 
  No maintenance interrupts at all? That's strange. You should be
  receiving them when LRs are full and you still have interrupts pending
  to be added to them.
 
  You could add another printk here to see if you should be receiving
  them:
 
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
  +{
  +gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n);
   GICH[GICH_HCR] |= GICH_HCR_UIE;
  -else
  -GICH[GICH_HCR] = ~GICH_HCR_UIE;
  -
  +}
   }
 
 
 Requested properly:
 
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 (XEN) gic.c:756:d0v0 requesting maintenance interrupt
 
 But does not occur

OK, let's see what's going on then by printing the irq number of the
maintenance interrupt:

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 4d2a92d..fed3167 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -55,6 +55,7 @@ static struct {
 static DEFINE_PER_CPU(uint64_t, lr_mask);
 
 static uint8_t nr_lrs;
+static bool uie_on;
 #define lr_all_full() (this_cpu(lr_mask) == ((1  nr_lrs) - 1))
 
 /* The GIC mapping of CPU interfaces does not necessarily match the
@@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
 {
 int i = 0;
 unsigned long flags;
+unsigned

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini

I think that's OK: it looks like that on your board for some reasons
when UIE is set you get irq 1023 (spurious interrupt) instead of your
normal maintenance interrupt.

But everything should work anyway without issues.

This is the same patch as before but on top of the lastest xen-unstable
tree. Please confirm if it works.

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 70d10d6..df140b9 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
 if ( is_idle_vcpu(v) )
 return;
 
+gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
+
 spin_lock_irqsave(v-arch.vgic.lock, flags);
 
 while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
@@ -527,8 +529,6 @@ void gic_inject(void)
 
 if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
 gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1);
-else
-gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
 }
 
 static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 I got this strange log:
 
 (XEN) received maintenance interrupt irq=1023
 
 And platform does not hang due to this:
 +hcr = GICH[GICH_HCR];
 +if ( hcr  GICH_HCR_UIE )
 +{
 +GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +uie_on = 1;
 +}
 
 On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
   andrii.tseglyts...@globallogic.com wrote:
On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
Hi Stefano,
   
On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

if ( !list_empty(current-arch.vgic.lr_pending)  
   lr_all_full() )
   -GICH[GICH_HCR] |= GICH_HCR_UIE;
   +GICH[GICH_HCR] |= GICH_HCR_NPIE;
else
   -GICH[GICH_HCR] = ~GICH_HCR_UIE;
   +GICH[GICH_HCR] = ~GICH_HCR_NPIE;
  
}
 
  Yes, exactly

 I tried, hang still occurs with this change

 We need to figure out why during the hang you still have all the 
 LRs
 busy even if you are getting maintenance interrupts that should 
 cause
 them to be cleared.

   
I see that I have free LRs during maintenance interrupt
   
(XEN) gic.c:871:d0v0 maintenance interrupt
(XEN) GICH_LRs (vcpu 0) mask=0
(XEN)HW_LR[0]=9a015856
(XEN)HW_LR[1]=0
(XEN)HW_LR[2]=0
(XEN)HW_LR[3]=0
(XEN) Inflight irq=86 lr=0
(XEN) Inflight irq=2 lr=255
(XEN) Pending irq=2
   
But I see that after I got hang - maintenance interrupts are 
generated
continuously. Platform continues printing the same log till reboot.
   
Exactly the same log? As in the one above you just pasted?
That is very very suspicious.
   
Yes exactly the same log. And looks like it means that LRs are flushed
correctly.
   
   
I am thinking that we are not handling GICH_HCR_UIE correctly and
something we do in Xen, maybe writing to an LR register, might 
trigger a
new maintenance interrupt immediately causing an infinite loop.
   
   
Yes, this is what I'm thinking about. Taking in account all collected
debug info it looks like once LRs are overloaded with SGIs -
maintenance interrupt occurs.
And then it is not handled properly, and occurs again and again - so
platform hangs inside its handler.
   
Could you please try this patch? It disable GICH_HCR_UIE immediately 
on
hypervisor entry.
   
   
Now trying.
   
   
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 4d2a92d..6ae8dc4 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
 if ( is_idle_vcpu(v) )
 return;
   
+GICH[GICH_HCR] = ~GICH_HCR_UIE;
+
 spin_lock_irqsave(v-arch.vgic.lock, flags);
   
 while ((i = find_next_bit((const unsigned long *) 
this_cpu(lr_mask),
@@ -821,12 +823,8 @@ void gic_inject(void)
   
 gic_restore_pending_irqs(current);
   
-
 if ( !list_empty(current-arch.vgic.lr_pending)  
lr_all_full() )
 GICH[GICH_HCR] |= GICH_HCR_UIE;
-else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
-
 }
   
 static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
gic_sgi sgi)
   
  
   Heh - I don't see hangs with this patch :) But also I see that
   maintenance interrupt doesn't occur (and no hang as

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,
 
 On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  I think that's OK: it looks like that on your board for some reasons
  when UIE is set you get irq 1023 (spurious interrupt) instead of your
  normal maintenance interrupt.
 
 OK, but I think this should be investigated too. What do you think ?

I think it is harmless: my guess is that if we clear UIE before reading
GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
interrupt. But it doesn't really matter to us.

 
  But everything should work anyway without issues.
 
  This is the same patch as before but on top of the lastest xen-unstable
  tree. Please confirm if it works.
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 70d10d6..df140b9 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
 
  +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
 
   while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
  @@ -527,8 +529,6 @@ void gic_inject(void)
 
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1);
  -else
  -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
   }
 
 
 I confirm - it works fine. Will this be a final fix ?

Yep :-)
Many thanks for your help on this!


 Regards,
 Andrii
 
   static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
 
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  I got this strange log:
 
  (XEN) received maintenance interrupt irq=1023
 
  And platform does not hang due to this:
  +hcr = GICH[GICH_HCR];
  +if ( hcr  GICH_HCR_UIE )
  +{
  +GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +uie_on = 1;
  +}
 
  On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
andrii.tseglyts...@globallogic.com wrote:
 On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
 if ( !list_empty(current-arch.vgic.lr_pending)  
lr_all_full() )
-GICH[GICH_HCR] |= GICH_HCR_UIE;
+GICH[GICH_HCR] |= GICH_HCR_NPIE;
 else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
+GICH[GICH_HCR] = ~GICH_HCR_NPIE;
   
 }
  
   Yes, exactly
 
  I tried, hang still occurs with this change
 
  We need to figure out why during the hang you still have all 
  the LRs
  busy even if you are getting maintenance interrupts that 
  should cause
  them to be cleared.
 

 I see that I have free LRs during maintenance interrupt

 (XEN) gic.c:871:d0v0 maintenance interrupt
 (XEN) GICH_LRs (vcpu 0) mask=0
 (XEN)HW_LR[0]=9a015856
 (XEN)HW_LR[1]=0
 (XEN)HW_LR[2]=0
 (XEN)HW_LR[3]=0
 (XEN) Inflight irq=86 lr=0
 (XEN) Inflight irq=2 lr=255
 (XEN) Pending irq=2

 But I see that after I got hang - maintenance interrupts are 
 generated
 continuously. Platform continues printing the same log till 
 reboot.

 Exactly the same log? As in the one above you just pasted?
 That is very very suspicious.

 Yes exactly the same log. And looks like it means that LRs are 
 flushed
 correctly.


 I am thinking that we are not handling GICH_HCR_UIE correctly and
 something we do in Xen, maybe writing to an LR register, might 
 trigger a
 new maintenance interrupt immediately causing an infinite loop.


 Yes, this is what I'm thinking about. Taking in account all 
 collected
 debug info it looks like once LRs are overloaded with SGIs -
 maintenance interrupt occurs.
 And then it is not handled properly, and occurs again and again - 
 so
 platform hangs inside its handler.

 Could you please try this patch? It disable GICH_HCR_UIE 
 immediately on
 hypervisor entry.


 Now trying.


 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index 4d2a92d..6ae8dc4 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
  if ( is_idle_vcpu(v) )

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi

On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  I think that's OK: it looks like that on your board for some reasons
  when UIE is set you get irq 1023 (spurious interrupt) instead of your
  normal maintenance interrupt.

 OK, but I think this should be investigated too. What do you think ?

 I think it is harmless: my guess is that if we clear UIE before reading
 GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
 interrupt. But it doesn't really matter to us.

OK. I think catching this will be a good exercise for someone )) But
out of scope for this issue.


 
  But everything should work anyway without issues.
 
  This is the same patch as before but on top of the lastest xen-unstable
  tree. Please confirm if it works.
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index 70d10d6..df140b9 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
   if ( is_idle_vcpu(v) )
   return;
 
  +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
  +
   spin_lock_irqsave(v-arch.vgic.lock, flags);
 
   while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask),
  @@ -527,8 +529,6 @@ void gic_inject(void)
 
   if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
   gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1);
  -else
  -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
   }
 

 I confirm - it works fine. Will this be a final fix ?

 Yep :-)
 Many thanks for your help on this!

Thank you Stefano. This issue was really critical for us :)

Regards,
Andrii



 Regards,
 Andrii

   static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
 
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  I got this strange log:
 
  (XEN) received maintenance interrupt irq=1023
 
  And platform does not hang due to this:
  +hcr = GICH[GICH_HCR];
  +if ( hcr  GICH_HCR_UIE )
  +{
  +GICH[GICH_HCR] = ~GICH_HCR_UIE;
  +uie_on = 1;
  +}
 
  On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
andrii.tseglyts...@globallogic.com wrote:
 On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
 if ( !list_empty(current-arch.vgic.lr_pending)  
lr_all_full() )
-GICH[GICH_HCR] |= GICH_HCR_UIE;
+GICH[GICH_HCR] |= GICH_HCR_NPIE;
 else
-GICH[GICH_HCR] = ~GICH_HCR_UIE;
+GICH[GICH_HCR] = ~GICH_HCR_NPIE;
   
 }
  
   Yes, exactly
 
  I tried, hang still occurs with this change
 
  We need to figure out why during the hang you still have all 
  the LRs
  busy even if you are getting maintenance interrupts that 
  should cause
  them to be cleared.
 

 I see that I have free LRs during maintenance interrupt

 (XEN) gic.c:871:d0v0 maintenance interrupt
 (XEN) GICH_LRs (vcpu 0) mask=0
 (XEN)HW_LR[0]=9a015856
 (XEN)HW_LR[1]=0
 (XEN)HW_LR[2]=0
 (XEN)HW_LR[3]=0
 (XEN) Inflight irq=86 lr=0
 (XEN) Inflight irq=2 lr=255
 (XEN) Pending irq=2

 But I see that after I got hang - maintenance interrupts are 
 generated
 continuously. Platform continues printing the same log till 
 reboot.

 Exactly the same log? As in the one above you just pasted?
 That is very very suspicious.

 Yes exactly the same log. And looks like it means that LRs are 
 flushed
 correctly.


 I am thinking that we are not handling GICH_HCR_UIE correctly and
 something we do in Xen, maybe writing to an LR register, might 
 trigger a
 new maintenance interrupt immediately causing an infinite loop.


 Yes, this is what I'm thinking about. Taking in account all 
 collected
 debug info it looks like once LRs are overloaded with SGIs -
 maintenance interrupt occurs.
 And then it is not handled properly, and occurs again and again - 
 so
 platform hangs inside its handler.

 Could you please try this patch? It disable GICH_HCR_UIE 
 immediately on
 hypervisor entry.


 Now trying.

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini

That's right, the maintenance interrupt handler is not called, but it
doesn't do anything so we are fine. The important thing is that an
interrupt is sent and git_clear_lrs gets called on hypervisor entry.

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 The only ambiguity left - maintenance interrupt handler is not called.
 It was requested for specific IRQ number, retrieved from device tree.
 But when we trigger GICH_HCR_UIE - we got maintenance interrupt for
 spurious number 1023.
 
 Regards,
 Andrii
 
 On Wed, Nov 19, 2014 at 7:47 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
  On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   I think that's OK: it looks like that on your board for some reasons
   when UIE is set you get irq 1023 (spurious interrupt) instead of your
   normal maintenance interrupt.
 
  OK, but I think this should be investigated too. What do you think ?
 
  I think it is harmless: my guess is that if we clear UIE before reading
  GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
  interrupt. But it doesn't really matter to us.
 
  OK. I think catching this will be a good exercise for someone )) But
  out of scope for this issue.
 
 
  
   But everything should work anyway without issues.
  
   This is the same patch as before but on top of the lastest xen-unstable
   tree. Please confirm if it works.
  
   diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
   index 70d10d6..df140b9 100644
   --- a/xen/arch/arm/gic.c
   +++ b/xen/arch/arm/gic.c
   @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
if ( is_idle_vcpu(v) )
return;
  
   +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
   +
spin_lock_irqsave(v-arch.vgic.lock, flags);
  
while ((i = find_next_bit((const unsigned long *) 
   this_cpu(lr_mask),
   @@ -527,8 +529,6 @@ void gic_inject(void)
  
if ( !list_empty(current-arch.vgic.lr_pending)  lr_all_full() )
gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1);
   -else
   -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0);
}
  
 
  I confirm - it works fine. Will this be a final fix ?
 
  Yep :-)
  Many thanks for your help on this!
 
  Thank you Stefano. This issue was really critical for us :)
 
  Regards,
  Andrii
 
 
 
  Regards,
  Andrii
 
static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
  
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   I got this strange log:
  
   (XEN) received maintenance interrupt irq=1023
  
   And platform does not hang due to this:
   +hcr = GICH[GICH_HCR];
   +if ( hcr  GICH_HCR_UIE )
   +{
   +GICH[GICH_HCR] = ~GICH_HCR_UIE;
   +uie_on = 1;
   +}
  
   On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
   stefano.stabell...@eu.citrix.com wrote:
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
  On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
  On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
  Hi Stefano,
 
  On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
   Hi Stefano,
  
  if ( !list_empty(current-arch.vgic.lr_pending) 
  lr_all_full() )
 -GICH[GICH_HCR] |= GICH_HCR_UIE;
 +GICH[GICH_HCR] |= GICH_HCR_NPIE;
  else
 -GICH[GICH_HCR] = ~GICH_HCR_UIE;
 +GICH[GICH_HCR] = ~GICH_HCR_NPIE;

  }
   
Yes, exactly
  
   I tried, hang still occurs with this change
  
   We need to figure out why during the hang you still have 
   all the LRs
   busy even if you are getting maintenance interrupts that 
   should cause
   them to be cleared.
  
 
  I see that I have free LRs during maintenance interrupt
 
  (XEN) gic.c:871:d0v0 maintenance interrupt
  (XEN) GICH_LRs (vcpu 0) mask=0
  (XEN)HW_LR[0]=9a015856
  (XEN)HW_LR[1]=0
  (XEN)HW_LR[2]=0
  (XEN)HW_LR[3]=0
  (XEN) Inflight irq=86 lr=0
  (XEN) Inflight irq=2 lr=255
  (XEN) Pending irq=2
 
  But I see that after I got hang - maintenance interrupts are 
  generated
  continuously. Platform continues printing the same log till 
  reboot.
 
  Exactly the same log? As in the one above you just pasted?
  That is very very suspicious.
 
  Yes exactly the

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Julien Grall

On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
 That's right, the maintenance interrupt handler is not called, but it
 doesn't do anything so we are fine. The important thing is that an
 interrupt is sent and git_clear_lrs gets called on hypervisor entry.

It would be worth to write down this somewhere. Just in case someone
decide to add code in maintenance interrupt later.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini

On Wed, 19 Nov 2014, Julien Grall wrote:
 On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
  That's right, the maintenance interrupt handler is not called, but it
  doesn't do anything so we are fine. The important thing is that an
  interrupt is sent and git_clear_lrs gets called on hypervisor entry.
 
 It would be worth to write down this somewhere. Just in case someone
 decide to add code in maintenance interrupt later.

Yes, I could add a comment in the handler

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi

19 лист. 2014 20:32, користувач Stefano Stabellini 
stefano.stabell...@eu.citrix.com написав:

 On Wed, 19 Nov 2014, Julien Grall wrote:
  On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
   That's right, the maintenance interrupt handler is not called, but it
   doesn't do anything so we are fine. The important thing is that an
   interrupt is sent and git_clear_lrs gets called on hypervisor entry.
 
  It would be worth to write down this somewhere. Just in case someone
  decide to add code in maintenance interrupt later.

 Yes, I could add a comment in the handler

Maybe it wouldn't take a lot of effort to fix it? I am just worrying that
we may hide some issue - typically spurious interrupt this not what is
expected.
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-18 Thread Andrii Tseglytskyi

Hi Stefano,

On Mon, Nov 17, 2014 at 8:02 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Mon, 17 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 Thank you for your answer.

 On Mon, Nov 17, 2014 at 6:39 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  Although it is possible that that patch is the cause of your problem,
  unfortunately it is part of a significant rework of the GIC driver in
  Xen and I am afraid that testing with only a portion of that patch
  series might introduce other subtle bugs.  For your reference the series
  starts at commit 6f91502be64a05d0635454d629118b96ae38b50f and ends at
  commit 72eaf29e8d70784aaf066ead79df1295a25ecfd0.
 

 Yes, I tested with and without the whole series.

 And the result is that the series causes the problem?


Yes.


  If 5495a512b63bad868c147198f7f049c2617d468c is really the cause of your
  problem, one idea that comes to mind is that GICH_LR_MAINTENANCE_IRQ
  might not work correctly on your platform. It wouldn't be the first time
  that we see hardware behaving that way, especially if you are using the
  GIC secure registers instead of the non-secure register as GICH_LRn.HW
  can only deactivate non-secure interrupts. This is usually due to a
  configuration error in u-boot.
 
  Could you please try to set PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI for your
  platform?
 

 I tried this. Unfortunately it doesn't help.

 Could you try the following patch on top of
 5495a512b63bad868c147198f7f049c2617d468c ?

 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index 302c031..a286376 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -557,10 +557,8 @@ static inline void gic_set_lr(int lr, struct pending_irq 
 *p,
  BUG_ON(lr  0);
  BUG_ON(state  ~(GICH_LR_STATE_MASKGICH_LR_STATE_SHIFT));

 -lr_val = state | ((p-priority  3)  GICH_LR_PRIORITY_SHIFT) |
 +lr_val = state | GICH_LR_MAINTENANCE_IRQ | ((p-priority  3)  
 GICH_LR_PRIORITY_SHIFT) |
  ((p-irq  GICH_LR_VIRTUAL_MASK)  GICH_LR_VIRTUAL_SHIFT);
 -if ( p-desc != NULL )
 -lr_val |= GICH_LR_HW | (p-desc-irq  GICH_LR_PHYSICAL_SHIFT);

  GICH[GICH_LR + lr] = lr_val;

 @@ -622,6 +620,12 @@ out:
  return;
  }

 +static void gic_irq_eoi(void *info)
 +{
 +int virq = (uintptr_t) info;
 +GICC[GICC_DIR] = virq;
 +}
 +
  static void gic_update_one_lr(struct vcpu *v, int i)
  {
  struct pending_irq *p;
 @@ -639,7 +643,10 @@ static void gic_update_one_lr(struct vcpu *v, int i)
  irq = (lr  GICH_LR_VIRTUAL_SHIFT)  GICH_LR_VIRTUAL_MASK;
  p = irq_to_pending(v, irq);
  if ( p-desc != NULL )
 +{
 +gic_irq_eoi((void*)(uintptr_t)irq);
  p-desc-status = ~IRQ_INPROGRESS;
 +}
  clear_bit(GIC_IRQ_GUEST_VISIBLE, p-status);
  if ( test_bit(GIC_IRQ_GUEST_PENDING, p-status) 
  test_bit(GIC_IRQ_GUEST_ENABLED, p-status))


It helps! Thank you a lot!
I did about ~30 reboots and got no hangs. The only what is needed - is
to rebase these changes on top of xen/master branch.
Changes in patch can be applied only on top of
5495a512b63bad868c147198f7f049c2617d468c
Will you do this change? Is it acceptable for baseline?

Regards,
Andrii

-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-18 Thread Andrii Tseglytskyi

OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
everything works fine
The following 2 patches fixes xen/master for my platform.

Stefano, could you please take a look to these changes?

commit 3628a0aa35706a8f532af865ed784536ce514eca
Author: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com
Date:   Tue Nov 18 14:20:42 2014 +0200

xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag

Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
Signed-off-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 31fb81a..093ecdb 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
pending_irq *p,
   GICH_V2_LR_PRIORITY_SHIFT) |
   ((p-irq  GICH_V2_LR_VIRTUAL_MASK) 
GICH_V2_LR_VIRTUAL_SHIFT));

-if ( p-desc != NULL )
+if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
 {
-if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
-lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
-else
-lr_reg |= GICH_V2_LR_HW | ((p-desc-irq 
GICH_V2_LR_PHYSICAL_MASK )
- GICH_V2_LR_PHYSICAL_SHIFT);
+lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
+}
+else if ( p-desc != NULL )
+{
+lr_reg |= GICH_V2_LR_HW | ((p-desc-irq  GICH_V2_LR_PHYSICAL_MASK )
+GICH_V2_LR_PHYSICAL_SHIFT);
 }

 writel_gich(lr_reg, GICH_LR + lr * 4);

commit 110ad1914f04a5e52ec9d49a9aeb7df488f524b1
Author: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com
Date:   Tue Nov 18 12:14:42 2014 +0200

xen/arm: dra7: add PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI

Change-Id: Ic6285d5aea803fb0bfef50ffcc35e20b5bfb7a77
Signed-off-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com

diff --git a/xen/arch/arm/platforms/omap5.c b/xen/arch/arm/platforms/omap5.c
index 9d6e504..fb6686f 100644
--- a/xen/arch/arm/platforms/omap5.c
+++ b/xen/arch/arm/platforms/omap5.c
@@ -166,6 +166,11 @@ static const struct dt_device_match
dra7_blacklist_dev[] __initconst =
 { /* sentinel */ },
 };

+static uint32_t dra7_quirks(void)
+{
+return PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI;
+}
+
 PLATFORM_START(omap5, TI OMAP5)
 .compatible = omap5_dt_compat,
 .init_time = omap5_init_time,
@@ -186,6 +191,7 @@ PLATFORM_START(dra7, TI DRA7)
 .dom0_gnttab_start = 0x4b00,
 .dom0_gnttab_size = 0x2,
 .blacklist_dev = dra7_blacklist_dev,
+.quirks = dra7_quirks,
 PLATFORM_END

 /*

On Tue, Nov 18, 2014 at 1:31 PM, Andrii Tseglytskyi
andrii.tseglyts...@globallogic.com wrote:
 Strange - looks like baseline code already does the same, that you
 sent me yesterday. The only what is needed - is to set
 PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI.
 But baseline contains an issue. And in the same time changes on top of
 5495a512b63bad868c147198f7f049c2617d468c work fine.

 Regards,
 Andrii

 On Tue, Nov 18, 2014 at 12:41 PM, Andrii Tseglytskyi
 andrii.tseglyts...@globallogic.com wrote:
 Hi Stefano,

 On Mon, Nov 17, 2014 at 8:02 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
 On Mon, 17 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,

 Thank you for your answer.

 On Mon, Nov 17, 2014 at 6:39 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  Although it is possible that that patch is the cause of your problem,
  unfortunately it is part of a significant rework of the GIC driver in
  Xen and I am afraid that testing with only a portion of that patch
  series might introduce other subtle bugs.  For your reference the series
  starts at commit 6f91502be64a05d0635454d629118b96ae38b50f and ends at
  commit 72eaf29e8d70784aaf066ead79df1295a25ecfd0.
 

 Yes, I tested with and without the whole series.

 And the result is that the series causes the problem?


 Yes.


  If 5495a512b63bad868c147198f7f049c2617d468c is really the cause of your
  problem, one idea that comes to mind is that GICH_LR_MAINTENANCE_IRQ
  might not work correctly on your platform. It wouldn't be the first time
  that we see hardware behaving that way, especially if you are using the
  GIC secure registers instead of the non-secure register as GICH_LRn.HW
  can only deactivate non-secure interrupts. This is usually due to a
  configuration error in u-boot.
 
  Could you please try to set PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI for your
  platform?
 

 I tried this. Unfortunately it doesn't help.

 Could you try the following patch on top of
 5495a512b63bad868c147198f7f049c2617d468c ?

 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index 302c031..a286376 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -557,10 +557,8 @@ static inline void gic_set_lr(int lr, struct 
 pending_irq *p,
  BUG_ON(lr  0);
  BUG_ON(state  ~(GICH_LR_STATE_MASKGICH_LR_STATE_SHIFT));

 -lr_val = state | ((p-priority  3)

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-18 Thread Stefano Stabellini

On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
 OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
 everything works fine
 The following 2 patches fixes xen/master for my platform.
 
 Stefano, could you please take a look to these changes?
 
 commit 3628a0aa35706a8f532af865ed784536ce514eca
 Author: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com
 Date:   Tue Nov 18 14:20:42 2014 +0200
 
 xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
 
 Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
 Signed-off-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com
 
 diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
 index 31fb81a..093ecdb 100644
 --- a/xen/arch/arm/gic-v2.c
 +++ b/xen/arch/arm/gic-v2.c
 @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
 pending_irq *p,
GICH_V2_LR_PRIORITY_SHIFT) |
((p-irq  GICH_V2_LR_VIRTUAL_MASK) 
 GICH_V2_LR_VIRTUAL_SHIFT));
 
 -if ( p-desc != NULL )
 +if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
  {
 -if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
 -lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
 -else
 -lr_reg |= GICH_V2_LR_HW | ((p-desc-irq 
 GICH_V2_LR_PHYSICAL_MASK )
 - GICH_V2_LR_PHYSICAL_SHIFT);
 +lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
 +}
 +else if ( p-desc != NULL )
 +{
 +lr_reg |= GICH_V2_LR_HW | ((p-desc-irq  GICH_V2_LR_PHYSICAL_MASK )
 +GICH_V2_LR_PHYSICAL_SHIFT);
  }
 
  writel_gich(lr_reg, GICH_LR + lr * 4);

Actually in case p-desc == NULL (the irq is not an hardware irq, it
could be the virtual timer irq or the evtchn irq), you shouldn't need
the maintenance interrupt, if the bug was really due to GICH_LR_HW not
working correctly on OMAP5. This changes might only be better at
hiding the real issue.

Maybe the problem is exactly the opposite: the new scheme for avoiding
maintenance interrupts doesn't work for software interrupts.
The commit that should make them work correctly after the
no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
If you look at the changes to gic_update_one_lr in that commit, you'll
see that is going to set a software irq as PENDING if it is already ACTIVE.
Maybe that doesn't work correctly on OMAP5.

Could you try this patch on top of
394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
if the problem is specifically with software irqs.


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index b7516c0..d8a17c9 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
 /* Maximum cpu interface per GIC */
 #define NR_GIC_CPU_IF 8
 
-#undef GIC_DEBUG
+#define GIC_DEBUG 1
 
 static void gic_update_one_lr(struct vcpu *v, int i);
 
@@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
 ((p-irq  GICH_LR_VIRTUAL_MASK)  GICH_LR_VIRTUAL_SHIFT);
 if ( p-desc != NULL )
 lr_val |= GICH_LR_HW | (p-desc-irq  GICH_LR_PHYSICAL_SHIFT);
+else
+lr_val |= GICH_LR_MAINTENANCE_IRQ;
 
 GICH[GICH_LR + lr] = lr_val;
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-18 Thread Andrii Tseglytskyi

What if I try on top of current master branch the following code:

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 31fb81a..6764ab7 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -36,6 +36,8 @@
 #include asm/io.h
 #include asm/gic.h

+#define GIC_DEBUG 1
+
 /*
  * LR register definitions are GIC v2 specific.
  * Moved these definitions from header file to here
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index bcaded9..c03d6a6 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);

 #define lr_all_full() (this_cpu(lr_mask) == ((1 
gic_hw_ops-info-nr_lrs) - 1))

-#undef GIC_DEBUG
+#define GIC_DEBUG 1

 static void gic_update_one_lr(struct vcpu *v, int i);

It is equivalent to what you proposing - my code contains
PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
be executed:
 lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function

regards,
Andrii

On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
 OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
 everything works fine
 The following 2 patches fixes xen/master for my platform.

 Stefano, could you please take a look to these changes?

 commit 3628a0aa35706a8f532af865ed784536ce514eca
 Author: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com
 Date:   Tue Nov 18 14:20:42 2014 +0200

 xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag

 Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
 Signed-off-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com

 diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
 index 31fb81a..093ecdb 100644
 --- a/xen/arch/arm/gic-v2.c
 +++ b/xen/arch/arm/gic-v2.c
 @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
 pending_irq *p,
GICH_V2_LR_PRIORITY_SHIFT) |
((p-irq  GICH_V2_LR_VIRTUAL_MASK) 
 GICH_V2_LR_VIRTUAL_SHIFT));

 -if ( p-desc != NULL )
 +if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
  {
 -if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
 -lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
 -else
 -lr_reg |= GICH_V2_LR_HW | ((p-desc-irq 
 GICH_V2_LR_PHYSICAL_MASK )
 - GICH_V2_LR_PHYSICAL_SHIFT);
 +lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
 +}
 +else if ( p-desc != NULL )
 +{
 +lr_reg |= GICH_V2_LR_HW | ((p-desc-irq  GICH_V2_LR_PHYSICAL_MASK 
 )
 +GICH_V2_LR_PHYSICAL_SHIFT);
  }

  writel_gich(lr_reg, GICH_LR + lr * 4);

 Actually in case p-desc == NULL (the irq is not an hardware irq, it
 could be the virtual timer irq or the evtchn irq), you shouldn't need
 the maintenance interrupt, if the bug was really due to GICH_LR_HW not
 working correctly on OMAP5. This changes might only be better at
 hiding the real issue.

 Maybe the problem is exactly the opposite: the new scheme for avoiding
 maintenance interrupts doesn't work for software interrupts.
 The commit that should make them work correctly after the
 no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
 If you look at the changes to gic_update_one_lr in that commit, you'll
 see that is going to set a software irq as PENDING if it is already ACTIVE.
 Maybe that doesn't work correctly on OMAP5.

 Could you try this patch on top of
 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
 if the problem is specifically with software irqs.


 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index b7516c0..d8a17c9 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
  /* Maximum cpu interface per GIC */
  #define NR_GIC_CPU_IF 8

 -#undef GIC_DEBUG
 +#define GIC_DEBUG 1

  static void gic_update_one_lr(struct vcpu *v, int i);

 @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq 
 *p,
  ((p-irq  GICH_LR_VIRTUAL_MASK)  GICH_LR_VIRTUAL_SHIFT);
  if ( p-desc != NULL )
  lr_val |= GICH_LR_HW | (p-desc-irq  GICH_LR_PHYSICAL_SHIFT);
 +else
 +lr_val |= GICH_LR_MAINTENANCE_IRQ;

  GICH[GICH_LR + lr] = lr_val;




-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-18 Thread Andrii Tseglytskyi

OK got it. Give me a few mins

On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
 for non-hardware irqs (desc == NULL) and keep avoiding
 GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.

 Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
 other potential bugs introduced later.

 On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
 What if I try on top of current master branch the following code:

 diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
 index 31fb81a..6764ab7 100644
 --- a/xen/arch/arm/gic-v2.c
 +++ b/xen/arch/arm/gic-v2.c
 @@ -36,6 +36,8 @@
  #include asm/io.h
  #include asm/gic.h

 +#define GIC_DEBUG 1
 +
  /*
   * LR register definitions are GIC v2 specific.
   * Moved these definitions from header file to here
 diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 index bcaded9..c03d6a6 100644
 --- a/xen/arch/arm/gic.c
 +++ b/xen/arch/arm/gic.c
 @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);

  #define lr_all_full() (this_cpu(lr_mask) == ((1 
 gic_hw_ops-info-nr_lrs) - 1))

 -#undef GIC_DEBUG
 +#define GIC_DEBUG 1

  static void gic_update_one_lr(struct vcpu *v, int i);

 It is equivalent to what you proposing - my code contains
 PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
 be executed:
  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function

 regards,
 Andrii

 On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
  OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
  everything works fine
  The following 2 patches fixes xen/master for my platform.
 
  Stefano, could you please take a look to these changes?
 
  commit 3628a0aa35706a8f532af865ed784536ce514eca
  Author: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com
  Date:   Tue Nov 18 14:20:42 2014 +0200
 
  xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
 
  Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
  Signed-off-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com
 
  diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
  index 31fb81a..093ecdb 100644
  --- a/xen/arch/arm/gic-v2.c
  +++ b/xen/arch/arm/gic-v2.c
  @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
  pending_irq *p,
 
  GICH_V2_LR_PRIORITY_SHIFT) |
 ((p-irq  GICH_V2_LR_VIRTUAL_MASK) 
  GICH_V2_LR_VIRTUAL_SHIFT));
 
  -if ( p-desc != NULL )
  +if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
   {
  -if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
  -lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
  -else
  -lr_reg |= GICH_V2_LR_HW | ((p-desc-irq 
  GICH_V2_LR_PHYSICAL_MASK )
  - GICH_V2_LR_PHYSICAL_SHIFT);
  +lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
  +}
  +else if ( p-desc != NULL )
  +{
  +lr_reg |= GICH_V2_LR_HW | ((p-desc-irq  
  GICH_V2_LR_PHYSICAL_MASK )
  +GICH_V2_LR_PHYSICAL_SHIFT);
   }
 
   writel_gich(lr_reg, GICH_LR + lr * 4);
 
  Actually in case p-desc == NULL (the irq is not an hardware irq, it
  could be the virtual timer irq or the evtchn irq), you shouldn't need
  the maintenance interrupt, if the bug was really due to GICH_LR_HW not
  working correctly on OMAP5. This changes might only be better at
  hiding the real issue.
 
  Maybe the problem is exactly the opposite: the new scheme for avoiding
  maintenance interrupts doesn't work for software interrupts.
  The commit that should make them work correctly after the
  no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
  If you look at the changes to gic_update_one_lr in that commit, you'll
  see that is going to set a software irq as PENDING if it is already ACTIVE.
  Maybe that doesn't work correctly on OMAP5.
 
  Could you try this patch on top of
  394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
  if the problem is specifically with software irqs.
 
 
  diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
  index b7516c0..d8a17c9 100644
  --- a/xen/arch/arm/gic.c
  +++ b/xen/arch/arm/gic.c
  @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
   /* Maximum cpu interface per GIC */
   #define NR_GIC_CPU_IF 8
 
  -#undef GIC_DEBUG
  +#define GIC_DEBUG 1
 
   static void gic_update_one_lr(struct vcpu *v, int i);
 
  @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct 
  pending_irq *p,
   ((p-irq  GICH_LR_VIRTUAL_MASK)  GICH_LR_VIRTUAL_SHIFT);
   if ( p-desc != NULL )
   lr_val |= GICH_LR_HW | (p-desc-irq  GICH_LR_PHYSICAL_SHIFT);
  +else
  +lr_val |= GICH_LR_MAINTENANCE_IRQ;
 
   GICH[GICH_LR + lr] = lr_val;

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-18 Thread Andrii Tseglytskyi

Hi Stefano,

No hangs with this change.
Complete log is the following:

U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
DRA752 ES1.0
ethaddr not set. Validating first E-fuse MAC
cpsw
- UART enabled -
- CPU  booting -
- Xen starting in Hyp mode -
- Zero BSS -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) Checking for initrd in /chosen
(XEN) RAM: 8000 - 9fff
(XEN) RAM: a000 - bfff
(XEN) RAM: c000 - dfff
(XEN)
(XEN) MODULE[1]: c200 - c20069aa
(XEN) MODULE[2]: c000 - c200
(XEN) MODULE[3]:  - 
(XEN) MODULE[4]: c300 - c301
(XEN)  RESVD[0]: ba30 - bfd0
(XEN)  RESVD[1]: 9580 - 9590
(XEN)  RESVD[2]: 98a0 - 98b0
(XEN)  RESVD[3]: 95f0 - 98a0
(XEN)  RESVD[4]: 9590 - 95f0
(XEN)
(XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
(XEN) Placing Xen at 0xdfe0-0xe000
(XEN) Xen heap: d200-de00 (49152 pages)
(XEN) Dom heap: 344064 pages
(XEN) Domain heap initialised
(XEN) Looking for UART console serial0
 Xen 4.5-unstable
(XEN) Xen version 4.5-unstable (atseglytskyi@)
(arm-linux-gnueabihf-gcc (crosstool-NG
linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
20130328 (prerelease)) debu4
(XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
(XEN) Processor: 412fc0f2: ARM Limited, variant: 0x2, part 0xc0f, rev 0x2
(XEN) 32-bit Execution:
(XEN)   Processor Features: 1131:00011011
(XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
(XEN) Extensions: GenericTimer Security
(XEN)   Debug Features: 02010555
(XEN)   Auxiliary Features: 
(XEN)   Memory Model Features: 10201105 2000 0124 02102211
(XEN)  ISA Features: 02101110 13112111 21232041 2131 10011142 
(XEN) Platform: TI DRA7
(XEN) /psci method must be smc, but is: hvc
(XEN) Set AuxCoreBoot1 to dfe0004c (0020004c)
(XEN) Set AuxCoreBoot0 to 0x20
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
(XEN) Using generic timer at 6144 KHz
(XEN) GIC initialization:
(XEN) gic_dist_addr=48211000
(XEN) gic_cpu_addr=48212000
(XEN) gic_hyp_addr=48214000
(XEN) gic_vcpu_addr=48216000
(XEN) gic_maintenance_irq=25
(XEN) GIC: 192 lines, 2 cpus, secure (IID 043b).
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) I/O virtualisation disabled
(XEN) Allocated console ring of 16 KiB.
(XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
(XEN) Bringing up CPU1
- CPU 0001 booting -
- Xen starting in Hyp mode -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) CPU 1 booted.
(XEN) Brought up 2 CPUs
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Loading kernel from boot module 2
(XEN) Populate P2M 0xc800-0xd000 (1:1 mapping for dom0)
(XEN) Loading zImage from c040 to cfc0-cff50c48
(XEN) Loading dom0 DTB to 0xcfa0-0xcfa05ba8
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) *** Serial input - DOM0 (type 'CTRL-a' three times to switch
input to Xen)
(XEN) Freed 272kB init memory.
(XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
already pending in LR0
(XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
already pending in LR0
[0.00] /cpus/cpu@0 missing clock-frequency property
[0.00] /cpus/cpu@1 missing clock-frequency property
[0.140625] omap-gpmc omap-gpmc: failed to reserve memory
[0.187500] omap_l3_noc ocp.3: couldn't find resource 2
[0.273437] i2c i2c-1: of_i2c: invalid reg on
/ocp/i2c@48072000/camera_ov10635
[0.437500] ldo3: operation not allowed
[0.437500] omapdss HDMI error: can't set the voltage regulator
[0.468750] tfc_s9700 display0: tfc_s9700_probe probe
[0.468750] ov1063x 1-0030: No deserializer node found
[0.468750] ov1063x 1-0030: No serializer node found
[0.468750] ov1063x 1-0030: Failed writing register 0x0103!
[0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
[0.578125] ahci ahci.0.auto: can't get clock
[0.898437] ldc_module_init
[1.304687] Missing dual_emac_res_vlan in DT.
[1.304687] Using 1 as Reserved VLAN for 0 slave
[1.312500] Missing dual_emac_res_vlan in DT.
[1.320312] Using 2 as Reserved VLAN for 1 slave
[1.382812] Freeing init memory: 236K
sh: write error: No such device
Cannot identify '/dev/camera0': 2, No such file or directory
Parsing config from /xen/images/DomUAndroid.cfg
XSM Disabled: seclabel not supported
(XEN) do_physdev_op 16 cmd=13: not implemented yet
libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
dom1 access to irq 53: Function not

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-18 Thread Stefano Stabellini

Hello Andrii,
we are getting closer :-)

It would help if you post the output with GIC_DEBUG defined but without
the other change that fixes the issue.

I think the problem is probably due to software irqs.
You are getting too many

gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending

messages. That means you are loosing virtual SGIs (guest VCPU to guest
VCPU). It would be best to investigate why, especially if you get many
more of the same messages without the MAINTENANCE_IRQ change I
suggested.

This patch might also help understading the problem more:


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index b7516c0..5eaeca2 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
 list_for_each_entry_safe ( p, t, v-arch.vgic.lr_pending, lr_queue )
 {
 i = find_first_zero_bit(this_cpu(lr_mask), nr_lrs);
-if ( i = nr_lrs ) return;
+if ( i = nr_lrs )
+{
+gdprintk(XENLOG_DEBUG, LRs full, not injecting irq=%u into 
d%dv%d\n,
+p-irq, v-domain-domain_id, v-vcpu_id);
+continue;
+}
 
 spin_lock_irqsave(gic.lock, flags);
 gic_set_lr(i, p, GICH_LR_PENDING);




On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,
 
 No hangs with this change.
 Complete log is the following:
 
 U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
 DRA752 ES1.0
 ethaddr not set. Validating first E-fuse MAC
 cpsw
 - UART enabled -
 - CPU  booting -
 - Xen starting in Hyp mode -
 - Zero BSS -
 - Setting up control registers -
 - Turning on paging -
 - Ready -
 (XEN) Checking for initrd in /chosen
 (XEN) RAM: 8000 - 9fff
 (XEN) RAM: a000 - bfff
 (XEN) RAM: c000 - dfff
 (XEN)
 (XEN) MODULE[1]: c200 - c20069aa
 (XEN) MODULE[2]: c000 - c200
 (XEN) MODULE[3]:  - 
 (XEN) MODULE[4]: c300 - c301
 (XEN)  RESVD[0]: ba30 - bfd0
 (XEN)  RESVD[1]: 9580 - 9590
 (XEN)  RESVD[2]: 98a0 - 98b0
 (XEN)  RESVD[3]: 95f0 - 98a0
 (XEN)  RESVD[4]: 9590 - 95f0
 (XEN)
 (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
 dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
 (XEN) Placing Xen at 0xdfe0-0xe000
 (XEN) Xen heap: d200-de00 (49152 pages)
 (XEN) Dom heap: 344064 pages
 (XEN) Domain heap initialised
 (XEN) Looking for UART console serial0
  Xen 4.5-unstable
 (XEN) Xen version 4.5-unstable (atseglytskyi@)
 (arm-linux-gnueabihf-gcc (crosstool-NG
 linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
 20130328 (prerelease)) debu4
 (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
 (XEN) Processor: 412fc0f2: ARM Limited, variant: 0x2, part 0xc0f, rev 0x2
 (XEN) 32-bit Execution:
 (XEN)   Processor Features: 1131:00011011
 (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
 (XEN) Extensions: GenericTimer Security
 (XEN)   Debug Features: 02010555
 (XEN)   Auxiliary Features: 
 (XEN)   Memory Model Features: 10201105 2000 0124 02102211
 (XEN)  ISA Features: 02101110 13112111 21232041 2131 10011142 
 (XEN) Platform: TI DRA7
 (XEN) /psci method must be smc, but is: hvc
 (XEN) Set AuxCoreBoot1 to dfe0004c (0020004c)
 (XEN) Set AuxCoreBoot0 to 0x20
 (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
 (XEN) Using generic timer at 6144 KHz
 (XEN) GIC initialization:
 (XEN) gic_dist_addr=48211000
 (XEN) gic_cpu_addr=48212000
 (XEN) gic_hyp_addr=48214000
 (XEN) gic_vcpu_addr=48216000
 (XEN) gic_maintenance_irq=25
 (XEN) GIC: 192 lines, 2 cpus, secure (IID 043b).
 (XEN) Using scheduler: SMP Credit Scheduler (credit)
 (XEN) I/O virtualisation disabled
 (XEN) Allocated console ring of 16 KiB.
 (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
 (XEN) Bringing up CPU1
 - CPU 0001 booting -
 - Xen starting in Hyp mode -
 - Setting up control registers -
 - Turning on paging -
 - Ready -
 (XEN) CPU 1 booted.
 (XEN) Brought up 2 CPUs
 (XEN) *** LOADING DOMAIN 0 ***
 (XEN) Loading kernel from boot module 2
 (XEN) Populate P2M 0xc800-0xd000 (1:1 mapping for dom0)
 (XEN) Loading zImage from c040 to 
 cfc0-cff50c48
 (XEN) Loading dom0 DTB to 0xcfa0-0xcfa05ba8
 (XEN) Std. Loglevel: All
 (XEN) Guest Loglevel: All
 (XEN) *** Serial input - DOM0 (type 'CTRL-a' three times to switch
 input to Xen)
 (XEN) Freed 272kB init memory.
 (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
 already pending in LR0
 (XEN) gic.c:673:d0v0 trying to inject irq=2 into

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-17 Thread Andrii Tseglytskyi

Hi,

Issue occurs after the following commit:

commit 5495a512b63bad868c147198f7f049c2617d468c
Author: Stefano Stabellini stefano.stabell...@eu.citrix.com
Date:   Tue Jun 10 15:07:12 2014 +0100

xen/arm: support HW interrupts, do not request maintenance_interrupts

If the irq to be injected is an hardware irq (p-desc != NULL), set
GICH_LR_HW. Do not set GICH_LR_MAINTENANCE_IRQ.


I'm going to debug it deeply.
Stefano - may be you have a feeling what it can be ?

Regards,
Andrii


On Fri, Nov 14, 2014 at 6:40 PM, Andrii Tseglytskyi
andrii.tseglyts...@globallogic.com wrote:
 Hi Julien,

 I would be surprised that the next GIC series impact this code as the
 next driver is only compiled for arm64 (GICv3 doesn't exist on arm32).
 Though, there was some refactoring.

 I meant that code was divided for generic GIC and GICv2 together with
 refactoring. Also in mails I saw that it was initially tested without
 SMP.
 GICv3 has no impacts for sure.


 The interrupt management has also been reworked for Xen 4.5 to avoid
 maintenance interrupt. I would give a look on this part.

 Thanks, this may help.

 Regards,
 Andrii



 Regards,

 --
 Julien Grall



 --

 Andrii Tseglytskyi | Embedded Dev
 GlobalLogic
 www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-17 Thread Stefano Stabellini

On Mon, 17 Nov 2014, Andrii Tseglytskyi wrote:
 Hi Stefano,
 
 Thank you for your answer.
 
 On Mon, Nov 17, 2014 at 6:39 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  Although it is possible that that patch is the cause of your problem,
  unfortunately it is part of a significant rework of the GIC driver in
  Xen and I am afraid that testing with only a portion of that patch
  series might introduce other subtle bugs.  For your reference the series
  starts at commit 6f91502be64a05d0635454d629118b96ae38b50f and ends at
  commit 72eaf29e8d70784aaf066ead79df1295a25ecfd0.
 
 
 Yes, I tested with and without the whole series.

And the result is that the series causes the problem?


  If 5495a512b63bad868c147198f7f049c2617d468c is really the cause of your
  problem, one idea that comes to mind is that GICH_LR_MAINTENANCE_IRQ
  might not work correctly on your platform. It wouldn't be the first time
  that we see hardware behaving that way, especially if you are using the
  GIC secure registers instead of the non-secure register as GICH_LRn.HW
  can only deactivate non-secure interrupts. This is usually due to a
  configuration error in u-boot.
 
  Could you please try to set PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI for your
  platform?
 
 
 I tried this. Unfortunately it doesn't help.

Could you try the following patch on top of
5495a512b63bad868c147198f7f049c2617d468c ?

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 302c031..a286376 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -557,10 +557,8 @@ static inline void gic_set_lr(int lr, struct pending_irq 
*p,
 BUG_ON(lr  0);
 BUG_ON(state  ~(GICH_LR_STATE_MASKGICH_LR_STATE_SHIFT));
 
-lr_val = state | ((p-priority  3)  GICH_LR_PRIORITY_SHIFT) |
+lr_val = state | GICH_LR_MAINTENANCE_IRQ | ((p-priority  3)  
GICH_LR_PRIORITY_SHIFT) |
 ((p-irq  GICH_LR_VIRTUAL_MASK)  GICH_LR_VIRTUAL_SHIFT);
-if ( p-desc != NULL )
-lr_val |= GICH_LR_HW | (p-desc-irq  GICH_LR_PHYSICAL_SHIFT);
 
 GICH[GICH_LR + lr] = lr_val;
 
@@ -622,6 +620,12 @@ out:
 return;
 }
 
+static void gic_irq_eoi(void *info)
+{
+int virq = (uintptr_t) info;
+GICC[GICC_DIR] = virq;
+}
+
 static void gic_update_one_lr(struct vcpu *v, int i)
 {
 struct pending_irq *p;
@@ -639,7 +643,10 @@ static void gic_update_one_lr(struct vcpu *v, int i)
 irq = (lr  GICH_LR_VIRTUAL_SHIFT)  GICH_LR_VIRTUAL_MASK;
 p = irq_to_pending(v, irq);
 if ( p-desc != NULL )
+{
+gic_irq_eoi((void*)(uintptr_t)irq);
 p-desc-status = ~IRQ_INPROGRESS;
+}
 clear_bit(GIC_IRQ_GUEST_VISIBLE, p-status);
 if ( test_bit(GIC_IRQ_GUEST_PENDING, p-status) 
 test_bit(GIC_IRQ_GUEST_ENABLED, p-status))

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-14 Thread Stefano Stabellini

On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
 On Fri, Nov 14, 2014 at 4:35 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
  Hi,
 
  I observe system freeze on latest xen/master branch.
 
  My setup is:
 
  - Jacinto 6 evm board (OMAP5)
  - Latest Xen 4.5.0-rc2 as hypervisor
  - Linux 3.8 as dom0, running on 2 vcpus
  - Android 4.3 as domU (running on Linux kernel 3.8, 2 vcpus)
  - XSM feature is disabled
  - gcc version 4.7.3 20130328 (prerelease) (crosstool-NG
  linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) as cross
  compiler
 
  Freeze occurs in random moment of time during creation of domU domain.
  Even Xen console may be not available after freeze.
  Can someone suggest - what it can be? Maybe some weak places in new
  code? Maybe new gic, which was reworked a lot or something else?
 
  Thank you in advance for any suggestions.
 
  Is this really 3.8 or 3.18?
 
 We have 3.8 in both dom0 and domU
 
  3.8 is pretty old and doesn't have any of
  the fixes to be able to safely do dma involving guest pages to
  non-coherent devices.
 
 This is a good point. Now we are migrating to 3.12 kernel in dom0. But
 Android will remain on 3.8. Will it help ?
 Maybe you can point me to any tree with proper DMA fixes? Note: if you
 are talking about SWIOTLB - we have your latest one, retrieved from
 git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git
 branch:swiotlb-xen-9.1

The last and most stable series is:

http://marc.info/?l=linux-kernelm=141579241729749w=2

But thinking more about this, I doubt that it is a dma problem, because
you would most probably see various kind of error messages, not a
freeze.


  Where are you storing the guest disk images?
 
 SATA drive, dedicated to dom0, its controller has its own DMA

Are they on file or on lvm volumes?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-14 Thread Andrii Tseglytskyi

On Fri, Nov 14, 2014 at 5:22 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
 On Fri, Nov 14, 2014 at 4:35 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
  Hi,
 
  I observe system freeze on latest xen/master branch.
 
  My setup is:
 
  - Jacinto 6 evm board (OMAP5)
  - Latest Xen 4.5.0-rc2 as hypervisor
  - Linux 3.8 as dom0, running on 2 vcpus
  - Android 4.3 as domU (running on Linux kernel 3.8, 2 vcpus)
  - XSM feature is disabled
  - gcc version 4.7.3 20130328 (prerelease) (crosstool-NG
  linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) as cross
  compiler
 
  Freeze occurs in random moment of time during creation of domU domain.
  Even Xen console may be not available after freeze.
  Can someone suggest - what it can be? Maybe some weak places in new
  code? Maybe new gic, which was reworked a lot or something else?
 
  Thank you in advance for any suggestions.
 
  Is this really 3.8 or 3.18?

 We have 3.8 in both dom0 and domU

  3.8 is pretty old and doesn't have any of
  the fixes to be able to safely do dma involving guest pages to
  non-coherent devices.

 This is a good point. Now we are migrating to 3.12 kernel in dom0. But
 Android will remain on 3.8. Will it help ?
 Maybe you can point me to any tree with proper DMA fixes? Note: if you
 are talking about SWIOTLB - we have your latest one, retrieved from
 git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git
 branch:swiotlb-xen-9.1

 The last and most stable series is:

 http://marc.info/?l=linux-kernelm=141579241729749w=2


Thanks  - I'll try this series anyway.

 But thinking more about this, I doubt that it is a dma problem, because
 you would most probably see various kind of error messages, not a
 freeze.


Agree.


  Where are you storing the guest disk images?

 SATA drive, dedicated to dom0, its controller has its own DMA

 Are they on file or on lvm volumes?

Images are on file.

Also note - freeze depends on system load. It reproduces more
frequently if I start Android + QNX + all frontends/backends drivers.
Starting Android only without any addition drivers works more less
stable. It looks like issue is reproduced when domU starts in parallel
with backends drivers in dom0.
But the same works fine with old Xen 4.4.

Regards,
Andrii


-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-14 Thread Stefano Stabellini

On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
 On Fri, Nov 14, 2014 at 5:22 PM, Stefano Stabellini
 stefano.stabell...@eu.citrix.com wrote:
  On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
  On Fri, Nov 14, 2014 at 4:35 PM, Stefano Stabellini
  stefano.stabell...@eu.citrix.com wrote:
   On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
   Hi,
  
   I observe system freeze on latest xen/master branch.
  
   My setup is:
  
   - Jacinto 6 evm board (OMAP5)
   - Latest Xen 4.5.0-rc2 as hypervisor
   - Linux 3.8 as dom0, running on 2 vcpus
   - Android 4.3 as domU (running on Linux kernel 3.8, 2 vcpus)
   - XSM feature is disabled
   - gcc version 4.7.3 20130328 (prerelease) (crosstool-NG
   linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) as cross
   compiler
  
   Freeze occurs in random moment of time during creation of domU domain.
   Even Xen console may be not available after freeze.
   Can someone suggest - what it can be? Maybe some weak places in new
   code? Maybe new gic, which was reworked a lot or something else?
  
   Thank you in advance for any suggestions.
  
   Is this really 3.8 or 3.18?
 
  We have 3.8 in both dom0 and domU
 
   3.8 is pretty old and doesn't have any of
   the fixes to be able to safely do dma involving guest pages to
   non-coherent devices.
 
  This is a good point. Now we are migrating to 3.12 kernel in dom0. But
  Android will remain on 3.8. Will it help ?
  Maybe you can point me to any tree with proper DMA fixes? Note: if you
  are talking about SWIOTLB - we have your latest one, retrieved from
  git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git
  branch:swiotlb-xen-9.1
 
  The last and most stable series is:
 
  http://marc.info/?l=linux-kernelm=141579241729749w=2
 
 
 Thanks  - I'll try this series anyway.
 
  But thinking more about this, I doubt that it is a dma problem, because
  you would most probably see various kind of error messages, not a
  freeze.
 
 
 Agree.
 
 
   Where are you storing the guest disk images?
 
  SATA drive, dedicated to dom0, its controller has its own DMA
 
  Are they on file or on lvm volumes?
 
 Images are on file.
 
 Also note - freeze depends on system load. It reproduces more
 frequently if I start Android + QNX + all frontends/backends drivers.
 Starting Android only without any addition drivers works more less
 stable. It looks like issue is reproduced when domU starts in parallel
 with backends drivers in dom0.
 But the same works fine with old Xen 4.4.

In my experience freezes like the one you are describing are due to
interrupt related bugs or deadlocks. Both of them are hard to track
down. If you can reproduce it reliably maybe you could bisect it.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-14 Thread Andrii Tseglytskyi

Hi Julien,

On Fri, Nov 14, 2014 at 5:49 PM, Julien Grall julien.gr...@linaro.org wrote:
 Hi Andrii,

 On 11/14/2014 03:39 PM, Andrii Tseglytskyi wrote:
 Also note - freeze depends on system load. It reproduces more
 frequently if I start Android + QNX + all frontends/backends drivers.
 Starting Android only without any addition drivers works more less
 stable. It looks like issue is reproduced when domU starts in parallel
 with backends drivers in dom0.
 But the same works fine with old Xen 4.4.

 To be sure, when you say xen/master is it a vanilla Xen? Or do you
 have patches on top of it?


This is my work tree, I have some local patches, specific to our system:

579f19c (HEAD, dev_xen_4.5_rc2_04) xsm: arm: allow dom0 to use send
call on during event_channel creation -- XSM is currently disabled,
this one has no effect
0f1bd43 kbdif: add raw events passing
b7289a0 pvfb: add release event
bd979de xen/tools: Fix virtual disks helper scripts.
81d2f11 libxl: skip memory finalize if appended DTB found -- we have
device tree attached to domU zImage, we skip initializetion of domU
DTB
6c7f2ae libxc: skip constructing DTB during zImage loading -- we have
device tree attached to domU zImage, we skip initializetion of domU
DTB
2b4ba6c libxl: add ability to skip constructing DTB -- we have device
tree attached to domU zImage, we skip initializetion of domU DTB
bd226aa Revert tools: arm: remove code to check for a DTB appended to
the kernel -- we have device tree attached to domU zImage, we skip
initializetion of domU DTB
e445c33 arm: decrease size of RAM memory for arm guest  -- We have
memory mapped registers starting from 0x4000, so I moved rambase
to 0x8000
3a00dd2 flask/policy: allow domU to use previously-mapped I/O-memory
-- XSM is currently disabled, this one has no effect
0fd131ac2 fix commit xen/arm: Add support for GICv3 for domU
cacfcc5 (tag: 4.5.0-rc2) Xen 4.5.0-rc2: Update tag for QEMU upstream tree
e6fa63d (xen_baseline/master) pvgrub: ignore NUL
fda1614 xen/arm: Add support for GICv3 for domU


 Also, what are the frontends/backends?

We have some userspace backend drivers - audio, framebuffer, event device, etc

I may send a tarball with all my local patches if needed.

Regards,
Andrii


 Regards,

 --
 Julien Grall



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-14 Thread Andrii Tseglytskyi

Hi Julien,

 I would be surprised that the next GIC series impact this code as the
 next driver is only compiled for arm64 (GICv3 doesn't exist on arm32).
 Though, there was some refactoring.

I meant that code was divided for generic GIC and GICv2 together with
refactoring. Also in mails I saw that it was initially tested without
SMP.
GICv3 has no impacts for sure.


 The interrupt management has also been reworked for Xen 4.5 to avoid
 maintenance interrupt. I would give a look on this part.

Thanks, this may help.

Regards,
Andrii



 Regards,

 --
 Julien Grall



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

41 matches

Mail list logo