Re: next-20130709 DMAR issues

2013-07-16 Thread Valdis . Kletnieks
On Mon, 15 Jul 2013 10:32:17 +0800, "Li, Zhen-Hua" said:
> I have met a bug with the same error message, its cause was that the bios
> did not allocate RMRR/DRHD(can't remember which one) for the device.

I think I posted a link to that same bug report.  The problem is that
if the BIOS wasn't allocating it before, it isn't allocating it now,
because it's still the same A11 bios that Dell shipped it with.  And now
I'm not sure which is more confusing - that it was OK in -0703 and borked
in -0709, or that between two successive boots of the same -0709 kernel
it cleared itself up


pgpvubatb38Ku.pgp
Description: PGP signature


Re: next-20130709 DMAR issues

2013-07-16 Thread Valdis . Kletnieks
On Fri, 12 Jul 2013 14:14:20 +0200, Ingo Molnar said:
>
> (Cc:-ed a few DMAR people.)

Sorry for the slow reply, missed this in the lkml firehose.

For whatever reason, the damned problem seems to have evaporated:

% egrep -i 'dmar|Linux vers' /var/log/messages-20130714

Jul 11 18:54:15 turing-police kernel: [0.00] Linux version 
3.10.0-next-20130709 (val...@turing-police.cc.vt.edu) (gcc version 4.8.1 
20130612 (Red Hat 4.8.1-2) (GCC) ) #100 SMP PREEMPT Tue Jul 9 16:01:59 EDT 2013
Jul 11 18:54:15 turing-police kernel: [0.00] ACPI: DMAR 
cb7fd298 00080 (v01 INTEL  SNB  0001 INTL 0001)
Jul 11 18:54:15 turing-police kernel: [0.021632] dmar: Host address width 36
Jul 11 18:54:15 turing-police kernel: [0.021638] dmar: DRHD base: 
0x00fed9 flags: 0x1
Jul 11 18:54:15 turing-police kernel: [0.021669] dmar: IOMMU 0: 
reg_base_addr fed9 ver 1:0 cap c9008020660262 ecap f0105a
Jul 11 18:54:15 turing-police kernel: [0.021677] dmar: RMRR base: 
0x00ce761000 end: 0x00ce780fff
Jul 11 18:54:15 turing-police kernel: [0.021775] dmar: DRHD: handling fault 
status reg 2
Jul 11 18:54:15 turing-police kernel: [0.021782] dmar: DMAR:[DMA Read] 
Request device [00:1f.2] fault addr ce71d000
Jul 11 18:54:15 turing-police kernel: [0.021782] DMAR:[fault reason 06] PTE 
Read access is not set
Jul 11 18:54:15 turing-police kernel: [1.002171] DMAR: No ATSR found
Jul 11 22:12:39 turing-police kernel: [0.00] Linux version 
3.10.0-next-20130709 (val...@turing-police.cc.vt.edu) (gcc version 4.8.1 
20130612 (Red Hat 4.8.1-2) (GCC) ) #100 SMP PREEMPT Tue Jul 9 16:01:59 EDT 2013
Jul 11 22:12:39 turing-police kernel: [0.00] ACPI: DMAR 
cb7fd298 00080 (v01 INTEL  SNB  0001 INTL 0001)
Jul 11 22:12:39 turing-police kernel: [0.021376] dmar: Host address width 36
Jul 11 22:12:39 turing-police kernel: [0.021382] dmar: DRHD base: 
0x00fed9 flags: 0x1
Jul 11 22:12:39 turing-police kernel: [0.021414] dmar: IOMMU 0: 
reg_base_addr fed9 ver 1:0 cap c9008020660262 ecap f0105a
Jul 11 22:12:39 turing-police kernel: [0.021422] dmar: RMRR base: 
0x00ce761000 end: 0x00ce780fff
Jul 11 22:12:39 turing-police kernel: [1.034107] DMAR: No ATSR found

Damned if I know what changed - same kernel booted at 18:54 hit the issue, but
at 22:12 it had gone into hiding, and I haven't seen it since.  (And the
laptop gets booted twice a day most days - once at arrival at office, and
once when I get home, and it had been doing it consistently at both locations
for several days.)

Definitely well into "things that go bump in the night" category.


pgprsmtMiIPh5.pgp
Description: PGP signature


Re: next-20130709 DMAR issues

2013-07-16 Thread Valdis . Kletnieks
On Fri, 12 Jul 2013 14:14:20 +0200, Ingo Molnar said:

 (Cc:-ed a few DMAR people.)

Sorry for the slow reply, missed this in the lkml firehose.

For whatever reason, the damned problem seems to have evaporated:

% egrep -i 'dmar|Linux vers' /var/log/messages-20130714

Jul 11 18:54:15 turing-police kernel: [0.00] Linux version 
3.10.0-next-20130709 (val...@turing-police.cc.vt.edu) (gcc version 4.8.1 
20130612 (Red Hat 4.8.1-2) (GCC) ) #100 SMP PREEMPT Tue Jul 9 16:01:59 EDT 2013
Jul 11 18:54:15 turing-police kernel: [0.00] ACPI: DMAR 
cb7fd298 00080 (v01 INTEL  SNB  0001 INTL 0001)
Jul 11 18:54:15 turing-police kernel: [0.021632] dmar: Host address width 36
Jul 11 18:54:15 turing-police kernel: [0.021638] dmar: DRHD base: 
0x00fed9 flags: 0x1
Jul 11 18:54:15 turing-police kernel: [0.021669] dmar: IOMMU 0: 
reg_base_addr fed9 ver 1:0 cap c9008020660262 ecap f0105a
Jul 11 18:54:15 turing-police kernel: [0.021677] dmar: RMRR base: 
0x00ce761000 end: 0x00ce780fff
Jul 11 18:54:15 turing-police kernel: [0.021775] dmar: DRHD: handling fault 
status reg 2
Jul 11 18:54:15 turing-police kernel: [0.021782] dmar: DMAR:[DMA Read] 
Request device [00:1f.2] fault addr ce71d000
Jul 11 18:54:15 turing-police kernel: [0.021782] DMAR:[fault reason 06] PTE 
Read access is not set
Jul 11 18:54:15 turing-police kernel: [1.002171] DMAR: No ATSR found
Jul 11 22:12:39 turing-police kernel: [0.00] Linux version 
3.10.0-next-20130709 (val...@turing-police.cc.vt.edu) (gcc version 4.8.1 
20130612 (Red Hat 4.8.1-2) (GCC) ) #100 SMP PREEMPT Tue Jul 9 16:01:59 EDT 2013
Jul 11 22:12:39 turing-police kernel: [0.00] ACPI: DMAR 
cb7fd298 00080 (v01 INTEL  SNB  0001 INTL 0001)
Jul 11 22:12:39 turing-police kernel: [0.021376] dmar: Host address width 36
Jul 11 22:12:39 turing-police kernel: [0.021382] dmar: DRHD base: 
0x00fed9 flags: 0x1
Jul 11 22:12:39 turing-police kernel: [0.021414] dmar: IOMMU 0: 
reg_base_addr fed9 ver 1:0 cap c9008020660262 ecap f0105a
Jul 11 22:12:39 turing-police kernel: [0.021422] dmar: RMRR base: 
0x00ce761000 end: 0x00ce780fff
Jul 11 22:12:39 turing-police kernel: [1.034107] DMAR: No ATSR found

Damned if I know what changed - same kernel booted at 18:54 hit the issue, but
at 22:12 it had gone into hiding, and I haven't seen it since.  (And the
laptop gets booted twice a day most days - once at arrival at office, and
once when I get home, and it had been doing it consistently at both locations
for several days.)

Definitely well into things that go bump in the night category.


pgprsmtMiIPh5.pgp
Description: PGP signature


Re: next-20130709 DMAR issues

2013-07-16 Thread Valdis . Kletnieks
On Mon, 15 Jul 2013 10:32:17 +0800, Li, Zhen-Hua said:
 I have met a bug with the same error message, its cause was that the bios
 did not allocate RMRR/DRHD(can't remember which one) for the device.

I think I posted a link to that same bug report.  The problem is that
if the BIOS wasn't allocating it before, it isn't allocating it now,
because it's still the same A11 bios that Dell shipped it with.  And now
I'm not sure which is more confusing - that it was OK in -0703 and borked
in -0709, or that between two successive boots of the same -0709 kernel
it cleared itself up


pgpvubatb38Ku.pgp
Description: PGP signature


Re: next-20130709 DMAR issues

2013-07-14 Thread Li, Zhen-Hua

I have met a bug with the same error message, its cause was that the bios
did not allocate RMRR/DRHD(can't remember which one) for the device.

Thanks
ZhenHua

On 07/12/2013 10:31 PM, Joerg Roedel wrote:

Thanks for the heads-up, Ingo.

On Fri, Jul 12, 2013 at 02:14:20PM +0200, Ingo Molnar wrote:

Jul 10 12:20:19 turing-police kernel: [0.021583] DMAR:[fault reason 06] PTE 
Read access is not set

Now I have 3 extra messages talking about handling a fault status.  lspci says 
00:1f.2 is:

00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA 
Controller [AHCI mode] (rev 04)

This could be caused by some change in the SATA driver stack. Maybe a
DMA buffer is used after unmap or something.



If this doesn't ring any bells, I'l go do the bisect thing...

Yes, a bisection would help here, thanks for doing this.


Joerg


.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: next-20130709 DMAR issues

2013-07-14 Thread Li, Zhen-Hua

I have met a bug with the same error message, its cause was that the bios
did not allocate RMRR/DRHD(can't remember which one) for the device.

Thanks
ZhenHua

On 07/12/2013 10:31 PM, Joerg Roedel wrote:

Thanks for the heads-up, Ingo.

On Fri, Jul 12, 2013 at 02:14:20PM +0200, Ingo Molnar wrote:

Jul 10 12:20:19 turing-police kernel: [0.021583] DMAR:[fault reason 06] PTE 
Read access is not set

Now I have 3 extra messages talking about handling a fault status.  lspci says 
00:1f.2 is:

00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA 
Controller [AHCI mode] (rev 04)

This could be caused by some change in the SATA driver stack. Maybe a
DMA buffer is used after unmap or something.



If this doesn't ring any bells, I'l go do the bisect thing...

Yes, a bisection would help here, thanks for doing this.


Joerg


.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: next-20130709 DMAR issues

2013-07-12 Thread Joerg Roedel
Thanks for the heads-up, Ingo.

On Fri, Jul 12, 2013 at 02:14:20PM +0200, Ingo Molnar wrote:
> > Jul 10 12:20:19 turing-police kernel: [0.021583] DMAR:[fault reason 06] 
> > PTE Read access is not set
> > 
> > Now I have 3 extra messages talking about handling a fault status.  lspci 
> > says 00:1f.2 is:
> > 
> > 00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port 
> > SATA Controller [AHCI mode] (rev 04)

This could be caused by some change in the SATA driver stack. Maybe a
DMA buffer is used after unmap or something.


> > If this doesn't ring any bells, I'l go do the bisect thing...

Yes, a bisection would help here, thanks for doing this.


Joerg


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: next-20130709 DMAR issues

2013-07-12 Thread Ingo Molnar

(Cc:-ed a few DMAR people.)

* Valdis Kletnieks  wrote:

> Dell Latitude E6530.
> 
> Seeing a new error message crop up in next-0709 that wasn't there with 0703.
> Particularly odd, last person to touch dmar.c was on 05/20/2013, so no smoking
> guns there...
> 
> egrep -i 'dmar|linux ver' /var/log/messages  gives me:
> 
> Jul  9 21:47:15 turing-police kernel: [0.00] Linux version 
> 3.10.0-next-20130703 (val...@turing-police.cc.vt.edu) (gcc version 4.8.1 
> 20130612 (Red Hat 4.8.1-2) (GCC) ) #99 SMP PREEMPT Wed Jul 3 17:40:09 EDT 2013
> Jul  9 21:47:15 turing-police kernel: [0.00] ACPI: DMAR 
> cb7fd298 00080 (v01 INTEL  SNB  0001 INTL 0001)
> Jul  9 21:47:15 turing-police kernel: [0.021530] dmar: Host address width 
> 36
> Jul  9 21:47:15 turing-police kernel: [0.021536] dmar: DRHD base: 
> 0x00fed9 flags: 0x1
> Jul  9 21:47:15 turing-police kernel: [0.021569] dmar: IOMMU 0: 
> reg_base_addr fed9 ver 1:0 cap c9008020660262 ecap f0105a
> Jul  9 21:47:15 turing-police kernel: [0.021576] dmar: RMRR base: 
> 0x00ce761000 end: 0x00ce780fff
> Jul  9 21:47:15 turing-police kernel: [1.023235] DMAR: No ATSR found
> 
> That's what it usually says.  But now I have:
> 
> Jul 10 12:20:19 turing-police kernel: [0.00] Linux version 
> 3.10.0-next-20130709 (val...@turing-police.cc.vt.edu) (gcc version 4.8.1 
> 20130612 (Red Hat 4.8.1-2) (GCC) ) #100 SMP PREEMPT Tue Jul 9 16:01:59 EDT 
> 2013
> Jul 10 12:20:19 turing-police kernel: [0.00] ACPI: DMAR 
> cb7fd298 00080 (v01 INTEL  SNB  0001 INTL 0001)
> Jul 10 12:20:19 turing-police kernel: [0.021453] dmar: Host address width 
> 36
> Jul 10 12:20:19 turing-police kernel: [0.021456] dmar: DRHD base: 
> 0x00fed9 flags: 0x1
> Jul 10 12:20:19 turing-police kernel: [0.021485] dmar: IOMMU 0: 
> reg_base_addr fed9 ver 1:0 cap c9008020660262 ecap f0105a
> Jul 10 12:20:19 turing-police kernel: [0.021487] dmar: RMRR base: 
> 0x00ce761000 end: 0x00ce780fff
> Jul 10 12:20:19 turing-police kernel: [0.021575] dmar: DRHD: handling 
> fault status reg 2
> Jul 10 12:20:19 turing-police kernel: [0.021583] dmar: DMAR:[DMA Read] 
> Request device [00:1f.2] fault addr ce71d000
> Jul 10 12:20:19 turing-police kernel: [0.021583] DMAR:[fault reason 06] 
> PTE Read access is not set
> Jul 10 12:20:19 turing-police kernel: [1.034643] DMAR: No ATSR found
> 
> Now I have 3 extra messages talking about handling a fault status.  lspci 
> says 00:1f.2 is:
> 
> 00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port 
> SATA Controller [AHCI mode] (rev 04)
> 
> I found similar in a thread here: 
> http://lkml.indiana.edu/hypermail/linux/kernel/1212.1/02319.html
> which ended with Don Dutile saying:
> 
> "DMAR table does not have an entry for this device to this region.
> Once the driver reconfigs/resets the device to stop polling bios-boot
> cmd rings and use (new) OS (dma-mapped) rings, there's a period of time
> during this transition that the hw is babbling away to an area that is no
> longer mapped."
> 
> But I'm not convinced this is the same issue - why did it change between 0703 
> and 0709,
> when I haven't updated the firmware. No relevant .config changes between the 
> two, either.
> 
> If this doesn't ring any bells, I'l go do the bisect thing...


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: next-20130709 DMAR issues

2013-07-12 Thread Ingo Molnar

(Cc:-ed a few DMAR people.)

* Valdis Kletnieks valdis.kletni...@vt.edu wrote:

 Dell Latitude E6530.
 
 Seeing a new error message crop up in next-0709 that wasn't there with 0703.
 Particularly odd, last person to touch dmar.c was on 05/20/2013, so no smoking
 guns there...
 
 egrep -i 'dmar|linux ver' /var/log/messages  gives me:
 
 Jul  9 21:47:15 turing-police kernel: [0.00] Linux version 
 3.10.0-next-20130703 (val...@turing-police.cc.vt.edu) (gcc version 4.8.1 
 20130612 (Red Hat 4.8.1-2) (GCC) ) #99 SMP PREEMPT Wed Jul 3 17:40:09 EDT 2013
 Jul  9 21:47:15 turing-police kernel: [0.00] ACPI: DMAR 
 cb7fd298 00080 (v01 INTEL  SNB  0001 INTL 0001)
 Jul  9 21:47:15 turing-police kernel: [0.021530] dmar: Host address width 
 36
 Jul  9 21:47:15 turing-police kernel: [0.021536] dmar: DRHD base: 
 0x00fed9 flags: 0x1
 Jul  9 21:47:15 turing-police kernel: [0.021569] dmar: IOMMU 0: 
 reg_base_addr fed9 ver 1:0 cap c9008020660262 ecap f0105a
 Jul  9 21:47:15 turing-police kernel: [0.021576] dmar: RMRR base: 
 0x00ce761000 end: 0x00ce780fff
 Jul  9 21:47:15 turing-police kernel: [1.023235] DMAR: No ATSR found
 
 That's what it usually says.  But now I have:
 
 Jul 10 12:20:19 turing-police kernel: [0.00] Linux version 
 3.10.0-next-20130709 (val...@turing-police.cc.vt.edu) (gcc version 4.8.1 
 20130612 (Red Hat 4.8.1-2) (GCC) ) #100 SMP PREEMPT Tue Jul 9 16:01:59 EDT 
 2013
 Jul 10 12:20:19 turing-police kernel: [0.00] ACPI: DMAR 
 cb7fd298 00080 (v01 INTEL  SNB  0001 INTL 0001)
 Jul 10 12:20:19 turing-police kernel: [0.021453] dmar: Host address width 
 36
 Jul 10 12:20:19 turing-police kernel: [0.021456] dmar: DRHD base: 
 0x00fed9 flags: 0x1
 Jul 10 12:20:19 turing-police kernel: [0.021485] dmar: IOMMU 0: 
 reg_base_addr fed9 ver 1:0 cap c9008020660262 ecap f0105a
 Jul 10 12:20:19 turing-police kernel: [0.021487] dmar: RMRR base: 
 0x00ce761000 end: 0x00ce780fff
 Jul 10 12:20:19 turing-police kernel: [0.021575] dmar: DRHD: handling 
 fault status reg 2
 Jul 10 12:20:19 turing-police kernel: [0.021583] dmar: DMAR:[DMA Read] 
 Request device [00:1f.2] fault addr ce71d000
 Jul 10 12:20:19 turing-police kernel: [0.021583] DMAR:[fault reason 06] 
 PTE Read access is not set
 Jul 10 12:20:19 turing-police kernel: [1.034643] DMAR: No ATSR found
 
 Now I have 3 extra messages talking about handling a fault status.  lspci 
 says 00:1f.2 is:
 
 00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port 
 SATA Controller [AHCI mode] (rev 04)
 
 I found similar in a thread here: 
 http://lkml.indiana.edu/hypermail/linux/kernel/1212.1/02319.html
 which ended with Don Dutile saying:
 
 DMAR table does not have an entry for this device to this region.
 Once the driver reconfigs/resets the device to stop polling bios-boot
 cmd rings and use (new) OS (dma-mapped) rings, there's a period of time
 during this transition that the hw is babbling away to an area that is no
 longer mapped.
 
 But I'm not convinced this is the same issue - why did it change between 0703 
 and 0709,
 when I haven't updated the firmware. No relevant .config changes between the 
 two, either.
 
 If this doesn't ring any bells, I'l go do the bisect thing...


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: next-20130709 DMAR issues

2013-07-12 Thread Joerg Roedel
Thanks for the heads-up, Ingo.

On Fri, Jul 12, 2013 at 02:14:20PM +0200, Ingo Molnar wrote:
  Jul 10 12:20:19 turing-police kernel: [0.021583] DMAR:[fault reason 06] 
  PTE Read access is not set
  
  Now I have 3 extra messages talking about handling a fault status.  lspci 
  says 00:1f.2 is:
  
  00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port 
  SATA Controller [AHCI mode] (rev 04)

This could be caused by some change in the SATA driver stack. Maybe a
DMA buffer is used after unmap or something.


  If this doesn't ring any bells, I'l go do the bisect thing...

Yes, a bisection would help here, thanks for doing this.


Joerg


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/