Re: [PATCH v27 1/9] memblock: add memblock_cap_memory_range()

2016-11-16 Thread AKASHI Takahiro
Will,

On Wed, Nov 16, 2016 at 04:30:15PM +, Will Deacon wrote:
> Hi Akashi,
> 
> On Mon, Nov 14, 2016 at 02:55:16PM +0900, AKASHI Takahiro wrote:
> > On Fri, Nov 11, 2016 at 11:19:04AM +0800, Dennis Chen wrote:
> > > On Fri, Nov 11, 2016 at 11:50:50AM +0900, AKASHI Takahiro wrote:
> > > > On Thu, Nov 10, 2016 at 05:27:20PM +, Will Deacon wrote:
> > > > > On Wed, Nov 02, 2016 at 01:51:53PM +0900, AKASHI Takahiro wrote:
> > > > > > +void __init memblock_cap_memory_range(phys_addr_t base, 
> > > > > > phys_addr_t size)
> > > > > > +{
> > > > > > +   int start_rgn, end_rgn;
> > > > > > +   int i, ret;
> > > > > > +
> > > > > > +   if (!size)
> > > > > > +   return;
> > > > > > +
> > > > > > +   ret = memblock_isolate_range(, base, size,
> > > > > > +   _rgn, _rgn);
> > > > > > +   if (ret)
> > > > > > +   return;
> > > > > > +
> > > > > > +   /* remove all the MAP regions */
> > > > > > +   for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
> > > > > > +   if (!memblock_is_nomap([i]))
> > > > > > +   memblock_remove_region(, i);
> > > > > > +
> > > > > > +   for (i = start_rgn - 1; i >= 0; i--)
> > > > > > +   if (!memblock_is_nomap([i]))
> > > > > > +   memblock_remove_region(, i);
> > > > > > +
> > > > > > +   /* truncate the reserved regions */
> > > > > > +   memblock_remove_range(, 0, base);
> > > > > > +   memblock_remove_range(,
> > > > > > +   base + size, (phys_addr_t)ULLONG_MAX);
> > > > > > +}
> > > > > 
> > > > > This duplicates a bunch of the logic in 
> > > > > memblock_mem_limit_remove_map. Can
> > > > > you not implement that in terms of your new, more general, function? 
> > > > > e.g.
> > > > > by passing base == 0, and size == limit?
> > > > 
> > > > Obviously it's possible.
> > > > I actually talked to Dennis before about merging them,
> > > > but he was against my idea.
> > > >
> > > Oops! I thought we have reached agreement in the 
> > > thread:http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/442817.html
> > > So feel free to do that as Will'll do
> > 
> > OK, but I found that the two functions have a bit different semantics
> > in clipping memory range, in particular, when the range [base,base+size)
> > goes across several regions with a gap.
> > (This does not happen in my arm64 kdump, though.)
> > That is, 'limit' in memblock_mem_limit_remove_map() means total size of
> > available memory, while 'size' in memblock_cap_memory_range() indicates
> > the size of _continuous_ memory range.
> 
> I thought limit was just a physical address, and then

No, it's not.

> memblock_mem_limit_remove_map operated on the end of the nearest memblock?

No, but "max_addr" returned by __find_max_addr() is a physical address
and the end address of memory of "limit" size in total.

> You could leave the __find_max_addr call in memblock_mem_limit_remove_map,
> given that I don't think you need/want it for memblock_cap_memory_range.
> 
> > So I added an extra argument, exact, to a common function to specify
> > distinct behaviors. Confusing? Please see the patch below.
> 
> Oh yikes, this certainly wasn't what I had in mind! My observation was
> just that memblock_mem_limit_remove_map(limit) does:
> 
> 
>   1. memblock_isolate_range(limit - limit+ULLONG_MAX)
>   2. memblock_remove_region(all non-nomap regions in the isolated region)
>   3. truncate reserved regions to limit
> 
> and your memblock_cap_memory_range(base, size) does:
> 
>   1. memblock_isolate_range(base - base+size)
>   2, memblock_remove_region(all non-nomap regions above and below the
>  isolated region)
>   3. truncate reserved regions around the isolated region
> 
> so, assuming we can invert the isolation in one of the cases, then they
> could share the same underlying implementation.

Please see my simplified patch below which would explain what I meant.
(Note that the size is calculated by 'max_addr - 0'.)

> I'm probably just missing something here, because the patch you've ended
> up with is far more involved than I anticipated...

I hope that it will meet almost your anticipation.

Thanks,
-Takahiro AKASHI

> 
> Will
===8<===
diff --git a/mm/memblock.c b/mm/memblock.c
index 7608bc3..fea1688 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1514,11 +1514,37 @@ void __init memblock_enforce_memory_limit(phys_addr_t 
limit)
  (phys_addr_t)ULLONG_MAX);
 }
 
+void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
+{
+   int start_rgn, end_rgn;
+   int i, ret;
+
+   if (!size)
+   return;
+
+   ret = memblock_isolate_range(, base, size,
+   _rgn, _rgn);
+   if (ret)
+   return;
+
+   /* remove all the MAP regions */
+   for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
+   if (!memblock_is_nomap([i]))
+   

Re: [PATCH Makedumpfile 00/10] arm64 cleanup and kaslr support

2016-11-16 Thread Pratyush Anand



On Wednesday 16 November 2016 02:10 PM, Atsushi Kumagai wrote:

As you wish. I can send v2 (there is no difference other than conflict
>resolution for rebasing on top of current devel, conflict is minor),
>or you can pick them from
>https://github.com/pratyushanand/makedumpfile.git : arm64_devel.

Sure, I'll pick them (from 60352bd to 3b5faac, right?) up from
your devel branch, thanks.


Yes, Correct.

Thanks

~Pratyush

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] iommu/vt-d: Flush old iotlb for kdump when the device gets context mapped

2016-11-16 Thread Xunlei Pang
On 2016/11/16 at 22:58, Myron Stowe wrote:
> On Wed, Nov 16, 2016 at 2:13 AM, Xunlei Pang  wrote:
>> Ccing David
>> On 2016/11/16 at 17:02, Xunlei Pang wrote:
>>> We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
>>> under kdump, it can be steadily reproduced on several different machines,
>>> the dmesg log is like:
>>> HP HPSA Driver (v 3.4.16-0)
>>> hpsa :02:00.0: using doorbell to reset controller
>>> hpsa :02:00.0: board ready after hard reset.
>>> hpsa :02:00.0: Waiting for controller to respond to no-op
>>> DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
>>> DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
>>> DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 - 0xbdf6efff]
>>> DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
>>> DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
>>> DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
>>> DMAR: DRHD: handling fault status reg 2
>>> DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason 
>>> 06] PTE Read access is not set
>>> hpsa :02:00.0: controller message 03:00 timed out
>>> hpsa :02:00.0: no-op failed; re-trying
>>>
>>> After some debugging, we found that the corresponding pte entry value
>>> is correct, and the value of the iommu caching mode is 0, the fault is
>>> probably due to the old iotlb cache of the in-flight DMA.
>>>
>>> Thus need to flush the old iotlb after context mapping is setup for the
>>> device, where the device is supposed to finish reset at its driver probe
>>> stage and no in-flight DMA exists hereafter.
>>>
>>> With this patch, all our problematic machines can survive the kdump tests.
>>>
>>> CC: Myron Stowe 
>>> CC: Don Brace 
>>> CC: Baoquan He 
>>> CC: Dave Young 
>>> Tested-by: Joseph Szczypek 
>>> Signed-off-by: Xunlei Pang 
>>> ---
>>>  drivers/iommu/intel-iommu.c | 11 +--
>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>> index 3965e73..eb79288 100644
>>> --- a/drivers/iommu/intel-iommu.c
>>> +++ b/drivers/iommu/intel-iommu.c
>>> @@ -2067,9 +2067,16 @@ static int domain_context_mapping_one(struct 
>>> dmar_domain *domain,
>>>* It's a non-present to present mapping. If hardware doesn't cache
>>>* non-present entry we only need to flush the write-buffer. If the
>>>* _does_ cache non-present entries, then it does so in the special
> If this does get accepted then we should fix the above grammar also -
>   "If the _does_ cache ..." -> "If the hardware _does_ cache ..."

Yes, but this reminds me of something.
As per the comment, the code here only needs to flush context caches for the 
special domain 0 which is
used to tag the non-present/erroneous caches, seems we should flush the old 
domain id of present entries
for kdump according to the analysis, other than the new-allocated domain id. 
Let me ponder more on this.

Regards,
Xunlei

>
>>> -  * domain #0, which we have to flush:
>>> +  * domain #0, which we have to flush.
>>> +  *
>>> +  * For kdump cases, present entries may be cached due to the in-flight
>>> +  * DMA and copied old pgtable, but there is no unmapping behaviour for
>>> +  * them, so we need an explicit iotlb flush for the newly-mapped 
>>> device.
>>> +  * For kdump, at this point, the device is supposed to finish reset at
>>> +  * the driver probe stage, no in-flight DMA will exist, thus we do not
>>> +  * need to worry about that anymore hereafter.
>>>*/
>>> - if (cap_caching_mode(iommu->cap)) {
>>> + if (is_kdump_kernel() || cap_caching_mode(iommu->cap)) {
>>>   iommu->flush.flush_context(iommu, 0,
>>>  (((u16)bus) << 8) | devfn,
>>>  DMA_CCMD_MASK_NOBIT,
>> ___
>> iommu mailing list
>> io...@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v27 1/9] memblock: add memblock_cap_memory_range()

2016-11-16 Thread Will Deacon
Hi Akashi,

On Mon, Nov 14, 2016 at 02:55:16PM +0900, AKASHI Takahiro wrote:
> On Fri, Nov 11, 2016 at 11:19:04AM +0800, Dennis Chen wrote:
> > On Fri, Nov 11, 2016 at 11:50:50AM +0900, AKASHI Takahiro wrote:
> > > On Thu, Nov 10, 2016 at 05:27:20PM +, Will Deacon wrote:
> > > > On Wed, Nov 02, 2016 at 01:51:53PM +0900, AKASHI Takahiro wrote:
> > > > > +void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t 
> > > > > size)
> > > > > +{
> > > > > + int start_rgn, end_rgn;
> > > > > + int i, ret;
> > > > > +
> > > > > + if (!size)
> > > > > + return;
> > > > > +
> > > > > + ret = memblock_isolate_range(, base, size,
> > > > > + _rgn, _rgn);
> > > > > + if (ret)
> > > > > + return;
> > > > > +
> > > > > + /* remove all the MAP regions */
> > > > > + for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
> > > > > + if (!memblock_is_nomap([i]))
> > > > > + memblock_remove_region(, i);
> > > > > +
> > > > > + for (i = start_rgn - 1; i >= 0; i--)
> > > > > + if (!memblock_is_nomap([i]))
> > > > > + memblock_remove_region(, i);
> > > > > +
> > > > > + /* truncate the reserved regions */
> > > > > + memblock_remove_range(, 0, base);
> > > > > + memblock_remove_range(,
> > > > > + base + size, (phys_addr_t)ULLONG_MAX);
> > > > > +}
> > > > 
> > > > This duplicates a bunch of the logic in memblock_mem_limit_remove_map. 
> > > > Can
> > > > you not implement that in terms of your new, more general, function? 
> > > > e.g.
> > > > by passing base == 0, and size == limit?
> > > 
> > > Obviously it's possible.
> > > I actually talked to Dennis before about merging them,
> > > but he was against my idea.
> > >
> > Oops! I thought we have reached agreement in the 
> > thread:http://lists.infradead.org/pipermail/linux-arm-kernel/2016-July/442817.html
> > So feel free to do that as Will'll do
> 
> OK, but I found that the two functions have a bit different semantics
> in clipping memory range, in particular, when the range [base,base+size)
> goes across several regions with a gap.
> (This does not happen in my arm64 kdump, though.)
> That is, 'limit' in memblock_mem_limit_remove_map() means total size of
> available memory, while 'size' in memblock_cap_memory_range() indicates
> the size of _continuous_ memory range.

I thought limit was just a physical address, and then
memblock_mem_limit_remove_map operated on the end of the nearest memblock?
You could leave the __find_max_addr call in memblock_mem_limit_remove_map,
given that I don't think you need/want it for memblock_cap_memory_range.

> So I added an extra argument, exact, to a common function to specify
> distinct behaviors. Confusing? Please see the patch below.

Oh yikes, this certainly wasn't what I had in mind! My observation was
just that memblock_mem_limit_remove_map(limit) does:


  1. memblock_isolate_range(limit - limit+ULLONG_MAX)
  2. memblock_remove_region(all non-nomap regions in the isolated region)
  3. truncate reserved regions to limit

and your memblock_cap_memory_range(base, size) does:

  1. memblock_isolate_range(base - base+size)
  2, memblock_remove_region(all non-nomap regions above and below the
 isolated region)
  3. truncate reserved regions around the isolated region

so, assuming we can invert the isolation in one of the cases, then they
could share the same underlying implementation.

I'm probably just missing something here, because the patch you've ended
up with is far more involved than I anticipated...

Will

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] iommu/vt-d: Flush old iotlb for kdump when the device gets context mapped

2016-11-16 Thread Myron Stowe
On Wed, Nov 16, 2016 at 2:13 AM, Xunlei Pang  wrote:
> Ccing David
> On 2016/11/16 at 17:02, Xunlei Pang wrote:
>> We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
>> under kdump, it can be steadily reproduced on several different machines,
>> the dmesg log is like:
>> HP HPSA Driver (v 3.4.16-0)
>> hpsa :02:00.0: using doorbell to reset controller
>> hpsa :02:00.0: board ready after hard reset.
>> hpsa :02:00.0: Waiting for controller to respond to no-op
>> DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
>> DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
>> DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 - 0xbdf6efff]
>> DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
>> DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
>> DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
>> DMAR: DRHD: handling fault status reg 2
>> DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason 
>> 06] PTE Read access is not set
>> hpsa :02:00.0: controller message 03:00 timed out
>> hpsa :02:00.0: no-op failed; re-trying
>>
>> After some debugging, we found that the corresponding pte entry value
>> is correct, and the value of the iommu caching mode is 0, the fault is
>> probably due to the old iotlb cache of the in-flight DMA.
>>
>> Thus need to flush the old iotlb after context mapping is setup for the
>> device, where the device is supposed to finish reset at its driver probe
>> stage and no in-flight DMA exists hereafter.
>>
>> With this patch, all our problematic machines can survive the kdump tests.
>>
>> CC: Myron Stowe 
>> CC: Don Brace 
>> CC: Baoquan He 
>> CC: Dave Young 
>> Tested-by: Joseph Szczypek 
>> Signed-off-by: Xunlei Pang 
>> ---
>>  drivers/iommu/intel-iommu.c | 11 +--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 3965e73..eb79288 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -2067,9 +2067,16 @@ static int domain_context_mapping_one(struct 
>> dmar_domain *domain,
>>* It's a non-present to present mapping. If hardware doesn't cache
>>* non-present entry we only need to flush the write-buffer. If the
>>* _does_ cache non-present entries, then it does so in the special

If this does get accepted then we should fix the above grammar also -
  "If the _does_ cache ..." -> "If the hardware _does_ cache ..."

>> -  * domain #0, which we have to flush:
>> +  * domain #0, which we have to flush.
>> +  *
>> +  * For kdump cases, present entries may be cached due to the in-flight
>> +  * DMA and copied old pgtable, but there is no unmapping behaviour for
>> +  * them, so we need an explicit iotlb flush for the newly-mapped 
>> device.
>> +  * For kdump, at this point, the device is supposed to finish reset at
>> +  * the driver probe stage, no in-flight DMA will exist, thus we do not
>> +  * need to worry about that anymore hereafter.
>>*/
>> - if (cap_caching_mode(iommu->cap)) {
>> + if (is_kdump_kernel() || cap_caching_mode(iommu->cap)) {
>>   iommu->flush.flush_context(iommu, 0,
>>  (((u16)bus) << 8) | devfn,
>>  DMA_CCMD_MASK_NOBIT,
>
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 2/2] kexec: Support -S paramter to return whether the kexec payload is loaded.

2016-11-16 Thread Simon Horman
On Mon, Nov 14, 2016 at 05:12:53PM -0500, Konrad Rzeszutek Wilk wrote:
> Instead of the scripts having to poke at various fields we can
> provide that functionality via the -S parameter.
> 
> Returns 0 if the payload is loaded. Can be used in combination
> with -l or -p to get the state of the proper kexec image.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
> v0: First version (internal product).
> v1: Posted on kexec mailing list. Changed -s to -S
> 
> CC: kexec@lists.infradead.org
> CC: xen-de...@lists.xenproject.org
> CC: Daniel Kiper 
> ---
>  kexec/kexec-xen.c | 20 ++
>  kexec/kexec.8 |  5 +
>  kexec/kexec.c | 61 
> ++-
>  kexec/kexec.h |  5 -
>  4 files changed, 85 insertions(+), 6 deletions(-)
> 
> diff --git a/kexec/kexec-xen.c b/kexec/kexec-xen.c
> index 24a4191..5a1e9d2 100644
> --- a/kexec/kexec-xen.c
> +++ b/kexec/kexec-xen.c
> @@ -105,6 +105,26 @@ int xen_kexec_unload(uint64_t kexec_flags)
>   return ret;
>  }
>  
> +int xen_kexec_status(uint64_t kexec_flags)
> +{
> + xc_interface *xch;
> + uint8_t type;
> + int ret;
> +
> + xch = xc_interface_open(NULL, NULL, 0);
> + if (!xch)
> + return -1;
> +
> + type = (kexec_flags & KEXEC_ON_CRASH) ? KEXEC_TYPE_CRASH
> + : KEXEC_TYPE_DEFAULT;
> +
> + ret = xc_kexec_status(xch, type);
> +
> + xc_interface_close(xch);
> +
> + return ret;
> +}
> +
>  void xen_kexec_exec(void)
>  {
>   xc_interface *xch;
> diff --git a/kexec/kexec.8 b/kexec/kexec.8
> index 4d0c1d1..02f4ccf 100644
> --- a/kexec/kexec.8
> +++ b/kexec/kexec.8
> @@ -107,6 +107,11 @@ command:
>  .B \-d\ (\-\-debug)
>  Enable debugging messages.
>  .TP
> +.B \-D\ (\-\-status)
> +Return 0 if the type (by default crash) is loaded. Can be used in conjuction
> +with -l or -p to toggle the type. Note it will

Is there a mnemonic to map -D to --status?
If not perhaps it would be best to drop the short option.

> +.BR not\ load\ the\ kernel.
> +.TP
>  .B \-e\ (\-\-exec)
>  Run the currently loaded kernel. Note that it will reboot into the loaded 
> kernel without calling shutdown(8).
>  .TP
> diff --git a/kexec/kexec.c b/kexec/kexec.c
> index 500e5a9..bc72688 100644
> --- a/kexec/kexec.c
> +++ b/kexec/kexec.c
> @@ -855,6 +855,32 @@ static int k_unload (unsigned long kexec_flags)
>   return result;
>  }
>  
> +static int kexec_loaded(void);
> +static int __kexec_loaded(const char *f);

I wonder if we could shuffle the location of functions
in this file to avoid the need to forward declarations all together.

> +
> +static int k_status(unsigned long kexec_flags)
> +{
> + int result;
> + long native_arch;
> +
> + /* set the arch */
> + native_arch = physical_arch();
> + if (native_arch < 0) {
> + return -1;
> + }
> + kexec_flags |= native_arch;
> +
> + if (xen_present())
> + result = xen_kexec_status(kexec_flags);
> + else {
> + if (kexec_flags & KEXEC_ON_CRASH)
> + result = 
> __kexec_loaded("/sys/kernel/kexec_crash_loaded");
> + else
> + result = kexec_loaded();
> + }
> + return result;
> +}
> +
>  /*
>   *   Start a reboot.
>   */
> @@ -890,8 +916,6 @@ static int my_exec(void)
>   return -1;
>  }
>  
> -static int kexec_loaded(void);
> -
>  static int load_jump_back_helper_image(unsigned long kexec_flags, void 
> *entry)
>  {
>   int result;
> @@ -970,6 +994,7 @@ void usage(void)
>  "  to original kernel.\n"
>  " -s, --kexec-file-syscall Use file based syscall for kexec 
> operation\n"
>  " -d, --debug   Enable debugging to help spot a 
> failure.\n"
> +" -S, --status Return 0 if the type (by default crash) 
> is loaded.\n"
>  "\n"
>  "Supported kernel file types and options: \n");
>   for (i = 0; i < file_types; i++) {
> @@ -981,7 +1006,7 @@ void usage(void)
>   printf("\n");
>  }
>  
> -static int kexec_loaded(void)
> +static int __kexec_loaded(const char *file)
>  {
>   long ret = -1;
>   FILE *fp;
> @@ -992,7 +1017,7 @@ static int kexec_loaded(void)
>   if (xen_present())
>   return 1;
>  
> - fp = fopen("/sys/kernel/kexec_loaded", "r");
> + fp = fopen(file, "r");
>   if (fp == NULL)
>   return -1;
>  
> @@ -1015,6 +1040,12 @@ static int kexec_loaded(void)
>   return (int)ret;
>  }
>  
> +static int kexec_loaded(void)
> +{
> + return __kexec_loaded("/sys/kernel/kexec_loaded");
> +}
> +
> +
>  /*
>   * Remove parameter from a kernel command line. Helper function by 
> get_command_line().
>   */
> @@ -1204,6 +1235,7 @@ int main(int argc, char *argv[])
>   int do_unload = 0;
>   int do_reuse_initrd = 0;
> + int do_status = 0;
>   void *entry = 0;
>   char *type = 0;
> 

Re: [PATCH] iommu/vt-d: Flush old iotlb for kdump when the device gets context mapped

2016-11-16 Thread Xunlei Pang
Ccing David
On 2016/11/16 at 17:02, Xunlei Pang wrote:
> We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
> under kdump, it can be steadily reproduced on several different machines,
> the dmesg log is like:
> HP HPSA Driver (v 3.4.16-0)
> hpsa :02:00.0: using doorbell to reset controller
> hpsa :02:00.0: board ready after hard reset.
> hpsa :02:00.0: Waiting for controller to respond to no-op
> DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
> DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 - 0xbdf6efff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
> DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
> DMAR: DRHD: handling fault status reg 2
> DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason 
> 06] PTE Read access is not set
> hpsa :02:00.0: controller message 03:00 timed out
> hpsa :02:00.0: no-op failed; re-trying
>
> After some debugging, we found that the corresponding pte entry value
> is correct, and the value of the iommu caching mode is 0, the fault is
> probably due to the old iotlb cache of the in-flight DMA.
>
> Thus need to flush the old iotlb after context mapping is setup for the
> device, where the device is supposed to finish reset at its driver probe
> stage and no in-flight DMA exists hereafter.
>
> With this patch, all our problematic machines can survive the kdump tests.
>
> CC: Myron Stowe 
> CC: Don Brace 
> CC: Baoquan He 
> CC: Dave Young 
> Tested-by: Joseph Szczypek 
> Signed-off-by: Xunlei Pang 
> ---
>  drivers/iommu/intel-iommu.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 3965e73..eb79288 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2067,9 +2067,16 @@ static int domain_context_mapping_one(struct 
> dmar_domain *domain,
>* It's a non-present to present mapping. If hardware doesn't cache
>* non-present entry we only need to flush the write-buffer. If the
>* _does_ cache non-present entries, then it does so in the special
> -  * domain #0, which we have to flush:
> +  * domain #0, which we have to flush.
> +  *
> +  * For kdump cases, present entries may be cached due to the in-flight
> +  * DMA and copied old pgtable, but there is no unmapping behaviour for
> +  * them, so we need an explicit iotlb flush for the newly-mapped device.
> +  * For kdump, at this point, the device is supposed to finish reset at
> +  * the driver probe stage, no in-flight DMA will exist, thus we do not
> +  * need to worry about that anymore hereafter.
>*/
> - if (cap_caching_mode(iommu->cap)) {
> + if (is_kdump_kernel() || cap_caching_mode(iommu->cap)) {
>   iommu->flush.flush_context(iommu, 0,
>  (((u16)bus) << 8) | devfn,
>  DMA_CCMD_MASK_NOBIT,


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] iommu/vt-d: Flush old iotlb for kdump when the device gets context mapped

2016-11-16 Thread Xunlei Pang
We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
under kdump, it can be steadily reproduced on several different machines,
the dmesg log is like:
HP HPSA Driver (v 3.4.16-0)
hpsa :02:00.0: using doorbell to reset controller
hpsa :02:00.0: board ready after hard reset.
hpsa :02:00.0: Waiting for controller to respond to no-op
DMAR: Setting identity map for device :02:00.0 [0xe8000 - 0xe8fff]
DMAR: Setting identity map for device :02:00.0 [0xf4000 - 0xf4fff]
DMAR: Setting identity map for device :02:00.0 [0xbdf6e000 - 0xbdf6efff]
DMAR: Setting identity map for device :02:00.0 [0xbdf6f000 - 0xbdf7efff]
DMAR: Setting identity map for device :02:00.0 [0xbdf7f000 - 0xbdf82fff]
DMAR: Setting identity map for device :02:00.0 [0xbdf83000 - 0xbdf84fff]
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read] Request device [02:00.0] fault addr f000 [fault reason 06] 
PTE Read access is not set
hpsa :02:00.0: controller message 03:00 timed out
hpsa :02:00.0: no-op failed; re-trying

After some debugging, we found that the corresponding pte entry value
is correct, and the value of the iommu caching mode is 0, the fault is
probably due to the old iotlb cache of the in-flight DMA.

Thus need to flush the old iotlb after context mapping is setup for the
device, where the device is supposed to finish reset at its driver probe
stage and no in-flight DMA exists hereafter.

With this patch, all our problematic machines can survive the kdump tests.

CC: Myron Stowe 
CC: Don Brace 
CC: Baoquan He 
CC: Dave Young 
Tested-by: Joseph Szczypek 
Signed-off-by: Xunlei Pang 
---
 drivers/iommu/intel-iommu.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3965e73..eb79288 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2067,9 +2067,16 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
 * It's a non-present to present mapping. If hardware doesn't cache
 * non-present entry we only need to flush the write-buffer. If the
 * _does_ cache non-present entries, then it does so in the special
-* domain #0, which we have to flush:
+* domain #0, which we have to flush.
+*
+* For kdump cases, present entries may be cached due to the in-flight
+* DMA and copied old pgtable, but there is no unmapping behaviour for
+* them, so we need an explicit iotlb flush for the newly-mapped device.
+* For kdump, at this point, the device is supposed to finish reset at
+* the driver probe stage, no in-flight DMA will exist, thus we do not
+* need to worry about that anymore hereafter.
 */
-   if (cap_caching_mode(iommu->cap)) {
+   if (is_kdump_kernel() || cap_caching_mode(iommu->cap)) {
iommu->flush.flush_context(iommu, 0,
   (((u16)bus) << 8) | devfn,
   DMA_CCMD_MASK_NOBIT,
-- 
1.8.3.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH Makedumpfile 00/10] arm64 cleanup and kaslr support

2016-11-16 Thread Pratyush Anand
On Wed, Nov 16, 2016 at 11:14 AM, Atsushi Kumagai
 wrote:
> Hello Pratyush,
>
>>On Tue, Nov 15, 2016 at 12:04 PM, Pratyush Anand  wrote:
>>> Hi Atsushi,
>>>
>>> There would be a conflict because of following patch while applying
>>> these patches. Other than that I also see a an issue with --config
>>> option. So I will fix that as well and repost the series soon.
>>
>>Sorry, for the noise.That was a bad compilation of the code.
>>
>>Anyway, I resolved the conflict because of below patch in upstream
>>devel branch and r5pushed the patches in my git tree:
>>https://github.com/pratyushanand/makedumpfile.git : arm64_devel
>>
>>There is no changes other than conflict resolution w.r.t. this patch series.
>>
>>~Pratyush
>>
>>>
>>> 0068010b9b83 [PATCH v2 2/2] Clean up unused KERNEL_IMAGE_SIZE
>
> I'm sorry ! I misunderstood that I already pushed this series
> in the devel branch. Should I wait for v2 series where the
> conflict is resolved ?

As you wish. I can send v2 (there is no difference other than conflict
resolution for rebasing on top of current devel, conflict is minor),
or you can pick them from
https://github.com/pratyushanand/makedumpfile.git : arm64_devel.

I am OK with whatever is convenient for you.

Thanks
Pratyush

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH Makedumpfile 00/10] arm64 cleanup and kaslr support

2016-11-16 Thread Atsushi Kumagai
Hello Pratyush,

>On Tue, Nov 15, 2016 at 12:04 PM, Pratyush Anand  wrote:
>> Hi Atsushi,
>>
>> There would be a conflict because of following patch while applying
>> these patches. Other than that I also see a an issue with --config
>> option. So I will fix that as well and repost the series soon.
>
>Sorry, for the noise.That was a bad compilation of the code.
>
>Anyway, I resolved the conflict because of below patch in upstream
>devel branch and r5pushed the patches in my git tree:
>https://github.com/pratyushanand/makedumpfile.git : arm64_devel
>
>There is no changes other than conflict resolution w.r.t. this patch series.
>
>~Pratyush
>
>>
>> 0068010b9b83 [PATCH v2 2/2] Clean up unused KERNEL_IMAGE_SIZE

I'm sorry ! I misunderstood that I already pushed this series
in the devel branch. Should I wait for v2 series where the
conflict is resolved ?


Thanks,
Atsushi Kumagai
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec