Re: [PATCH 4.14 v2 ] platform/x86: Corrects warning: missing braces around initializer

2020-10-30 Thread John Donnelly



> On Oct 30, 2020, at 11:52 AM, Andy Shevchenko 
>  wrote:
> 
> On Fri, Oct 30, 2020 at 08:55:01AM -0700, john.p.donne...@oracle.com wrote:
>> From: John Donnelly 
>> 
>> The assignment statement of a local variable "struct tp_nvram_state s[2] = 
>> {0};
>> is not valid for all versions of compilers.
> 
> I don't get the subject. IS it backport of existing change to v4.14, or you 
> are
> trying to fix v4.14? If the latter is the case, it's not correct order. Try
> latest vanilla first (v5.10-rc1 as of today) and if there is still an issue,
> submit a patch.

Hi,

 It is only intended for 4.14. Why would you back port  a commit  to a stable 
tree that emits warnings ?




> 
>> Fixes: 515ded02bc4b ("platform/x86: thinkpad_acpi: initialize tp_nvram_state 
>> variable")
>> 
>> Signed-off-by: John Donnelly 
> 
> Should not be blank line in between.
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 



Re: [PATCH 4.14 ] platform/x86: Corrects warning: missing braces around initializer

2020-10-30 Thread John Donnelly



> On Oct 30, 2020, at 10:52 AM, john.p.donne...@oracle.com wrote:
> 
> From: John Donnelly 
> 
> The assignment statement of a local variable "struct tp_nvram_state s[2] = 
> {0}; "
> is not valid for all versions of compilers (UEK6 on OL7).
> 
> Fixes: 515ded02bc4b ("platform/x86: thinkpad_acpi: initialize tp_nvram_state 
> variable")
> 
> Signed-off-by: John Donnelly 
> ---
> drivers/platform/x86/thinkpad_acpi.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/platform/x86/thinkpad_acpi.c 
> b/drivers/platform/x86/thinkpad_acpi.c
> index ffaaccded34e..c41ac0385304 100644
> --- a/drivers/platform/x86/thinkpad_acpi.c
> +++ b/drivers/platform/x86/thinkpad_acpi.c
> @@ -2477,7 +2477,7 @@ static void hotkey_compare_and_issue_event(struct 
> tp_nvram_state *oldn,
>  */
> static int hotkey_kthread(void *data)
> {
> - struct tp_nvram_state s[2] = { 0 };
> + struct tp_nvram_state s[2];
>   u32 poll_mask, event_mask;
>   unsigned int si, so;
>   unsigned long t;
> @@ -2488,6 +2488,8 @@ static int hotkey_kthread(void *data)
>   if (tpacpi_lifecycle == TPACPI_LIFE_EXITING)
>   goto exit;
> 
> + memset(, 0, sizeof(s));
> +
>   set_freezable();
> 
>   so = 0;
> -- 
> 2.27.0
> 

Please ignore and use :

PATCH 4.14 v2 ] platform/x86: Corrects warning: missing braces around 
initializer



Re: [PATCH ] scsi: page warning: 'page' may be used uninitialized

2020-10-02 Thread John Donnelly



> On Oct 2, 2020, at 1:01 PM, Mike Christie  wrote:
> 
> On 9/23/20 7:19 PM, john.p.donne...@oracle.com wrote:
>> From: John Donnelly 
>> 
>> corrects: drivers/target/target_core_user.c:688:6: warning: 'page' may be 
>> used
>> uninitialized
>> 
>> Fixes: 3c58f737231e ("scsi: target: tcmu: Optimize use of
>> flush_dcache_page")
>> 
>> To: linux-s...@vger.kernel.org
>> Cc: Mike Christie 
>> Signed-off-by: John Donnelly 
>> ---
>> drivers/target/target_core_user.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/target/target_core_user.c 
>> b/drivers/target/target_core_user.c
>> index 9b7592350502..86b28117787e 100644
>> --- a/drivers/target/target_core_user.c
>> +++ b/drivers/target/target_core_user.c
>> @@ -681,7 +681,7 @@ static void scatter_data_area(struct tcmu_dev *udev,
>>  void *from, *to = NULL;
>>  size_t copy_bytes, to_offset, offset;
>>  struct scatterlist *sg;
>> -struct page *page;
>> +struct page *page = NULL;
>> 
>>  for_each_sg(data_sg, sg, data_nents, i) {
>>  int sg_remaining = sg->length;
>> 
> 
> Looks ok for now. In the next kernel we can do the more invasive approach and
> add a new struct/helpers to make the code cleaner and fix it properly.
> 
> Acked-by: Mike Christie 


Hi 

Thank you.

I am not always on the email dlists .. Please do the right thing . 





Re: [PATCH v12 0/9] support reserving crashkernel above 4G on arm64 kdump

2020-09-23 Thread John Donnelly



> On Sep 15, 2020, at 2:16 AM, chenzhou  wrote:
> 
> 
> 
> On 2020/9/7 21:47, Chen Zhou wrote:
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
>> will fail when there is no enough low memory.
>> 2. If reserving crashkernel above 4G, in this case, crash dump
>> kernel will boot failure because there is no low memory available
>> for allocation.
>> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
>> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
>> the devices in crash dump kernel need to use ZONE_DMA will alloc
>> fail.
>> 
>> To solve these issues, change the behavior of crashkernel=X.
>> crashkernel=X tries low allocation in DMA zone, and fall back to
>> high allocation if it fails.
>> If requized size X is too large and leads to very little low memory
>> in DMA zone after low allocation, the system may not work normally.
>> So add a threshold and go for high allocation directly if the required
>> size is too large. The value of threshold is set as the half of
>> the low memory.
>> 
>> We can also use "crashkernel=X,high" to select a high region above
>> DMA zone, which also tries to allocate at least 256M low memory in
>> DMA zone automatically.
>> "crashkernel=Y,low" can be used to allocate specified size low memory.
>> For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
>> 
>> When reserving crashkernel in high memory, some low memory is reserved
>> for crash dump kernel devices. So there may be two regions reserved for
>> crash dump kernel.
>> In order to distinct from the high region and make no effect to the use
>> of existing kexec-tools, rename the low region as "Crash kernel (low)",
>> and pass the low region by reusing DT property
>> "linux,usable-memory-range". We made the low memory region as the last
>> range of "linux,usable-memory-range" to keep compatibility with existing
>> user-space and older kdump kernels.
>> 
>> Besides, we need to modify kexec-tools:
>> arm64: support more than one crash kernel regions(see [1])
>> 
>> Another update is document about DT property 'linux,usable-memory-range':
>> schemas: update 'linux,usable-memory-range' node schema(see [2])
>> 
>> This patchset contains the following nine patches:
>> 0001-x86-kdump-move-CRASH_ALIGN-to-2M.patch
>> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
>> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
>> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
>> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
>> 0006-arm64-kdump-reimplement-crashkernel-X.patch
>> 0007-kdump-add-threshold-for-the-required-memory.patch
>> 0008-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
>> 0009-kdump-update-Documentation-about-crashkernel.patch
>> 
>> 0001-0003 are some x86 cleanups which prepares for making
>> functionsreserve_crashkernel[_low]() generic.
>> 
>> 0004 makes functions reserve_crashkernel[_low]() generic.
>> 0005-0006 reimplements crashkernel=X.
>> 0007 adds threshold for the required memory.
>> 0008 adds memory for devices by DT property linux,usable-memory-range.
>> 0009 updates the doc.
> Hi Catalin and Dave,

  Hi,

   This patch set  has been going on since May, 2019.  When will this be 
accepted and integrated into a rc build ?

  



> 
> Any other suggestions about this patchset? Let me know if you have any 
> questions.
> 
> Thanks,
> Chen Zhou
>> 
>> Changes since [v11]
>> - Rebased on top of 5.9-rc4.
>> - Make the function reserve_crashkernel() of x86 generic.
>> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
>> and arm64 use the generic version to reimplement crashkernel=X.
>> 
>> Changes since [v10]
>> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>> 
>> Changes since [v9]
>> - Patch 1 add Acked-by from Dave.
>> - Update patch 5 according to Dave's comments.
>> - Update chosen schema.
>> 
>> Changes since [v8]
>> - Reuse DT property "linux,usable-memory-range".
>> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the 
>> low
>> memory region.
>> - Fix kdump broken with ZONE_DMA reintroduced.
>> - Update chosen schema.
>> 
>> Changes since [v7]
>> - Move x86 CRASH_ALIGN to 2M
>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
>> - Update Documentation/devicetree/bindings/chosen.txt.
>> Add corresponding documentation to 
>> Documentation/devicetree/bindings/chosen.txt
>> suggested by Arnd.
>> - Add Tested-by from Jhon and pk.
>> 
>> Changes since [v6]
>> - Fix build errors reported by kbuild test robot.
>> 
>> Changes since [v5]
>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>> - Delete crashkernel=X,high.
>> - Modify crashkernel=X,low.
>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
>> memory for crash kdump kernel devices firstly and then reserve memory above 
>> 4G.
>> In 

Re: [PATCH v12 0/9] support reserving crashkernel above 4G on arm64 kdump

2020-09-12 Thread John Donnelly
nel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: 
https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-June/020737.html__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BsfMsIet$
[2]: 
https://urldefense.com/v3/__https://github.com/robherring/dt-schema/pull/19__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6Bv1JxB2D$
[v1]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BgTzrgKq$
[v2]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6Btz3iM8F$
[v3]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BuqcVDab$
[v4]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6Bgdlc1Y7$
[v5]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BsuuZ6C_$
[v6]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6Bo4IxHqi$
[v7]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BjlqN_6I$
[v8]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2020/5/21/213__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BlBSztwY$
[v9]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2020/6/28/73__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BoNFCNt9$
[v10]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2020/7/2/1443__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BvfD2Ihf$
[v11]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2020/8/1/150__;!!GqivPVa7Brio!IzjRTihkWj0uY8lqf60OD7rbqIAhyGD20C4EZpBaPsNfWxuPgeU1Av-fzig6BohKxmce$

Chen Zhou (9):
   x86: kdump: move CRASH_ALIGN to 2M
   x86: kdump: make the lower bound of crash kernel reservation
 consistent
   x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
 reserve_crashkernel[_low]()
   x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
   arm64: kdump: introduce some macroes for crash kernel reservation
   arm64: kdump: reimplement crashkernel=X
   kdump: add threshold for the required memory
   arm64: kdump: add memory for devices by DT property
 linux,usable-memory-range
   kdump: update Documentation about crashkernel

  Documentation/admin-guide/kdump/kdump.rst |  25 ++-
  .../admin-guide/kernel-parameters.txt |  13 +-
  arch/arm64/include/asm/kexec.h|  15 ++
  arch/arm64/include/asm/processor.h|   1 +
  arch/arm64/kernel/setup.c |  13 +-
  arch/arm64/mm/init.c  | 105 --
  arch/arm64/mm/mmu.c   |   4 +
  arch/x86/include/asm/kexec.h  |  28 +++
  arch/x86/kernel/setup.c   | 165 +--
  include/linux/crash_core.h|   4 +
  include/linux/kexec.h |   2 -
  kernel/crash_core.c   | 192 ++
  kernel/kexec_core.c   |  17 --
  13 files changed, 328 insertions(+), 256 deletions(-)




I did a brief unit-test on 5.9-rc4.

Please add:


Tested-by:  John Donnelly 


This activity is over a year old. It needs accepted.







Re: [PATCH v4: {linux-4.14.y} ] dm cache: submit writethrough writes in parallel to origin and cache

2020-08-04 Thread John Donnelly



> On Aug 4, 2020, at 7:07 PM, john.p.donne...@oracle.com wrote:
> 
> From: Mike Snitzer 
> 
> Discontinue issuing writethrough write IO in series to the origin and
> then cache.
> 
> Use bio_clone_fast() to create a new origin clone bio that will be
> mapped to the origin device and then bio_chain() it to the bio that gets
> remapped to the cache device.  The origin clone bio does _not_ have a
> copy of the per_bio_data -- as such check_if_tick_bio_needed() will not
> be called.
> 
> The cache bio (parent bio) will not complete until the origin bio has
> completed -- this fulfills bio_clone_fast()'s requirements as well as
> the requirement to not complete the original IO until the write IO has
> completed to both the origin and cache device.
> 
> Signed-off-by: Mike Snitzer 
> 
> (cherry picked from commit 2df3bae9a6543e90042291707b8db0cbfbae9ee9)
> 
> Fixes: 4ec34f2196d125ff781170ddc6c3058c08ec5e73 (dm bio record:
> save/restore bi_end_io and bi_integrity )
> 
> 4ec34f21 introduced a mkfs.ext4 hang on a LVM device that has been
> modified with lvconvert --cachemode=writethrough.
> 
> CC:sta...@vger.kernel.org for 4.14.y
> 
> Signed-off-by: John Donnelly 
> Reviewed-by: Somasundaram Krishnasamy 
> ---
> drivers/md/dm-cache-target.c | 92 ++--
> 1 file changed, 37 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
> index 69cdb29ef6be..2732d1df05fa 100644
> --- a/drivers/md/dm-cache-target.c
> +++ b/drivers/md/dm-cache-target.c
> @@ -450,6 +450,7 @@ struct cache {
>   struct work_struct migration_worker;
>   struct delayed_work waker;
>   struct dm_bio_prison_v2 *prison;
> + struct bio_set *bs;
> 
>   mempool_t *migration_pool;
> 
> @@ -537,11 +538,6 @@ static void wake_deferred_bio_worker(struct cache *cache)
>   queue_work(cache->wq, >deferred_bio_worker);
> }
> 
> -static void wake_deferred_writethrough_worker(struct cache *cache)
> -{
> - queue_work(cache->wq, >deferred_writethrough_worker);
> -}
> -
> static void wake_migration_worker(struct cache *cache)
> {
>   if (passthrough_mode(>features))
> @@ -868,16 +864,23 @@ static void check_if_tick_bio_needed(struct cache 
> *cache, struct bio *bio)
>   spin_unlock_irqrestore(>lock, flags);
> }
> 
> -static void remap_to_origin_clear_discard(struct cache *cache, struct bio 
> *bio,
> -   dm_oblock_t oblock)
> +static void __remap_to_origin_clear_discard(struct cache *cache, struct bio 
> *bio,
> + dm_oblock_t oblock, bool 
> bio_has_pbd)
> {
> - // FIXME: this is called way too much.
> - check_if_tick_bio_needed(cache, bio);
> + if (bio_has_pbd)
> + check_if_tick_bio_needed(cache, bio);
>   remap_to_origin(cache, bio);
>   if (bio_data_dir(bio) == WRITE)
>   clear_discard(cache, oblock_to_dblock(cache, oblock));
> }
> 
> +static void remap_to_origin_clear_discard(struct cache *cache, struct bio 
> *bio,
> +   dm_oblock_t oblock)
> +{
> + // FIXME: check_if_tick_bio_needed() is called way too much through 
> this interface
> + __remap_to_origin_clear_discard(cache, bio, oblock, true);
> +}
> +
> static void remap_to_cache_dirty(struct cache *cache, struct bio *bio,
>dm_oblock_t oblock, dm_cblock_t cblock)
> {
> @@ -937,57 +940,26 @@ static void issue_op(struct bio *bio, void *context)
>   accounted_request(cache, bio);
> }
> 
> -static void defer_writethrough_bio(struct cache *cache, struct bio *bio)
> -{
> - unsigned long flags;
> -
> - spin_lock_irqsave(>lock, flags);
> - bio_list_add(>deferred_writethrough_bios, bio);
> - spin_unlock_irqrestore(>lock, flags);
> -
> - wake_deferred_writethrough_worker(cache);
> -}
> -
> -static void writethrough_endio(struct bio *bio)
> -{
> - struct per_bio_data *pb = get_per_bio_data(bio, PB_DATA_SIZE_WT);
> -
> - dm_unhook_bio(>hook_info, bio);
> -
> - if (bio->bi_status) {
> - bio_endio(bio);
> - return;
> - }
> -
> - dm_bio_restore(>bio_details, bio);
> - remap_to_cache(pb->cache, bio, pb->cblock);
> -
> - /*
> -  * We can't issue this bio directly, since we're in interrupt
> -  * context.  So it gets put on a bio list for processing by the
> -  * worker thread.
> -  */
> - defer_writethrough_bio(pb->cache, bio);
> -}
> -
> /*
> - * FIXME: send in parallel, h

[(resend) PATCH v3: {linux-4.14.y} ] dm cache: submit writethrough writes in parallel to origin and cache

2020-08-04 Thread John Donnelly
From: Mike Snitzer 

Discontinue issuing writethrough write IO in series to the origin and
then cache.

Use bio_clone_fast() to create a new origin clone bio that will be
mapped to the origin device and then bio_chain() it to the bio that gets
remapped to the cache device.  The origin clone bio does _not_ have a
copy of the per_bio_data -- as such check_if_tick_bio_needed() will not
be called.

The cache bio (parent bio) will not complete until the origin bio has
completed -- this fulfills bio_clone_fast()'s requirements as well as
the requirement to not complete the original IO until the write IO has
completed to both the origin and cache device.

Signed-off-by: Mike Snitzer 

(cherry picked from commit 2df3bae9a6543e90042291707b8db0cbfbae9ee9)

Fixes: 4ec34f2196d125ff781170ddc6c3058c08ec5e73 (dm bio record:
save/restore bi_end_io and bi_integrity )

4ec34f21 introduced a mkfs.ext4 hang on a LVM device that has been
modified with lvconvert --cachemode=writethrough.

CC:sta...@vger.kernel.org for 4.14.y

Signed-off-by: John Donnelly 
Reviewed-by: Somasundaram Krishnasamy 

conflicts:
drivers/md/dm-cache-target.c. -  Corrected usage of
writethrough_mode(>feature) that was caught by
compiler, and removed unused static functions : writethrough_endio(),
defer_writethrough_bio(), wake_deferred_writethrough_worker()
that generated warnings.
---
drivers/md/dm-cache-target.c | 92 ++--
1 file changed, 37 insertions(+), 55 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 69cdb29ef6be..2732d1df05fa 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -450,6 +450,7 @@ struct cache {
struct work_struct migration_worker;
struct delayed_work waker;
struct dm_bio_prison_v2 *prison;
+   struct bio_set *bs;

mempool_t *migration_pool;

@@ -537,11 +538,6 @@ static void wake_deferred_bio_worker(struct cache *cache)
queue_work(cache->wq, >deferred_bio_worker);
}

-static void wake_deferred_writethrough_worker(struct cache *cache)
-{
-   queue_work(cache->wq, >deferred_writethrough_worker);
-}
-
static void wake_migration_worker(struct cache *cache)
{
if (passthrough_mode(>features))
@@ -868,16 +864,23 @@ static void check_if_tick_bio_needed(struct cache *cache, 
struct bio *bio)
spin_unlock_irqrestore(>lock, flags);
}

-static void remap_to_origin_clear_discard(struct cache *cache, struct bio *bio,
- dm_oblock_t oblock)
+static void __remap_to_origin_clear_discard(struct cache *cache, struct bio 
*bio,
+   dm_oblock_t oblock, bool 
bio_has_pbd)
{
-   // FIXME: this is called way too much.
-   check_if_tick_bio_needed(cache, bio);
+   if (bio_has_pbd)
+   check_if_tick_bio_needed(cache, bio);
remap_to_origin(cache, bio);
if (bio_data_dir(bio) == WRITE)
clear_discard(cache, oblock_to_dblock(cache, oblock));
}

+static void remap_to_origin_clear_discard(struct cache *cache, struct bio *bio,
+ dm_oblock_t oblock)
+{
+   // FIXME: check_if_tick_bio_needed() is called way too much through 
this interface
+   __remap_to_origin_clear_discard(cache, bio, oblock, true);
+}
+
static void remap_to_cache_dirty(struct cache *cache, struct bio *bio,
 dm_oblock_t oblock, dm_cblock_t cblock)
{
@@ -937,57 +940,26 @@ static void issue_op(struct bio *bio, void *context)
accounted_request(cache, bio);
}

-static void defer_writethrough_bio(struct cache *cache, struct bio *bio)
-{
-   unsigned long flags;
-
-   spin_lock_irqsave(>lock, flags);
-   bio_list_add(>deferred_writethrough_bios, bio);
-   spin_unlock_irqrestore(>lock, flags);
-
-   wake_deferred_writethrough_worker(cache);
-}
-
-static void writethrough_endio(struct bio *bio)
-{
-   struct per_bio_data *pb = get_per_bio_data(bio, PB_DATA_SIZE_WT);
-
-   dm_unhook_bio(>hook_info, bio);
-
-   if (bio->bi_status) {
-   bio_endio(bio);
-   return;
-   }
-
-   dm_bio_restore(>bio_details, bio);
-   remap_to_cache(pb->cache, bio, pb->cblock);
-
-   /*
-* We can't issue this bio directly, since we're in interrupt
-* context.  So it gets put on a bio list for processing by the
-* worker thread.
-*/
-   defer_writethrough_bio(pb->cache, bio);
-}
-
/*
- * FIXME: send in parallel, huge latency as is.
 * When running in writethrough mode we need to send writes to clean blocks
- * to both the cache and origin devices.  In future we'd like to clone the
- * bio and send them in parallel, but for now we're doing them in
- * series as this is easier.
+ * to both the cache and origin devices.  Clone the 

[PATCH v3: {linux-4.14.y} ] dm cache: submit writethrough writes in parallel to origin and cache

2020-08-03 Thread John Donnelly

From: Mike Snitzer 

Discontinue issuing writethrough write IO in series to the origin and
then cache.

Use bio_clone_fast() to create a new origin clone bio that will be
mapped to the origin device and then bio_chain() it to the bio that gets
remapped to the cache device.  The origin clone bio does _not_ have a
copy of the per_bio_data -- as such check_if_tick_bio_needed() will not
be called.

The cache bio (parent bio) will not complete until the origin bio has
completed -- this fulfills bio_clone_fast()'s requirements as well as
the requirement to not complete the original IO until the write IO has
completed to both the origin and cache device.

Signed-off-by: Mike Snitzer 

(cherry picked from commit 2df3bae9a6543e90042291707b8db0cbfbae9ee9)

Fixes: 4ec34f2196d125ff781170ddc6c3058c08ec5e73 (dm bio record:
save/restore bi_end_io and bi_integrity )

4ec34f21 introduced a mkfs.ext4 hang on a LVM device that has been
modified with lvconvert --cachemode=writethrough.

CC:sta...@vger.kernel.org for 4.14.y

Signed-off-by: John Donnelly 
Reviewed-by: Somasundaram Krishnasamy 

conflicts:
drivers/md/dm-cache-target.c. -  Corrected usage of
writethrough_mode(>feature) that was caught by
compiler, and removed unused static functions : writethrough_endio(),
defer_writethrough_bio(), wake_deferred_writethrough_worker()
that generated warnings.
---
 drivers/md/dm-cache-target.c | 92 
++--

 1 file changed, 37 insertions(+), 55 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 69cdb29ef6be..2732d1df05fa 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -450,6 +450,7 @@ struct cache {
struct work_struct migration_worker;
struct delayed_work waker;
struct dm_bio_prison_v2 *prison;
+   struct bio_set *bs;
mempool_t *migration_pool;
 @@ -537,11 +538,6 @@ static void wake_deferred_bio_worker(struct cache 
*cache)

queue_work(cache->wq, >deferred_bio_worker);
 }
 -static void wake_deferred_writethrough_worker(struct cache *cache)
-{
-   queue_work(cache->wq, >deferred_writethrough_worker);
-}
-
 static void wake_migration_worker(struct cache *cache)
 {
if (passthrough_mode(>features))
@@ -868,16 +864,23 @@ static void check_if_tick_bio_needed(struct cache 
*cache, struct bio *bio)

spin_unlock_irqrestore(>lock, flags);
 }
 -static void remap_to_origin_clear_discard(struct cache *cache, struct 
bio *bio,

- dm_oblock_t oblock)
+static void __remap_to_origin_clear_discard(struct cache *cache, struct 
bio *bio,

+   dm_oblock_t oblock, bool 
bio_has_pbd)
 {
-   // FIXME: this is called way too much.
-   check_if_tick_bio_needed(cache, bio);
+   if (bio_has_pbd)
+   check_if_tick_bio_needed(cache, bio);
remap_to_origin(cache, bio);
if (bio_data_dir(bio) == WRITE)
clear_discard(cache, oblock_to_dblock(cache, oblock));
 }
 +static void remap_to_origin_clear_discard(struct cache *cache, struct 
bio *bio,

+ dm_oblock_t oblock)
+{
+	// FIXME: check_if_tick_bio_needed() is called way too much through 
this interface

+   __remap_to_origin_clear_discard(cache, bio, oblock, true);
+}
+
 static void remap_to_cache_dirty(struct cache *cache, struct bio *bio,
 dm_oblock_t oblock, dm_cblock_t cblock)
 {
@@ -937,57 +940,26 @@ static void issue_op(struct bio *bio, void *context)
accounted_request(cache, bio);
 }
 -static void defer_writethrough_bio(struct cache *cache, struct bio *bio)
-{
-   unsigned long flags;
-
-   spin_lock_irqsave(>lock, flags);
-   bio_list_add(>deferred_writethrough_bios, bio);
-   spin_unlock_irqrestore(>lock, flags);
-
-   wake_deferred_writethrough_worker(cache);
-}
-
-static void writethrough_endio(struct bio *bio)
-{
-   struct per_bio_data *pb = get_per_bio_data(bio, PB_DATA_SIZE_WT);
-
-   dm_unhook_bio(>hook_info, bio);
-
-   if (bio->bi_status) {
-   bio_endio(bio);
-   return;
-   }
-
-   dm_bio_restore(>bio_details, bio);
-   remap_to_cache(pb->cache, bio, pb->cblock);
-
-   /*
-* We can't issue this bio directly, since we're in interrupt
-* context.  So it gets put on a bio list for processing by the
-* worker thread.
-*/
-   defer_writethrough_bio(pb->cache, bio);
-}
-
 /*
- * FIXME: send in parallel, huge latency as is.
  * When running in writethrough mode we need to send writes to clean 
blocks

- * to both the cache and origin devices.  In future we'd like to clone the
- * bio and send them in parallel, but for now we're doing them in
- * series as this is easier.
+ * to both the cache and origin devices.  Clone t

Re: [PATCH v2: [linux-4.14.y] ] dm cache: submit writethrough writes in parallel to origin and cache

2020-07-31 Thread John Donnelly




On 7/31/20 1:54 AM, Greg KH wrote:

On Thu, Jul 30, 2020 at 03:33:42PM -0500, John Donnelly wrote:

From: Mike Snitzer 

Discontinue issuing writethrough write IO in series to the origin and
then cache.

Use bio_clone_fast() to create a new origin clone bio that will be
mapped to the origin device and then bio_chain() it to the bio that gets
remapped to the cache device.  The origin clone bio does _not_ have a
copy of the per_bio_data -- as such check_if_tick_bio_needed() will not
be called.

The cache bio (parent bio) will not complete until the origin bio has
completed -- this fulfills bio_clone_fast()'s requirements as well as
the requirement to not complete the original IO until the write IO has
completed to both the origin and cache device.

Signed-off-by: Mike Snitzer 

(cherry picked from commit 2df3bae9a6543e90042291707b8db0cbfbae9ee9)

Fixes: 4ec34f2196d125ff781170ddc6c3058c08ec5e73 (dm bio record:
save/restore bi_end_io and bi_integrity )

4ec34f21 introduced a mkfs.ext4 hang on a LVM device that has been
modified with lvconvert --cachemode=writethrough.

CC:sta...@vger.kernel.org for 4.14.x .

Signed-off-by: John Donnelly 
Reviewed-by: Somasundaram Krishnasamy 

conflicts:
drivers/md/dm-cache-target.c. -  Corrected usage of
writethrough_mode(>feature) that was caught by
compiler, and removed unused static functions : writethrough_endio(),
defer_writethrough_bio(), wake_deferred_writethrough_worker()
that generated warnings.
---
  drivers/md/dm-cache-target.c | 94
++--


Line wrapped?


 ummm .. that is weird




  1 file changed, 38 insertions(+), 56 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 69cdb29ef6be..2d9566bfe08b 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -1,5 +1,5 @@
  /*
- * Copyright (C) 2012 Red Hat. All rights reserved.
+ i Copyright (C) 2012 Red Hat. All rights reserved.


What happened here?


 spurious character from vi ;-(



This patch can't be applied as-is :(

greg k-h



 I will repost next week Thank you.



[PATCH v2: [linux-4.14.y] ] dm cache: submit writethrough writes in parallel to origin and cache

2020-07-30 Thread John Donnelly

From: Mike Snitzer 

Discontinue issuing writethrough write IO in series to the origin and
then cache.

Use bio_clone_fast() to create a new origin clone bio that will be
mapped to the origin device and then bio_chain() it to the bio that gets
remapped to the cache device.  The origin clone bio does _not_ have a
copy of the per_bio_data -- as such check_if_tick_bio_needed() will not
be called.

The cache bio (parent bio) will not complete until the origin bio has
completed -- this fulfills bio_clone_fast()'s requirements as well as
the requirement to not complete the original IO until the write IO has
completed to both the origin and cache device.

Signed-off-by: Mike Snitzer 

(cherry picked from commit 2df3bae9a6543e90042291707b8db0cbfbae9ee9)

Fixes: 4ec34f2196d125ff781170ddc6c3058c08ec5e73 (dm bio record:
save/restore bi_end_io and bi_integrity )

4ec34f21 introduced a mkfs.ext4 hang on a LVM device that has been
modified with lvconvert --cachemode=writethrough.

CC:sta...@vger.kernel.org for 4.14.x .

Signed-off-by: John Donnelly 
Reviewed-by: Somasundaram Krishnasamy 

conflicts:
drivers/md/dm-cache-target.c. -  Corrected usage of
writethrough_mode(>feature) that was caught by
compiler, and removed unused static functions : writethrough_endio(),
defer_writethrough_bio(), wake_deferred_writethrough_worker()
that generated warnings.
---
 drivers/md/dm-cache-target.c | 94 
++--

 1 file changed, 38 insertions(+), 56 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 69cdb29ef6be..2d9566bfe08b 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2012 Red Hat. All rights reserved.
+ i Copyright (C) 2012 Red Hat. All rights reserved.
  *
  * This file is released under the GPL.
  */
@@ -450,6 +450,7 @@ struct cache {
struct work_struct migration_worker;
struct delayed_work waker;
struct dm_bio_prison_v2 *prison;
+   struct bio_set *bs;
mempool_t *migration_pool;
 @@ -537,11 +538,6 @@ static void wake_deferred_bio_worker(struct cache 
*cache)

queue_work(cache->wq, >deferred_bio_worker);
 }
 -static void wake_deferred_writethrough_worker(struct cache *cache)
-{
-   queue_work(cache->wq, >deferred_writethrough_worker);
-}
-
 static void wake_migration_worker(struct cache *cache)
 {
if (passthrough_mode(>features))
@@ -868,16 +864,23 @@ static void check_if_tick_bio_needed(struct cache 
*cache, struct bio *bio)

spin_unlock_irqrestore(>lock, flags);
 }
 -static void remap_to_origin_clear_discard(struct cache *cache, struct 
bio *bio,

- dm_oblock_t oblock)
+static void __remap_to_origin_clear_discard(struct cache *cache, struct 
bio *bio,

+   dm_oblock_t oblock, bool 
bio_has_pbd)
 {
-   // FIXME: this is called way too much.
-   check_if_tick_bio_needed(cache, bio);
+   if (bio_has_pbd)
+   check_if_tick_bio_needed(cache, bio);
remap_to_origin(cache, bio);
if (bio_data_dir(bio) == WRITE)
clear_discard(cache, oblock_to_dblock(cache, oblock));
 }
 +static void remap_to_origin_clear_discard(struct cache *cache, struct 
bio *bio,

+ dm_oblock_t oblock)
+{
+	// FIXME: check_if_tick_bio_needed() is called way too much through 
this interface

+   __remap_to_origin_clear_discard(cache, bio, oblock, true);
+}
+
 static void remap_to_cache_dirty(struct cache *cache, struct bio *bio,
 dm_oblock_t oblock, dm_cblock_t cblock)
 {
@@ -937,57 +940,26 @@ static void issue_op(struct bio *bio, void *context)
accounted_request(cache, bio);
 }
 -static void defer_writethrough_bio(struct cache *cache, struct bio *bio)
-{
-   unsigned long flags;
-
-   spin_lock_irqsave(>lock, flags);
-   bio_list_add(>deferred_writethrough_bios, bio);
-   spin_unlock_irqrestore(>lock, flags);
-
-   wake_deferred_writethrough_worker(cache);
-}
-
-static void writethrough_endio(struct bio *bio)
-{
-   struct per_bio_data *pb = get_per_bio_data(bio, PB_DATA_SIZE_WT);
-
-   dm_unhook_bio(>hook_info, bio);
-
-   if (bio->bi_status) {
-   bio_endio(bio);
-   return;
-   }
-
-   dm_bio_restore(>bio_details, bio);
-   remap_to_cache(pb->cache, bio, pb->cblock);
-
-   /*
-* We can't issue this bio directly, since we're in interrupt
-* context.  So it gets put on a bio list for processing by the
-* worker thread.
-*/
-   defer_writethrough_bio(pb->cache, bio);
-}
-
 /*
- * FIXME: send in parallel, huge latency as is.
  * When running in writethrough mode we need to send writes to clean 
blocks

- * to both the cache and origin devices.  I

Re: (resend) [PATCH [linux-4.14.y]] dm cache: submit writethrough writes in parallel to origin and cache

2020-07-30 Thread John Donnelly



> On Jul 30, 2020, at 12:21 AM, Greg KH  wrote:
> 
> On Wed, Jul 29, 2020 at 06:45:46PM -0500, John Donnelly wrote:
>> 
>> 
>> On 7/29/20 9:16 AM, Mike Snitzer wrote:
>>> On Wed, Jul 29 2020 at  7:55am -0400,
>>> Greg KH  wrote:
>>> 
>>>> On Wed, Jul 29, 2020 at 01:51:19PM +0200, Greg KH wrote:
>>>>> On Mon, Jul 27, 2020 at 11:00:14AM -0400, Mike Snitzer wrote:
>>>>>> This mail needs to be saent to sta...@vger.kernel.org (now cc'd).
>>>>>> 
>>>>>> Greg et al: please backport 2df3bae9a6543e90042291707b8db0cbfbae9ee9
>>>>> 
>>>>> Now backported, thanks.
>>>> 
>>>> Nope, it broke the build, I need something that actually works :)
>>>> 
>>> 
>>> OK, I'll defer to John Donnelly to get back with you (and rest of
>>> stable@).  He is more invested due to SUSE also having this issue.  I
>>> can put focus to it if John cannot sort this out.
>>> 
>>> Mike
>>> 
>> 
>> 
>> Hi.
>> 
>> 
>> Thank you for reaching out.
>> 
>> What specifically is broken? . If it that applying
>> 2df3bae9a6543e90042291707b8db0cbfbae9ee9 to 4.14.y is failing?
> 
> yes, try it yourself and see!

 Hi . 

 Yes .  

  2df3bae9a6543e90042291707b8db0cbfbae9ee9

 Needs refactored to work in 4.14.y (now .190) as there is a conflict in 
arguments as noted in my original submittal ;-) . 
 I also noticed there are warning to functions " defined but not used 
[-Wunused-function] “  too.

 Do you want another PATCH v2 message  in a new email thread,  or can I  append 
it to this this thread ?

Please advice.

Thanks.
JD.





 





Re: (resend) [PATCH [linux-4.14.y]] dm cache: submit writethrough writes in parallel to origin and cache

2020-07-29 Thread John Donnelly




On 7/29/20 9:16 AM, Mike Snitzer wrote:

On Wed, Jul 29 2020 at  7:55am -0400,
Greg KH  wrote:


On Wed, Jul 29, 2020 at 01:51:19PM +0200, Greg KH wrote:

On Mon, Jul 27, 2020 at 11:00:14AM -0400, Mike Snitzer wrote:

This mail needs to be saent to sta...@vger.kernel.org (now cc'd).

Greg et al: please backport 2df3bae9a6543e90042291707b8db0cbfbae9ee9


Now backported, thanks.


Nope, it broke the build, I need something that actually works :)



OK, I'll defer to John Donnelly to get back with you (and rest of
stable@).  He is more invested due to SUSE also having this issue.  I
can put focus to it if John cannot sort this out.

Mike




Hi.


Thank you for reaching out.

What specifically is broken? . If it that applying 
2df3bae9a6543e90042291707b8db0cbfbae9ee9 to 4.14.y is failing?


JD.



Re: (resend) [PATCH [linux-4.14.y]] dm cache: submit writethrough writes in parallel to origin and cache

2020-07-27 Thread John Donnelly



> On Jul 27, 2020, at 3:17 PM, Sasha Levin  wrote:
> 
> On Mon, Jul 27, 2020 at 02:38:52PM -0500, John Donnelly wrote:
>> 
>> 
>>> On Jul 27, 2020, at 2:18 PM, Sasha Levin  wrote:
>>> 
>>> On Mon, Jul 27, 2020 at 11:00:14AM -0400, Mike Snitzer wrote:
>>>> This mail needs to be saent to sta...@vger.kernel.org (now cc'd).
>>>> 
>>>> Greg et al: please backport 2df3bae9a6543e90042291707b8db0cbfbae9ee9
>>> 
>>> Hm, what's the issue that this patch addresses? It's not clear from the
>>> commit message.
>>> 
>>> --
>>> Thanks,
>>> Sasha
>> 
>> HI Sasha ,
>> 
>> In an off-line conversation I had with Mike , he indicated that :
>> 
>> 
>> commit 1b17159e52bb31f982f82a6278acd7fab1d3f67b
>> Author: Mike Snitzer 
>> Date:   Fri Feb 28 18:00:53 2020 -0500
>> 
>>  dm bio record: save/restore bi_end_io and bi_integrity
>> 
>> 
>> commit 248aa2645aa7fc9175d1107c2593cc90d4af5a4e
>> Author: Mike Snitzer 
>> Date:   Fri Feb 28 18:11:53 2020 -0500
>> 
>>  dm integrity: use dm_bio_record and dm_bio_restore
>> 
>> 
>> Were picked up  in  "stable" kernels picked up even though
>> neither was marked for sta...@vger.kernel.org
>> 
>> Adding this missing  commit :
>> 
>> 2df3bae9a6543e90042291707b8db0cbfbae9ee9
>> 
>> 
>> Completes the series
> 
> Should we just revert those two commits instead if they're not needed?
> 
> -- 
> Thanks,
> Sasha

  As I stated above:

> Fixes: 705559706d62038b74c5088114c1799cf2c9dce8 (dm bio record:
> save/restore bi_end_io and bi_integrity, version 4.14.188)
> 
> 70555970 introduced a mkfs.ext4 hang on a LVM device that has been
> modified with lvconvert --cachemode=writethrough.

  It corrects an issue we discovered in 4.14.188 .Any other branches those 
two commits have migrated to will likely have the same regression. 

I am confident linux-4.14.y will  be better off with it ;-) 






re: (resend) [PATCH [linux-4.14.y]] dm cache: submit writethrough writes in parallel to origin and cache

2020-07-27 Thread John Donnelly



> On Jul 27, 2020, at 2:18 PM, Sasha Levin  wrote:
> 
> On Mon, Jul 27, 2020 at 11:00:14AM -0400, Mike Snitzer wrote:
>> This mail needs to be saent to sta...@vger.kernel.org (now cc'd).
>> 
>> Greg et al: please backport 2df3bae9a6543e90042291707b8db0cbfbae9ee9
> 
> Hm, what's the issue that this patch addresses? It's not clear from the
> commit message.
> 
> -- 
> Thanks,
> Sasha

HI Sasha ,

In an off-line conversation I had with Mike , he indicated that :


commit 1b17159e52bb31f982f82a6278acd7fab1d3f67b
Author: Mike Snitzer 
Date:   Fri Feb 28 18:00:53 2020 -0500

   dm bio record: save/restore bi_end_io and bi_integrity


commit 248aa2645aa7fc9175d1107c2593cc90d4af5a4e
Author: Mike Snitzer 
Date:   Fri Feb 28 18:11:53 2020 -0500

   dm integrity: use dm_bio_record and dm_bio_restore


Were picked up  in  "stable" kernels picked up even though 
neither was marked for sta...@vger.kernel.org 

Adding this missing  commit :  

 2df3bae9a6543e90042291707b8db0cbfbae9ee9


Completes the series 


Thank you ,


John.




(resend) [PATCH [linux-4.14.y]] dm cache: submit writethrough writes in parallel to origin and cache

2020-07-27 Thread John Donnelly

From: Mike Snitzer 

Discontinue issuing writethrough write IO in series to the origin and
then cache.

Use bio_clone_fast() to create a new origin clone bio that will be
mapped to the origin device and then bio_chain() it to the bio that gets
remapped to the cache device. The origin clone bio does _not_ have a
copy of the per_bio_data -- as such check_if_tick_bio_needed() will not
be called.

The cache bio (parent bio) will not complete until the origin bio has
completed -- this fulfills bio_clone_fast()'s requirements as well as
the requirement to not complete the original IO until the write IO has
completed to both the origin and cache device.

Signed-off-by: Mike Snitzer 

(cherry picked from commit 2df3bae9a6543e90042291707b8db0cbfbae9ee9)

Fixes: 705559706d62038b74c5088114c1799cf2c9dce8 (dm bio record:
save/restore bi_end_io and bi_integrity, version 4.14.188)

70555970 introduced a mkfs.ext4 hang on a LVM device that has been
modified with lvconvert --cachemode=writethrough.

Signed-off-by: John Donnelly 
Tested-by: John Donnelly 
Reviewed-by: Somasundaram Krishnasamy 

conflict: drivers/md/dm-cache-target.c - Corrected syntax of
writethrough_mode(>feature) that was caught by
arm compiler.

cc: sta...@vger.kernel.org
cc: snit...@redhat.com
---
drivers/md/dm-cache-target.c | 54 
1 file changed, 37 insertions(+), 17 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 69cdb29ef6be..8241b7c36655 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -450,6 +450,7 @@ struct cache {
struct work_struct migration_worker;
struct delayed_work waker;
struct dm_bio_prison_v2 *prison;
+ struct bio_set *bs;
mempool_t *migration_pool;
@@ -868,16 +869,23 @@ static void check_if_tick_bio_needed(struct cache 
*cache, struct bio *bio)

spin_unlock_irqrestore(>lock, flags);
}
-static void remap_to_origin_clear_discard(struct cache *cache, struct 
bio *bio,

- dm_oblock_t oblock)
+static void __remap_to_origin_clear_discard(struct cache *cache, struct 
bio *bio,

+ dm_oblock_t oblock, bool bio_has_pbd)
{
- // FIXME: this is called way too much.
- check_if_tick_bio_needed(cache, bio);
+ if (bio_has_pbd)
+ check_if_tick_bio_needed(cache, bio);
remap_to_origin(cache, bio);
if (bio_data_dir(bio) == WRITE)
clear_discard(cache, oblock_to_dblock(cache, oblock));
}
+static void remap_to_origin_clear_discard(struct cache *cache, struct 
bio *bio,

+ dm_oblock_t oblock)
+{
+ // FIXME: check_if_tick_bio_needed() is called way too much through 
this interface

+ __remap_to_origin_clear_discard(cache, bio, oblock, true);
+}
+
static void remap_to_cache_dirty(struct cache *cache, struct bio *bio,
dm_oblock_t oblock, dm_cblock_t cblock)
{
@@ -971,23 +979,25 @@ static void writethrough_endio(struct bio *bio)
}
/*
- * FIXME: send in parallel, huge latency as is.
* When running in writethrough mode we need to send writes to clean blocks
- * to both the cache and origin devices. In future we'd like to clone the
- * bio and send them in parallel, but for now we're doing them in
- * series as this is easier.
+ * to both the cache and origin devices. Clone the bio and send them in 
parallel.

*/
-static void remap_to_origin_then_cache(struct cache *cache, struct bio 
*bio,

- dm_oblock_t oblock, dm_cblock_t cblock)
+static void remap_to_origin_and_cache(struct cache *cache, struct bio *bio,
+ dm_oblock_t oblock, dm_cblock_t cblock)
{
- struct per_bio_data *pb = get_per_bio_data(bio, PB_DATA_SIZE_WT);
+ struct bio *origin_bio = bio_clone_fast(bio, GFP_NOIO, cache->bs);
- pb->cache = cache;
- pb->cblock = cblock;
- dm_hook_bio(>hook_info, bio, writethrough_endio, NULL);
- dm_bio_record(>bio_details, bio);
+ BUG_ON(!origin_bio);
- remap_to_origin_clear_discard(pb->cache, bio, oblock);
+ bio_chain(origin_bio, bio);
+ /*
+ * Passing false to __remap_to_origin_clear_discard() skips
+ * all code that might use per_bio_data (since clone doesn't have it)
+ */
+ __remap_to_origin_clear_discard(cache, origin_bio, oblock, false);
+ submit_bio(origin_bio);
+
+ remap_to_cache(cache, bio, cblock);
}
/*
@@ -1873,7 +1883,7 @@ static int map_bio(struct cache *cache, struct bio 
*bio, dm_oblock_t block,

} else {
if (bio_data_dir(bio) == WRITE && writethrough_mode(>features) &&
!is_dirty(cache, cblock)) {
- remap_to_origin_then_cache(cache, bio, block, cblock);
+ remap_to_origin_and_cache(cache, bio, block, cblock);
accounted_begin(cache, bio);
} else
remap_to_cache_dirty(cache, bio, block, cblock);
@@ -2132,6 +2142,9 @@ static void destroy(struct cache *cache)
kfree(cache->ctr_args[i]);
kfree(cache->ctr_args);
+ if (cache->bs)
+ bioset_free(cache->bs);
+
kfree(cache);
}
@@ -2589,6 +2602,13 @@ static int cache_create(struct cache_args *ca, 
struct cache **result)

cache->features = ca->features;
ti->per_io_data_size =

Re: [PATCH v10 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-07-27 Thread John Donnelly



On 7/3/20 3:38 AM, chenzhou wrote:

Hi Bhupesh,


On 2020/7/3 15:26, Bhupesh Sharma wrote:

Hi Chen,

On Fri, Jul 3, 2020 at 9:24 AM Chen Zhou  wrote:

This patch series enable reserving crashkernel above 4G in arm64.

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
when there is no enough low memory.
2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
in this case, if swiotlb or DMA buffers are required, crash dump kernel
will boot failure because there is no low memory available for allocation.
3. commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32") broken
the arm64 kdump. If the memory reserved for crash dump kernel falled in
ZONE_DMA32, the devices in crash dump kernel need to use ZONE_DMA will alloc
fail.

To solve these issues, introduce crashkernel=X,low to reserve specified
size low memory.
Crashkernel=X tries to reserve memory for the crash dump kernel under
4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
size low memory for crash kdump kernel devices firstly and then reserve
memory above 4G.

When crashkernel is reserved above 4G in memory and crashkernel=X,low
is specified simultaneously, kernel should reserve specified size low memory
for crash dump kernel devices. So there may be two crash kernel regions, one
is below 4G, the other is above 4G.
In order to distinct from the high region and make no effect to the use of
kexec-tools, rename the low region as "Crash kernel (low)", and pass the
low region by reusing DT property "linux,usable-memory-range". We made the low
memory region as the last range of "linux,usable-memory-range" to keep
compatibility with existing user-space and older kdump kernels.

Besides, we need to modify kexec-tools:
arm64: support more than one crash kernel regions(see [1])

Another update is document about DT property 'linux,usable-memory-range':
schemas: update 'linux,usable-memory-range' node schema(see [2])

The previous changes and discussions can be retrieved from:

Changes since [v9]
- Patch 1 add Acked-by from Dave.
- Update patch 5 according to Dave's comments.
- Update chosen schema.

Changes since [v8]
- Reuse DT property "linux,usable-memory-range".
Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
memory region.
- Fix kdump broken with ZONE_DMA reintroduced.
- Update chosen schema.

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt.
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
suggested by Arnd.
- Add Tested-by from Jhon and pk.

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: 
https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-June/020737.html__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su4V80IBu$
[2]: 
https://urldefense.com/v3/__https://github.com/robherring/dt-schema/pull/19__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su3Exu3Pr$
[v1]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su_RTeG6n$
[v2]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su3HI0hvE$
[v3]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su-DmOkg5$
[v4]: 

[PATCH [linux-4.14.y]] dm cache: submit writethrough writes in parallel to origin and cache

2020-07-15 Thread John Donnelly

From: Mike Snitzer 

Discontinue issuing writethrough write IO in series to the origin and
then cache.

Use bio_clone_fast() to create a new origin clone bio that will be
mapped to the origin device and then bio_chain() it to the bio that gets
remapped to the cache device. The origin clone bio does _not_ have a
copy of the per_bio_data -- as such check_if_tick_bio_needed() will not
be called.

The cache bio (parent bio) will not complete until the origin bio has
completed -- this fulfills bio_clone_fast()'s requirements as well as
the requirement to not complete the original IO until the write IO has
completed to both the origin and cache device.

Signed-off-by: Mike Snitzer 

(cherry picked from commit 2df3bae9a6543e90042291707b8db0cbfbae9ee9)

Fixes: 705559706d62038b74c5088114c1799cf2c9dce8 (dm bio record:
save/restore bi_end_io and bi_integrity, version 4.14.188)

70555970 introduced a mkfs.ext4 hang on a LVM device that has been
modified with lvconvert --cachemode=writethrough.

Signed-off-by: John Donnelly 
Tested-by: John Donnelly 
Reviewed-by: Somasundaram Krishnasamy 

conflict: drivers/md/dm-cache-target.c - Corrected syntax of
writethrough_mode(>feature) that was caught by
arm compiler.

cc: sta...@vger.kernel.org
cc: snit...@redhat.com
---
drivers/md/dm-cache-target.c | 54 
1 file changed, 37 insertions(+), 17 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 69cdb29ef6be..8241b7c36655 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -450,6 +450,7 @@ struct cache {
struct work_struct migration_worker;
struct delayed_work waker;
struct dm_bio_prison_v2 *prison;
+ struct bio_set *bs;
mempool_t *migration_pool;
@@ -868,16 +869,23 @@ static void check_if_tick_bio_needed(struct cache 
*cache, struct bio *bio)

spin_unlock_irqrestore(>lock, flags);
}
-static void remap_to_origin_clear_discard(struct cache *cache, struct 
bio *bio,

- dm_oblock_t oblock)
+static void __remap_to_origin_clear_discard(struct cache *cache, struct 
bio *bio,

+ dm_oblock_t oblock, bool bio_has_pbd)
{
- // FIXME: this is called way too much.
- check_if_tick_bio_needed(cache, bio);
+ if (bio_has_pbd)
+ check_if_tick_bio_needed(cache, bio);
remap_to_origin(cache, bio);
if (bio_data_dir(bio) == WRITE)
clear_discard(cache, oblock_to_dblock(cache, oblock));
}
+static void remap_to_origin_clear_discard(struct cache *cache, struct 
bio *bio,

+ dm_oblock_t oblock)
+{
+ // FIXME: check_if_tick_bio_needed() is called way too much through 
this interface

+ __remap_to_origin_clear_discard(cache, bio, oblock, true);
+}
+
static void remap_to_cache_dirty(struct cache *cache, struct bio *bio,
dm_oblock_t oblock, dm_cblock_t cblock)
{
@@ -971,23 +979,25 @@ static void writethrough_endio(struct bio *bio)
}
/*
- * FIXME: send in parallel, huge latency as is.
* When running in writethrough mode we need to send writes to clean blocks
- * to both the cache and origin devices. In future we'd like to clone the
- * bio and send them in parallel, but for now we're doing them in
- * series as this is easier.
+ * to both the cache and origin devices. Clone the bio and send them in 
parallel.

*/
-static void remap_to_origin_then_cache(struct cache *cache, struct bio 
*bio,

- dm_oblock_t oblock, dm_cblock_t cblock)
+static void remap_to_origin_and_cache(struct cache *cache, struct bio *bio,
+ dm_oblock_t oblock, dm_cblock_t cblock)
{
- struct per_bio_data *pb = get_per_bio_data(bio, PB_DATA_SIZE_WT);
+ struct bio *origin_bio = bio_clone_fast(bio, GFP_NOIO, cache->bs);
- pb->cache = cache;
- pb->cblock = cblock;
- dm_hook_bio(>hook_info, bio, writethrough_endio, NULL);
- dm_bio_record(>bio_details, bio);
+ BUG_ON(!origin_bio);
- remap_to_origin_clear_discard(pb->cache, bio, oblock);
+ bio_chain(origin_bio, bio);
+ /*
+ * Passing false to __remap_to_origin_clear_discard() skips
+ * all code that might use per_bio_data (since clone doesn't have it)
+ */
+ __remap_to_origin_clear_discard(cache, origin_bio, oblock, false);
+ submit_bio(origin_bio);
+
+ remap_to_cache(cache, bio, cblock);
}
/*
@@ -1873,7 +1883,7 @@ static int map_bio(struct cache *cache, struct bio 
*bio, dm_oblock_t block,

} else {
if (bio_data_dir(bio) == WRITE && writethrough_mode(>features) &&
!is_dirty(cache, cblock)) {
- remap_to_origin_then_cache(cache, bio, block, cblock);
+ remap_to_origin_and_cache(cache, bio, block, cblock);
accounted_begin(cache, bio);
} else
remap_to_cache_dirty(cache, bio, block, cblock);
@@ -2132,6 +2142,9 @@ static void destroy(struct cache *cache)
kfree(cache->ctr_args[i]);
kfree(cache->ctr_args);
+ if (cache->bs)
+ bioset_free(cache->bs);
+
kfree(cache);
}
@@ -2589,6 +2602,13 @@ static int cache_create(struct cache_args *ca, 
struct cache **result)

cache->features = ca->features;
ti->per_io_data_size =

Re: [PATCH v9 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-06-29 Thread John Donnelly
Hi , 

> On Jun 28, 2020, at 3:34 AM, Chen Zhou  wrote:
> 
> This patch series enable reserving crashkernel above 4G in arm64.
> 
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
> when there is no enough low memory.
> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
> in this case, if swiotlb or DMA buffers are required, crash dump kernel
> will boot failure because there is no low memory available for allocation.
> 3. commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32") broken
> the arm64 kdump. If the memory reserved for crash dump kernel falled in
> ZONE_DMA32, the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.
> 
> To solve these issues, introduce crashkernel=X,low to reserve specified
> size low memory.
> Crashkernel=X tries to reserve memory for the crash dump kernel under
> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
> size low memory for crash kdump kernel devices firstly and then reserve
> memory above 4G.
> 
> When crashkernel is reserved above 4G in memory and crashkernel=X,low
> is specified simultaneously, kernel should reserve specified size low memory
> for crash dump kernel devices. So there may be two crash kernel regions, one
> is below 4G, the other is above 4G.
> In order to distinct from the high region and make no effect to the use of
> kexec-tools, rename the low region as "Crash kernel (low)", and pass the
> low region by reusing DT property "linux,usable-memory-range". We made the low
> memory region as the last range of "linux,usable-memory-range" to keep
> compatibility with existing user-space and older kdump kernels.
> 
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
> 
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
> 
> The previous changes and discussions can be retrieved from:
> 
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the 
> low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.

  
  Nice job Chen, 

  Does this need a Ack-by from the Raspberry maintainers on this ZONE_DMA fix  ?

  This activity has been going on for over a year now.  Can we please get this 
finalized and merged ?
 
 Thank you,
 John.



> 
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to 
> Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
> 
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
> 
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 
> 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
> 
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> 
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
> 
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
> 
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
> 
> [1]: 
> https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-June/020737.html__;!!GqivPVa7Brio!OnPQrQrzeDmjwp6OTFe3rN1ddb-AUny-Wq5vlaMfxI3rSentYYQy-2H91dqbw-1A43Ss$
>  
> [2]: 
> https://urldefense.com/v3/__https://github.com/robherring/dt-schema/pull/19__;!!GqivPVa7Brio!OnPQrQrzeDmjwp6OTFe3rN1ddb-AUny-Wq5vlaMfxI3rSentYYQy-2H91dqbw9xTB8yT$
>   
> [v1]: 
> https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!OnPQrQrzeDmjwp6OTFe3rN1ddb-AUny-Wq5vlaMfxI3rSentYYQy-2H91dqbw2EAOIhM$
>  
> [v2]: 
> https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!OnPQrQrzeDmjwp6OTFe3rN1ddb-AUny-Wq5vlaMfxI3rSentYYQy-2H91dqbwx3ILnLL$
>  
> [v3]: 
> 

Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-06-19 Thread John Donnelly



On 6/19/20 3:21 AM, chenzhou wrote:

On 2020/6/19 10:32, John Donnelly wrote:

On 6/4/20 12:01 PM, Nicolas Saenz Julienne wrote:

On Thu, 2020-06-04 at 01:17 +0530, Bhupesh Sharma wrote:

Hi All,

On Wed, Jun 3, 2020 at 9:03 PM John Donnelly 
wrote:

On Jun 3, 2020, at 8:20 AM, chenzhou  wrote:

Hi,


On 2020/6/3 19:47, Prabhakar Kushwaha wrote:

Hi Chen,

On Tue, Jun 2, 2020 at 8:12 PM John Donnelly 
wrote:


On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <
prabhakar.p...@gmail.com> wrote:

On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <
john.p.donne...@oracle.com> wrote:

Hi .  See below !


On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma 
wrote:

Hi John,

On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <
john.p.donne...@oracle.com> wrote:

Hi,


On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:

Hi Chen,

On Thu, May 21, 2020 at 3:05 PM Chen Zhou <
chenzho...@huawei.com> wrote:

This patch series enable reserving crashkernel above 4G in
arm64.

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G,
which will fail
when there is no enough low memory.
2. Currently, crashkernel=Y@X can be used to reserve
crashkernel above 4G,
in this case, if swiotlb or DMA buffers are required,
crash dump kernel
will boot failure because there is no low memory available
for allocation.


We are getting "warn_alloc" [1] warning during boot of kdump
kernel
with bootargs as [2] of primary kernel.
This error observed on ThunderX2  ARM64 platform.

It is observed with latest upstream tag (v5.7-rc3) with this
patch set
and


https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$

Also **without** this patch-set
"


https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$

"

This issue comes whenever crashkernel memory is reserved
after 0xc000_.
More details discussed earlier in


https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
without

any
solution

This patch-set is expected to solve similar kind of issue.
i.e. low memory is only targeted for DMA, swiotlb; So above
mentioned
observation should be considered/fixed. .

--pk

[1]
[   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/
[   30.367696] NET: Registered protocol family 16
[   30.369973] swapper/0: page allocation failure: order:6,
mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
[   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.7.0-rc3+ #121
[   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/
[   30.369984] Call trace:
[   30.369989]  dump_backtrace+0x0/0x1f8
[   30.369991]  show_stack+0x20/0x30
[   30.369997]  dump_stack+0xc0/0x10c
[   30.370001]  warn_alloc+0x10c/0x178
[   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0
xb50
[   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
[   30.370008]  alloc_page_interleave+0x24/0x98
[   30.370011]  alloc_pages_current+0xe4/0x108
[   30.370017]  dma_atomic_pool_init+0x44/0x1a4
[   30.370020]  do_one_initcall+0x54/0x228
[   30.370027]  kernel_init_freeable+0x228/0x2cc
[   30.370031]  kernel_init+0x1c/0x110
[   30.370034]  ret_from_fork+0x10/0x18
[   30.370036] Mem-Info:
[   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
[   30.370064]  active_file:0 inactive_file:0
isolated_file:0
[   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
[   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
[   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
[   30.370064]  free:1537719 free_pcp:219 free_cma:0
[   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB
isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB
isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
writepending:0kB
present:128kB managed:0kB mlocked:0kB kernel_stack:0kB
pagetables:0kB
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   30.370084] lowmem_reserve[]: 0 250 6063 6063
[   30.370090] Node 0 DMA32 free:256000kB min:408kB
low:664kB
high:920kB reserved_highatomic:0KB activ

Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-06-18 Thread John Donnelly



On 6/4/20 12:01 PM, Nicolas Saenz Julienne wrote:

On Thu, 2020-06-04 at 01:17 +0530, Bhupesh Sharma wrote:

Hi All,

On Wed, Jun 3, 2020 at 9:03 PM John Donnelly 
wrote:



On Jun 3, 2020, at 8:20 AM, chenzhou  wrote:

Hi,


On 2020/6/3 19:47, Prabhakar Kushwaha wrote:

Hi Chen,

On Tue, Jun 2, 2020 at 8:12 PM John Donnelly 
wrote:


On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <
prabhakar.p...@gmail.com> wrote:

On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <
john.p.donne...@oracle.com> wrote:

Hi .  See below !


On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma 
wrote:

Hi John,

On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <
john.p.donne...@oracle.com> wrote:

Hi,


On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:

Hi Chen,

On Thu, May 21, 2020 at 3:05 PM Chen Zhou <
chenzho...@huawei.com> wrote:

This patch series enable reserving crashkernel above 4G in
arm64.

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G,
which will fail
when there is no enough low memory.
2. Currently, crashkernel=Y@X can be used to reserve
crashkernel above 4G,
in this case, if swiotlb or DMA buffers are required,
crash dump kernel
will boot failure because there is no low memory available
for allocation.


We are getting "warn_alloc" [1] warning during boot of kdump
kernel
with bootargs as [2] of primary kernel.
This error observed on ThunderX2  ARM64 platform.

It is observed with latest upstream tag (v5.7-rc3) with this
patch set
and


https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$

Also **without** this patch-set
"


https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$

"

This issue comes whenever crashkernel memory is reserved
after 0xc000_.
More details discussed earlier in


https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
   without

any
solution

This patch-set is expected to solve similar kind of issue.
i.e. low memory is only targeted for DMA, swiotlb; So above
mentioned
observation should be considered/fixed. .

--pk

[1]
[   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/
[   30.367696] NET: Registered protocol family 16
[   30.369973] swapper/0: page allocation failure: order:6,
mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
[   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.7.0-rc3+ #121
[   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/
[   30.369984] Call trace:
[   30.369989]  dump_backtrace+0x0/0x1f8
[   30.369991]  show_stack+0x20/0x30
[   30.369997]  dump_stack+0xc0/0x10c
[   30.370001]  warn_alloc+0x10c/0x178
[   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0
xb50
[   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
[   30.370008]  alloc_page_interleave+0x24/0x98
[   30.370011]  alloc_pages_current+0xe4/0x108
[   30.370017]  dma_atomic_pool_init+0x44/0x1a4
[   30.370020]  do_one_initcall+0x54/0x228
[   30.370027]  kernel_init_freeable+0x228/0x2cc
[   30.370031]  kernel_init+0x1c/0x110
[   30.370034]  ret_from_fork+0x10/0x18
[   30.370036] Mem-Info:
[   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
[   30.370064]  active_file:0 inactive_file:0
isolated_file:0
[   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
[   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
[   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
[   30.370064]  free:1537719 free_pcp:219 free_cma:0
[   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB
isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB
isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
writepending:0kB
present:128kB managed:0kB mlocked:0kB kernel_stack:0kB
pagetables:0kB
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   30.370084] lowmem_reserve[]: 0 250 6063 6063
[   30.370090] Node 0 DMA32 free:256000kB min:408kB
low:664kB
high:920kB reserved_highatomic:0KB active_anon:0kB
inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
writep

Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-06-04 Thread John Donnelly



On 6/4/20 12:01 PM, Nicolas Saenz Julienne wrote:

On Thu, 2020-06-04 at 01:17 +0530, Bhupesh Sharma wrote:

Hi All,

On Wed, Jun 3, 2020 at 9:03 PM John Donnelly 
wrote:



On Jun 3, 2020, at 8:20 AM, chenzhou  wrote:

Hi,


On 2020/6/3 19:47, Prabhakar Kushwaha wrote:

Hi Chen,

On Tue, Jun 2, 2020 at 8:12 PM John Donnelly 
wrote:


On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <
prabhakar.p...@gmail.com> wrote:

On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <
john.p.donne...@oracle.com> wrote:

Hi .  See below !


On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma 
wrote:

Hi John,

On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <
john.p.donne...@oracle.com> wrote:

Hi,


On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:

Hi Chen,

On Thu, May 21, 2020 at 3:05 PM Chen Zhou <
chenzho...@huawei.com> wrote:

This patch series enable reserving crashkernel above 4G in
arm64.

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G,
which will fail
when there is no enough low memory.
2. Currently, crashkernel=Y@X can be used to reserve
crashkernel above 4G,
in this case, if swiotlb or DMA buffers are required,
crash dump kernel
will boot failure because there is no low memory available
for allocation.


We are getting "warn_alloc" [1] warning during boot of kdump
kernel
with bootargs as [2] of primary kernel.
This error observed on ThunderX2  ARM64 platform.

It is observed with latest upstream tag (v5.7-rc3) with this
patch set
and


https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$

Also **without** this patch-set
"


https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$

"

This issue comes whenever crashkernel memory is reserved
after 0xc000_.
More details discussed earlier in


https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
   without

any
solution

This patch-set is expected to solve similar kind of issue.
i.e. low memory is only targeted for DMA, swiotlb; So above
mentioned
observation should be considered/fixed. .

--pk

[1]
[   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/
[   30.367696] NET: Registered protocol family 16
[   30.369973] swapper/0: page allocation failure: order:6,
mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
[   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.7.0-rc3+ #121
[   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/
[   30.369984] Call trace:
[   30.369989]  dump_backtrace+0x0/0x1f8
[   30.369991]  show_stack+0x20/0x30
[   30.369997]  dump_stack+0xc0/0x10c
[   30.370001]  warn_alloc+0x10c/0x178
[   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0
xb50
[   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
[   30.370008]  alloc_page_interleave+0x24/0x98
[   30.370011]  alloc_pages_current+0xe4/0x108
[   30.370017]  dma_atomic_pool_init+0x44/0x1a4
[   30.370020]  do_one_initcall+0x54/0x228
[   30.370027]  kernel_init_freeable+0x228/0x2cc
[   30.370031]  kernel_init+0x1c/0x110
[   30.370034]  ret_from_fork+0x10/0x18
[   30.370036] Mem-Info:
[   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
[   30.370064]  active_file:0 inactive_file:0
isolated_file:0
[   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
[   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
[   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
[   30.370064]  free:1537719 free_pcp:219 free_cma:0
[   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB
isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB
isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
writepending:0kB
present:128kB managed:0kB mlocked:0kB kernel_stack:0kB
pagetables:0kB
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   30.370084] lowmem_reserve[]: 0 250 6063 6063
[   30.370090] Node 0 DMA32 free:256000kB min:408kB
low:664kB
high:920kB reserved_highatomic:0KB active_anon:0kB
inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
writep

Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-06-03 Thread John Donnelly



> On Jun 3, 2020, at 8:20 AM, chenzhou  wrote:
> 
> Hi,
> 
> 
> On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
>> Hi Chen,
>> 
>> On Tue, Jun 2, 2020 at 8:12 PM John Donnelly  
>> wrote:
>>> 
>>> 
>>>> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha  
>>>> wrote:
>>>> 
>>>> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly  
>>>> wrote:
>>>>> Hi .  See below !
>>>>> 
>>>>>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma  wrote:
>>>>>> 
>>>>>> Hi John,
>>>>>> 
>>>>>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly 
>>>>>>  wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> 
>>>>>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>>>>>>> Hi Chen,
>>>>>>>> 
>>>>>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou  
>>>>>>>> wrote:
>>>>>>>>> This patch series enable reserving crashkernel above 4G in arm64.
>>>>>>>>> 
>>>>>>>>> There are following issues in arm64 kdump:
>>>>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will 
>>>>>>>>> fail
>>>>>>>>> when there is no enough low memory.
>>>>>>>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel 
>>>>>>>>> above 4G,
>>>>>>>>> in this case, if swiotlb or DMA buffers are required, crash dump 
>>>>>>>>> kernel
>>>>>>>>> will boot failure because there is no low memory available for 
>>>>>>>>> allocation.
>>>>>>>>> 
>>>>>>>> We are getting "warn_alloc" [1] warning during boot of kdump kernel
>>>>>>>> with bootargs as [2] of primary kernel.
>>>>>>>> This error observed on ThunderX2  ARM64 platform.
>>>>>>>> 
>>>>>>>> It is observed with latest upstream tag (v5.7-rc3) with this patch set
>>>>>>>> and 
>>>>>>>> https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
>>>>>>>> Also **without** this patch-set
>>>>>>>> "https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$;
>>>>>>>> 
>>>>>>>> This issue comes whenever crashkernel memory is reserved after 
>>>>>>>> 0xc000_.
>>>>>>>> More details discussed earlier in
>>>>>>>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
>>>>>>>>   without any
>>>>>>>> solution
>>>>>>>> 
>>>>>>>> This patch-set is expected to solve similar kind of issue.
>>>>>>>> i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
>>>>>>>> observation should be considered/fixed. .
>>>>>>>> 
>>>>>>>> --pk
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/
>>>>>>>> [   30.367696] NET: Registered protocol family 16
>>>>>>>> [   30.369973] swapper/0: page allocation failure: order:6,
>>>>>>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>>>>>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3+ 
>>>>>>>> #121
>>>>>>>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/
>>>>>>>> [   30.369984] Call trace:
>>>>>>>> [   30.369989]  dump_backtrace+0x0/0x1f8
>>>>>>>> [   30.369991]  show_stack+0x20/0x30
>>>>>>>&g

Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-06-02 Thread John Donnelly



> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha  
> wrote:
> 
> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly  
> wrote:
>> 
>> Hi .  See below !
>> 
>>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma  wrote:
>>> 
>>> Hi John,
>>> 
>>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly  
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> 
>>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>>>> Hi Chen,
>>>>> 
>>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou  wrote:
>>>>>> This patch series enable reserving crashkernel above 4G in arm64.
>>>>>> 
>>>>>> There are following issues in arm64 kdump:
>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>>>>>> when there is no enough low memory.
>>>>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 
>>>>>> 4G,
>>>>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>>>>>> will boot failure because there is no low memory available for 
>>>>>> allocation.
>>>>>> 
>>>>>> To solve these issues, introduce crashkernel=X,low to reserve specified
>>>>>> size low memory.
>>>>>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>>>>>> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
>>>>>> size low memory for crash kdump kernel devices firstly and then reserve
>>>>>> memory above 4G.
>>>>>> 
>>>>>> When crashkernel is reserved above 4G in memory, that is, 
>>>>>> crashkernel=X,low
>>>>>> is specified simultaneously, kernel should reserve specified size low 
>>>>>> memory
>>>>>> for crash dump kernel devices. So there may be two crash kernel regions, 
>>>>>> one
>>>>>> is below 4G, the other is above 4G.
>>>>>> In order to distinct from the high region and make no effect to the use 
>>>>>> of
>>>>>> kexec-tools, rename the low region as "Crash kernel (low)", and add DT 
>>>>>> property
>>>>>> "linux,low-memory-range" to crash dump kernel's dtb to pass the low 
>>>>>> region.
>>>>>> 
>>>>>> Besides, we need to modify kexec-tools:
>>>>>> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
>>>>>> 
>>>>>> The previous changes and discussions can be retrieved from:
>>>>>> 
>>>>>> Changes since [v7]
>>>>>> - Move x86 CRASH_ALIGN to 2M
>>>>>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
>>>>>> - Update Documentation/devicetree/bindings/chosen.txt
>>>>>> Add corresponding documentation to 
>>>>>> Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
>>>>>> - Add Tested-by from Jhon and pk
>>>>>> 
>>>>>> Changes since [v6]
>>>>>> - Fix build errors reported by kbuild test robot.
>>>>>> 
>>>>>> Changes since [v5]
>>>>>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>>>>>> - Delete crashkernel=X,high.
>>>>>> - Modify crashkernel=X,low.
>>>>>> If crashkernel=X,low is specified simultaneously, reserve spcified size 
>>>>>> low
>>>>>> memory for crash kdump kernel devices firstly and then reserve memory 
>>>>>> above 4G.
>>>>>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, 
>>>>>> and then
>>>>>> pass to crash dump kernel by DT property "linux,low-memory-range".
>>>>>> - Update Documentation/admin-guide/kdump/kdump.rst.
>>>>>> 
>>>>>> Changes since [v4]
>>>>>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>>>>>> 
>>>>>> Changes since [v3]
>>>>>> - Add memblock_cap_memory_ranges back for multiple ranges.
>>>>>> - Fix some compiling warnings.
>>>>>> 
>>>>>> Changes since [v2]
>>>>>> - Split patch "arm64: kdump: 

Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-06-01 Thread John Donnelly
Hi .  See below ! 

> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma  wrote:
> 
> Hi John,
> 
> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly  
> wrote:
>> 
>> Hi,
>> 
>> 
>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>> Hi Chen,
>>> 
>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou  wrote:
>>>> This patch series enable reserving crashkernel above 4G in arm64.
>>>> 
>>>> There are following issues in arm64 kdump:
>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>>>> when there is no enough low memory.
>>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>>>> will boot failure because there is no low memory available for allocation.
>>>> 
>>>> To solve these issues, introduce crashkernel=X,low to reserve specified
>>>> size low memory.
>>>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>>>> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
>>>> size low memory for crash kdump kernel devices firstly and then reserve
>>>> memory above 4G.
>>>> 
>>>> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
>>>> is specified simultaneously, kernel should reserve specified size low 
>>>> memory
>>>> for crash dump kernel devices. So there may be two crash kernel regions, 
>>>> one
>>>> is below 4G, the other is above 4G.
>>>> In order to distinct from the high region and make no effect to the use of
>>>> kexec-tools, rename the low region as "Crash kernel (low)", and add DT 
>>>> property
>>>> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
>>>> 
>>>> Besides, we need to modify kexec-tools:
>>>> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
>>>> 
>>>> The previous changes and discussions can be retrieved from:
>>>> 
>>>> Changes since [v7]
>>>> - Move x86 CRASH_ALIGN to 2M
>>>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
>>>> - Update Documentation/devicetree/bindings/chosen.txt
>>>> Add corresponding documentation to 
>>>> Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
>>>> - Add Tested-by from Jhon and pk
>>>> 
>>>> Changes since [v6]
>>>> - Fix build errors reported by kbuild test robot.
>>>> 
>>>> Changes since [v5]
>>>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>>>> - Delete crashkernel=X,high.
>>>> - Modify crashkernel=X,low.
>>>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
>>>> memory for crash kdump kernel devices firstly and then reserve memory 
>>>> above 4G.
>>>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and 
>>>> then
>>>> pass to crash dump kernel by DT property "linux,low-memory-range".
>>>> - Update Documentation/admin-guide/kdump/kdump.rst.
>>>> 
>>>> Changes since [v4]
>>>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>>>> 
>>>> Changes since [v3]
>>>> - Add memblock_cap_memory_ranges back for multiple ranges.
>>>> - Fix some compiling warnings.
>>>> 
>>>> Changes since [v2]
>>>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
>>>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
>>>> patch.
>>>> 
>>>> Changes since [v1]:
>>>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
>>>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
>>>> in fdt_enforce_memory_region().
>>>> There are at most two crash kernel regions, for two crash kernel regions
>>>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
>>>> and then remove the memory range in the middle.
>>>> 
>>>> [1]: 
>>>> https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvpn1uM1$
>>>> [v1]: 

Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-06-01 Thread John Donnelly

Hi,


On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:

Hi Chen,

On Thu, May 21, 2020 at 3:05 PM Chen Zhou  wrote:

This patch series enable reserving crashkernel above 4G in arm64.

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
when there is no enough low memory.
2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
in this case, if swiotlb or DMA buffers are required, crash dump kernel
will boot failure because there is no low memory available for allocation.

To solve these issues, introduce crashkernel=X,low to reserve specified
size low memory.
Crashkernel=X tries to reserve memory for the crash dump kernel under
4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
size low memory for crash kdump kernel devices firstly and then reserve
memory above 4G.

When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
is specified simultaneously, kernel should reserve specified size low memory
for crash dump kernel devices. So there may be two crash kernel regions, one
is below 4G, the other is above 4G.
In order to distinct from the high region and make no effect to the use of
kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
"linux,low-memory-range" to crash dump kernel's dtb to pass the low region.

Besides, we need to modify kexec-tools:
arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])

The previous changes and discussions can be retrieved from:

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt 
suggested by Arnd.
- Add Tested-by from Jhon and pk

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: 
https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvpn1uM1$
[v1]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbt0xN9PE$
[v2]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbub7yUQH$
[v3]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbnc4zPPV$
[v4]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvsAsZLu$
[v5]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbl24n-79$
[v6]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbs7r8G2a$
[v7]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiFUH90G$

Chen Zhou (5):
   x86: kdump: move reserve_crashkernel_low() into crash_core.c
   arm64: kdump: reserve crashkenel above 4G for crash dump kernel
   arm64: kdump: add memory for devices by DT property, low-memory-range
   kdump: update Documentation about crashkernel on arm64
   dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump


We are getting "warn_alloc" [1] warning during boot of kdump kernel
with bootargs as [2] of primary kernel.
This error observed on ThunderX2  ARM64 platform.

It is observed with latest 

Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-05-28 Thread John Donnelly



On 5/25/20 8:42 PM, Baoquan He wrote:

On 05/21/20 at 05:38pm, Chen Zhou wrote:

This patch series enable reserving crashkernel above 4G in arm64.

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
when there is no enough low memory.
2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
in this case, if swiotlb or DMA buffers are required, crash dump kernel
will boot failure because there is no low memory available for allocation.

To solve these issues, introduce crashkernel=X,low to reserve specified
size low memory.
Crashkernel=X tries to reserve memory for the crash dump kernel under
4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
size low memory for crash kdump kernel devices firstly and then reserve
memory above 4G.

When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
is specified simultaneously, kernel should reserve specified size low memory
for crash dump kernel devices. So there may be two crash kernel regions, one
is below 4G, the other is above 4G.
In order to distinct from the high region and make no effect to the use of
kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
"linux,low-memory-range" to crash dump kernel's dtb to pass the low region.

Besides, we need to modify kexec-tools:
arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])

The previous changes and discussions can be retrieved from:

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.

OK, moving x86 CRASH_ALIGN to 2M is suggested by Dave. Because
CONFIG_PHYSICAL_ALIGN can be selected from 2M to 16M. So 2M seems good.
But, anyway, we should tell the reason why it need be changed in commit
log.


arch/x86/Kconfig:
config PHYSICAL_ALIGN
 hex "Alignment value to which kernel should be aligned"
 default "0x20"
 range 0x2000 0x100 if X86_32
 range 0x20 0x100 if X86_64


- Update Documentation/devicetree/bindings/chosen.txt
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt 
suggested by Arnd.
- Add Tested-by from Jhon and pk

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.

And the crashkernel=X,high being deleted need be told too. Otherwise
people reading the commit have to check why themselves. I didn't follow
the old version, can't see why ,high can't be specified explicitly.


- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: 
https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJwQs3C4x$
[v1]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ6e-mIEp$
[v2]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJyUVjUta$
[v3]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ3CXBRdT$
[v4]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ7SxW1Vj$
[v5]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJ2wyJ9tj$
[v6]: 
https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!NHQIQVbVz5bR1SSP7U7SwT3uHb6OnycPGa6nM0oLTaQdZT4pjRsjrMjn5GqOJzvGhWBh$
[v7]: 

Re: [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump

2020-05-19 Thread John Donnelly



> On May 19, 2020, at 5:21 AM, Arnd Bergmann  wrote:
> 
> On Thu, Mar 26, 2020 at 4:10 AM Chen Zhou  wrote:
>> 
>> Hi all,
>> 
>> Friendly ping...
> 
> I was asked about this patch series, and see that you last posted it in
> December. I think you should rebase it to linux-5.7-rc6 and post the
> entire series again to make progress, as it's unlikely that any maintainer
> would pick up the patches from last year.
> 
> For the contents, everything seems reasonable to me, but I noticed that
> you are adding a property to the /chosen node without adding the
> corresponding documentation to
> Documentation/devicetree/bindings/chosen.txt
> 
> Please add that, and Cc the devicetree maintainers on the updated
> patch.
> 
> Arnd
> 
>> On 2019/12/23 23:23, Chen Zhou wrote:
>>> This patch series enable reserving crashkernel above 4G in arm64.
>>> 
>>> There are following issues in arm64 kdump:
>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>>> when there is no enough low memory.
>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>>> will boot failure because there is no low memory available for allocation.
>>> 
>>> The previous changes and discussions can be retrieved from:
>>> 
>>> Changes since [v6]
>>> - Fix build errors reported by kbuild test robot.
> ...


 Hi 

We found 

https://lkml.org/lkml/2020/4/30/1583

Has cured our Out-Of-Memory kdump failures. 

FromHenry Willard 
Subject [PATCH] mm: Limit boost_watermark on small zones.

I am currently not on linux-kernel@vger.kernel.org. dlist for all to see  this 
message so you may want to rebase and see if this cures your OoM issue and 
share the results. 










[PATCH v3 ] iommu/vt-d: Fix panic after kexec -p for kdump

2019-10-21 Thread John Donnelly




This cures a panic on restart after a kexec operation on 5.3 and 5.4 
kernels.


The underlying state of the iommu registers (iommu->flags &
VTD_FLAG_TRANS_PRE_ENABLED) on a restart results in a domain being marked as
"DEFER_DEVICE_DOMAIN_INFO" that produces an Oops in identity_mapping().

[   43.654737] BUG: kernel NULL pointer dereference, address:
0056
[   43.655720] #PF: supervisor read access in kernel mode
[   43.655720] #PF: error_code(0x) - not-present page
[   43.655720] PGD 0 P4D 0
[   43.655720] Oops:  [#1] SMP PTI
[   43.655720] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.3.2-1940.el8uek.x86_64 #1
[   43.655720] Hardware name: Oracle Corporation ORACLE SERVER
X5-2/ASM,MOTHERBOARD,1U, BIOS 30140300 09/20/2018
[   43.655720] RIP: 0010:iommu_need_mapping+0x29/0xd0
[   43.655720] Code: 00 0f 1f 44 00 00 48 8b 97 70 02 00 00 48 83 fa ff
74 53 48 8d 4a ff b8 01 00 00 00 48 83 f9 fd 76 01 c3 48 8b 35 7f 58 e0
01 <48> 39 72 58 75 f2 55 48 89 e5 41 54 53 48 8b 87 28 02 00 00 4c 8b
[   43.655720] RSP: 0018:c901b9b0 EFLAGS: 00010246
[   43.655720] RAX: 0001 RBX: 1000 RCX:
fffd
[   43.655720] RDX: fffe RSI: 8880719b8000 RDI:
8880477460b0
[   43.655720] RBP: c901b9e8 R08:  R09:
888047c01700
[   43.655720] R10: 2194036fc692 R11:  R12:

[   43.655720] R13: 8880477460b0 R14: 0cc0 R15:
888072d2b558
[   43.655720] FS:  () GS:888071c0()
knlGS:
[   43.655720] CS:  0010 DS:  ES:  CR0: 80050033
[   43.655720] CR2: 0056 CR3: 7440a002 CR4:
001606b0
[   43.655720] Call Trace:
[   43.655720]  ? intel_alloc_coherent+0x2a/0x180
[   43.655720]  ? __schedule+0x2c2/0x650
[   43.655720]  dma_alloc_attrs+0x8c/0xd0
[   43.655720]  dma_pool_alloc+0xdf/0x200
[   43.655720]  ehci_qh_alloc+0x58/0x130
[   43.655720]  ehci_setup+0x287/0x7ba
[   43.655720]  ? _dev_info+0x6c/0x83
[   43.655720]  ehci_pci_setup+0x91/0x436
[   43.655720]  usb_add_hcd.cold.48+0x1d4/0x754
[   43.655720]  usb_hcd_pci_probe+0x2bc/0x3f0
[   43.655720]  ehci_pci_probe+0x39/0x40
[   43.655720]  local_pci_probe+0x47/0x80
[   43.655720]  pci_device_probe+0xff/0x1b0
[   43.655720]  really_probe+0xf5/0x3a0
[   43.655720]  driver_probe_device+0xbb/0x100
[   43.655720]  device_driver_attach+0x58/0x60
[   43.655720]  __driver_attach+0x8f/0x150
[   43.655720]  ? device_driver_attach+0x60/0x60
[   43.655720]  bus_for_each_dev+0x74/0xb0
[   43.655720]  driver_attach+0x1e/0x20
[   43.655720]  bus_add_driver+0x151/0x1f0
[   43.655720]  ? ehci_hcd_init+0xb2/0xb2
[   43.655720]  ? do_early_param+0x95/0x95
[   43.655720]  driver_register+0x70/0xc0
[   43.655720]  ? ehci_hcd_init+0xb2/0xb2
[   43.655720]  __pci_register_driver+0x57/0x60
[   43.655720]  ehci_pci_init+0x6a/0x6c
[   43.655720]  do_one_initcall+0x4a/0x1fa
[   43.655720]  ? do_early_param+0x95/0x95
[   43.655720]  kernel_init_freeable+0x1bd/0x262
[   43.655720]  ? rest_init+0xb0/0xb0
[   43.655720]  kernel_init+0xe/0x110
[   43.655720]  ret_from_fork+0x24/0x50


Fixes: 8af46c784ecfe ("iommu/vt-d: Implement is_attach_deferred iommu 
ops entry")

Cc: sta...@vger.kernel.org # v5.3+

Signed-off-by: John Donnelly 
Reviewed-by: Lu Baolu 


---
 drivers/iommu/intel-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c4e0e4a9ee9e..f83a9a302f8e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2783,7 +2783,7 @@ static int identity_mapping(struct device *dev)
struct device_domain_info *info;

info = dev->archdata.iommu;
-   if (info && info != DUMMY_DEVICE_DOMAIN_INFO)
+	if (info && info != DUMMY_DEVICE_DOMAIN_INFO && info != 
DEFER_DEVICE_DOMAIN_INFO)

return (info->domain == si_domain);

return 0;
--
2.20.1


--
Thank You,
John


[PATCH v2 ] iommu/vt-d: Fix panic after kexec -p for kdump

2019-10-21 Thread John Donnelly
This cures a panic on restart after a kexec -p operation on 5.3 and 5.4 kernels.

Fixes: 8af46c784ecfe ("iommu/vt-d: Implement is_attach_deferred iommu ops 
entry")

The underlying state of the iommu registers (iommu->flags &
VTD_FLAG_TRANS_PRE_ENABLED) on a restart results in a domain being marked as
"DEFER_DEVICE_DOMAIN_INFO" that produces an Oops in identity_mapping().

[   43.654737] BUG: kernel NULL pointer dereference, address:
0056
[   43.655720] #PF: supervisor read access in kernel mode
[   43.655720] #PF: error_code(0x) - not-present page
[   43.655720] PGD 0 P4D 0
[   43.655720] Oops:  [#1] SMP PTI
[   43.655720] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.3.2-1940.el8uek.x86_64 #1
[   43.655720] Hardware name: Oracle Corporation ORACLE SERVER
X5-2/ASM,MOTHERBOARD,1U, BIOS 30140300 09/20/2018
[   43.655720] RIP: 0010:iommu_need_mapping+0x29/0xd0
[   43.655720] Code: 00 0f 1f 44 00 00 48 8b 97 70 02 00 00 48 83 fa ff
74 53 48 8d 4a ff b8 01 00 00 00 48 83 f9 fd 76 01 c3 48 8b 35 7f 58 e0
01 <48> 39 72 58 75 f2 55 48 89 e5 41 54 53 48 8b 87 28 02 00 00 4c 8b
[   43.655720] RSP: 0018:c901b9b0 EFLAGS: 00010246
[   43.655720] RAX: 0001 RBX: 1000 RCX:
fffd
[   43.655720] RDX: fffe RSI: 8880719b8000 RDI:
8880477460b0
[   43.655720] RBP: c901b9e8 R08:  R09:
888047c01700
[   43.655720] R10: 2194036fc692 R11:  R12:

[   43.655720] R13: 8880477460b0 R14: 0cc0 R15:
888072d2b558
[   43.655720] FS:  () GS:888071c0()
knlGS:
[   43.655720] CS:  0010 DS:  ES:  CR0: 80050033
[   43.655720] CR2: 0056 CR3: 7440a002 CR4:
001606b0
[   43.655720] Call Trace:
[   43.655720]  ? intel_alloc_coherent+0x2a/0x180
[   43.655720]  ? __schedule+0x2c2/0x650
[   43.655720]  dma_alloc_attrs+0x8c/0xd0
[   43.655720]  dma_pool_alloc+0xdf/0x200
[   43.655720]  ehci_qh_alloc+0x58/0x130
[   43.655720]  ehci_setup+0x287/0x7ba
[   43.655720]  ? _dev_info+0x6c/0x83
[   43.655720]  ehci_pci_setup+0x91/0x436
[   43.655720]  usb_add_hcd.cold.48+0x1d4/0x754
[   43.655720]  usb_hcd_pci_probe+0x2bc/0x3f0
[   43.655720]  ehci_pci_probe+0x39/0x40
[   43.655720]  local_pci_probe+0x47/0x80
[   43.655720]  pci_device_probe+0xff/0x1b0
[   43.655720]  really_probe+0xf5/0x3a0
[   43.655720]  driver_probe_device+0xbb/0x100
[   43.655720]  device_driver_attach+0x58/0x60
[   43.655720]  __driver_attach+0x8f/0x150
[   43.655720]  ? device_driver_attach+0x60/0x60
[   43.655720]  bus_for_each_dev+0x74/0xb0
[   43.655720]  driver_attach+0x1e/0x20
[   43.655720]  bus_add_driver+0x151/0x1f0
[   43.655720]  ? ehci_hcd_init+0xb2/0xb2
[   43.655720]  ? do_early_param+0x95/0x95
[   43.655720]  driver_register+0x70/0xc0
[   43.655720]  ? ehci_hcd_init+0xb2/0xb2
[   43.655720]  __pci_register_driver+0x57/0x60
[   43.655720]  ehci_pci_init+0x6a/0x6c
[   43.655720]  do_one_initcall+0x4a/0x1fa
[   43.655720]  ? do_early_param+0x95/0x95
[   43.655720]  kernel_init_freeable+0x1bd/0x262
[   43.655720]  ? rest_init+0xb0/0xb0
[   43.655720]  kernel_init+0xe/0x110
[   43.655720]  ret_from_fork+0x24/0x50
 

Signed-off-by: John Donnelly 
Reviewed-by: Lu Baolu 


---
 drivers/iommu/intel-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c4e0e4a9ee9e..f83a9a302f8e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2783,7 +2783,7 @@ static int identity_mapping(struct device *dev)
struct device_domain_info *info;
 
info = dev->archdata.iommu;
-   if (info && info != DUMMY_DEVICE_DOMAIN_INFO)
+   if (info && info != DUMMY_DEVICE_DOMAIN_INFO && info != 
DEFER_DEVICE_DOMAIN_INFO)
return (info->domain == si_domain);
 
return 0;
-- 
2.20.1

[PATCH] iommu/vt-d: Fix panic after kexec -p for kdump

2019-10-18 Thread John Donnelly
This cures a panic on restart after a kexec -p  operation on 5.3 and 5.4 
kernels.

The underlying state of the iommu registers (iommu->flags &
VTD_FLAG_TRANS_PRE_ENABLED) on a restart results in a domain being marked as
"DEFER_DEVICE_DOMAIN_INFO" that produces an Oops in identity_mapping().

[   43.654737] BUG: kernel NULL pointer dereference, address:
0056
[   43.655720] #PF: supervisor read access in kernel mode
[   43.655720] #PF: error_code(0x) - not-present page
[   43.655720] PGD 0 P4D 0
[   43.655720] Oops:  [#1] SMP PTI
[   43.655720] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.3.2-1940.el8uek.x86_64 #1
[   43.655720] Hardware name: Oracle Corporation ORACLE SERVER
X5-2/ASM,MOTHERBOARD,1U, BIOS 30140300 09/20/2018
[   43.655720] RIP: 0010:iommu_need_mapping+0x29/0xd0
[   43.655720] Code: 00 0f 1f 44 00 00 48 8b 97 70 02 00 00 48 83 fa ff
74 53 48 8d 4a ff b8 01 00 00 00 48 83 f9 fd 76 01 c3 48 8b 35 7f 58 e0
01 <48> 39 72 58 75 f2 55 48 89 e5 41 54 53 48 8b 87 28 02 00 00 4c 8b
[   43.655720] RSP: 0018:c901b9b0 EFLAGS: 00010246
[   43.655720] RAX: 0001 RBX: 1000 RCX:
fffd
[   43.655720] RDX: fffe RSI: 8880719b8000 RDI:
8880477460b0
[   43.655720] RBP: c901b9e8 R08:  R09:
888047c01700
[   43.655720] R10: 2194036fc692 R11:  R12:

[   43.655720] R13: 8880477460b0 R14: 0cc0 R15:
888072d2b558
[   43.655720] FS:  () GS:888071c0()
knlGS:
[   43.655720] CS:  0010 DS:  ES:  CR0: 80050033
[   43.655720] CR2: 0056 CR3: 7440a002 CR4:
001606b0
[   43.655720] Call Trace:
[   43.655720]  ? intel_alloc_coherent+0x2a/0x180
[   43.655720]  ? __schedule+0x2c2/0x650
[   43.655720]  dma_alloc_attrs+0x8c/0xd0
[   43.655720]  dma_pool_alloc+0xdf/0x200
[   43.655720]  ehci_qh_alloc+0x58/0x130
[   43.655720]  ehci_setup+0x287/0x7ba
[   43.655720]  ? _dev_info+0x6c/0x83
[   43.655720]  ehci_pci_setup+0x91/0x436
[   43.655720]  usb_add_hcd.cold.48+0x1d4/0x754
[   43.655720]  usb_hcd_pci_probe+0x2bc/0x3f0
[   43.655720]  ehci_pci_probe+0x39/0x40
[   43.655720]  local_pci_probe+0x47/0x80
[   43.655720]  pci_device_probe+0xff/0x1b0
[   43.655720]  really_probe+0xf5/0x3a0
[   43.655720]  driver_probe_device+0xbb/0x100
[   43.655720]  device_driver_attach+0x58/0x60
[   43.655720]  __driver_attach+0x8f/0x150
[   43.655720]  ? device_driver_attach+0x60/0x60
[   43.655720]  bus_for_each_dev+0x74/0xb0
[   43.655720]  driver_attach+0x1e/0x20
[   43.655720]  bus_add_driver+0x151/0x1f0
[   43.655720]  ? ehci_hcd_init+0xb2/0xb2
[   43.655720]  ? do_early_param+0x95/0x95
[   43.655720]  driver_register+0x70/0xc0
[   43.655720]  ? ehci_hcd_init+0xb2/0xb2
[   43.655720]  __pci_register_driver+0x57/0x60
[   43.655720]  ehci_pci_init+0x6a/0x6c
[   43.655720]  do_one_initcall+0x4a/0x1fa
[   43.655720]  ? do_early_param+0x95/0x95
[   43.655720]  kernel_init_freeable+0x1bd/0x262
[   43.655720]  ? rest_init+0xb0/0xb0
[   43.655720]  kernel_init+0xe/0x110
[   43.655720]  ret_from_fork+0x24/0x50

Signed-off-by: John Donnelly 
---
drivers/iommu/intel-iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c4e0e4a9ee9e..f83a9a302f8e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2783,7 +2783,7 @@ static int identity_mapping(struct device *dev)
struct device_domain_info *info;

info = dev->archdata.iommu;
-   if (info && info != DUMMY_DEVICE_DOMAIN_INFO)
+   if (info && info != DUMMY_DEVICE_DOMAIN_INFO && info != 
DEFER_DEVICE_DOMAIN_INFO)
return (info->domain == si_domain);

return 0;
-- 
2.20.1