Re: [PATCHv5 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents

2020-08-26 Thread Pingfan Liu
Hello guys. Do you have further comments on this version?

Thanks,
Pingfan

On Mon, Aug 10, 2020 at 4:53 PM Pingfan Liu  wrote:
>
> A bug is observed on pseries by taking the following steps on rhel:
> -1. drmgr -c mem -r -q 5
> -2. echo c > /proc/sysrq-trigger
>
> And then, the failure looks like:
> kdump: saving to /sysroot//var/crash/127.0.0.1-2020-01-16-02:06:14/
> kdump: saving vmcore-dmesg.txt
> kdump: saving vmcore-dmesg.txt complete
> kdump: saving vmcore
>  Checking for memory holes : [  0.0 %] /  
>  Checking for memory holes : [100.0 %] |  
>  Excluding unnecessary pages   : [100.0 %] \  
>  Copying data  : [  0.3 %] -  
> eta: 38s[   44.337636] hash-mmu: mm: Hashing failure ! 
> EA=0x7fffba40 access=0x8004 current=makedumpfile
> [   44.337663] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 
> psize 2 pte=0xc0005504
> [   44.337677] hash-mmu: mm: Hashing failure ! EA=0x7fffba40 
> access=0x8004 current=makedumpfile
> [   44.337692] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 
> psize 2 pte=0xc0005504
> [   44.337708] makedumpfile[469]: unhandled signal 7 at 7fffba40 nip 
> 7fffbbc4d7fc lr 00011356ca3c code 2
> [   44.338548] Core dump to |/bin/false pipe failed
> /lib/kdump-lib-initramfs.sh: line 98:   469 Bus error   
> $CORE_COLLECTOR /proc/vmcore 
> $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete
> kdump: saving vmcore failed
>
> * Root cause *
>   After analyzing, it turns out that in the current implementation,
> when hot-removing lmb, the KOBJ_REMOVE event ejects before the dt updating as
> the code __remove_memory() comes before drmem_update_dt().
> So in kdump kernel, when read_from_oldmem() resorts to
> pSeries_lpar_hpte_insert() to install hpte, but fails with -2 due to
> non-exist pfn. And finally, low_hash_fault() raise SIGBUS to process, as it
> can be observed "Bus error"
>
> From a viewpoint of listener and publisher, the publisher notifies the
> listener before data is ready.  This introduces a problem where udev
> launches kexec-tools (due to KOBJ_REMOVE) and loads a stale dt before
> updating. And in capture kernel, makedumpfile will access the memory based
> on the stale dt info, and hit a SIGBUS error due to an un-existed lmb.
>
> * Fix *
> This bug is introduced by commit 063b8b1251fd
> ("powerpc/pseries/memory-hotplug: Only update DT once per memory DLPAR
> request"), which tried to combine all the dt updating into one.
>
> To fix this issue, meanwhile not to introduce a quadratic runtime
> complexity by the model:
>   dlpar_memory_add_by_count
> for_each_drmem_lmb <--
>   dlpar_add_lmb
> drmem_update_dt(_v1|_v2)
>   for_each_drmem_lmb   <--
> The dt should still be only updated once, and just before the last memory
> online/offline event is ejected to user space. Achieve this by tracing the
> num of lmb added or removed.
>
> Signed-off-by: Pingfan Liu 
> Cc: Michael Ellerman 
> Cc: Hari Bathini 
> Cc: Nathan Lynch 
> Cc: Nathan Fontenot 
> Cc: Laurent Dufour 
> To: linuxppc-...@lists.ozlabs.org
> Cc: kexec@lists.infradead.org
> ---
> v4 -> v5: change dlpar_add_lmb()/dlpar_remove_lmb() prototype to report
>   whether dt is updated successfully.
>   Fix a condition boundary check bug
> v3 -> v4: resolve a quadratic runtime complexity issue.
>   This series is applied on next-test branch
>  arch/powerpc/platforms/pseries/hotplug-memory.c | 102 
> +++-
>  1 file changed, 80 insertions(+), 22 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 46cbcd1..1567d9f 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -350,13 +350,22 @@ static bool lmb_is_removable(struct drmem_lmb *lmb)
> return true;
>  }
>
> -static int dlpar_add_lmb(struct drmem_lmb *);
> +enum dt_update_status {
> +   DT_NOUPDATE,
> +   DT_TOUPDATE,
> +   DT_UPDATED,
> +};
> +
> +/* "*dt_update" returns DT_UPDATED if updated */
> +static int dlpar_add_lmb(struct drmem_lmb *lmb,
> +   enum dt_update_status *dt_update);
>
> -static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> +static int dlpar_remove_lmb(struct drmem_lmb *lmb,
> +   enum dt_update_status *dt_update)
>  {
> unsigned long block_sz;
> phys_addr_t base_addr;
> -   int rc, nid;
> +   int rc, ret, nid;
>
> if (!lmb_is_removable(lmb))
> return -EINVAL;
> @@ -372,6 +381,13 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> invalidate_lmb_associativity_index(lmb);
> lmb_clear_nid(lmb);
> lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> +  

Re: [PATCH v2] kexec/kexec.c: Add missing close() call

2020-08-26 Thread Khalid Aziz
On 8/25/20 6:51 PM, Youling Tang wrote:
> Add missing close() call.
> 
> Signed-off-by: Youling Tang 
> ---
>  kexec/kexec.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kexec/kexec.c b/kexec/kexec.c
> index a62b362..bb88caa 100644
> --- a/kexec/kexec.c
> +++ b/kexec/kexec.c
> @@ -585,6 +585,7 @@ static char *slurp_file_generic(const char *filename, 
> off_t *r_size,
>   die("Read on %s ended before stat said it should\n", filename);
>  
>   *r_size = size;
> + close(fd);
>   return buf;
>  }
>  
> @@ -1257,12 +1258,14 @@ static int do_kexec_file_load(int fileind, int argc, 
> char **argv,
>   if (i == file_types) {
>   fprintf(stderr, "Cannot determine the file type " "of %s\n",
>   kernel);
> + close(kernel_fd);
>   return EFAILED;
>   }
>  
>   ret = file_type[i].load(argc, argv, kernel_buf, kernel_size, &info);
>   if (ret < 0) {
>   fprintf(stderr, "Cannot load %s\n", kernel);
> + close(kernel_fd);
>   return ret;
>   }
>  
> 

This looks good to me.

Reviewed-by: Khalid Aziz 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support

2020-08-26 Thread Petr Mladek
On Wed 2020-08-26 14:43:22, John Ogness wrote:
> On 2020-08-26, Sergey Senozhatsky  wrote:
> >>> @@ -1157,6 +1431,14 @@ bool prb_reserve(struct prb_reserved_entry *e, 
> >>> struct printk_ringbuffer *rb,
> >>>   goto fail;
> >>>   }
> >>>  
> >>> + /*
> >>> +  * New data is about to be reserved. Once that happens, previous
> >>> +  * descriptors are no longer able to be extended. Finalize the
> >>> +  * previous descriptor now so that it can be made available to
> >>> +  * readers (when committed).
> >>> +  */
> >>> + desc_finalize(desc_ring, DESC_ID(id - 1));
> >>> +
> >>>   d = to_desc(desc_ring, id);
> >>>  
> >>>   /*
> >> 
> >> Apparently this is not enough to guarantee that past descriptors are
> >> finalized. I am able to reproduce a scenario where the finalization
> >> of a certain descriptor never happens. That leaves the descriptor
> >> permanently in the reserved queried state, which prevents any new
> >> records from being created. I am investigating.
> >
> > Good to know. I also run into problems:
> > - broken dmesg (and broken journalctl -f /dev/kmsg poll) and broken
> >   syslog read
> >
> > $ strace dmesg
> >
> > ...
> > openat(AT_FDCWD, "/dev/kmsg", O_RDONLY|O_NONBLOCK) = 3
> > lseek(3, 0, SEEK_DATA)  = 0
> > read(3, 0x55dda8c240a8, 8191)   = -1 EAGAIN (Resource temporarily 
> > unavailable)
> > close(3)= 0
> > syslog(10 /* SYSLOG_ACTION_SIZE_BUFFER */) = 524288
> > mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
> > = 0x7f43ea847000
> > syslog(3 /* SYSLOG_ACTION_READ_ALL */, "", 524296) = 0
> 
> Yes, this a consequence of the problem. The tail is in the reserved
> queried state, so readers will not advance beyond it.
> 
> This series makes a very naive assumption that the previous descriptor
> is either in the reserved or committed queried states. The fact is, it
> can be in any of the 4 queried states. Adding support for finalization
> of all the states then gets quite complex, since any state transition
> (cmpxchg) may have to deal with an unexpected FINAL flag.
>
> The ringbuffer was designed so that descriptors are completely
> self-contained. So adding logic where an action on one descriptor should
> affect another descriptor is far more complex than I initially expected.
> 
> Keep in mind the finalization concept satisfies 3 things:
> 
> - denote if a record can be extended (i.e. transition back to reserved)
> - denote if a reader may read the record
> - denote if a writer may recycle a record
>
> I have not yet given up on the idea of finalization (particularly
> because it allows mainline LOG_CONT behavior to be preserved locklessy),

In each case, we need to finalize the descriptor also in prb_commit()
when the descriptor is not longer the last reserved one.

It has to be done in two steps to avoid race:

prb_commit()

   + set PRB_COMMIT_MASK
   + check if it is still the last descriptor in the array
   + set PRB_FINAL_MASK when it is not the last descriptor

It should work because prb_reserve() finalizes the previous
descriptor after the new one is reserved. As a result:

   + prb_reserve() should either see PRB_COMMIT_MASK in the previous
 descriptor and be able to finalize it.

   + or prb_commit() will see that the head moved and it is not
 longer the last reserved one.

I keep my fingers crossed.

Best Regards,
Petr

PS: I am still in the middle of review. And I had a feeling that
this situatuation was not handled.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support

2020-08-26 Thread John Ogness
On 2020-08-26, Sergey Senozhatsky  wrote:
>>> @@ -1157,6 +1431,14 @@ bool prb_reserve(struct prb_reserved_entry *e, 
>>> struct printk_ringbuffer *rb,
>>> goto fail;
>>> }
>>>  
>>> +   /*
>>> +* New data is about to be reserved. Once that happens, previous
>>> +* descriptors are no longer able to be extended. Finalize the
>>> +* previous descriptor now so that it can be made available to
>>> +* readers (when committed).
>>> +*/
>>> +   desc_finalize(desc_ring, DESC_ID(id - 1));
>>> +
>>> d = to_desc(desc_ring, id);
>>>  
>>> /*
>> 
>> Apparently this is not enough to guarantee that past descriptors are
>> finalized. I am able to reproduce a scenario where the finalization
>> of a certain descriptor never happens. That leaves the descriptor
>> permanently in the reserved queried state, which prevents any new
>> records from being created. I am investigating.
>
> Good to know. I also run into problems:
> - broken dmesg (and broken journalctl -f /dev/kmsg poll) and broken
>   syslog read
>
> $ strace dmesg
>
> ...
> openat(AT_FDCWD, "/dev/kmsg", O_RDONLY|O_NONBLOCK) = 3
> lseek(3, 0, SEEK_DATA)  = 0
> read(3, 0x55dda8c240a8, 8191)   = -1 EAGAIN (Resource temporarily 
> unavailable)
> close(3)= 0
> syslog(10 /* SYSLOG_ACTION_SIZE_BUFFER */) = 524288
> mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
> 0x7f43ea847000
> syslog(3 /* SYSLOG_ACTION_READ_ALL */, "", 524296) = 0

Yes, this a consequence of the problem. The tail is in the reserved
queried state, so readers will not advance beyond it.

This series makes a very naive assumption that the previous descriptor
is either in the reserved or committed queried states. The fact is, it
can be in any of the 4 queried states. Adding support for finalization
of all the states then gets quite complex, since any state transition
(cmpxchg) may have to deal with an unexpected FINAL flag.

The ringbuffer was designed so that descriptors are completely
self-contained. So adding logic where an action on one descriptor should
affect another descriptor is far more complex than I initially expected.

Keep in mind the finalization concept satisfies 3 things:

- denote if a record can be extended (i.e. transition back to reserved)
- denote if a reader may read the record
- denote if a writer may recycle a record

I have not yet given up on the idea of finalization (particularly
because it allows mainline LOG_CONT behavior to be preserved locklessy),
but I am no longer sure if this is the direction we want to take.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support

2020-08-26 Thread Sergey Senozhatsky
On (20/08/26 10:45), John Ogness wrote:
> On 2020-08-24, John Ogness  wrote:
> > @@ -1157,6 +1431,14 @@ bool prb_reserve(struct prb_reserved_entry *e, 
> > struct printk_ringbuffer *rb,
> > goto fail;
> > }
> >  
> > +   /*
> > +* New data is about to be reserved. Once that happens, previous
> > +* descriptors are no longer able to be extended. Finalize the
> > +* previous descriptor now so that it can be made available to
> > +* readers (when committed).
> > +*/
> > +   desc_finalize(desc_ring, DESC_ID(id - 1));
> > +
> > d = to_desc(desc_ring, id);
> >  
> > /*
> 
> Apparently this is not enough to guarantee that past descriptors are
> finalized. I am able to reproduce a scenario where the finalization of a
> certain descriptor never happens. That leaves the descriptor permanently
> in the reserved queried state, which prevents any new records from being
> created. I am investigating.

Good to know. I also run into problems:
- broken dmesg (and broken journalctl -f /dev/kmsg poll) and broken
  syslog read

$ strace dmesg

...
openat(AT_FDCWD, "/dev/kmsg", O_RDONLY|O_NONBLOCK) = 3
lseek(3, 0, SEEK_DATA)  = 0
read(3, 0x55dda8c240a8, 8191)   = -1 EAGAIN (Resource temporarily 
unavailable)
close(3)= 0
syslog(10 /* SYSLOG_ACTION_SIZE_BUFFER */) = 524288
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f43ea847000
syslog(3 /* SYSLOG_ACTION_READ_ALL */, "", 524296) = 0
munmap(0x7f43ea847000, 528384)  = 0
...

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support

2020-08-26 Thread John Ogness
On 2020-08-24, John Ogness  wrote:
> @@ -1157,6 +1431,14 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
> printk_ringbuffer *rb,
>   goto fail;
>   }
>  
> + /*
> +  * New data is about to be reserved. Once that happens, previous
> +  * descriptors are no longer able to be extended. Finalize the
> +  * previous descriptor now so that it can be made available to
> +  * readers (when committed).
> +  */
> + desc_finalize(desc_ring, DESC_ID(id - 1));
> +
>   d = to_desc(desc_ring, id);
>  
>   /*

Apparently this is not enough to guarantee that past descriptors are
finalized. I am able to reproduce a scenario where the finalization of a
certain descriptor never happens. That leaves the descriptor permanently
in the reserved queried state, which prevents any new records from being
created. I am investigating.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec