Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-20 Thread Rahul Lakkireddy
On Friday, April 04/20/18, 2018 at 19:06:09 +0530, Eric W. Biederman wrote:
> Rahul Lakkireddy  writes:
> 
> > On Thursday, April 04/19/18, 2018 at 20:23:37 +0530, Eric W. Biederman 
> > wrote:
> >> Rahul Lakkireddy  writes:
> >> 
> >> > On Thursday, April 04/19/18, 2018 at 07:10:30 +0530, Dave Young wrote:
> >> >> On 04/18/18 at 06:01pm, Rahul Lakkireddy wrote:
> >> >> > On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young 
> >> >> > wrote:
> >> >> > > Hi Rahul,
> >> >> > > On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> >> >> > > > On production servers running variety of workloads over time, 
> >> >> > > > kernel
> >> >> > > > panic can happen sporadically after days or even months. It is
> >> >> > > > important to collect as much debug logs as possible to root cause
> >> >> > > > and fix the problem, that may not be easy to reproduce. Snapshot 
> >> >> > > > of
> >> >> > > > underlying hardware/firmware state (like register dump, firmware
> >> >> > > > logs, adapter memory, etc.), at the time of kernel panic will be 
> >> >> > > > very
> >> >> > > > helpful while debugging the culprit device driver.
> >> >> > > > 
> >> >> > > > This series of patches add new generic framework that enable 
> >> >> > > > device
> >> >> > > > drivers to collect device specific snapshot of the 
> >> >> > > > hardware/firmware
> >> >> > > > state of the underlying device in the crash recovery kernel. In 
> >> >> > > > crash
> >> >> > > > recovery kernel, the collected logs are added as elf notes to
> >> >> > > > /proc/vmcore, which is copied by user space scripts for 
> >> >> > > > post-analysis.
> >> >> > > > 
> >> >> > > > The sequence of actions done by device drivers to append their 
> >> >> > > > device
> >> >> > > > specific hardware/firmware logs to /proc/vmcore are as follows:
> >> >> > > > 
> >> >> > > > 1. During probe (before hardware is initialized), device drivers
> >> >> > > > register to the vmcore module (via vmcore_add_device_dump()), with
> >> >> > > > callback function, along with buffer size and log name needed for
> >> >> > > > firmware/hardware log collection.
> >> >> > > 
> >> >> > > I assumed the elf notes info should be prepared while 
> >> >> > > kexec_[file_]load
> >> >> > > phase. But I did not read the old comment, not sure if it has been 
> >> >> > > discussed
> >> >> > > or not.
> >> >> > > 
> >> >> > 
> >> >> > We must not collect dumps in crashing kernel. Adding more things in
> >> >> > crash dump path risks not collecting vmcore at all. Eric had
> >> >> > discussed this in more detail at:
> >> >> > 
> >> >> > https://lkml.org/lkml/2018/3/24/319
> >> >> > 
> >> >> > We are safe to collect dumps in the second kernel. Each device dump
> >> >> > will be exported as an elf note in /proc/vmcore.
> >> >> 
> >> >> I understand that we should avoid adding anything in crash path.  And I 
> >> >> also
> >> >> agree to collect device dump in second kernel.  I just assumed device
> >> >> dump use some memory area to store the debug info and the memory
> >> >> is persistent so that this can be done in 2 steps, first register the
> >> >> address in elf header in kexec_load, then collect the dump in 2nd
> >> >> kernel.  But it seems the driver is doing some other logic to collect
> >> >> the info instead of just that simple like I thought. 
> >> >> 
> >> >
> >> > It seems simpler, but I'm concerned with waste of memory area, if
> >> > there are no device dumps being collected in second kernel. In
> >> > approach proposed in these series, we dynamically allocate memory
> >> > for the device dumps from second kernel's available memory.
> >> 
> >> Don't count that kernel having more than about 128MiB.
> >> 
> >
> > If large dump is expected, Administrator can increase the memory
> > allocated to the second kernel (using crashkernel boot param), to
> > ensure device dumps get collected.
> 
> Except 128MiB is already a already a huge amount to reserve.  I
> typically have run crash dumps with 16MiB of memory and thought it was
> overkill.  Looking below 32MiB seems a bit high but it is small enough
> that it is still doable.  I am baffled at how 2GiB can be guaranteed to fit
> in 32MiB (sparse register space?) but if it works reliably.
> 

Yes, we skip portions in on-chip memory dump that do not add to debug
value (such as the large regions reserved for holding Payload data
going through the device) and hence the overall dump size reduces
significantly.

> >> For that reason if for no other it would be nice if it was possible to
> >> have the driver to not initialize the device and just stand there
> >> handing out the data a piece at a time as it is read from /proc/vmcore.
> >> 
> >
> > Since cxgb4 is a network driver, it can be used to transfer the dumps
> > over the network. So we must ensure the dumps get collected and
> > stored, before device gets initialized to transfer dumps over
> > the network.
> 
> Good point.  For some reason 

Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-20 Thread Eric W. Biederman
Rahul Lakkireddy  writes:

> On Thursday, April 04/19/18, 2018 at 20:23:37 +0530, Eric W. Biederman wrote:
>> Rahul Lakkireddy  writes:
>> 
>> > On Thursday, April 04/19/18, 2018 at 07:10:30 +0530, Dave Young wrote:
>> >> On 04/18/18 at 06:01pm, Rahul Lakkireddy wrote:
>> >> > On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
>> >> > > Hi Rahul,
>> >> > > On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
>> >> > > > On production servers running variety of workloads over time, kernel
>> >> > > > panic can happen sporadically after days or even months. It is
>> >> > > > important to collect as much debug logs as possible to root cause
>> >> > > > and fix the problem, that may not be easy to reproduce. Snapshot of
>> >> > > > underlying hardware/firmware state (like register dump, firmware
>> >> > > > logs, adapter memory, etc.), at the time of kernel panic will be 
>> >> > > > very
>> >> > > > helpful while debugging the culprit device driver.
>> >> > > > 
>> >> > > > This series of patches add new generic framework that enable device
>> >> > > > drivers to collect device specific snapshot of the hardware/firmware
>> >> > > > state of the underlying device in the crash recovery kernel. In 
>> >> > > > crash
>> >> > > > recovery kernel, the collected logs are added as elf notes to
>> >> > > > /proc/vmcore, which is copied by user space scripts for 
>> >> > > > post-analysis.
>> >> > > > 
>> >> > > > The sequence of actions done by device drivers to append their 
>> >> > > > device
>> >> > > > specific hardware/firmware logs to /proc/vmcore are as follows:
>> >> > > > 
>> >> > > > 1. During probe (before hardware is initialized), device drivers
>> >> > > > register to the vmcore module (via vmcore_add_device_dump()), with
>> >> > > > callback function, along with buffer size and log name needed for
>> >> > > > firmware/hardware log collection.
>> >> > > 
>> >> > > I assumed the elf notes info should be prepared while 
>> >> > > kexec_[file_]load
>> >> > > phase. But I did not read the old comment, not sure if it has been 
>> >> > > discussed
>> >> > > or not.
>> >> > > 
>> >> > 
>> >> > We must not collect dumps in crashing kernel. Adding more things in
>> >> > crash dump path risks not collecting vmcore at all. Eric had
>> >> > discussed this in more detail at:
>> >> > 
>> >> > https://lkml.org/lkml/2018/3/24/319
>> >> > 
>> >> > We are safe to collect dumps in the second kernel. Each device dump
>> >> > will be exported as an elf note in /proc/vmcore.
>> >> 
>> >> I understand that we should avoid adding anything in crash path.  And I 
>> >> also
>> >> agree to collect device dump in second kernel.  I just assumed device
>> >> dump use some memory area to store the debug info and the memory
>> >> is persistent so that this can be done in 2 steps, first register the
>> >> address in elf header in kexec_load, then collect the dump in 2nd
>> >> kernel.  But it seems the driver is doing some other logic to collect
>> >> the info instead of just that simple like I thought. 
>> >> 
>> >
>> > It seems simpler, but I'm concerned with waste of memory area, if
>> > there are no device dumps being collected in second kernel. In
>> > approach proposed in these series, we dynamically allocate memory
>> > for the device dumps from second kernel's available memory.
>> 
>> Don't count that kernel having more than about 128MiB.
>> 
>
> If large dump is expected, Administrator can increase the memory
> allocated to the second kernel (using crashkernel boot param), to
> ensure device dumps get collected.

Except 128MiB is already a already a huge amount to reserve.  I
typically have run crash dumps with 16MiB of memory and thought it was
overkill.  Looking below 32MiB seems a bit high but it is small enough
that it is still doable.  I am baffled at how 2GiB can be guaranteed to fit
in 32MiB (sparse register space?) but if it works reliably.

>> For that reason if for no other it would be nice if it was possible to
>> have the driver to not initialize the device and just stand there
>> handing out the data a piece at a time as it is read from /proc/vmcore.
>> 
>
> Since cxgb4 is a network driver, it can be used to transfer the dumps
> over the network. So we must ensure the dumps get collected and
> stored, before device gets initialized to transfer dumps over
> the network.

Good point.  For some reason I was thinking it was an infiniband and not
an 10GiB ethernet device.

>> The 2GiB number I read earlier concerns me for working in a limited
>> environment.
>> 
>
> All dumps, including the 2GB on-chip memory dump, is compressed by
> the cxgb4 driver as they are collected. The overall compressed dump
> comes out at max 32 MB.
>
>> It might even make sense to separate this into a completely separate
>> module (depended upon the main driver if it makes sense to share
>> the functionality) so that people performing crash dumps would not
>> hesitate 

Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-20 Thread Rahul Lakkireddy
On Thursday, April 04/19/18, 2018 at 20:23:37 +0530, Eric W. Biederman wrote:
> Rahul Lakkireddy  writes:
> 
> > On Thursday, April 04/19/18, 2018 at 07:10:30 +0530, Dave Young wrote:
> >> On 04/18/18 at 06:01pm, Rahul Lakkireddy wrote:
> >> > On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
> >> > > Hi Rahul,
> >> > > On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> >> > > > On production servers running variety of workloads over time, kernel
> >> > > > panic can happen sporadically after days or even months. It is
> >> > > > important to collect as much debug logs as possible to root cause
> >> > > > and fix the problem, that may not be easy to reproduce. Snapshot of
> >> > > > underlying hardware/firmware state (like register dump, firmware
> >> > > > logs, adapter memory, etc.), at the time of kernel panic will be very
> >> > > > helpful while debugging the culprit device driver.
> >> > > > 
> >> > > > This series of patches add new generic framework that enable device
> >> > > > drivers to collect device specific snapshot of the hardware/firmware
> >> > > > state of the underlying device in the crash recovery kernel. In crash
> >> > > > recovery kernel, the collected logs are added as elf notes to
> >> > > > /proc/vmcore, which is copied by user space scripts for 
> >> > > > post-analysis.
> >> > > > 
> >> > > > The sequence of actions done by device drivers to append their device
> >> > > > specific hardware/firmware logs to /proc/vmcore are as follows:
> >> > > > 
> >> > > > 1. During probe (before hardware is initialized), device drivers
> >> > > > register to the vmcore module (via vmcore_add_device_dump()), with
> >> > > > callback function, along with buffer size and log name needed for
> >> > > > firmware/hardware log collection.
> >> > > 
> >> > > I assumed the elf notes info should be prepared while kexec_[file_]load
> >> > > phase. But I did not read the old comment, not sure if it has been 
> >> > > discussed
> >> > > or not.
> >> > > 
> >> > 
> >> > We must not collect dumps in crashing kernel. Adding more things in
> >> > crash dump path risks not collecting vmcore at all. Eric had
> >> > discussed this in more detail at:
> >> > 
> >> > https://lkml.org/lkml/2018/3/24/319
> >> > 
> >> > We are safe to collect dumps in the second kernel. Each device dump
> >> > will be exported as an elf note in /proc/vmcore.
> >> 
> >> I understand that we should avoid adding anything in crash path.  And I 
> >> also
> >> agree to collect device dump in second kernel.  I just assumed device
> >> dump use some memory area to store the debug info and the memory
> >> is persistent so that this can be done in 2 steps, first register the
> >> address in elf header in kexec_load, then collect the dump in 2nd
> >> kernel.  But it seems the driver is doing some other logic to collect
> >> the info instead of just that simple like I thought. 
> >> 
> >
> > It seems simpler, but I'm concerned with waste of memory area, if
> > there are no device dumps being collected in second kernel. In
> > approach proposed in these series, we dynamically allocate memory
> > for the device dumps from second kernel's available memory.
> 
> Don't count that kernel having more than about 128MiB.
> 

If large dump is expected, Administrator can increase the memory
allocated to the second kernel (using crashkernel boot param), to
ensure device dumps get collected.

> For that reason if for no other it would be nice if it was possible to
> have the driver to not initialize the device and just stand there
> handing out the data a piece at a time as it is read from /proc/vmcore.
> 

Since cxgb4 is a network driver, it can be used to transfer the dumps
over the network. So we must ensure the dumps get collected and
stored, before device gets initialized to transfer dumps over
the network.

> The 2GiB number I read earlier concerns me for working in a limited
> environment.
> 

All dumps, including the 2GB on-chip memory dump, is compressed by
the cxgb4 driver as they are collected. The overall compressed dump
comes out at max 32 MB.

> It might even make sense to separate this into a completely separate
> module (depended upon the main driver if it makes sense to share
> the functionality) so that people performing crash dumps would not
> hesitate to include the code in their initramfs images.
> 
> I can see splitting a device up into a portion only to be used in case
> of a crash dump and a normal portion like we do for main memory but I
> doubt that makes sense in practice.
> 

This is not required, especially in case of network drivers, which
must collect underlying device dump and initialize the device to
transfer dumps over the network.

> >> > > If do this in 2nd kernel a question is driver can be loaded later than 
> >> > > vmcore init.
> >> > 
> >> > Yes, drivers will add their device dumps after vmcore init.
> >> > 
> >> > > How to guarantee the function works if vmcore 

Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-19 Thread Eric W. Biederman
Rahul Lakkireddy  writes:

> On Thursday, April 04/19/18, 2018 at 07:10:30 +0530, Dave Young wrote:
>> On 04/18/18 at 06:01pm, Rahul Lakkireddy wrote:
>> > On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
>> > > Hi Rahul,
>> > > On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
>> > > > On production servers running variety of workloads over time, kernel
>> > > > panic can happen sporadically after days or even months. It is
>> > > > important to collect as much debug logs as possible to root cause
>> > > > and fix the problem, that may not be easy to reproduce. Snapshot of
>> > > > underlying hardware/firmware state (like register dump, firmware
>> > > > logs, adapter memory, etc.), at the time of kernel panic will be very
>> > > > helpful while debugging the culprit device driver.
>> > > > 
>> > > > This series of patches add new generic framework that enable device
>> > > > drivers to collect device specific snapshot of the hardware/firmware
>> > > > state of the underlying device in the crash recovery kernel. In crash
>> > > > recovery kernel, the collected logs are added as elf notes to
>> > > > /proc/vmcore, which is copied by user space scripts for post-analysis.
>> > > > 
>> > > > The sequence of actions done by device drivers to append their device
>> > > > specific hardware/firmware logs to /proc/vmcore are as follows:
>> > > > 
>> > > > 1. During probe (before hardware is initialized), device drivers
>> > > > register to the vmcore module (via vmcore_add_device_dump()), with
>> > > > callback function, along with buffer size and log name needed for
>> > > > firmware/hardware log collection.
>> > > 
>> > > I assumed the elf notes info should be prepared while kexec_[file_]load
>> > > phase. But I did not read the old comment, not sure if it has been 
>> > > discussed
>> > > or not.
>> > > 
>> > 
>> > We must not collect dumps in crashing kernel. Adding more things in
>> > crash dump path risks not collecting vmcore at all. Eric had
>> > discussed this in more detail at:
>> > 
>> > https://lkml.org/lkml/2018/3/24/319
>> > 
>> > We are safe to collect dumps in the second kernel. Each device dump
>> > will be exported as an elf note in /proc/vmcore.
>> 
>> I understand that we should avoid adding anything in crash path.  And I also
>> agree to collect device dump in second kernel.  I just assumed device
>> dump use some memory area to store the debug info and the memory
>> is persistent so that this can be done in 2 steps, first register the
>> address in elf header in kexec_load, then collect the dump in 2nd
>> kernel.  But it seems the driver is doing some other logic to collect
>> the info instead of just that simple like I thought. 
>> 
>
> It seems simpler, but I'm concerned with waste of memory area, if
> there are no device dumps being collected in second kernel. In
> approach proposed in these series, we dynamically allocate memory
> for the device dumps from second kernel's available memory.

Don't count that kernel having more than about 128MiB.

For that reason if for no other it would be nice if it was possible to
have the driver to not initialize the device and just stand there
handing out the data a piece at a time as it is read from /proc/vmcore.

The 2GiB number I read earlier concerns me for working in a limited
environment.

It might even make sense to separate this into a completely separate
module (depended upon the main driver if it makes sense to share
the functionality) so that people performing crash dumps would not
hesitate to include the code in their initramfs images.

I can see splitting a device up into a portion only to be used in case
of a crash dump and a normal portion like we do for main memory but I
doubt that makes sense in practice.

>> > > If do this in 2nd kernel a question is driver can be loaded later than 
>> > > vmcore init.
>> > 
>> > Yes, drivers will add their device dumps after vmcore init.
>> > 
>> > > How to guarantee the function works if vmcore reading happens before
>> > > the driver is loaded?
>> > > 
>> > > Also it is possible that kdump initramfs does not contains the driver
>> > > module.
>> > > 
>> > > Am I missing something?
>> > > 
>> > 
>> > Yes, driver must be in initramfs if it wants to collect and add device
>> > dump to /proc/vmcore in second kernel.
>> 
>> In RH/Fedora kdump scripts we only add the things are required to
>> bring up the dump target, so that we can use as less memory as we can.
>> 
>> For example, if a net driver panicked, and the dump target is rootfs
>> which is a scsi disk, then no network related stuff will be added in
>> initramfs.
>> 
>> In this case the device dump info will be not collected..
>
> Correct. If the driver is not present in initramfs, it can't collect
> its underlying device's dump. Administrator is expected to add the 
> driver to initramfs, if device dump needs to be collected.

That makes sense, as most people won't have that need.  Still if we 

Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-19 Thread Rahul Lakkireddy
On Thursday, April 04/19/18, 2018 at 07:10:30 +0530, Dave Young wrote:
> On 04/18/18 at 06:01pm, Rahul Lakkireddy wrote:
> > On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
> > > Hi Rahul,
> > > On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> > > > On production servers running variety of workloads over time, kernel
> > > > panic can happen sporadically after days or even months. It is
> > > > important to collect as much debug logs as possible to root cause
> > > > and fix the problem, that may not be easy to reproduce. Snapshot of
> > > > underlying hardware/firmware state (like register dump, firmware
> > > > logs, adapter memory, etc.), at the time of kernel panic will be very
> > > > helpful while debugging the culprit device driver.
> > > > 
> > > > This series of patches add new generic framework that enable device
> > > > drivers to collect device specific snapshot of the hardware/firmware
> > > > state of the underlying device in the crash recovery kernel. In crash
> > > > recovery kernel, the collected logs are added as elf notes to
> > > > /proc/vmcore, which is copied by user space scripts for post-analysis.
> > > > 
> > > > The sequence of actions done by device drivers to append their device
> > > > specific hardware/firmware logs to /proc/vmcore are as follows:
> > > > 
> > > > 1. During probe (before hardware is initialized), device drivers
> > > > register to the vmcore module (via vmcore_add_device_dump()), with
> > > > callback function, along with buffer size and log name needed for
> > > > firmware/hardware log collection.
> > > 
> > > I assumed the elf notes info should be prepared while kexec_[file_]load
> > > phase. But I did not read the old comment, not sure if it has been 
> > > discussed
> > > or not.
> > > 
> > 
> > We must not collect dumps in crashing kernel. Adding more things in
> > crash dump path risks not collecting vmcore at all. Eric had
> > discussed this in more detail at:
> > 
> > https://lkml.org/lkml/2018/3/24/319
> > 
> > We are safe to collect dumps in the second kernel. Each device dump
> > will be exported as an elf note in /proc/vmcore.
> 
> I understand that we should avoid adding anything in crash path.  And I also
> agree to collect device dump in second kernel.  I just assumed device
> dump use some memory area to store the debug info and the memory
> is persistent so that this can be done in 2 steps, first register the
> address in elf header in kexec_load, then collect the dump in 2nd
> kernel.  But it seems the driver is doing some other logic to collect
> the info instead of just that simple like I thought. 
> 

It seems simpler, but I'm concerned with waste of memory area, if
there are no device dumps being collected in second kernel. In
approach proposed in these series, we dynamically allocate memory
for the device dumps from second kernel's available memory.

> > 
> > > If do this in 2nd kernel a question is driver can be loaded later than 
> > > vmcore init.
> > 
> > Yes, drivers will add their device dumps after vmcore init.
> > 
> > > How to guarantee the function works if vmcore reading happens before
> > > the driver is loaded?
> > > 
> > > Also it is possible that kdump initramfs does not contains the driver
> > > module.
> > > 
> > > Am I missing something?
> > > 
> > 
> > Yes, driver must be in initramfs if it wants to collect and add device
> > dump to /proc/vmcore in second kernel.
> 
> In RH/Fedora kdump scripts we only add the things are required to
> bring up the dump target, so that we can use as less memory as we can.
> 
> For example, if a net driver panicked, and the dump target is rootfs
> which is a scsi disk, then no network related stuff will be added in
> initramfs.
> 
> In this case the device dump info will be not collected..

Correct. If the driver is not present in initramfs, it can't collect
its underlying device's dump. Administrator is expected to add the 
driver to initramfs, if device dump needs to be collected.

> > 
> > > > 
> > > > 2. vmcore module allocates the buffer with requested size. It adds
> > > > an elf note and invokes the device driver's registered callback
> > > > function.
> > > > 
> > > > 3. Device driver collects all hardware/firmware logs into the buffer
> > > > and returns control back to vmcore module.
> > > > 
> > > > The device specific hardware/firmware logs can be seen as elf notes:
> > > > 
> > > > # readelf -n /proc/vmcore
> > > > 
> > > > Displaying notes found at file offset 0x1000 with length 0x04003288:
> > > >   Owner Data size   Description
> > > >   VMCOREDD_cxgb4_:02:00.4 0x02000fd8Unknown note type: 
> > > > (0x0700)
> > > >   VMCOREDD_cxgb4_:04:00.4 0x02000fd8Unknown note type: 
> > > > (0x0700)
> > > >   CORE 0x0150   NT_PRSTATUS (prstatus structure)
> > > >   CORE 0x0150   NT_PRSTATUS (prstatus structure)
> > > >   CORE 0x0150 

Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-18 Thread Dave Young
On 04/18/18 at 06:01pm, Rahul Lakkireddy wrote:
> On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
> > Hi Rahul,
> > On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> > > On production servers running variety of workloads over time, kernel
> > > panic can happen sporadically after days or even months. It is
> > > important to collect as much debug logs as possible to root cause
> > > and fix the problem, that may not be easy to reproduce. Snapshot of
> > > underlying hardware/firmware state (like register dump, firmware
> > > logs, adapter memory, etc.), at the time of kernel panic will be very
> > > helpful while debugging the culprit device driver.
> > > 
> > > This series of patches add new generic framework that enable device
> > > drivers to collect device specific snapshot of the hardware/firmware
> > > state of the underlying device in the crash recovery kernel. In crash
> > > recovery kernel, the collected logs are added as elf notes to
> > > /proc/vmcore, which is copied by user space scripts for post-analysis.
> > > 
> > > The sequence of actions done by device drivers to append their device
> > > specific hardware/firmware logs to /proc/vmcore are as follows:
> > > 
> > > 1. During probe (before hardware is initialized), device drivers
> > > register to the vmcore module (via vmcore_add_device_dump()), with
> > > callback function, along with buffer size and log name needed for
> > > firmware/hardware log collection.
> > 
> > I assumed the elf notes info should be prepared while kexec_[file_]load
> > phase. But I did not read the old comment, not sure if it has been discussed
> > or not.
> > 
> 
> We must not collect dumps in crashing kernel. Adding more things in
> crash dump path risks not collecting vmcore at all. Eric had
> discussed this in more detail at:
> 
> https://lkml.org/lkml/2018/3/24/319
> 
> We are safe to collect dumps in the second kernel. Each device dump
> will be exported as an elf note in /proc/vmcore.

I understand that we should avoid adding anything in crash path.  And I also
agree to collect device dump in second kernel.  I just assumed device
dump use some memory area to store the debug info and the memory
is persistent so that this can be done in 2 steps, first register the
address in elf header in kexec_load, then collect the dump in 2nd
kernel.  But it seems the driver is doing some other logic to collect
the info instead of just that simple like I thought. 

> 
> > If do this in 2nd kernel a question is driver can be loaded later than 
> > vmcore init.
> 
> Yes, drivers will add their device dumps after vmcore init.
> 
> > How to guarantee the function works if vmcore reading happens before
> > the driver is loaded?
> > 
> > Also it is possible that kdump initramfs does not contains the driver
> > module.
> > 
> > Am I missing something?
> > 
> 
> Yes, driver must be in initramfs if it wants to collect and add device
> dump to /proc/vmcore in second kernel.

In RH/Fedora kdump scripts we only add the things are required to
bring up the dump target, so that we can use as less memory as we can.

For example, if a net driver panicked, and the dump target is rootfs
which is a scsi disk, then no network related stuff will be added in
initramfs.

In this case the device dump info will be not collected..
> 
> > > 
> > > 2. vmcore module allocates the buffer with requested size. It adds
> > > an elf note and invokes the device driver's registered callback
> > > function.
> > > 
> > > 3. Device driver collects all hardware/firmware logs into the buffer
> > > and returns control back to vmcore module.
> > > 
> > > The device specific hardware/firmware logs can be seen as elf notes:
> > > 
> > > # readelf -n /proc/vmcore
> > > 
> > > Displaying notes found at file offset 0x1000 with length 0x04003288:
> > >   Owner Data size Description
> > >   VMCOREDD_cxgb4_:02:00.4 0x02000fd8  Unknown note type: (0x0700)
> > >   VMCOREDD_cxgb4_:04:00.4 0x02000fd8  Unknown note type: (0x0700)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   VMCOREINFO   0x074f Unknown note type: (0x)
> > > 
> > > Patch 1 adds API to vmcore module to allow drivers to register callback
> > > to collect the device specific hardware/firmware logs.  The logs will
> > > be added to /proc/vmcore as elf notes.
> > > 
> > > Patch 2 updates read and mmap logic to append device specific hardware/
> > > 

Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-18 Thread Rahul Lakkireddy
On Wednesday, April 04/18/18, 2018 at 19:58:01 +0530, Eric W. Biederman wrote:
> Rahul Lakkireddy  writes:
> 
> > On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
> >> Hi Rahul,
> >> On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> >> > On production servers running variety of workloads over time, kernel
> >> > panic can happen sporadically after days or even months. It is
> >> > important to collect as much debug logs as possible to root cause
> >> > and fix the problem, that may not be easy to reproduce. Snapshot of
> >> > underlying hardware/firmware state (like register dump, firmware
> >> > logs, adapter memory, etc.), at the time of kernel panic will be very
> >> > helpful while debugging the culprit device driver.
> >> > 
> >> > This series of patches add new generic framework that enable device
> >> > drivers to collect device specific snapshot of the hardware/firmware
> >> > state of the underlying device in the crash recovery kernel. In crash
> >> > recovery kernel, the collected logs are added as elf notes to
> >> > /proc/vmcore, which is copied by user space scripts for post-analysis.
> >> > 
> >> > The sequence of actions done by device drivers to append their device
> >> > specific hardware/firmware logs to /proc/vmcore are as follows:
> >> > 
> >> > 1. During probe (before hardware is initialized), device drivers
> >> > register to the vmcore module (via vmcore_add_device_dump()), with
> >> > callback function, along with buffer size and log name needed for
> >> > firmware/hardware log collection.
> >> 
> >> I assumed the elf notes info should be prepared while kexec_[file_]load
> >> phase. But I did not read the old comment, not sure if it has been 
> >> discussed
> >> or not.
> >> 
> >
> > We must not collect dumps in crashing kernel. Adding more things in
> > crash dump path risks not collecting vmcore at all. Eric had
> > discussed this in more detail at:
> >
> > https://lkml.org/lkml/2018/3/24/319
> >
> > We are safe to collect dumps in the second kernel. Each device dump
> > will be exported as an elf note in /proc/vmcore.
> 
> It just occurred to me there is one variation that is worth
> considering.
> 
> Is the area you are looking at dumping part of a huge mmio area?
> I think someone said 2GB?
> 
> If that is the case it could be worth it to simply add the needed
> addresses to the range of memory we need to dump, and simply having a
> elf note saying that is what happened.
> 

We are _not_ dumping mmio area. However, one part of the dump
collection involves reading 2 GB on-chip memory via PIO access,
which is compressed and stored.

Thanks,
Rahul


Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-18 Thread Eric W. Biederman
Rahul Lakkireddy  writes:

> On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
>> Hi Rahul,
>> On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
>> > On production servers running variety of workloads over time, kernel
>> > panic can happen sporadically after days or even months. It is
>> > important to collect as much debug logs as possible to root cause
>> > and fix the problem, that may not be easy to reproduce. Snapshot of
>> > underlying hardware/firmware state (like register dump, firmware
>> > logs, adapter memory, etc.), at the time of kernel panic will be very
>> > helpful while debugging the culprit device driver.
>> > 
>> > This series of patches add new generic framework that enable device
>> > drivers to collect device specific snapshot of the hardware/firmware
>> > state of the underlying device in the crash recovery kernel. In crash
>> > recovery kernel, the collected logs are added as elf notes to
>> > /proc/vmcore, which is copied by user space scripts for post-analysis.
>> > 
>> > The sequence of actions done by device drivers to append their device
>> > specific hardware/firmware logs to /proc/vmcore are as follows:
>> > 
>> > 1. During probe (before hardware is initialized), device drivers
>> > register to the vmcore module (via vmcore_add_device_dump()), with
>> > callback function, along with buffer size and log name needed for
>> > firmware/hardware log collection.
>> 
>> I assumed the elf notes info should be prepared while kexec_[file_]load
>> phase. But I did not read the old comment, not sure if it has been discussed
>> or not.
>> 
>
> We must not collect dumps in crashing kernel. Adding more things in
> crash dump path risks not collecting vmcore at all. Eric had
> discussed this in more detail at:
>
> https://lkml.org/lkml/2018/3/24/319
>
> We are safe to collect dumps in the second kernel. Each device dump
> will be exported as an elf note in /proc/vmcore.

It just occurred to me there is one variation that is worth
considering.

Is the area you are looking at dumping part of a huge mmio area?
I think someone said 2GB?

If that is the case it could be worth it to simply add the needed
addresses to the range of memory we need to dump, and simply having a
elf note saying that is what happened.

>> If do this in 2nd kernel a question is driver can be loaded later than 
>> vmcore init.
>
> Yes, drivers will add their device dumps after vmcore init.
>
>> How to guarantee the function works if vmcore reading happens before
>> the driver is loaded?
>> 
>> Also it is possible that kdump initramfs does not contains the driver
>> module.
>> 
>> Am I missing something?
>> 
>
> Yes, driver must be in initramfs if it wants to collect and add device
> dump to /proc/vmcore in second kernel.

Eric


Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-18 Thread Rahul Lakkireddy
On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
> Hi Rahul,
> On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> > On production servers running variety of workloads over time, kernel
> > panic can happen sporadically after days or even months. It is
> > important to collect as much debug logs as possible to root cause
> > and fix the problem, that may not be easy to reproduce. Snapshot of
> > underlying hardware/firmware state (like register dump, firmware
> > logs, adapter memory, etc.), at the time of kernel panic will be very
> > helpful while debugging the culprit device driver.
> > 
> > This series of patches add new generic framework that enable device
> > drivers to collect device specific snapshot of the hardware/firmware
> > state of the underlying device in the crash recovery kernel. In crash
> > recovery kernel, the collected logs are added as elf notes to
> > /proc/vmcore, which is copied by user space scripts for post-analysis.
> > 
> > The sequence of actions done by device drivers to append their device
> > specific hardware/firmware logs to /proc/vmcore are as follows:
> > 
> > 1. During probe (before hardware is initialized), device drivers
> > register to the vmcore module (via vmcore_add_device_dump()), with
> > callback function, along with buffer size and log name needed for
> > firmware/hardware log collection.
> 
> I assumed the elf notes info should be prepared while kexec_[file_]load
> phase. But I did not read the old comment, not sure if it has been discussed
> or not.
> 

We must not collect dumps in crashing kernel. Adding more things in
crash dump path risks not collecting vmcore at all. Eric had
discussed this in more detail at:

https://lkml.org/lkml/2018/3/24/319

We are safe to collect dumps in the second kernel. Each device dump
will be exported as an elf note in /proc/vmcore.

> If do this in 2nd kernel a question is driver can be loaded later than vmcore 
> init.

Yes, drivers will add their device dumps after vmcore init.

> How to guarantee the function works if vmcore reading happens before
> the driver is loaded?
> 
> Also it is possible that kdump initramfs does not contains the driver
> module.
> 
> Am I missing something?
> 

Yes, driver must be in initramfs if it wants to collect and add device
dump to /proc/vmcore in second kernel.

> > 
> > 2. vmcore module allocates the buffer with requested size. It adds
> > an elf note and invokes the device driver's registered callback
> > function.
> > 
> > 3. Device driver collects all hardware/firmware logs into the buffer
> > and returns control back to vmcore module.
> > 
> > The device specific hardware/firmware logs can be seen as elf notes:
> > 
> > # readelf -n /proc/vmcore
> > 
> > Displaying notes found at file offset 0x1000 with length 0x04003288:
> >   Owner Data size   Description
> >   VMCOREDD_cxgb4_:02:00.4 0x02000fd8Unknown note type: (0x0700)
> >   VMCOREDD_cxgb4_:04:00.4 0x02000fd8Unknown note type: (0x0700)
> >   CORE 0x0150   NT_PRSTATUS (prstatus structure)
> >   CORE 0x0150   NT_PRSTATUS (prstatus structure)
> >   CORE 0x0150   NT_PRSTATUS (prstatus structure)
> >   CORE 0x0150   NT_PRSTATUS (prstatus structure)
> >   CORE 0x0150   NT_PRSTATUS (prstatus structure)
> >   CORE 0x0150   NT_PRSTATUS (prstatus structure)
> >   CORE 0x0150   NT_PRSTATUS (prstatus structure)
> >   CORE 0x0150   NT_PRSTATUS (prstatus structure)
> >   VMCOREINFO   0x074f   Unknown note type: (0x)
> > 
> > Patch 1 adds API to vmcore module to allow drivers to register callback
> > to collect the device specific hardware/firmware logs.  The logs will
> > be added to /proc/vmcore as elf notes.
> > 
> > Patch 2 updates read and mmap logic to append device specific hardware/
> > firmware logs as elf notes.
> > 
> > Patch 3 shows a cxgb4 driver example using the API to collect
> > hardware/firmware logs in crash recovery kernel, before hardware is
> > initialized.
> > 
> > Thanks,
> > Rahul
> > 
> > RFC v1: https://lkml.org/lkml/2018/3/2/542
> > RFC v2: https://lkml.org/lkml/2018/3/16/326
> > 
[...]

Thanks,
Rahul


Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-18 Thread Dave Young
Hi Rahul,
On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> On production servers running variety of workloads over time, kernel
> panic can happen sporadically after days or even months. It is
> important to collect as much debug logs as possible to root cause
> and fix the problem, that may not be easy to reproduce. Snapshot of
> underlying hardware/firmware state (like register dump, firmware
> logs, adapter memory, etc.), at the time of kernel panic will be very
> helpful while debugging the culprit device driver.
> 
> This series of patches add new generic framework that enable device
> drivers to collect device specific snapshot of the hardware/firmware
> state of the underlying device in the crash recovery kernel. In crash
> recovery kernel, the collected logs are added as elf notes to
> /proc/vmcore, which is copied by user space scripts for post-analysis.
> 
> The sequence of actions done by device drivers to append their device
> specific hardware/firmware logs to /proc/vmcore are as follows:
> 
> 1. During probe (before hardware is initialized), device drivers
> register to the vmcore module (via vmcore_add_device_dump()), with
> callback function, along with buffer size and log name needed for
> firmware/hardware log collection.

I assumed the elf notes info should be prepared while kexec_[file_]load
phase. But I did not read the old comment, not sure if it has been discussed
or not.

If do this in 2nd kernel a question is driver can be loaded later than vmcore 
init.
How to guarantee the function works if vmcore reading happens before
the driver is loaded?

Also it is possible that kdump initramfs does not contains the driver
module.

Am I missing something?

> 
> 2. vmcore module allocates the buffer with requested size. It adds
> an elf note and invokes the device driver's registered callback
> function.
> 
> 3. Device driver collects all hardware/firmware logs into the buffer
> and returns control back to vmcore module.
> 
> The device specific hardware/firmware logs can be seen as elf notes:
> 
> # readelf -n /proc/vmcore
> 
> Displaying notes found at file offset 0x1000 with length 0x04003288:
>   Owner Data size Description
>   VMCOREDD_cxgb4_:02:00.4 0x02000fd8  Unknown note type: (0x0700)
>   VMCOREDD_cxgb4_:04:00.4 0x02000fd8  Unknown note type: (0x0700)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   VMCOREINFO   0x074f Unknown note type: (0x)
> 
> Patch 1 adds API to vmcore module to allow drivers to register callback
> to collect the device specific hardware/firmware logs.  The logs will
> be added to /proc/vmcore as elf notes.
> 
> Patch 2 updates read and mmap logic to append device specific hardware/
> firmware logs as elf notes.
> 
> Patch 3 shows a cxgb4 driver example using the API to collect
> hardware/firmware logs in crash recovery kernel, before hardware is
> initialized.
> 
> Thanks,
> Rahul
> 
> RFC v1: https://lkml.org/lkml/2018/3/2/542
> RFC v2: https://lkml.org/lkml/2018/3/16/326
> 
> ---
> v4:
> - Made __vmcore_add_device_dump() static.
> - Moved compile check to define vmcore_add_device_dump() to
>   crash_dump.h to fix compilation when vmcore.c is not compiled in.
> - Convert ---help--- to help in Kconfig as indicated by checkpatch.
> - Rebased to tip.
> 
> v3:
> - Dropped sysfs crashdd module.
> - Exported dumps as elf notes. Suggested by Eric Biederman
>   .  Added as patch 2 in this version.
> - Added CONFIG_PROC_VMCORE_DEVICE_DUMP to allow configuring device
>   dump support.
> - Moved logic related to adding dumps from crashdd to vmcore module.
> - Rename all crashdd* to vmcoredd*.
> - Updated comments.
> 
> v2:
> - Added ABI Documentation for crashdd.
> - Directly use octal permission instead of macro.
> 
> Changes since rfc v2:
> - Moved exporting crashdd from procfs to sysfs. Suggested by
>   Stephen Hemminger 
> - Moved code from fs/proc/crashdd.c to fs/crashdd/ directory.
> - Replaced all proc API with sysfs API and updated comments.
> - Calling driver callback before creating the binary file under
>   crashdd sysfs.
> - Changed binary dump file permission from S_IRUSR to S_IRUGO.
> - Changed module name from CRASH_DRIVER_DUMP to CRASH_DEVICE_DUMP.
> 
> rfc v2:
> - Collecting logs in 2nd kernel instead of during kernel panic.
>   Suggested by Eric Biederman .
> - Added new crashdd