Re: [PATCH net-next v8 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-05-14 Thread David Miller
From: ebied...@xmission.com (Eric W. Biederman)
Date: Mon, 14 May 2018 08:11:24 -0500

> David Miller  writes:
> 
>> I'm deferring this patch series.
>>
>> If we can't get a reasonable review from an interested party in 10+
>> days, that is not reasonable.
>>
>> Resubmit this once someone reviews it properly.
> 
> David I am out on vacation this week and last (the reason for the delay).
> 
> The last version of this that I looked at I gave my ack.  All of my ABI
> concerns had been addressed. The only outstanding change I believe was
> the Eric Dumazet's asking about something being reviewed.
> 
> I just glanced over it again and I don't see any new issues introduced
> by the last round of changes.
> 
> From 10,000 feet flyover design perspectie and from an ABI perspective
> this patchset seems fine.
> 
> Acked-by: "Eric W. Biederman" 

Ok, thanks for reviewing Eric.

Series applied.



Re: [PATCH net-next v8 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-05-14 Thread David Miller
From: ebied...@xmission.com (Eric W. Biederman)
Date: Mon, 14 May 2018 08:11:24 -0500

> David Miller  writes:
> 
>> I'm deferring this patch series.
>>
>> If we can't get a reasonable review from an interested party in 10+
>> days, that is not reasonable.
>>
>> Resubmit this once someone reviews it properly.
> 
> David I am out on vacation this week and last (the reason for the delay).
> 
> The last version of this that I looked at I gave my ack.  All of my ABI
> concerns had been addressed. The only outstanding change I believe was
> the Eric Dumazet's asking about something being reviewed.
> 
> I just glanced over it again and I don't see any new issues introduced
> by the last round of changes.
> 
> From 10,000 feet flyover design perspectie and from an ABI perspective
> this patchset seems fine.
> 
> Acked-by: "Eric W. Biederman" 

Ok, thanks for reviewing Eric.

Series applied.



Re: [PATCH net-next v8 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-05-14 Thread Eric W. Biederman
David Miller  writes:

> I'm deferring this patch series.
>
> If we can't get a reasonable review from an interested party in 10+
> days, that is not reasonable.
>
> Resubmit this once someone reviews it properly.

David I am out on vacation this week and last (the reason for the delay).

The last version of this that I looked at I gave my ack.  All of my ABI
concerns had been addressed. The only outstanding change I believe was
the Eric Dumazet's asking about something being reviewed.

I just glanced over it again and I don't see any new issues introduced
by the last round of changes.

>From 10,000 feet flyover design perspectie and from an ABI perspective
this patchset seems fine.

Acked-by: "Eric W. Biederman" 

Eric


Re: [PATCH net-next v8 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-05-14 Thread Eric W. Biederman
David Miller  writes:

> I'm deferring this patch series.
>
> If we can't get a reasonable review from an interested party in 10+
> days, that is not reasonable.
>
> Resubmit this once someone reviews it properly.

David I am out on vacation this week and last (the reason for the delay).

The last version of this that I looked at I gave my ack.  All of my ABI
concerns had been addressed. The only outstanding change I believe was
the Eric Dumazet's asking about something being reviewed.

I just glanced over it again and I don't see any new issues introduced
by the last round of changes.

>From 10,000 feet flyover design perspectie and from an ABI perspective
this patchset seems fine.

Acked-by: "Eric W. Biederman" 

Eric


Re: [PATCH net-next v8 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-05-13 Thread David Miller

I'm deferring this patch series.

If we can't get a reasonable review from an interested party in 10+
days, that is not reasonable.

Resubmit this once someone reviews it properly.

Thank you.


Re: [PATCH net-next v8 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-05-13 Thread David Miller

I'm deferring this patch series.

If we can't get a reasonable review from an interested party in 10+
days, that is not reasonable.

Resubmit this once someone reviews it properly.

Thank you.


Re: [PATCH net-next v8 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-05-07 Thread David Miller
From: Rahul Lakkireddy 
Date: Wed,  2 May 2018 15:17:16 +0530

> This series of patches add new generic framework that enable device
> drivers to collect device specific snapshot of the hardware/firmware
> state of the underlying device in the crash recovery kernel. In crash
> recovery kernel, the collected logs are added as elf notes to
> /proc/vmcore, which is copied by user space scripts for post-analysis.

Eric B., since you've been giving very useful and active feedback on
this series could you please give it a review?

Thank you.


Re: [PATCH net-next v8 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-05-07 Thread David Miller
From: Rahul Lakkireddy 
Date: Wed,  2 May 2018 15:17:16 +0530

> This series of patches add new generic framework that enable device
> drivers to collect device specific snapshot of the hardware/firmware
> state of the underlying device in the crash recovery kernel. In crash
> recovery kernel, the collected logs are added as elf notes to
> /proc/vmcore, which is copied by user space scripts for post-analysis.

Eric B., since you've been giving very useful and active feedback on
this series could you please give it a review?

Thank you.


[PATCH net-next v8 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-05-02 Thread Rahul Lakkireddy
On production servers running variety of workloads over time, kernel
panic can happen sporadically after days or even months. It is
important to collect as much debug logs as possible to root cause
and fix the problem, that may not be easy to reproduce. Snapshot of
underlying hardware/firmware state (like register dump, firmware
logs, adapter memory, etc.), at the time of kernel panic will be very
helpful while debugging the culprit device driver.

This series of patches add new generic framework that enable device
drivers to collect device specific snapshot of the hardware/firmware
state of the underlying device in the crash recovery kernel. In crash
recovery kernel, the collected logs are added as elf notes to
/proc/vmcore, which is copied by user space scripts for post-analysis.

The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /proc/vmcore are as follows:

1. During probe (before hardware is initialized), device drivers
register to the vmcore module (via vmcore_add_device_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.

2. vmcore module allocates the buffer with requested size. It adds
an elf note and invokes the device driver's registered callback
function.

3. Device driver collects all hardware/firmware logs into the buffer
and returns control back to vmcore module.

The device specific hardware/firmware logs can be seen as elf notes
with note type 0x700, as shown below:

# readelf -n /proc/vmcore

Displaying notes found at file offset 0x1000 with length 0x040032c0:
  Owner Data size   Description
  LINUX0x02000fec   Unknown note type: (0x0700)
  LINUX0x02000fec   Unknown note type: (0x0700)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  VMCOREINFO   0x0785   Unknown note type: (0x)

Patch 1 adds API to vmcore module to allow drivers to register callback
to collect the device specific hardware/firmware logs.  The logs will
be added to /proc/vmcore as elf notes.

Patch 2 updates read and mmap logic to append device specific hardware/
firmware logs as elf notes.

Patch 3 shows a cxgb4 driver example using the API to collect
hardware/firmware logs in crash recovery kernel, before hardware is
initialized.

Thanks,
Rahul

---
v8:
- Added missing linux/types.h header include.
- Removed __vmcore_add_device_dump().

v7:
- Removed "CHELSIO" vendor identifier in Elf Note name. Instead,
  writing "LINUX".
- Moved vmcoredd_header to new file include/uapi/linux/vmcore.h
- Reworked vmcoredd_header to include Elf Note as part of the header
  itself.
- Removed vmcoredd_get_note_size().
- Renamed vmcoredd_write_note() to vmcoredd_write_header().
- Replaced all "unsigned long" with "unsigned int" for device dump
  size since max size of Elf Word is u32.

v6:
- Reworked device dump elf note name to contain vendor identifier.
- Added vmcoredd_header that precedes actual dump in the Elf Note.
- Device dump's name is moved inside vmcoredd_header.
- Added "CHELSIO" string as vendor identifier in the Elf Note name
  for cxgb4 device dumps.

v5:
- Removed enabling CONFIG_PROC_VMCORE_DEVICE_DUMP by default and
  updated help message.

v4:
- Made __vmcore_add_device_dump() static.
- Moved compile check to define vmcore_add_device_dump() to
  crash_dump.h to fix compilation when vmcore.c is not compiled in.
- Convert ---help--- to help in Kconfig as indicated by checkpatch.
- Rebased to tip.

v3:
- Dropped sysfs crashdd module.
- Exported dumps as elf notes. Suggested by Eric Biederman
  .  Added as patch 2 in this version.
- Added CONFIG_PROC_VMCORE_DEVICE_DUMP to allow configuring device
  dump support.
- Moved logic related to adding dumps from crashdd to vmcore module.
- Rename all crashdd* to vmcoredd*.
- Updated comments.

v2:
- Added ABI Documentation for crashdd.
- Directly use octal permission instead of macro.

Changes since rfc v2:
- Moved exporting crashdd from procfs to sysfs. Suggested by
  Stephen Hemminger 
- Moved code from fs/proc/crashdd.c to fs/crashdd/ directory.
- Replaced all proc API with sysfs API and updated comments.
- Calling driver callback before creating the binary file under
  crashdd sysfs.
- Changed binary dump file permission from S_IRUSR to S_IRUGO.
- Changed module name from CRASH_DRIVER_DUMP to 

[PATCH net-next v8 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-05-02 Thread Rahul Lakkireddy
On production servers running variety of workloads over time, kernel
panic can happen sporadically after days or even months. It is
important to collect as much debug logs as possible to root cause
and fix the problem, that may not be easy to reproduce. Snapshot of
underlying hardware/firmware state (like register dump, firmware
logs, adapter memory, etc.), at the time of kernel panic will be very
helpful while debugging the culprit device driver.

This series of patches add new generic framework that enable device
drivers to collect device specific snapshot of the hardware/firmware
state of the underlying device in the crash recovery kernel. In crash
recovery kernel, the collected logs are added as elf notes to
/proc/vmcore, which is copied by user space scripts for post-analysis.

The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /proc/vmcore are as follows:

1. During probe (before hardware is initialized), device drivers
register to the vmcore module (via vmcore_add_device_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.

2. vmcore module allocates the buffer with requested size. It adds
an elf note and invokes the device driver's registered callback
function.

3. Device driver collects all hardware/firmware logs into the buffer
and returns control back to vmcore module.

The device specific hardware/firmware logs can be seen as elf notes
with note type 0x700, as shown below:

# readelf -n /proc/vmcore

Displaying notes found at file offset 0x1000 with length 0x040032c0:
  Owner Data size   Description
  LINUX0x02000fec   Unknown note type: (0x0700)
  LINUX0x02000fec   Unknown note type: (0x0700)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  CORE 0x0150   NT_PRSTATUS (prstatus structure)
  VMCOREINFO   0x0785   Unknown note type: (0x)

Patch 1 adds API to vmcore module to allow drivers to register callback
to collect the device specific hardware/firmware logs.  The logs will
be added to /proc/vmcore as elf notes.

Patch 2 updates read and mmap logic to append device specific hardware/
firmware logs as elf notes.

Patch 3 shows a cxgb4 driver example using the API to collect
hardware/firmware logs in crash recovery kernel, before hardware is
initialized.

Thanks,
Rahul

---
v8:
- Added missing linux/types.h header include.
- Removed __vmcore_add_device_dump().

v7:
- Removed "CHELSIO" vendor identifier in Elf Note name. Instead,
  writing "LINUX".
- Moved vmcoredd_header to new file include/uapi/linux/vmcore.h
- Reworked vmcoredd_header to include Elf Note as part of the header
  itself.
- Removed vmcoredd_get_note_size().
- Renamed vmcoredd_write_note() to vmcoredd_write_header().
- Replaced all "unsigned long" with "unsigned int" for device dump
  size since max size of Elf Word is u32.

v6:
- Reworked device dump elf note name to contain vendor identifier.
- Added vmcoredd_header that precedes actual dump in the Elf Note.
- Device dump's name is moved inside vmcoredd_header.
- Added "CHELSIO" string as vendor identifier in the Elf Note name
  for cxgb4 device dumps.

v5:
- Removed enabling CONFIG_PROC_VMCORE_DEVICE_DUMP by default and
  updated help message.

v4:
- Made __vmcore_add_device_dump() static.
- Moved compile check to define vmcore_add_device_dump() to
  crash_dump.h to fix compilation when vmcore.c is not compiled in.
- Convert ---help--- to help in Kconfig as indicated by checkpatch.
- Rebased to tip.

v3:
- Dropped sysfs crashdd module.
- Exported dumps as elf notes. Suggested by Eric Biederman
  .  Added as patch 2 in this version.
- Added CONFIG_PROC_VMCORE_DEVICE_DUMP to allow configuring device
  dump support.
- Moved logic related to adding dumps from crashdd to vmcore module.
- Rename all crashdd* to vmcoredd*.
- Updated comments.

v2:
- Added ABI Documentation for crashdd.
- Directly use octal permission instead of macro.

Changes since rfc v2:
- Moved exporting crashdd from procfs to sysfs. Suggested by
  Stephen Hemminger 
- Moved code from fs/proc/crashdd.c to fs/crashdd/ directory.
- Replaced all proc API with sysfs API and updated comments.
- Calling driver callback before creating the binary file under
  crashdd sysfs.
- Changed binary dump file permission from S_IRUSR to S_IRUGO.
- Changed module name from CRASH_DRIVER_DUMP to CRASH_DEVICE_DUMP.

rfc v2:
- Collecting logs in 2nd