Re: [PATCH V8 0/5] Intel Platform Monitoring Technology

2020-10-28 Thread David E. Box
On Tue, 2020-10-27 at 12:28 +0100, Hans de Goede wrote:
> Hi,
> 
> On 10/3/20 3:31 AM, David E. Box wrote:
> > Intel Platform Monitoring Technology (PMT) is an architecture for
> > enumerating and accessing hardware monitoring capabilities on a
> > device.
> > With customers increasingly asking for hardware telemetry,
> > engineers not
> > only have to figure out how to measure and collect data, but also
> > how to
> > deliver it and make it discoverable. The latter may be through some
> > device
> > specific method requiring device specific tools to collect the
> > data. This
> > in turn requires customers to manage a suite of different tools in
> > order to
> > collect the differing assortment of monitoring data on their
> > systems.  Even
> > when such information can be provided in kernel drivers, they may
> > require
> > constant maintenance to update register mappings as they change
> > with
> > firmware updates and new versions of hardware. PMT provides a
> > solution for
> > discovering and reading telemetry from a device through a hardware
> > agnostic
> > framework that allows for updates to systems without requiring
> > patches to
> > the kernel or software tools.
> > 
> > PMT defines several capabilities to support collecting monitoring
> > data from
> > hardware. All are discoverable as separate instances of the PCIE
> > Designated
> > Vendor extended capability (DVSEC) with the Intel vendor code. The
> > DVSEC ID
> > field uniquely identifies the capability. Each DVSEC also provides
> > a BAR
> > offset to a header that defines capability-specific attributes,
> > including
> > GUID, feature type, offset and length, as well as configuration
> > settings
> > where applicable. The GUID uniquely identifies the register space
> > of any
> > monitor data exposed by the capability. The GUID is associated with
> > an XML
> > file from the vendor that describes the mapping of the register
> > space along
> > with properties of the monitor data. This allows vendors to perform
> > firmware updates that can change the mapping (e.g. add new metrics)
> > without
> > requiring any changes to drivers or software tools. The new mapping
> > is
> > confirmed by an updated GUID, read from the hardware, which
> > software uses
> > with a new XML.
> > 
> > The current capabilities defined by PMT are Telemetry, Watcher, and
> > Crashlog.  The Telemetry capability provides access to a continuous
> > block
> > of read only data. The Watcher capability provides access to
> > hardware
> > sampling and tracing features. Crashlog provides access to device
> > crash
> > dumps.  While there is some relationship between capabilities
> > (Watcher can
> > be configured to sample from the Telemetry data set) each exists as
> > stand
> > alone features with no dependency on any other. The design
> > therefore splits
> > them into individual, capability specific drivers. MFD is used to
> > create
> > platform devices for each capability so that they may be managed by
> > their
> > own driver. The PMT architecture is (for the most part) agnostic to
> > the
> > type of device it can collect from. Software can determine which
> > devices
> > support a PMT feature by searching through each device node entry
> > in the
> > sysfs class folder. It can additionally determine if a particular
> > device
> > supports a PMT feature by checking for a PMT class folder in the
> > device
> > folder.
> > 
> > This patch set provides support for the PMT framework, along with
> > support
> > for Telemetry on Tiger Lake.
> 
> The entire series looks good to me, so you may add my:
> 
> Reviewed-by: Hans de Goede 

Thanks. I did send out a V9 after rebasing and noticing that I
accidentally dropped a change in the MFD driver between V7 and V8. This
was added back. I also took the opportunity to get rid of a nuisance
dev_warn.

David



Re: [PATCH V8 0/5] Intel Platform Monitoring Technology

2020-10-27 Thread Hans de Goede
Hi,

On 10/3/20 3:31 AM, David E. Box wrote:
> Intel Platform Monitoring Technology (PMT) is an architecture for
> enumerating and accessing hardware monitoring capabilities on a device.
> With customers increasingly asking for hardware telemetry, engineers not
> only have to figure out how to measure and collect data, but also how to
> deliver it and make it discoverable. The latter may be through some device
> specific method requiring device specific tools to collect the data. This
> in turn requires customers to manage a suite of different tools in order to
> collect the differing assortment of monitoring data on their systems.  Even
> when such information can be provided in kernel drivers, they may require
> constant maintenance to update register mappings as they change with
> firmware updates and new versions of hardware. PMT provides a solution for
> discovering and reading telemetry from a device through a hardware agnostic
> framework that allows for updates to systems without requiring patches to
> the kernel or software tools.
> 
> PMT defines several capabilities to support collecting monitoring data from
> hardware. All are discoverable as separate instances of the PCIE Designated
> Vendor extended capability (DVSEC) with the Intel vendor code. The DVSEC ID
> field uniquely identifies the capability. Each DVSEC also provides a BAR
> offset to a header that defines capability-specific attributes, including
> GUID, feature type, offset and length, as well as configuration settings
> where applicable. The GUID uniquely identifies the register space of any
> monitor data exposed by the capability. The GUID is associated with an XML
> file from the vendor that describes the mapping of the register space along
> with properties of the monitor data. This allows vendors to perform
> firmware updates that can change the mapping (e.g. add new metrics) without
> requiring any changes to drivers or software tools. The new mapping is
> confirmed by an updated GUID, read from the hardware, which software uses
> with a new XML.
> 
> The current capabilities defined by PMT are Telemetry, Watcher, and
> Crashlog.  The Telemetry capability provides access to a continuous block
> of read only data. The Watcher capability provides access to hardware
> sampling and tracing features. Crashlog provides access to device crash
> dumps.  While there is some relationship between capabilities (Watcher can
> be configured to sample from the Telemetry data set) each exists as stand
> alone features with no dependency on any other. The design therefore splits
> them into individual, capability specific drivers. MFD is used to create
> platform devices for each capability so that they may be managed by their
> own driver. The PMT architecture is (for the most part) agnostic to the
> type of device it can collect from. Software can determine which devices
> support a PMT feature by searching through each device node entry in the
> sysfs class folder. It can additionally determine if a particular device
> supports a PMT feature by checking for a PMT class folder in the device
> folder.
> 
> This patch set provides support for the PMT framework, along with support
> for Telemetry on Tiger Lake.

The entire series looks good to me, so you may add my:

Reviewed-by: Hans de Goede 

To the entire series.

Lee, in the discussion about previous versions you indicated that you
would be happy to merge the entire series through the MFD tree.

>From my pov this is ready for merging, so if you can pick up the entire
series and then provide me with an immutable branch to merge into the
pdx86 tree that would be great.

Regards,

Hans



[PATCH V8 0/5] Intel Platform Monitoring Technology

2020-10-02 Thread David E. Box
Intel Platform Monitoring Technology (PMT) is an architecture for
enumerating and accessing hardware monitoring capabilities on a device.
With customers increasingly asking for hardware telemetry, engineers not
only have to figure out how to measure and collect data, but also how to
deliver it and make it discoverable. The latter may be through some device
specific method requiring device specific tools to collect the data. This
in turn requires customers to manage a suite of different tools in order to
collect the differing assortment of monitoring data on their systems.  Even
when such information can be provided in kernel drivers, they may require
constant maintenance to update register mappings as they change with
firmware updates and new versions of hardware. PMT provides a solution for
discovering and reading telemetry from a device through a hardware agnostic
framework that allows for updates to systems without requiring patches to
the kernel or software tools.

PMT defines several capabilities to support collecting monitoring data from
hardware. All are discoverable as separate instances of the PCIE Designated
Vendor extended capability (DVSEC) with the Intel vendor code. The DVSEC ID
field uniquely identifies the capability. Each DVSEC also provides a BAR
offset to a header that defines capability-specific attributes, including
GUID, feature type, offset and length, as well as configuration settings
where applicable. The GUID uniquely identifies the register space of any
monitor data exposed by the capability. The GUID is associated with an XML
file from the vendor that describes the mapping of the register space along
with properties of the monitor data. This allows vendors to perform
firmware updates that can change the mapping (e.g. add new metrics) without
requiring any changes to drivers or software tools. The new mapping is
confirmed by an updated GUID, read from the hardware, which software uses
with a new XML.

The current capabilities defined by PMT are Telemetry, Watcher, and
Crashlog.  The Telemetry capability provides access to a continuous block
of read only data. The Watcher capability provides access to hardware
sampling and tracing features. Crashlog provides access to device crash
dumps.  While there is some relationship between capabilities (Watcher can
be configured to sample from the Telemetry data set) each exists as stand
alone features with no dependency on any other. The design therefore splits
them into individual, capability specific drivers. MFD is used to create
platform devices for each capability so that they may be managed by their
own driver. The PMT architecture is (for the most part) agnostic to the
type of device it can collect from. Software can determine which devices
support a PMT feature by searching through each device node entry in the
sysfs class folder. It can additionally determine if a particular device
supports a PMT feature by checking for a PMT class folder in the device
folder.

This patch set provides support for the PMT framework, along with support
for Telemetry on Tiger Lake.

Changes from V7:
Link: 
https://lore.kernel.org/lkml/20201001014250.26987-1-david.e@linux.intel.com/

- Refactor to minimize code duplication by putting more setup code 
  in the common intel_pmt_dev_create(). 
- Add and use a function pointer to handle capability specific
  header decoding.
- Add comment on usage of early_client_pci_ids list and add
  Alder Lake PCI ID to the list.
- Remove unneeded check on the count variable in intel_pmt_read().
- Add missing header functions.
- Specify module names in Kconfig.
- Fix spelling errors across patch set.

Changes from V6:
- Use NULL for OOBMSM driver data instead of an empty struct.
  Rewrite the code to check for NULL driver_data.
- Fix spelling and formatting in Kconfig.
- Use MKDEV(0,0) to prevent unneeded device node from being
  created.

Changes from V5:
- Add Alder Lake and the "Out of Band Management Services
  Module (OOBMSM)" ids to the MFD driver. Transferred to this
  patch set.
- Use a single class for all PMT capabilities as suggested by
  Hans.
- Add binary attribute for telemetry driver to allow read
  syscall as suggested by Hans.
- Use the class file to hold attributes and other common code
  used by all PMT drivers.
- Add the crashlog driver to the patch set and add a mutex to
  protect access to the enable control and trigger files as
  suggested by Hans.

Changes from V4:
- Replace MFD with PMT in driver title
- Fix commit tags in chronological order
- Fix includes in alphabetical order
- Use 'raw' string instead of defines for device names
- Add an error message when returning an error code for
  unrecognized capability id
- Use