RE: qed*: debug infrastructures

2017-04-26 Thread Elior, Ariel
Jiri, Florian, Jakub,
Thanks all for you suggestions.

Some answers to questions posted: The signal tracing in our device can be used 
for tracing things like load/store/program_counter from our fastpath processors
(which handle every packet) which can then be re-run in a simulative environment
(recreating the recorded scenario). Other interesting uses for this feature can
be partial pci recording or partial network recording (poor man's analyzer)
which can also be very effective where full blown lab equipment is unavailable.

I reviewed the code of the drivers under hw_tracing (thanks Florian) and I think
we might be a good fit.

Jiri indicated dpipe was not intended for this sort of thing and suggested an
additional dev link object, although it seems to me that this will have to be
either a very generic object which would be susceptible to abuse similar to
debugfs, or it would be tailored to our device so much that no one else would
use it, so I am somewhat less inclined to go down this path (the code
abstracting our debug feature is accessed via ~20 api functions accepting ~10
params each, i.e. quite a handful of configuraion to generalize).

The ethtool debug dump presets (thanks Jakub) are far too narrow to encompass
the full flexibility required here.

Dave, I think my next step would be to send an RFC adding to our core module
(qed) the necessary APIs (mostly to provide some details to this rather abstract
discussion). I will plan to connect those to a new hwtracing driver I'll create
for this purpose, unless a different direction is suggested.

Thanks,
Ariel


Re: qed*: debug infrastructures

2017-04-25 Thread Florian Fainelli
On 04/24/2017 10:38 AM, Elior, Ariel wrote:
> Hi Dave,
> 
> According to the recent messages on the list indicating debugfs is not the way
> to go, I am looking for some guidance on what is. dpipe approach was
> mentioned as favorable, but I wanted to make sure molding our debug features 
> to
> this infrastructure will result in something acceptable. A few points:
> 
> 1.
> One of our HW debug features is a signal recording feature. HW is configured 
> to
> output specific signals, which are then continuously dumped into a cyclic
> buffer on host. There are ~8000 signals, which can be logically divided to 
> ~100
> groups. I believe this can be modeled in dpipe (or similar tool) as a set of
> ~100 tables with ~80 entries each, and user would be able to see them all and
> choose what they like. The output data itself is binary, and meaningless to
> "the user". The amount of data is basically as large a contiguous buffer as
> driver can allocate, i.e. usually 4MB. When user selects the signals, and sets
> meta data regarding to mode of operations, some device configuration will have
> to take place. Does that sound reasonable?
> This debug feature already exists out of tree for bnx2x and qed* drivers and 
> is
> *very* effective in field deployments. I would very much like to see this as 
> an
> in-tree feature via some infrastructure or another.

By signals do you mean actual electrical signals coming from your chip's
pins/balls? FWIW, there is kind of something similar existing with ARM
Ltd's Embedded Trace Module which allows capturing a SoC's CPU activity
(loads/stores/pc progression etc.) using a high speed interface. You may
want to look at drivers/hwtracing/{coresight,stm}/ for examples on how
this interfaces with Linux' perf events subsystems. There are existing
trace buffers infrastructures and it sounds like your use case fits
nicely within the perf events framework here where you may want the
entire set or subset of these 8000 signals to be recorded in a buffer.


> 
> 2.
> Sometimes we want to debug the probe or removal flow of the driver. ethtool 
> has
> the disadvantage of only being available once network device is available. If 
> a
> bug stops the load flow before the ethtool debug paths are available, there is
> no way to collect a dump. Similarly, removal flows which hit a bug but do 
> remove
> the network device, can't be debugged from ethtool. Does dpipe suffer from the
> same problem? qed* (like mlx*) has a common-functionality module. This allows
> creating debugfs nodes even before the network drivers are probed, providing a
> solution for this (debug nodes are also available after network driver 
> removal).
> If dpipe does hold an answer here (e.g. provide preconfiguration which would
> kick in when network device registers) then we might want to port all of our
> register dump logic over there for this advantage. Does that sound reasonable?

Can you consider using tracing for that particular purpose with trace
filters that are specific to the device of interest? That should allow
you to get events down from the Linux device driver model all the way to
when your driver actually gets initialized. Is that an option?
-- 
Florian


Re: qed*: debug infrastructures

2017-04-25 Thread Jiri Pirko
Tue, Apr 25, 2017 at 05:44:10AM CEST, kubak...@wp.pl wrote:
>On Mon, 24 Apr 2017 17:38:57 +, Elior, Ariel wrote:
>> Hi Dave,
>
>Hi Ariel!
>
>I'm not Dave but let me share my perspective :)
>
>> According to the recent messages on the list indicating debugfs is not the 
>> way
>> to go, I am looking for some guidance on what is. dpipe approach was
>> mentioned as favorable, but I wanted to make sure molding our debug features 
>> to
>> this infrastructure will result in something acceptable. A few points:
>> 
>> 1.
>> One of our HW debug features is a signal recording feature. HW is configured 
>> to
>> output specific signals, which are then continuously dumped into a cyclic
>> buffer on host. There are ~8000 signals, which can be logically divided to 
>> ~100
>> groups. I believe this can be modeled in dpipe (or similar tool) as a set of
>> ~100 tables with ~80 entries each, and user would be able to see them all and
>> choose what they like. The output data itself is binary, and meaningless to
>> "the user". The amount of data is basically as large a contiguous buffer as
>> driver can allocate, i.e. usually 4MB. When user selects the signals, and 
>> sets
>> meta data regarding to mode of operations, some device configuration will 
>> have
>> to take place. Does that sound reasonable?
>> This debug feature already exists out of tree for bnx2x and qed* drivers and 
>> is
>> *very* effective in field deployments. I would very much like to see this as 
>> an
>> in-tree feature via some infrastructure or another.
>
>Sorry for even mentioning it, new debug interfaces would be cool, but
>for FW/HW state dumps which are meaningless to the user why not just
>use ethtool get-dump/set-dump?  Do you really need the ability to
>toggle those 8k signals one-by-one or are there reasonable sets you
>could provide from the driver that you could encode on the available
>32bits of flags?
>
>What could be useful would be some form of start/stop commands for
>debugging to tell the driver/FW when to record events selected by
>set-dump and maybe a way for the user to discover what dumps the driver
>can provide (a'la ethtool private flags).
>
>> 2.
>> Sometimes we want to debug the probe or removal flow of the driver. ethtool 
>> has
>> the disadvantage of only being available once network device is available. 
>> If a
>> bug stops the load flow before the ethtool debug paths are available, there 
>> is
>> no way to collect a dump. Similarly, removal flows which hit a bug but do 
>> remove
>> the network device, can't be debugged from ethtool. Does dpipe suffer from 
>> the
>> same problem? qed* (like mlx*) has a common-functionality module. This allows
>> creating debugfs nodes even before the network drivers are probed, providing 
>> a
>> solution for this (debug nodes are also available after network driver 
>> removal).
>> If dpipe does hold an answer here (e.g. provide preconfiguration which would
>> kick in when network device registers) then we might want to port all of our
>> register dump logic over there for this advantage. Does that sound 
>> reasonable?
>
>Porting the debug/dump infrastructure to devlink would be very much
>appreciated.  I'm not sure it would fit into dpipe or be a separate
>command.

Yeah. dpipe was designed to provide HW pipeline abstraction, based on
match/action model. I think that for stats and debugging, it would make
sense to introduce another devlink object.


>
>> 3.
>> Yuval mentioned this, but I wanted to reiterate that the same is necessary 
>> for
>> our storage drivers (qedi/qedf). debugfs does have the advantage of being non
>> sub-system specific. Is there perhaps another non subsystem specific debug
>> infrastructure which *is* acceptable to the networking subsystem? My guess is
>> that the storage drivers will turn to debugfs (in their own subsystem).
>
>devlink is not ethernet-specific, it should be a good fit for iSCSI and
>FCOE drivers, too.

Yes. During the devlink design, I had the non-network usecase in mind.
Devlink is the iface to be used for all who need it.



Re: qed*: debug infrastructures

2017-04-24 Thread Jakub Kicinski
On Mon, 24 Apr 2017 17:38:57 +, Elior, Ariel wrote:
> Hi Dave,

Hi Ariel!

I'm not Dave but let me share my perspective :)

> According to the recent messages on the list indicating debugfs is not the way
> to go, I am looking for some guidance on what is. dpipe approach was
> mentioned as favorable, but I wanted to make sure molding our debug features 
> to
> this infrastructure will result in something acceptable. A few points:
> 
> 1.
> One of our HW debug features is a signal recording feature. HW is configured 
> to
> output specific signals, which are then continuously dumped into a cyclic
> buffer on host. There are ~8000 signals, which can be logically divided to 
> ~100
> groups. I believe this can be modeled in dpipe (or similar tool) as a set of
> ~100 tables with ~80 entries each, and user would be able to see them all and
> choose what they like. The output data itself is binary, and meaningless to
> "the user". The amount of data is basically as large a contiguous buffer as
> driver can allocate, i.e. usually 4MB. When user selects the signals, and sets
> meta data regarding to mode of operations, some device configuration will have
> to take place. Does that sound reasonable?
> This debug feature already exists out of tree for bnx2x and qed* drivers and 
> is
> *very* effective in field deployments. I would very much like to see this as 
> an
> in-tree feature via some infrastructure or another.

Sorry for even mentioning it, new debug interfaces would be cool, but
for FW/HW state dumps which are meaningless to the user why not just
use ethtool get-dump/set-dump?  Do you really need the ability to
toggle those 8k signals one-by-one or are there reasonable sets you
could provide from the driver that you could encode on the available
32bits of flags?

What could be useful would be some form of start/stop commands for
debugging to tell the driver/FW when to record events selected by
set-dump and maybe a way for the user to discover what dumps the driver
can provide (a'la ethtool private flags).

> 2.
> Sometimes we want to debug the probe or removal flow of the driver. ethtool 
> has
> the disadvantage of only being available once network device is available. If 
> a
> bug stops the load flow before the ethtool debug paths are available, there is
> no way to collect a dump. Similarly, removal flows which hit a bug but do 
> remove
> the network device, can't be debugged from ethtool. Does dpipe suffer from the
> same problem? qed* (like mlx*) has a common-functionality module. This allows
> creating debugfs nodes even before the network drivers are probed, providing a
> solution for this (debug nodes are also available after network driver 
> removal).
> If dpipe does hold an answer here (e.g. provide preconfiguration which would
> kick in when network device registers) then we might want to port all of our
> register dump logic over there for this advantage. Does that sound reasonable?

Porting the debug/dump infrastructure to devlink would be very much
appreciated.  I'm not sure it would fit into dpipe or be a separate
command.

> 3.
> Yuval mentioned this, but I wanted to reiterate that the same is necessary for
> our storage drivers (qedi/qedf). debugfs does have the advantage of being non
> sub-system specific. Is there perhaps another non subsystem specific debug
> infrastructure which *is* acceptable to the networking subsystem? My guess is
> that the storage drivers will turn to debugfs (in their own subsystem).

devlink is not ethernet-specific, it should be a good fit for iSCSI and
FCOE drivers, too.


qed*: debug infrastructures

2017-04-24 Thread Elior, Ariel
Hi Dave,

According to the recent messages on the list indicating debugfs is not the way
to go, I am looking for some guidance on what is. dpipe approach was
mentioned as favorable, but I wanted to make sure molding our debug features to
this infrastructure will result in something acceptable. A few points:

1.
One of our HW debug features is a signal recording feature. HW is configured to
output specific signals, which are then continuously dumped into a cyclic
buffer on host. There are ~8000 signals, which can be logically divided to ~100
groups. I believe this can be modeled in dpipe (or similar tool) as a set of
~100 tables with ~80 entries each, and user would be able to see them all and
choose what they like. The output data itself is binary, and meaningless to
"the user". The amount of data is basically as large a contiguous buffer as
driver can allocate, i.e. usually 4MB. When user selects the signals, and sets
meta data regarding to mode of operations, some device configuration will have
to take place. Does that sound reasonable?
This debug feature already exists out of tree for bnx2x and qed* drivers and is
*very* effective in field deployments. I would very much like to see this as an
in-tree feature via some infrastructure or another.

2.
Sometimes we want to debug the probe or removal flow of the driver. ethtool has
the disadvantage of only being available once network device is available. If a
bug stops the load flow before the ethtool debug paths are available, there is
no way to collect a dump. Similarly, removal flows which hit a bug but do remove
the network device, can't be debugged from ethtool. Does dpipe suffer from the
same problem? qed* (like mlx*) has a common-functionality module. This allows
creating debugfs nodes even before the network drivers are probed, providing a
solution for this (debug nodes are also available after network driver removal).
If dpipe does hold an answer here (e.g. provide preconfiguration which would
kick in when network device registers) then we might want to port all of our
register dump logic over there for this advantage. Does that sound reasonable?

3.
Yuval mentioned this, but I wanted to reiterate that the same is necessary for
our storage drivers (qedi/qedf). debugfs does have the advantage of being non
sub-system specific. Is there perhaps another non subsystem specific debug
infrastructure which *is* acceptable to the networking subsystem? My guess is
that the storage drivers will turn to debugfs (in their own subsystem).

Thanks,
Ariel