RE: qed*: debug infrastructures
Jiri, Florian, Jakub, Thanks all for you suggestions. Some answers to questions posted: The signal tracing in our device can be used for tracing things like load/store/program_counter from our fastpath processors (which handle every packet) which can then be re-run in a simulative environment (recreating the recorded scenario). Other interesting uses for this feature can be partial pci recording or partial network recording (poor man's analyzer) which can also be very effective where full blown lab equipment is unavailable. I reviewed the code of the drivers under hw_tracing (thanks Florian) and I think we might be a good fit. Jiri indicated dpipe was not intended for this sort of thing and suggested an additional dev link object, although it seems to me that this will have to be either a very generic object which would be susceptible to abuse similar to debugfs, or it would be tailored to our device so much that no one else would use it, so I am somewhat less inclined to go down this path (the code abstracting our debug feature is accessed via ~20 api functions accepting ~10 params each, i.e. quite a handful of configuraion to generalize). The ethtool debug dump presets (thanks Jakub) are far too narrow to encompass the full flexibility required here. Dave, I think my next step would be to send an RFC adding to our core module (qed) the necessary APIs (mostly to provide some details to this rather abstract discussion). I will plan to connect those to a new hwtracing driver I'll create for this purpose, unless a different direction is suggested. Thanks, Ariel
Re: qed*: debug infrastructures
On 04/24/2017 10:38 AM, Elior, Ariel wrote: > Hi Dave, > > According to the recent messages on the list indicating debugfs is not the way > to go, I am looking for some guidance on what is. dpipe approach was > mentioned as favorable, but I wanted to make sure molding our debug features > to > this infrastructure will result in something acceptable. A few points: > > 1. > One of our HW debug features is a signal recording feature. HW is configured > to > output specific signals, which are then continuously dumped into a cyclic > buffer on host. There are ~8000 signals, which can be logically divided to > ~100 > groups. I believe this can be modeled in dpipe (or similar tool) as a set of > ~100 tables with ~80 entries each, and user would be able to see them all and > choose what they like. The output data itself is binary, and meaningless to > "the user". The amount of data is basically as large a contiguous buffer as > driver can allocate, i.e. usually 4MB. When user selects the signals, and sets > meta data regarding to mode of operations, some device configuration will have > to take place. Does that sound reasonable? > This debug feature already exists out of tree for bnx2x and qed* drivers and > is > *very* effective in field deployments. I would very much like to see this as > an > in-tree feature via some infrastructure or another. By signals do you mean actual electrical signals coming from your chip's pins/balls? FWIW, there is kind of something similar existing with ARM Ltd's Embedded Trace Module which allows capturing a SoC's CPU activity (loads/stores/pc progression etc.) using a high speed interface. You may want to look at drivers/hwtracing/{coresight,stm}/ for examples on how this interfaces with Linux' perf events subsystems. There are existing trace buffers infrastructures and it sounds like your use case fits nicely within the perf events framework here where you may want the entire set or subset of these 8000 signals to be recorded in a buffer. > > 2. > Sometimes we want to debug the probe or removal flow of the driver. ethtool > has > the disadvantage of only being available once network device is available. If > a > bug stops the load flow before the ethtool debug paths are available, there is > no way to collect a dump. Similarly, removal flows which hit a bug but do > remove > the network device, can't be debugged from ethtool. Does dpipe suffer from the > same problem? qed* (like mlx*) has a common-functionality module. This allows > creating debugfs nodes even before the network drivers are probed, providing a > solution for this (debug nodes are also available after network driver > removal). > If dpipe does hold an answer here (e.g. provide preconfiguration which would > kick in when network device registers) then we might want to port all of our > register dump logic over there for this advantage. Does that sound reasonable? Can you consider using tracing for that particular purpose with trace filters that are specific to the device of interest? That should allow you to get events down from the Linux device driver model all the way to when your driver actually gets initialized. Is that an option? -- Florian
Re: qed*: debug infrastructures
Tue, Apr 25, 2017 at 05:44:10AM CEST, kubak...@wp.pl wrote: >On Mon, 24 Apr 2017 17:38:57 +, Elior, Ariel wrote: >> Hi Dave, > >Hi Ariel! > >I'm not Dave but let me share my perspective :) > >> According to the recent messages on the list indicating debugfs is not the >> way >> to go, I am looking for some guidance on what is. dpipe approach was >> mentioned as favorable, but I wanted to make sure molding our debug features >> to >> this infrastructure will result in something acceptable. A few points: >> >> 1. >> One of our HW debug features is a signal recording feature. HW is configured >> to >> output specific signals, which are then continuously dumped into a cyclic >> buffer on host. There are ~8000 signals, which can be logically divided to >> ~100 >> groups. I believe this can be modeled in dpipe (or similar tool) as a set of >> ~100 tables with ~80 entries each, and user would be able to see them all and >> choose what they like. The output data itself is binary, and meaningless to >> "the user". The amount of data is basically as large a contiguous buffer as >> driver can allocate, i.e. usually 4MB. When user selects the signals, and >> sets >> meta data regarding to mode of operations, some device configuration will >> have >> to take place. Does that sound reasonable? >> This debug feature already exists out of tree for bnx2x and qed* drivers and >> is >> *very* effective in field deployments. I would very much like to see this as >> an >> in-tree feature via some infrastructure or another. > >Sorry for even mentioning it, new debug interfaces would be cool, but >for FW/HW state dumps which are meaningless to the user why not just >use ethtool get-dump/set-dump? Do you really need the ability to >toggle those 8k signals one-by-one or are there reasonable sets you >could provide from the driver that you could encode on the available >32bits of flags? > >What could be useful would be some form of start/stop commands for >debugging to tell the driver/FW when to record events selected by >set-dump and maybe a way for the user to discover what dumps the driver >can provide (a'la ethtool private flags). > >> 2. >> Sometimes we want to debug the probe or removal flow of the driver. ethtool >> has >> the disadvantage of only being available once network device is available. >> If a >> bug stops the load flow before the ethtool debug paths are available, there >> is >> no way to collect a dump. Similarly, removal flows which hit a bug but do >> remove >> the network device, can't be debugged from ethtool. Does dpipe suffer from >> the >> same problem? qed* (like mlx*) has a common-functionality module. This allows >> creating debugfs nodes even before the network drivers are probed, providing >> a >> solution for this (debug nodes are also available after network driver >> removal). >> If dpipe does hold an answer here (e.g. provide preconfiguration which would >> kick in when network device registers) then we might want to port all of our >> register dump logic over there for this advantage. Does that sound >> reasonable? > >Porting the debug/dump infrastructure to devlink would be very much >appreciated. I'm not sure it would fit into dpipe or be a separate >command. Yeah. dpipe was designed to provide HW pipeline abstraction, based on match/action model. I think that for stats and debugging, it would make sense to introduce another devlink object. > >> 3. >> Yuval mentioned this, but I wanted to reiterate that the same is necessary >> for >> our storage drivers (qedi/qedf). debugfs does have the advantage of being non >> sub-system specific. Is there perhaps another non subsystem specific debug >> infrastructure which *is* acceptable to the networking subsystem? My guess is >> that the storage drivers will turn to debugfs (in their own subsystem). > >devlink is not ethernet-specific, it should be a good fit for iSCSI and >FCOE drivers, too. Yes. During the devlink design, I had the non-network usecase in mind. Devlink is the iface to be used for all who need it.
Re: qed*: debug infrastructures
On Mon, 24 Apr 2017 17:38:57 +, Elior, Ariel wrote: > Hi Dave, Hi Ariel! I'm not Dave but let me share my perspective :) > According to the recent messages on the list indicating debugfs is not the way > to go, I am looking for some guidance on what is. dpipe approach was > mentioned as favorable, but I wanted to make sure molding our debug features > to > this infrastructure will result in something acceptable. A few points: > > 1. > One of our HW debug features is a signal recording feature. HW is configured > to > output specific signals, which are then continuously dumped into a cyclic > buffer on host. There are ~8000 signals, which can be logically divided to > ~100 > groups. I believe this can be modeled in dpipe (or similar tool) as a set of > ~100 tables with ~80 entries each, and user would be able to see them all and > choose what they like. The output data itself is binary, and meaningless to > "the user". The amount of data is basically as large a contiguous buffer as > driver can allocate, i.e. usually 4MB. When user selects the signals, and sets > meta data regarding to mode of operations, some device configuration will have > to take place. Does that sound reasonable? > This debug feature already exists out of tree for bnx2x and qed* drivers and > is > *very* effective in field deployments. I would very much like to see this as > an > in-tree feature via some infrastructure or another. Sorry for even mentioning it, new debug interfaces would be cool, but for FW/HW state dumps which are meaningless to the user why not just use ethtool get-dump/set-dump? Do you really need the ability to toggle those 8k signals one-by-one or are there reasonable sets you could provide from the driver that you could encode on the available 32bits of flags? What could be useful would be some form of start/stop commands for debugging to tell the driver/FW when to record events selected by set-dump and maybe a way for the user to discover what dumps the driver can provide (a'la ethtool private flags). > 2. > Sometimes we want to debug the probe or removal flow of the driver. ethtool > has > the disadvantage of only being available once network device is available. If > a > bug stops the load flow before the ethtool debug paths are available, there is > no way to collect a dump. Similarly, removal flows which hit a bug but do > remove > the network device, can't be debugged from ethtool. Does dpipe suffer from the > same problem? qed* (like mlx*) has a common-functionality module. This allows > creating debugfs nodes even before the network drivers are probed, providing a > solution for this (debug nodes are also available after network driver > removal). > If dpipe does hold an answer here (e.g. provide preconfiguration which would > kick in when network device registers) then we might want to port all of our > register dump logic over there for this advantage. Does that sound reasonable? Porting the debug/dump infrastructure to devlink would be very much appreciated. I'm not sure it would fit into dpipe or be a separate command. > 3. > Yuval mentioned this, but I wanted to reiterate that the same is necessary for > our storage drivers (qedi/qedf). debugfs does have the advantage of being non > sub-system specific. Is there perhaps another non subsystem specific debug > infrastructure which *is* acceptable to the networking subsystem? My guess is > that the storage drivers will turn to debugfs (in their own subsystem). devlink is not ethernet-specific, it should be a good fit for iSCSI and FCOE drivers, too.
qed*: debug infrastructures
Hi Dave, According to the recent messages on the list indicating debugfs is not the way to go, I am looking for some guidance on what is. dpipe approach was mentioned as favorable, but I wanted to make sure molding our debug features to this infrastructure will result in something acceptable. A few points: 1. One of our HW debug features is a signal recording feature. HW is configured to output specific signals, which are then continuously dumped into a cyclic buffer on host. There are ~8000 signals, which can be logically divided to ~100 groups. I believe this can be modeled in dpipe (or similar tool) as a set of ~100 tables with ~80 entries each, and user would be able to see them all and choose what they like. The output data itself is binary, and meaningless to "the user". The amount of data is basically as large a contiguous buffer as driver can allocate, i.e. usually 4MB. When user selects the signals, and sets meta data regarding to mode of operations, some device configuration will have to take place. Does that sound reasonable? This debug feature already exists out of tree for bnx2x and qed* drivers and is *very* effective in field deployments. I would very much like to see this as an in-tree feature via some infrastructure or another. 2. Sometimes we want to debug the probe or removal flow of the driver. ethtool has the disadvantage of only being available once network device is available. If a bug stops the load flow before the ethtool debug paths are available, there is no way to collect a dump. Similarly, removal flows which hit a bug but do remove the network device, can't be debugged from ethtool. Does dpipe suffer from the same problem? qed* (like mlx*) has a common-functionality module. This allows creating debugfs nodes even before the network drivers are probed, providing a solution for this (debug nodes are also available after network driver removal). If dpipe does hold an answer here (e.g. provide preconfiguration which would kick in when network device registers) then we might want to port all of our register dump logic over there for this advantage. Does that sound reasonable? 3. Yuval mentioned this, but I wanted to reiterate that the same is necessary for our storage drivers (qedi/qedf). debugfs does have the advantage of being non sub-system specific. Is there perhaps another non subsystem specific debug infrastructure which *is* acceptable to the networking subsystem? My guess is that the storage drivers will turn to debugfs (in their own subsystem). Thanks, Ariel