On Mon, Nov 23, 2015 at 02:48:35PM +0200, Martin Peres wrote: > On 02/11/15 16:57, [email protected] wrote: > >This adds a policy which advises when user should reboot system to avoid > >noisy test results due to system becoming unstable, for instance, and > >therefore continues testing successfully. > >To do this, a new Dmesg class is proposed which is not filtering dmesg and > >monitors whether or not one of the following event occurs: > >- gpu reset failed (not just gpu reset happened, that happens way too > >often and many tests even provoke hangs intentionally) > >- gpu crash, > >- Oops: > >- BUG > >- lockdep splat that causes the locking validator to get disabled > >If one of these issues happen, piglit test execution is stopped > >-terminating test thread pool- and exit with code 3 to inform that reboot > >is advised. > >Then test execution resume, after rebooting system or not, is done like > >usually with command line parameter "resume". > > > >To call it, use command line parameter: --dmesg monitored > > Hello Yann, > > The rationale behind this patch is very sound and we need something like > this. Here are however a list of nitpicks: > > - Please send patches with git send-email, otherwise, it makes it > impossible for us to comment inline which is the usual process for patch > review. Please re-send :) > > - varaiable -> variable; double space after "when a reboot may be required" > > - I am not a big fan of changing the semantic of arguments that have been > there forever. Can you think of a case where the user would not want the > test to abort if we reach a state where we cannot trust the result? I am > including Dylan on this. Also, if we are to keep these modes, can we rename > the "simple" mode to "warning" and "monitored" to "abort"? This would make > more sense and clearly state the goal of the modes.
Ilia, Daniel, Thomas, Glenn, I know that y'all use the dmesg support. What do you think? > > - I see a potential issue with the dmesg-monitoring class, it can generate > false positives. Indeed, it completely ignores the fact that there may be > more than one GPU installed and that it may come from a different vendor. It > would be good to know which gpu the test got run on and only track this gpu > for bugs. This is however a complex task and would require env_dump to be > done properly. I guess we could pass an argument to env_dump to track the > gpu-vendor-dependent codepath and leave the general cases (Oops, BUG and > lockdep) up to python. Anyway, better safe than sorry in the mean time! > > - I am not a big fan of using dmesg to get information out of the kernel > because the output format may change. How about listening for udev events > coming from the gpu or reading i915_error_state/i915_wedged ? > > That's it for the first round :) > > Thanks for the proposal! > Martin
signature.asc
Description: PGP signature
_______________________________________________ Piglit mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/piglit
