Re: [Piglit] [PATCH][RFC] Add dmesg option for reboot policy

Martin Peres Mon, 23 Nov 2015 04:49:35 -0800

On 02/11/15 16:57, [email protected] wrote:

This adds a policy which advises when user should reboot system to avoid
noisy test results due to system becoming unstable, for instance, and
therefore continues testing successfully.
To do this, a new Dmesg class is proposed which is not filtering dmesg and
monitors whether or not one of the following event occurs:
- gpu reset failed (not just gpu reset happened, that happens way too
often and many tests even provoke hangs intentionally)
- gpu crash,
- Oops:
- BUG
- lockdep splat that causes the locking validator to get disabled
If one of these issues happen, piglit test execution is stopped
-terminating test thread pool- and exit with code 3 to inform that reboot
is advised.
Then test execution resume, after rebooting system or not, is done like
usually with command line parameter "resume".


To call it, use command line parameter: --dmesg monitored


Hello Yann,

The rationale behind this patch is very sound and we need something likethis. Here are however a list of nitpicks:

- Please send patches with git send-email, otherwise, it makes itimpossible for us to comment inline which is the usual process for patchreview. Please re-send :)


- varaiable -> variable; double space after "when a reboot may be required"

- I am not a big fan of changing the semantic of arguments that havebeen there forever. Can you think of a case where the user would notwant the test to abort if we reach a state where we cannot trust theresult? I am including Dylan on this. Also, if we are to keep thesemodes, can we rename the "simple" mode to "warning" and "monitored" to"abort"? This would make more sense and clearly state the goal of themodes.

- I see a potential issue with the dmesg-monitoring class, it cangenerate false positives. Indeed, it completely ignores the fact thatthere may be more than one GPU installed and that it may come from adifferent vendor. It would be good to know which gpu the test got run onand only track this gpu for bugs. This is however a complex task andwould require env_dump to be done properly. I guess we could pass anargument to env_dump to track the gpu-vendor-dependent codepath andleave the general cases (Oops, BUG and lockdep) up to python. Anyway,better safe than sorry in the mean time!

- I am not a big fan of using dmesg to get information out of thekernel because the output format may change. How about listening forudev events coming from the gpu or reading i915_error_state/i915_wedged ?


That's it for the first round :)

Thanks for the proposal!
Martin
_______________________________________________
Piglit mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/piglit

Re: [Piglit] [PATCH][RFC] Add dmesg option for reboot policy

Reply via email to