On Fri, May 16, 2025 at 03:34:55PM -0700, Marc Herbert wrote: > On 2025-05-15 23:08, Dan Williams wrote: > > alison.schofield@ wrote: > > >> > >> +wait_for_logfile_update() > >> +{ > >> + local file="$1" > >> + local prev_size="$2" > >> + local timeout=30 > >> + local i=0 > >> + > >> + # prev_size is always zero because start_monitor truncates it. > >> + # Set and check against it anyway to future proof. > >> + while [ $i -lt $timeout ]; do > >> + local new_size=$(stat -c%s "$file" 2>/dev/null || echo 0) > >> + if [ "$new_size" -gt "$prev_size" ]; then > >> + return 0 > >> + fi > >> + sleep 0.1 > >> + i=$((i+1)) > >> + done > >> + > >> + echo "logfile not updated within 3 seconds" > >> + err "$LINENO" > > > > Hmm... not a fan of this open coded "wait for file to change" bash > > function. This feels like something that a tool can do... (searches) > > > > Does inotifywait fit the bill here? > > > > https://linux.die.net/man/1/inotifywait > > If inotify works, go for it. Blocking is always better than polling. It > might be tricky because the file does not exist yet. Create an empty > file yourself first, would that work? Probably not if ndctl monitor > creates a brand new file. > > If inotify does not work, consider adding to test/common this generic > polling function that lets you poll in bash literally anything: > > https://github.com/pmem/run_qemu/pull/177/files > > It would require making `prev_size` global which does not look like an > issue to me. > > Before adding it run_qemu, that polling function has been used for years > and thousands of runs in > https://github.com/thesofproject/sof-test/blob/main/case-lib/lib.sh > I mean it's been extremely well tested. > > Even if you don't need polling here, it's unfortunately fairly common to > have to poll from shell scripts. Why I'm suggesting a test/common > location.
Thanks. I'm not sure what other polling type work is going on in the tests, but I'm sure if any, it'll come out of the woodwork now that we're mentioning it. I did drop the polling. Not that I thought polling was so horrid, but mostly because I found something better to 'look' for, so using 'tail -F' to watch the logfile made more sense. > > >> +} > >> + > >> start_monitor() > >> { > >> logfile=$(mktemp) > >> $NDCTL monitor -c "$monitor_conf" -l "$logfile" $1 & > >> monitor_pid=$! > >> - sync; sleep 3 > >> + sync > >> + for i in {1..30}; do > >> + if ps -p "$monitor_pid" > /dev/null; then > >> + sleep 0.1 > >> + break > >> + fi > >> + sleep 0.1 > >> + done > >> + if ! ps -p "$monitor_pid" > /dev/null; then > >> + echo "monitor not ready within 3 seconds" > >> + err "$LINENO" > >> + fi > > > > This does not make sense to. The shell got the pid from the launching > > the executable. This is effectively testing that bash command execution > > works. About the only use I can imagine for this is checking that the > > monitor did not die early, but that should be determined by other parts > > of the test. > > Agreed: I'm afraid the only thing this code does is sleeping 0.1s only > once instead of 3s. Because not sleeping at all worked for you, no > surprise a single sleep 0.1 works too. > > I suspect the only case where the "for" loop actually iterates is when the > "monitor" process crashes extremely fast, faster than the > "sync". Basically racing with its parent to crash before the latter > notices. That race does not look like a "feature" to me. > > I agree this should be replaced by observing side-effects from the > monitor. Dunno what. grep something in the ndctl monitor -v output? > > > By the way, UNTESTED: > Is this because in your testing the monitor was left dangling and the test hanging? I suspect the that the cleanup on err was missing, and I added it my v3 patch. > --- a/test/monitor.sh > +++ b/test/monitor.sh > @@ -67,7 +67,19 @@ check_result() > > stop_monitor() > { > - kill $monitor_pid > + kill $monitor_pid || die "monitor $monitor_pid was already dead" > + > + local ret=0 > + timeout 3 wait $monitor_pid || ret=$? > + case "$ret") in > + 124) # timeout > + die "monitor $monitor_pid ignored SIGTERM" ;; > + 0|127) # either success or killed fast > + : ;; > + *) > + die "unexpected monitor exit status:$ret" ;; > + esac > + > rm "$logfile" > } > > > Something like that... > > This is all assuming the monitor does not have any kind of "remote > control"; that would be much better. > > I don't know the "monitor", but if it has neither obvious > side-effects nor any kind of "remote control", then it does a "great" > job... getting in the way of tests! :-( Maybe the monitor is what should > be fixed rather than adding shell script "creativity"?