On Fri, May 16, 2025 at 03:34:55PM -0700, Marc Herbert wrote:
> On 2025-05-15 23:08, Dan Williams wrote:
> > alison.schofield@ wrote:
> 
> >>  
> >> +wait_for_logfile_update()
> >> +{
> >> +  local file="$1"
> >> +  local prev_size="$2"
> >> +  local timeout=30
> >> +  local i=0
> >> +
> >> +  # prev_size is always zero because start_monitor truncates it.
> >> +  # Set and check against it anyway to future proof.
> >> +  while [ $i -lt $timeout ]; do
> >> +          local new_size=$(stat -c%s "$file" 2>/dev/null || echo 0)
> >> +          if [ "$new_size" -gt "$prev_size" ]; then
> >> +                  return 0
> >> +          fi
> >> +          sleep 0.1
> >> +          i=$((i+1))
> >> +  done
> >> +
> >> +  echo "logfile not updated within 3 seconds"
> >> +  err "$LINENO"
> > 
> > Hmm... not a fan of this open coded "wait for file to change" bash
> > function. This feels like something that a tool can do... (searches)
> > 
> > Does inotifywait fit the bill here?
> > 
> > https://linux.die.net/man/1/inotifywait
> 
> If inotify works, go for it. Blocking is always better than polling.  It
> might be tricky because the file does not exist yet. Create an empty
> file yourself first, would that work? Probably not if ndctl monitor
> creates a brand new file.
> 
> If inotify does not work, consider adding to test/common this generic
> polling function that lets you poll in bash literally anything:
> 
> https://github.com/pmem/run_qemu/pull/177/files
> 
> It would require making `prev_size` global which does not look like an
> issue to me.
> 
> Before adding it run_qemu, that polling function has been used for years
> and thousands of runs in
> https://github.com/thesofproject/sof-test/blob/main/case-lib/lib.sh
> I mean it's been extremely well tested.
> 
> Even if you don't need polling here, it's unfortunately fairly common to
> have to poll from shell scripts. Why I'm suggesting a test/common
> location.

Thanks. I'm not sure what other polling type work is going on in
the tests, but I'm sure if any, it'll come out of the woodwork
now that we're mentioning it.

I did drop the polling. Not that I thought polling was so horrid,
but mostly because I found something better to 'look' for, so using
'tail -F' to watch the logfile made more sense.



> 
> >> +}
> >> +
> >>  start_monitor()
> >>  {
> >>    logfile=$(mktemp)
> >>    $NDCTL monitor -c "$monitor_conf" -l "$logfile" $1 &
> >>    monitor_pid=$!
> >> -  sync; sleep 3
> >> +  sync
> >> +  for i in {1..30}; do
> >> +          if ps -p "$monitor_pid" > /dev/null; then
> >> +                  sleep 0.1
> >> +                  break
> >> +          fi
> >> +          sleep 0.1
> >> +  done
> >> +  if ! ps -p "$monitor_pid" > /dev/null; then
> >> +          echo "monitor not ready within 3 seconds"
> >> +          err "$LINENO"
> >> +  fi
> > 
> > This does not make sense to. The shell got the pid from the launching
> > the executable. This is effectively testing that bash command execution
> > works. About the only use I can imagine for this is checking that the
> > monitor did not die early, but that should be determined by other parts
> > of the test.
> 
> Agreed: I'm afraid the only thing this code does is sleeping 0.1s only
> once instead of 3s. Because not sleeping at all worked for you, no
> surprise a single sleep 0.1 works too.
> 
> I suspect the only case where the "for" loop actually iterates is when the
> "monitor" process crashes extremely fast, faster than the
> "sync". Basically racing with its parent to crash before the latter
> notices. That race does not look like a "feature" to me.
> 
> I agree this should be replaced by observing side-effects from the
> monitor. Dunno what. grep something in the ndctl monitor -v output?
> 
> 
> By the way, UNTESTED:
> 

Is this because in your testing the monitor was left dangling and
the test hanging? I suspect the that the cleanup on err was missing,
and I added it my v3 patch.



> --- a/test/monitor.sh
> +++ b/test/monitor.sh
> @@ -67,7 +67,19 @@ check_result()
>  
>  stop_monitor()
>  {
> -     kill $monitor_pid
> +     kill $monitor_pid || die "monitor $monitor_pid was already dead"
> +
> +     local ret=0
> +     timeout 3 wait $monitor_pid || ret=$?
> +     case "$ret") in
> +     124) # timeout
> +             die "monitor $monitor_pid ignored SIGTERM" ;;
> +     0|127) # either success or killed fast
> +             : ;;
> +     *)
> +             die "unexpected monitor exit status:$ret" ;;
> +     esac
> +
>       rm "$logfile"
>  }
> 
> 
> Something like that...
> 
> This is all assuming the monitor does not have any kind of "remote
> control"; that would be much better.
> 
> I don't know the "monitor", but if it has neither obvious
> side-effects nor any kind of "remote control", then it does a "great"
> job... getting in the way of tests! :-( Maybe the monitor is what should
> be fixed rather than adding shell script "creativity"?

Reply via email to