Richard Purdie <[email protected]> writes:

> On Tue, 2023-12-19 at 16:44 +0000, Alex Bennée wrote:
>> Richard Purdie <[email protected]> writes:
>> 
>> > On Tue, 2023-12-19 at 15:10 +0000, Alex Bennée wrote:
>> > > "Richard Purdie" <[email protected]> writes:
>> > > 
>> > > > Similar to stdout in the previous commit, we need to ensure serial 
>> > > > output
>> > > > if written is read and put somewhere, else qemu might block on writes 
>> > > > to
>> > > > the serial port leading to hangs in the kernel. Use our existing 
>> > > > logging
>> > > > thread to log data when run_serial is not in use.
>> > > > 
>> > > > Signed-off-by: Richard Purdie <[email protected]>
>> > > 
>> > > I'm still seeing hanging with my test case:
>> > > 
>> > >   o regression_1010.5 Invalid memory access 5
>> > >     regression_1010.5 OK
>> > >   o regression_1010.6 Invalid memory access 6
>> > > 
>> > >   Stdout:
>> > >   Tried to dump info from target but serial console failed
>> > >   Failed CMD: top -bn1
>> > >   Tried to dump info from target but serial console failed
>> > >   Failed CMD: ps
>> > >   Tried to dump info from target but serial console failed
>> > >   Failed CMD: free
>> > > 
>> > >   NOTE: test_soafee (soafeetestsuite.SoafeeTestSuite.test_soafee)
>> > >   DEBUG: Checking if at least one of soafee-test-suite is installed
>> > >   DEBUG: Setting up a 400 second(s) timeout
>> > >   DEBUG: [Running]$ ssh -l root -o ServerAliveCountMax=2 -o 
>> > > ServerAliveInterval=30 -o UserKnownHostsFile=/dev/null -o 
>> > > StrictHostKeyChecking=no -o LogLevel=ERROR -p 2222 127.0.0.1 export 
>> > > PATH=/usr/sbin:/sbin:/usr/bin:/bin; sudo -i -u trs soafee-test-suite run 
>> > > -r -t -T ; rm -rf /tmp/soafee-test-suite
>> > >   DEBUG: Waiting for process output: time: 1702998407.0291, endtime: 
>> > > 1702998707.0285847
>> > >   DEBUG: Waiting for process output: time: 1702998412.0338314, endtime: 
>> > > 1702998707.0285847
>> > >   DEBUG: Waiting for process output: time: 1702998417.0378022, endtime: 
>> > > 1702998707.0285847
>> > >   DEBUG: Waiting for process output: time: 1702998422.0417945, endtime: 
>> > > 1702998707.0285847
>> > >   <snip>
>> > > 
>> > > The regression tests are hanging because that has triggered output on
>> > > the serial port as the firmware attempted to service the test and the
>> > > port isn't being drained. Then when that times out we start spinning
>> > > waiting for more output.
>> > 
>> > I don't really understand where it might be hanging unfortunately as in
>> > my local tests it is draining stdout and the serial ports. Does it
>> > login to the serial console correctly? Are there any WARNING messages?
>> > 
>> > I did use this debug patch:
>> > 
>> > https://git.yoctoproject.org/poky-contrib/commit/?h=rpurdie/t222&id=0d9029bad610736c114e0770d2433b53133fa620
>> > 
>> > to prove that it can interleave serial commands with left over output
>> > successfully. A similar patch may help work out where your setup may be
>> > hanging?
>> 
>> Yeah I'm only seeing A10/A11:
>> 
>>   o regression_1010.5 Invalid memory access 5
>>     regression_1010.5 OK
>>   o regression_1010.6 Invalid memory access 6
>> 
>>   WARNING: here A10
>>   WARNING: here A11
>>   WARNING: here A10
>>   WARNING: here A11
>>   WARNING: here A10
>>   WARNING: here A11
>>   WARNING: here A10
>>   WARNING: here A11
>>   WARNING: here A10
>>   WARNING: here A11
>>   WARNING: here A10
>>   WARNING: here A11
>>   WARNING: here A10
>>   WARNING: here A11
>>   WARNING: here A10
>>   WARNING: here A11
>>   WARNING: here A10
>>   WARNING: here A11
>>   WARNING: here A10
>>   WARNING: here A11
>>   NOTE:  ... FAIL
>>   DEBUG: Removed SIGALRM handler
>>   Traceback (most recent call last):
>>     File 
>> "/home/alex/lsrc/tests/trs/build/../poky/meta/lib/oeqa/core/decorator/__init__.py",
>>  line 35, in wrapped_f
>>       return func(*args, **kwargs)
>>              ^^^^^^^^^^^^^^^^^^^^^
>>     File 
>> "/home/alex/lsrc/tests/trs/build/../poky/meta/lib/oeqa/core/decorator/__init__.py",
>>  line 35, in wrapped_f
>>       return func(*args, **kwargs)
>>              ^^^^^^^^^^^^^^^^^^^^^
>>     File 
>> "/home/alex/lsrc/tests/trs/meta-trs/lib/oeqa/runtime/cases/opteetest.py", 
>> line 27, in test_opteetest_xtest
>>       self.assertEqual(status, 0, msg='\n'.join([cmd, output]))
>>   AssertionError: 255 != 0 :  xtest               
>>   Run test suite with level=0
>
> How long does it take for the test data to "back up" and timeout? The
> logging thread should take a maximum of 2s to notice data not being
> read/handled on the serial port.

I don't think it goes around the logging loop more than once. I only
see:

  grep "got" 
build/tmp_trs-qemuarm64/work/trs_qemuarm64-trs-linux/trs-image/1.0/temp/log.do_testimage
  DEBUG: got [(12, 1)]

from the poll.

>
> What the code should be doing is noticing the run_serial function isn't
> using the port and if there is data there, reading it in the logging
> thread. You could put further warnings around the code there to see if
> it ever detects data to read on that port. A warning when it exits the
> run_serial command may help too to see if it is waiting for data there
> or not.
>
> I also ended up using:
>
> https://git.yoctoproject.org/poky-contrib/commit/?h=rpurdie/t222&id=febb0d892e4c3648cd88b8ecb9c0260edc45b68e
>
> to show what was going over the serial interface when I found it wasn't
> what I expected.

I'll keep digging.

>
> I suspect/hope my patch is a step in the right direction but it looks
> like there may be cases it doesn't catch yet :/.
>
> Cheers,
>
> Richard

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#192785): 
https://lists.openembedded.org/g/openembedded-core/message/192785
Mute This Topic: https://lists.openembedded.org/mt/103244415/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to