On Thu, May 21, 2015 at 9:47 PM, Antti Kantee <[email protected]> wrote:
> On 21/05/15 03:31, Ryota Ozaki wrote:
>>>
>>> As it happens, last night I needed to get the dmesg from a rump kernel
>>> running on bare metal, so I made a trivial adjustment to make the
>>> kern.msgbuf sysctl node available. Now you can get the log slightly more
>>> easily than with gdb by using sysctl -r kern.msgbuf (I assume hijacking
>>> dmesg(1) would also work on NetBSD).
>>
>>
>> rump.sysctl -r kern.msgbuf works (though some NULL bytes appears before
>> actual
>> kernel logs),
>
>
> That's expected, that's how kern.msgbuf works (before the ringbuffer fills,
> after which you encounter another weird issue).
I see.
>
>> however, dmesg with hijacking doesn't. dmesg still shows
>> host's kernel logs.
>
>
> Are you hijacking the sysctl system call, i.e. RUMPHIJACK=sysctl dmesg?
> (unlike with path- or fd based system calls, hijacking sysctl is
> all-or-nothing -- it might be possible to add some pseudo-mib path handling
> along the sysctl hijacker so that only rump.foo.bar gets hijacked, but don't
> really see the point of using a few days to do so).
It works! I didn't know that option.
Fine-grain sysctl MIB hijacking isn't required for me at this point.
>
>>> Can you sketch a bit how you'd integrate the feature with ATF? For
>>> example,
>>> do you plan to always include the log of all rump kernels started by ATF
>>> in
>>> the test output (how?) and leave parsing to a human reading the logs, or
>>> will the logcat be executed only by a failure handler, or something
>>> different?
>>
>>
>> I thought I use kernel logs for debugging a kernel with ATF tests. So I
>> wouldn't output kernel logs by default in ATF (it may be useful though).
>>
>> I imagined the following scenario:
>> - Modify the code of the kernel and test it with ATF tests
>> - Find a regression via an ATF test
>> - Want to debug the kernel with the ATF test
>> - Try printf debugging of the kernel (add printf to the code)
>> - Let rump_server(s) dump kernel logs at the cleanup phase (or somewhere)
>> (it may be done by just enabling a debug flag of the test if it
>> supports
>> or adding some code to the test if not.)
>> - Run the ATF test again, see the output and debug it
>>
>> If printf debugging is not sufficient, move to use of gdb, though I have
>> no idea on using gdb with ATF tests easily.
>
>
> Using gdb with ATF is very easy. I added support to ATF so that it
> internally uses gdb to print a stack trace in case a C test program creates
> a coredump. So every time a test program crashes, you are using gdb with
> ATF ;)
Heh, I didn't know that.
Inspired by that, I'm adding this code in the cleanup phase:
if [ -f rump_server.core ]; then
gdb -batch -ex bt /usr/bin/rump_server rump_server.core
fi
It's helpful :)
>
> More seriously though, IMO the biggest usability problem with ATF is that
> while it works great when the tests work or fail in expected ways, it's
> difficult to debug the tests (or system under test) when that's not the
> case. It used to be completely impossible to use gdb with ATF tests, at
> least now it's somewhat possible.
>
> So, yes, I completely agree that being able to iterate with printf debugging
> would solve >50% of "debugging the test" problems, at least for tests which
> use rump kernels and when the problem is in a kernel component.
>
>> Support of kernel log output in ATF would be like this:
>> http://www.netbsd.org/~ozaki-r/atf-dmesg.diff
>
>
> Isn't it very inconvenient to do that dance individually for every test?
I imagined we change only failed tests when they fail one by one, but
of course changing every tests is painful!
> Also, I can imagine log-cat support going out-of-date in tests if the test
> is normally run without DEBUG. Furthermore, it would be desirable to be
> able to enable dmesg output in any [conforming] test without having to start
> modifying the test.
Well, yes, sometimes I annoy that outputs of failed tests don't help to know
what happens on them. More (debug) outputs by default may be better.
>
> So, I am thinking that maybe there should be some higher-level construct for
> running rump_server in ATF tests, something like atf_rump_server. And I'm
> thinking that once that higher level construct has been specified after
> short experimentation, we might notice that -L is not really what we wanted
> (or we might notice that it is).
I'm not sure how we implement such functions yet, I feel we need some
support to write ATF tests with rump kernels. I repeated myself many times
when I'm writing tests: defining sockets (and buses) for each server,
starting rump_servers with such items, selecting a server by RUMP_SERVER
for each operation, setting LD_PRELOAD, halting servers, etc.
If there are utility functions, we may be able to add the log-cat feature
to them as well.
>
> Now, understandably, defining such testing abstractions may not be what you
> want to spend time on now, though I think it would quickly start saving a
> lot of time, especially if you want to introduce the log capability to a
> large number of tests. So, if you or nobody else has any ideas on what the
> higher level construct should be, I can add support for -L. ... anyone?
My opinion at this point is that something like -L option is needed anyway,
but of course more smart solution is welcome if exist.
>
>>> I think -L is fine, but I'd like to see at least one concrete example --
>>> and
>>> preferably more than one -- on how you plan to use the feature to make
>>> sure
>>> -L is really the best possible way instead of just fine.
>>
>>
>> Well, another example would be to use kernel logs in test itself,
>> e.g., atf_check -s exit:0 -o match:'something' dmesg. I'm not sure
>> such tests are proper.
>
>
> I think they not only are proper, they also are desirable. Sometimes there
> is no good way to retrieve information from the kernel. For example, I
> think dev/scsipi/t_cd.c would benefit from that capability.
>
> The counter-argument one hears is that then you can't change kernel printfs,
> but who really changes printfs that often, and even if they are changed, the
> test can be quickly fixed to conform.
Sure. It's easy to change messages compared with changing behavior.
ozaki-r
>
> - antti
>