Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-09 Thread Peter
On Wed, Dec 09, 2020 at 02:00:37PM +1100, Dewayne Geraghty wrote:

! On a jail with config:
! exec.start = "/bin/sh -x /etc/rc";
! exec.stop = "/bin/sh /etc/rc.shutdown";
! exec.clean;
! 
! test_prod  { jid=7; persist; ip4.addr =
! "10.0.7.96,10.0.5.96,127.0.5.96"; devfs_ruleset = "6";
! host.hostuuid=---0001-0302; host.hostid=000302; }
! 
! I successfully performed
! for i in `seq 10`; do jail -vc test_prod; sleep 3; jail -vr test_prod; done

But, this is not a VIMAGE jail, is it?
Old-style jails are unaffected by this issue. Only VIMAGE jails, using
epair or netgraph, might be affected. (In that case, you would not
have an "ip4.addr" configured, and rather a "vnet.interface".)

! I think the normal use of jail.conf is to NOT explicitly use a jid in
! the definition, which may be why this may not have been picked up?
! (Maybe a clue).

This is an interesting point. When you stop a jail, it may stay for
a more or less long time in a "dying" state (visible with "jls -d"),
keeping the jid occupied. During that time, the jail cannot be
restarted with that same jid.
Once ago, I read people complaining about this, and the advice was to
just not define the jid in the definition, so that the jail can be
restarted immediately (and will probably grab another jid).

I did not find a solid explanation for what is happening in that
"dying" state (and why it does take more or less long), even less
an approach to fix that. I found some theories circling the net, but
these don't really figure. So I would need to look into the source
myself - and I did postpone that indefinitely. ;)

But what I found out, with the VIMAGE jails (those that can carry
their own network interfaces), when you make a slight mistake with
managing and handling the interfaces, then the jail will stay in the
dying state forever. If you don't make a mistake, then it will finally
die within some time.
So I decided to keep the jid, so that rightaway nothing is allowed to
linger from misconfigured unnoticed. (The tradeoff is obviousely that
one might have to wait before restarting.)

cheerio,
PMc

P.S. 41 celsius is phantastic! I envy You! :)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-09 Thread Kristof Provost

Peter,

I’m not interested in discussing software development methodology 
here.


Please drop me from this thread. Let me know if/when you have a test 
case I can work from.


Regards,
Kristof

On 9 Dec 2020, at 11:54, Peter wrote:


On Tue, Dec 08, 2020 at 07:51:07PM -0600, Kyle Evans wrote:

! You seem to have misinterpreted this; he doesn't want to narrow it
! down to one bug, he wants simple steps that he can follow to 
reproduce


Maybe I did misinterpret, but then I don't really understand it.
I would suppose, when testing a proposed fix, the fact that it
does break under the exact same conditions as before, is all the
information needed at that point. Put in simple words: that it does
not work.

! any failure, preferably steps that can actually be followed by just
! about anyone and don't require immense amounts of setup time or
! additional hardware.

Engineering does not normally work that way.

I'll try to explain: when a bug is first encountered, it is necessary
to isolate it insofar that somebody who is knowledgeable of the code,
can actually reproduce it, in order to have a look at it and analyze
what causes the mis-happening.

If then a remedy is devised, and that does not work as expected, then
the flaw is in the analysis, and we just start over from there.

In fact, I would have expected somebody who is trying to fix such
kind of bug, to already have testing tools available and tell me
exactly which kind of data I might retrieve from the dumps.

The open question now is: am I the only one seeing these failures?
Might they be attributed to a faulty configuration or maybe hardware
issues or whatever?
We cannot know this, we can only watch out what happens at other
sites. And that is why I sent out all these backtraces - because they
appear weird and might be difficult to associate with this issue.

I don't think there is much more we can do at this point, unless we
were willing to actually look into the details.


Am I discouraging? Indeed, I think, engineering is discouraging by
it's very nature, and that's the fun of it: to overcome odds and
finally maybe make things better. And when we start to forget about
that, bad things begin to happen (anybody remember Apollo 13?).

But talking about disencouragement: I usually try to track down
defects I encounter, and, if possible, do a viable root-cause
analysis. I tended to be very willing to share the outcomes and. if
a solution arises, by all means make that get back into the code base;
but I found that even ready made patches for easy matters would
linger forever in the sendbug system without anybody caring, or, in
more complex cases where I would need some feedback from the original
writer, if only to clarify the purpose of some defaults or verify
than an approach is viable, that communication is very difficult to
establish. And that is what I would call disencouraging, and I for
my part have accepted to just leave the developers in their ivory
tower and tend to my own business.


cheerio,
PMc

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-09 Thread Peter
On Tue, Dec 08, 2020 at 07:51:07PM -0600, Kyle Evans wrote:
 
! You seem to have misinterpreted this; he doesn't want to narrow it
! down to one bug, he wants simple steps that he can follow to reproduce

Maybe I did misinterpret, but then I don't really understand it.
I would suppose, when testing a proposed fix, the fact that it
does break under the exact same conditions as before, is all the
information needed at that point. Put in simple words: that it does
not work.

! any failure, preferably steps that can actually be followed by just
! about anyone and don't require immense amounts of setup time or
! additional hardware.

Engineering does not normally work that way. 

I'll try to explain: when a bug is first encountered, it is necessary
to isolate it insofar that somebody who is knowledgeable of the code,
can actually reproduce it, in order to have a look at it and analyze
what causes the mis-happening.

If then a remedy is devised, and that does not work as expected, then
the flaw is in the analysis, and we just start over from there.

In fact, I would have expected somebody who is trying to fix such
kind of bug, to already have testing tools available and tell me
exactly which kind of data I might retrieve from the dumps.

The open question now is: am I the only one seeing these failures?
Might they be attributed to a faulty configuration or maybe hardware
issues or whatever?
We cannot know this, we can only watch out what happens at other
sites. And that is why I sent out all these backtraces - because they
appear weird and might be difficult to associate with this issue.

I don't think there is much more we can do at this point, unless we
were willing to actually look into the details.


Am I discouraging? Indeed, I think, engineering is discouraging by
it's very nature, and that's the fun of it: to overcome odds and
finally maybe make things better. And when we start to forget about
that, bad things begin to happen (anybody remember Apollo 13?). 

But talking about disencouragement: I usually try to track down
defects I encounter, and, if possible, do a viable root-cause
analysis. I tended to be very willing to share the outcomes and. if
a solution arises, by all means make that get back into the code base;
but I found that even ready made patches for easy matters would
linger forever in the sendbug system without anybody caring, or, in
more complex cases where I would need some feedback from the original
writer, if only to clarify the purpose of some defaults or verify
than an approach is viable, that communication is very difficult to
establish. And that is what I would call disencouraging, and I for
my part have accepted to just leave the developers in their ivory
tower and tend to my own business.


cheerio,
PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"