Re: netinet & netpfil tests failing

2022-01-18 Thread Gleb Smirnoff
On Mon, Jan 17, 2022 at 06:07:43PM -0800, Gleb Smirnoff wrote:
T> * Reclaim the memory from pcb zone(s), when jail is destroyed, returning back
T>   the old behaviour that with test suites 'jail -r' is always synchronous.
T>   Some prerequisites for this approach are here: 
https://reviews.freebsd.org/D33868
T> * Protect jails with epoch, bypass the cred pointer in inpcb and in the 
lookup
T>   check inp->inp_prison->pr_foo. After that the crfree() can be moved back 
to the
T>   immediate inpcb free procedure.  Mark has a quick & dirty proof of concept 
for
T>   this approach.
T> * In the test suite destroy the interface from the jail:
T>   'jexec jname ifconfig ${ifname} destroy'.
T> 
T> I'd like to add a few words on the last option.  To me it seems most elegant 
as we
T> are improving the test suite instead of changing kernel to meet demands of 
the suite.
T> However, it doesn't work :( Why? Why does 'jexec jname ifconfig epair0b 
destroy' or
T> 'jexec jname ifconfig lo1 destroy' returns ENXIO? Because the interface was 
created
T> within vnet0 and is linked on vnet0 cloner's list. To repeat: epair0b ifnet 
is linked
T> to the jail's list of network interfaces, but it linked on vnet0 list of 
epair(4)
T> ifcloner.  Likewise, some lo4 interface would also be in the jail list of 
interfaces,
T> but on vnet0 if_loop cloner.  This makes it impossible to destroy such 
interface from
T> inside the jail.  Neither it is possible to destroy it from the outside, for 
obvious
T> reasons.  There are more side effects about this.  For example the only 
reason why we
T> can't create an interface with the same name inside a jail using its cloner 
list is
T> call to ifunit() in the beginning of if_clone_createif().  This definitely 
is a part
T> of design, since if_clone_create()/if_clone_destroy() would lookup vnet0 
cloner list
T> in case if interface is not found on the current vnet list.
T> 
T> To put it short, it is yet another problem created by if_vmove :( Not an 
easy one
T> to fix and makes the third approach to the problem complicated.

Looking more into the problem, I've found pieces of code that confirm that the 
clone
behavior of staying at home vnet cloner when vmoved is known and seems to be 
part of
design. I created a patch to better workaround this fact and be able to properly
destroy a once vmove'd clone:

https://reviews.freebsd.org/D33942

Followed by earlier suggested change to the test suite:

https://reviews.freebsd.org/D33943

With these two patches in place I have all of the test suite fixed. Waiting for
your reviews to push them in.

-- 
Gleb Smirnoff



Re: netinet & netpfil tests failing

2022-01-18 Thread Marko Zec
On Mon, 17 Jan 2022 18:07:43 -0800
Gleb Smirnoff  wrote:

>   Hi,
> 
> just to remind that I'm responsible for multiple tests failing and to
> refresh the context, kind of explaining why the hell they aren't fixed
> yet?! The long old discussion can be found in this thread in December,
> link to last message:
> 
> https://lists.freebsd.org/archives/dev-commits-src-main/2021-December/002581.html
> 
> Summarized refreshed context follows.
> 
> The reason for failing tests is complex. There a constellation of
> factors [bugs] that attribute to it:
> 
> * Jails are reference counted and jail destroy may be delayed. Test
> suite usually didn't trigger delayed jail destroy and expectation of
> many tests is that immediately after 'jail -r' all resources are
> released, especially network interfaces in a jail are if_vmove()'d to
> vnet0.
> * My original change to inpcb database protection ignored the fact
> that inp->inp_cred->cr_prison is dereferenced and read during a fast
> pcb lookup. The prison doesn't have neither network epoch nor SMR
> protection. That was a bug and to fix it me & Mark decided that an
> elegant idea would be to delay crfree() when a pcb is destroyed from
> immediate call to SMR-delayed destructor.  This fixed the race, but
> created another bug. Since every vnet had its own pcb zone, a dying
> jail won't ever free its resources, it will stay forever.  This was
> mitigated by making the pcb zone global.  Now pcbs are correctly
> recycled, but there is no guarantee that upon return from 'jail -r'
> the jail is already fully cleared.
> * Back to tests. As tests expect 'jail -r' to immediately free
> resources. Right after 'jail -r' tests do 'ifconfig ${ifname}
> destroy', where ifname is the interface that was just popped up back
> to vnet0 from the destroyed jail. Now this 'ifconfig destroy' fails,
> but test suite ignores this error. A test succeeds. However, some
> time later, usually after other tests, the jail is indeed destroyed
> and surprise interfaces out of nowhere pop up at vnet0.  Of course
> this is definite memory leak, but not the reason why tests
> are failing.
> * Another factor - scapy.  The python scapy library would emit
> warning to stderr if it sees interface without any IP address.  This
> happens right at 'import scapy'. The test suite considers a test
> failed if it has something on stderr, even if it returned success.
> 
> So, result is that some test (absolutely unrelated to pcbs) leaves a
> jail with interfaces, then jail is released, interfaced pop up at
> vnet0, and then some other test (absolutely unrelated to pcbs) using
> scapy writes a warning to stderr and triggers failure.
> 
> My & Mark are now seeing three approaches to the problem:
> 
> * Reclaim the memory from pcb zone(s), when jail is destroyed,
> returning back the old behaviour that with test suites 'jail -r' is
> always synchronous. Some prerequisites for this approach are here:
> https://reviews.freebsd.org/D33868
> * Protect jails with epoch, bypass the cred pointer in inpcb and in
> the lookup check inp->inp_prison->pr_foo. After that the crfree() can
> be moved back to the immediate inpcb free procedure.  Mark has a
> quick & dirty proof of concept for this approach.
> * In the test suite destroy the interface from the jail:
>   'jexec jname ifconfig ${ifname} destroy'.
> 
> I'd like to add a few words on the last option.  To me it seems most
> elegant as we are improving the test suite instead of changing kernel
> to meet demands of the suite. However, it doesn't work :( Why? Why
> does 'jexec jname ifconfig epair0b destroy' or 'jexec jname ifconfig
> lo1 destroy' returns ENXIO? Because the interface was created within
> vnet0 and is linked on vnet0 cloner's list. To repeat: epair0b ifnet
> is linked to the jail's list of network interfaces, but it linked on
> vnet0 list of epair(4) ifcloner.  Likewise, some lo4 interface would
> also be in the jail list of interfaces, but on vnet0 if_loop cloner.
> This makes it impossible to destroy such interface from inside the
> jail.  Neither it is possible to destroy it from the outside, for
> obvious reasons.  There are more side effects about this.  For
> example the only reason why we can't create an interface with the
> same name inside a jail using its cloner list is call to ifunit() in
> the beginning of if_clone_createif().  This definitely is a part of
> design, since if_clone_create()/if_clone_destroy() would lookup vnet0
> cloner list in case if interface is not found on the current vnet
> list.
> 
> To put it short, it is yet another problem created by if_vmove :( Not
> an easy one to fix and makes the third approach to the problem
> complicated.
> 
> To sum up: I'm sorry for tests broken, I'm working on it, it isn't
> easy problem. Suggestions and help are welcome.

Is there a reason why each test could not be composed in a separate
master jail, with the actual test vnet topology below it?  The setup
routine would create epair interfaces 

Re: netinet & netpfil tests failing

2022-01-18 Thread Kristof Provost
On 18 Jan 2022, at 3:07, Gleb Smirnoff wrote:
> * Another factor - scapy.  The python scapy library would emit warning to 
> stderr
>   if it sees interface without any IP address.  This happens right at 'import 
> scapy'.
>   The test suite considers a test failed if it has something on stderr, even 
> if
>   it returned success.
>
> So, result is that some test (absolutely unrelated to pcbs) leaves a jail with
> interfaces, then jail is released, interfaced pop up at vnet0, and then some
> other test (absolutely unrelated to pcbs) using scapy writes a warning to 
> stderr
> and triggers failure.
>
Several of the pf scapy scripts deal with that issue by setting the scapy log 
level:
https://cgit.freebsd.org/src/tree/tests/sys/netpfil/pf/CVE-2019-5597.py#n30

So that part at least we could probably mitigate easily.

(I’m not overly fond of that decision in scapy, but didn’t want to resort to 
patching scapy to cope with our fairly specific requirements.)

Kristof



netinet & netpfil tests failing

2022-01-17 Thread Gleb Smirnoff
  Hi,

just to remind that I'm responsible for multiple tests failing and to
refresh the context, kind of explaining why the hell they aren't fixed
yet?! The long old discussion can be found in this thread in December,
link to last message:

https://lists.freebsd.org/archives/dev-commits-src-main/2021-December/002581.html

Summarized refreshed context follows.

The reason for failing tests is complex. There a constellation of factors [bugs]
that attribute to it:

* Jails are reference counted and jail destroy may be delayed. Test suite 
usually
  didn't trigger delayed jail destroy and expectation of many tests is that 
immediately
  after 'jail -r' all resources are released, especially network interfaces in a
  jail are if_vmove()'d to vnet0.
* My original change to inpcb database protection ignored the fact that
  inp->inp_cred->cr_prison is dereferenced and read during a fast pcb lookup. 
The
  prison doesn't have neither network epoch nor SMR protection. That was a bug 
and
  to fix it me & Mark decided that an elegant idea would be to delay crfree() 
when
  a pcb is destroyed from immediate call to SMR-delayed destructor.  This fixed 
the
  race, but created another bug. Since every vnet had its own pcb zone, a dying
  jail won't ever free its resources, it will stay forever.  This was mitigated 
by
  making the pcb zone global.  Now pcbs are correctly recycled, but there is no
  guarantee that upon return from 'jail -r' the jail is already fully cleared.
* Back to tests. As tests expect 'jail -r' to immediately free resources. Right
  after 'jail -r' tests do 'ifconfig ${ifname} destroy', where ifname is the
  interface that was just popped up back to vnet0 from the destroyed jail. Now
  this 'ifconfig destroy' fails, but test suite ignores this error. A test
  succeeds. However, some time later, usually after other tests, the jail is
  indeed destroyed and surprise interfaces out of nowhere pop up at vnet0.  Of
  course this is definite memory leak, but not the reason why tests
  are failing.
* Another factor - scapy.  The python scapy library would emit warning to stderr
  if it sees interface without any IP address.  This happens right at 'import 
scapy'.
  The test suite considers a test failed if it has something on stderr, even if
  it returned success.

So, result is that some test (absolutely unrelated to pcbs) leaves a jail with
interfaces, then jail is released, interfaced pop up at vnet0, and then some
other test (absolutely unrelated to pcbs) using scapy writes a warning to stderr
and triggers failure.

My & Mark are now seeing three approaches to the problem:

* Reclaim the memory from pcb zone(s), when jail is destroyed, returning back
  the old behaviour that with test suites 'jail -r' is always synchronous.
  Some prerequisites for this approach are here: 
https://reviews.freebsd.org/D33868
* Protect jails with epoch, bypass the cred pointer in inpcb and in the lookup
  check inp->inp_prison->pr_foo. After that the crfree() can be moved back to 
the
  immediate inpcb free procedure.  Mark has a quick & dirty proof of concept for
  this approach.
* In the test suite destroy the interface from the jail:
  'jexec jname ifconfig ${ifname} destroy'.

I'd like to add a few words on the last option.  To me it seems most elegant as 
we
are improving the test suite instead of changing kernel to meet demands of the 
suite.
However, it doesn't work :( Why? Why does 'jexec jname ifconfig epair0b 
destroy' or
'jexec jname ifconfig lo1 destroy' returns ENXIO? Because the interface was 
created
within vnet0 and is linked on vnet0 cloner's list. To repeat: epair0b ifnet is 
linked
to the jail's list of network interfaces, but it linked on vnet0 list of 
epair(4)
ifcloner.  Likewise, some lo4 interface would also be in the jail list of 
interfaces,
but on vnet0 if_loop cloner.  This makes it impossible to destroy such 
interface from
inside the jail.  Neither it is possible to destroy it from the outside, for 
obvious
reasons.  There are more side effects about this.  For example the only reason 
why we
can't create an interface with the same name inside a jail using its cloner 
list is
call to ifunit() in the beginning of if_clone_createif().  This definitely is a 
part
of design, since if_clone_create()/if_clone_destroy() would lookup vnet0 cloner 
list
in case if interface is not found on the current vnet list.

To put it short, it is yet another problem created by if_vmove :( Not an easy 
one
to fix and makes the third approach to the problem complicated.

To sum up: I'm sorry for tests broken, I'm working on it, it isn't easy problem.
Suggestions and help are welcome.

-- 
Gleb Smirnoff