Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-24 Thread Edgar Fuß
> the request count on the mclpl line is incrementing at a pretty fast rate
Maybe you're running into the same problem as me (see the "mbuf cluster leak?" 
thread on tech-net).
Try a kernel with MBUFTRACE. If that shows you (via netstat -mss) a large 
number of tx bufs on a particular vlan interface, try destroy-ing and 
re-creating that interface (and reloading ipfilter in case you're using it).
For me, that stops the allocations from rising (for a while).
I still don't know what triggers it, though.


Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Brian Buhrow
hello.  In looking at my vmstat-m output, I see:

mclpl   211228146028146 14109 1407435   187 0 524288  35

I see no failures and the number of nmbclusters is: 524288

yet, this machine has displayed this message about 6 times since it was 
rebooted about 5 hours
ago.

Am I missing something?
-thanks
-Brian



Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Brian Buhrow
hello.  One strange thing I notice on this particular system that seems 
to be different
from the other systems I'm running is that the request count on the mclpl line 
is incrementing
at a pretty fast rate, where as on other systems, the request rate is, more or 
less, constant
over time, with occasional bursts of requests.  Even so, there are no failures 
noted, even
though the driver says it's failed to get an rx cluster a few times since the 
system was
booted.
For example, since the last  message I wrote, the mclpl line now looks like:

mclpl   211229471029440 14801 1476239   187 0 524288   8


Maybe this incrementing thing isn't a big deal, but it jumps right out as being 
different.
-thanks
-Brian



Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Brian Buhrow
hello.  In looking at the if_xennet_xenbus.c file, I see where the 
if_xennetrxbuf_cache is
initialized, but I don't see where data is put into it before it's requested.  
Is the idea that
the items in the cache are supposed to be provided by the backend, i.e. the 
dom0?  Is it
possible that dom0 isn't providing enough rx requests to satisfy the traffic 
it's sending us? I
think I understand what's supposed to happen once traffic begins flowing:  rx 
requests come in,
if_xennet_xenbus processes them and pushes them back into the 
if_xennetrxbuf_cache cache.   and
pushes them back into the if_xennetrxbuf_cache cache.  What I don't understand 
is how the
initial cache gets populated with free rx requests to use in order to get 
things started.

-thanks
-Brian



Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Manuel Bouyer
On Thu, Jun 23, 2022 at 01:48:55PM -0700, Brian Buhrow wrote:
>   hello.  In looking at the if_xennet_xenbus.c file, I see where the 
> if_xennetrxbuf_cache is
> initialized, but I don't see where data is put into it before it's requested. 
>  Is the idea that
> the items in the cache are supposed to be provided by the backend, i.e. the 
> dom0?  Is it
> possible that dom0 isn't providing enough rx requests to satisfy the traffic 
> it's sending us? I
> think I understand what's supposed to happen once traffic begins flowing:  rx 
> requests come in,
> if_xennet_xenbus processes them and pushes them back into the 
> if_xennetrxbuf_cache cache.   and
> pushes them back into the if_xennetrxbuf_cache cache.  What I don't 
> understand is how the
> initial cache gets populated with free rx requests to use in order to get 
> things started.

a pool cache has a backing pool. If there's no item in the pool cache, it
gets some memory from its backing pool.
The point of the cache here it to keep the physical address of items around,
so it doesn't have to be computed again

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Manuel Bouyer
On Thu, Jun 23, 2022 at 12:54:59PM -0700, Brian Buhrow wrote:
>   hello.  In looking at my vmstat-m output, I see:
> 
> mclpl   211228146028146 14109 1407435   187 0 524288  
> 35
> 
> I see no failures and the number of nmbclusters is: 524288
> 
> yet, this machine has displayed this message about 6 times since it was 
> rebooted about 5 hours
> ago.
> 
> Am I missing something?

OK, so this is -current; it is the if_xennetrxbuf_cache pool cache which
is failing. This one has no limits.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Manuel Bouyer
On Thu, Jun 23, 2022 at 12:29:11PM -0700, Brian Buhrow wrote:
>   Hello.  I'm running a number of NetBSD-9 and -current as of 99.77 
> amd/64 domu machines on
> a couple of different servers with FreeBSD as dom0.  I'm getting the 
> following messages from
> the kernel: 
> xennet0: rx no cluster
> Much of the time, these messages seem harmless, but occasionally, the network 
> locks up on
> machines that display this message.
> 
> In looking at the source code, I get that this is a pool allocation failure in
> if_xennet_xenbus.c, but I don't understand which memory resource it's running 
> out of and if
> there is a way to increase that resource.  In general, the domu's in question 
> seem to have
> plenty of memory and I don't see a lot of memoory pressure for other tasks on 
> the systems.
> 
>   Has anyone else seen these messages on their domu machines and does 
> anyone have ideas on
> how to correct the issue?

It's running out of mbuf clusters; this is the mclpl in vmstat -m

You can try increasing kern.mbuf.nmbclusters, or if that fail, rebuilding
a kernel with
options NMBCLUSTERS=
e.g.
options NMBCLUSTERS=65536

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Brian Buhrow
Hello.  I'm running a number of NetBSD-9 and -current as of 99.77 
amd/64 domu machines on
a couple of different servers with FreeBSD as dom0.  I'm getting the following 
messages from
the kernel: 
xennet0: rx no cluster
Much of the time, these messages seem harmless, but occasionally, the network 
locks up on
machines that display this message.

In looking at the source code, I get that this is a pool allocation failure in
if_xennet_xenbus.c, but I don't understand which memory resource it's running 
out of and if
there is a way to increase that resource.  In general, the domu's in question 
seem to have
plenty of memory and I don't see a lot of memoory pressure for other tasks on 
the systems.

Has anyone else seen these messages on their domu machines and does 
anyone have ideas on
how to correct the issue?
-thanks
-Brian