On Wed, Oct 04, 2023 at 11:31:47PM +0200, Alexander Bluhm wrote:
> On Wed, Oct 04, 2023 at 11:03:27PM +0300, Vitaliy Makkoveev wrote:
> > On Wed, Oct 04, 2023 at 09:13:59PM +0200, Alexander Bluhm wrote:
> > > On Wed, Oct 04, 2023 at 08:42:48PM +0200, Kirill Miazine wrote:
> > > > > If it happns again, could you send an 'ps axlww | grep ifconifg'
> > > > > output?  Then we see the wait channel where it hangs in the kernel.
> > > > > 
> > > > > $ ps axlww
> > > > >    UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT   TT       
> > > > > TIME COMMAND
> > > > 
> > > > Here it happened again:
> > > > 
> > > >      0 75339 23922   0  10   0   360   296 wg_ifq  D+U    p0    0:00.00 
> > > > ifconfig wg1 destroy
> > > 
> > > wg_peer_destroy()
> > >   ...
> > >         NET_LOCK();
> > >         while (!ifq_empty(&sc->sc_if.if_snd)) {
> > >                 NET_UNLOCK();
> > >                 tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
> > >                 NET_LOCK();
> > >         }
> > >         NET_UNLOCK();
> > > 
> > > This net lock dance looks fishy.  And the sleep has a timeout of 1
> > > milli second.  But that is may be per packet.  So if you have a
> > > long queue or the queue refills somehow, it will take forever.
> > > 
> > > I think the difference in the usage is constant traffic that keeps
> > > the send queue full.  The timeout hides the problem when there are
> > > only a few packets.
> > > 
> > 
> > This should ensure wg_qstart() will not dereference dying `peer'. Looks
> > crappy and potentially could block forever, but should work. However
> > netlock it unnecessary here. netlocked wg_output() could fill `if_snd'
> > while netlock released before tsleep(), so it serializes nothing but
> > stops packets processing.
> > 
> > Kirill, does this diff help? 
> 
> I doubt that it changes much.  When netlock is not taken, the queue
> can still be filled with packets.
> 
> Removing this ugly netlock makes sense anyway.  But without any
> synchronisation just reading a variable feels wrong.  Can we add a
> read once like for mq_len in sys/mbuf.h?  And the ifq_set_maxlen()
> also looks very unsafe.  For mbuf queues I added a mutex, interface
> queues should do the same.
> 
> ok?

I reverted this diff. Since it breaks the API. There are numerous cases
where ifq_set_maxlen() is called before the mutex is initalized. So doing
this does not work.

I also question this diff in general. ifq_set_maxlen() is not called
concurrently, it is called when the interface is attached. So there is
no need for a mutex here.
 
Also the READ_ONCE() added seem not needed. 
ifiq_len() is unused and ifiq_empty() is only used by ifiq_process().
So maybe that call should be moved into the mutex protected block, the
task should only run when the ifiq_ml has data enqueued.

> Index: net/ifq.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/ifq.c,v
> retrieving revision 1.50
> diff -u -p -r1.50 ifq.c
> --- net/ifq.c 30 Jul 2023 05:39:52 -0000      1.50
> +++ net/ifq.c 4 Oct 2023 21:04:20 -0000
> @@ -529,6 +529,14 @@ ifq_hdatalen(struct ifqueue *ifq)
>       return (len);
>  }
>  
> +void
> +ifq_set_maxlen(struct ifqueue *ifq, unsigned int maxlen)
> +{
> +     mtx_enter(&ifq->ifq_mtx);
> +     ifq->ifq_maxlen = maxlen;
> +     mtx_leave(&ifq->ifq_mtx);
> +}
> +
>  unsigned int
>  ifq_purge(struct ifqueue *ifq)
>  {
> Index: net/ifq.h
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/ifq.h,v
> retrieving revision 1.38
> diff -u -p -r1.38 ifq.h
> --- net/ifq.h 30 Jul 2023 05:39:52 -0000      1.38
> +++ net/ifq.h 4 Oct 2023 21:09:04 -0000
> @@ -435,6 +435,7 @@ void               ifq_deq_commit(struct ifqueue *, 
>  void          ifq_deq_rollback(struct ifqueue *, struct mbuf *);
>  struct mbuf  *ifq_dequeue(struct ifqueue *);
>  int           ifq_hdatalen(struct ifqueue *);
> +void          ifq_set_maxlen(struct ifqueue *, unsigned int);
>  void          ifq_mfreem(struct ifqueue *, struct mbuf *);
>  void          ifq_mfreeml(struct ifqueue *, struct mbuf_list *);
>  unsigned int  ifq_purge(struct ifqueue *);
> @@ -448,9 +449,8 @@ int                ifq_deq_sleep(struct ifqueue *, st
>                    const char *, volatile unsigned int *,
>                    volatile unsigned int *);
>  
> -#define      ifq_len(_ifq)                   ((_ifq)->ifq_len)
> -#define      ifq_empty(_ifq)                 (ifq_len(_ifq) == 0)
> -#define      ifq_set_maxlen(_ifq, _l)        ((_ifq)->ifq_maxlen = (_l))
> +#define ifq_len(_ifq)                READ_ONCE((_ifq)->ifq_len)
> +#define ifq_empty(_ifq)              (ifq_len(_ifq) == 0)
>  
>  static inline int
>  ifq_is_priq(struct ifqueue *ifq)
> @@ -490,8 +490,8 @@ int                ifiq_input(struct ifiqueue *, stru
>  int           ifiq_enqueue(struct ifiqueue *, struct mbuf *);
>  void          ifiq_add_data(struct ifiqueue *, struct if_data *);
>  
> -#define      ifiq_len(_ifiq)                 ml_len(&(_ifiq)->ifiq_ml)
> -#define      ifiq_empty(_ifiq)               ml_empty(&(_ifiq)->ifiq_ml)
> +#define ifiq_len(_ifiq)              READ_ONCE(ml_len(&(_ifiq)->ifiq_ml))
> +#define ifiq_empty(_ifiq)    (ifiq_len(_ifiq) == 0)
>  
>  #endif /* _KERNEL */
>  
> 

-- 
:wq Claudio

Reply via email to