> On 29 Apr 2021, at 01:38, Alexandr Nedvedicky 
> <[email protected]> wrote:
> 
> Hello,
> 
> </snip>
>> Such a diff is below.  I will test extensively towmorrow.  If anyone
>> wants to try now, be my guest.
>> 
>> Is the comment correct?
> 
>    I think comment is not quite right.
> 
> </snip>
> 
> the packet gets inserted into la->la_mq via arpresolve(), which
> is not protected by KERNEL_LOCK().
> 
> arpresolve() runs as a reader on netlock, right?
> 
> 
>> Index: netinet/if_ether.c
>> ===================================================================
>> RCS file: /cvs/src/sys/netinet/if_ether.c,v
>> retrieving revision 1.248
>> diff -u -p -r1.248 if_ether.c
>> --- netinet/if_ether.c       28 Apr 2021 21:21:44 -0000      1.248
>> +++ netinet/if_ether.c       28 Apr 2021 21:44:22 -0000
>> @@ -689,12 +689,15 @@ arpcache(struct ifnet *ifp, struct ether
>>      len = ml_len(&ml);
>>      while ((m = ml_dequeue(&ml)) != NULL)
>>              ifp->if_output(ifp, m, rt_key(rt), rt);
>> -    /* XXXSMP we discard if other CPU enqueues */
>> -    if (mq_len(&la->la_mq) > 0) {
>> -            /* mbuf is back in queue. Discard. */
>> -            atomic_sub_int(&la_hold_total, len + mq_purge(&la->la_mq));
>> -    } else
>> -            atomic_sub_int(&la_hold_total, len);
>> +    atomic_sub_int(&la_hold_total, len);
>> +
>> +    /*
>> +     * This function is protected by kernel lock.  No other CPU can insert
>> +     * to la_mq while we feed the packets to if_output().  If a packet
>> +     * is reinserted from there we have a loop.  This should not happen
>> +     * and we want to find such cases.
>> +     */
>> +    KASSERT(mq_len(&la->la_mq) == 0);
> 
>    tripping the assert here means the ARP cache entry either
>    expired or got deleted, while we were dispatching queued packets 
>    to wire in a while() loop above.
> 
>    we enter dispatch loop above after ARP successfully resolves IP address.
>    if ARP entry gets lost while we dispatching packets to wire we end up
>    queuing packets into la->la_mq queue. This happens in arpresolve().
>    arpresolve() is called on behalf of ifp->if_output()
> 
>    given arpcache() runs under KERNEL_LOCK() we can rule out case user
>    runs 'arp -a -d' command, which would delete ARP cache. the 'arp -a -d'
>    is excluded by KERNEL_LOCK().
> 
>    so if the assert fires something very strange must be happening.

This time arpcache() is only called by netisr() with both kernel and
exclusive net locks held. RTM_DELETE message processing will also call
ifp->if_rtrequest() with exclusive netlock held.

So this while() loop within arpcache() can’t be broken by “arp -a -d”.

Also netlock serializes arpcache() and arpresolve().

> 
> I would suggest comment below:
> 
> /*
> * If la_mq is not empty, then something very strange has happened.
> * We expect la_mq to be empty, because while() loop above dispatches
> * queued packets to wire after ARP resolves IP address.
> *
> * la_mq not being empty means arpresolve() called on behalf of
> * ifp->if_output() could not find ARP entry we've just put to
> * cache. Quite unexpected given we run under KERNEL_LOCK().
> */
> 
> thanks and
> regards
> sashan

Reply via email to