Hi,

I've an application which uses libpcap 1.8.1 in TPACKET_V2 mode on an Azure VM 
with multiple cores (4 - 8 in my case). The underlying interface is Azure's 
paravirtualized NIC - hv_netsvc.

In this setup, I observe that under heavy ingress traffic the notion of kernel 
and user space ownership goes out of sync and continues to remain inconsistent. 
The userspace's current slot as indicated by tp_status is marked kernel, 
however slots beyond the current slot are marked user owner and all have valid 
packets. Thus these packets are not processed as the user space application 
still perceives no data on the ring, owing to the current slot being marked 
kernel owned and blocks.

To further verify this, I dumped the entire ring (of 256) under this condition, 
to verify the kernel marked timestamp for each frame. The timestamp of the 
packets are incremental for the current slot and beyond, indicating there is an 
error ownership status of the current slot.

This is causing the application to stall till the ring becomes full; 
introducing considerable latency, out-of-order packets and spurious 
retransmissions.

Further, I also observed that I increase the number of cores; holes introduced 
in the ring where there are some frames marked "user" followed by frames 
erroneously marked "kernel" and valid empty kernel owned frames and again some 
valid frame marked user.

While I'm unsure, the current theory is the issue of multiple netif_rx softirq 
invocations for hv_netsvc rx routine and a potential issue of the current slot 
operation not being indicated as in in-use as reported here 
(https://patchwork.ozlabs.org/patch/894816/).

Any help or pointers toward understanding or resolving this issue is much 
appreciated.

Thanks.

-Raghav
_______________________________________________
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers

Reply via email to