Hi, I've an application which uses libpcap 1.8.1 in TPACKET_V2 mode on an Azure VM with multiple cores (4 - 8 in my case). The underlying interface is Azure's paravirtualized NIC - hv_netsvc.
In this setup, I observe that under heavy ingress traffic the notion of kernel and user space ownership goes out of sync and continues to remain inconsistent. The userspace's current slot as indicated by tp_status is marked kernel, however slots beyond the current slot are marked user owner and all have valid packets. Thus these packets are not processed as the user space application still perceives no data on the ring, owing to the current slot being marked kernel owned and blocks. To further verify this, I dumped the entire ring (of 256) under this condition, to verify the kernel marked timestamp for each frame. The timestamp of the packets are incremental for the current slot and beyond, indicating there is an error ownership status of the current slot. This is causing the application to stall till the ring becomes full; introducing considerable latency, out-of-order packets and spurious retransmissions. Further, I also observed that I increase the number of cores; holes introduced in the ring where there are some frames marked "user" followed by frames erroneously marked "kernel" and valid empty kernel owned frames and again some valid frame marked user. While I'm unsure, the current theory is the issue of multiple netif_rx softirq invocations for hv_netsvc rx routine and a potential issue of the current slot operation not being indicated as in in-use as reported here (https://patchwork.ozlabs.org/patch/894816/). Any help or pointers toward understanding or resolving this issue is much appreciated. Thanks. -Raghav _______________________________________________ tcpdump-workers mailing list tcpdump-workers@lists.tcpdump.org https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers