Hey guys, Sorry I'm a bit late to this thread. I'm happy to see there's a prototype for benchmarking, though I do wonder if this is a bit of overeager optimization? That is, why is this necessary and does it actually help?
By returning packets back to the Wintun ring later, more of the ring winds up being used, which in turn means more cache misses as it spans additional cache lines. In other words, it seems like this might be comparing the performance of memcpy+cache no-memcpy+cachemiss. Which is better, and is it actually measurable? Is it possible that adding this functionality actually has zero measurable impact on performance? Given the complexity this adds, it'd be nice to see some numbers to help make the argument, or perhaps reasoning that's more sophisticated than my own napkin thoughts here. Jason
