> On 13 Dec 2018, at 00:03, Martin Pieuchot <[email protected]> wrote:
>
> On 12/12/18(Wed) 12:03, David Gwynne wrote:
>> with the previous if_ethersubr.c diff, this allows etherip(4) to output
>> directly to the network stack.
>
> What do you mean with "directly"?
>
> To my understanding ip{,6}_etherip_output() call ip{,6}_send() to enqueue
> packets.
You guess below.
>
>> direct output relies on the interface using priq, since hfsc uses the
>> ifq machinery to work. priq implies you dont want to delay packets, so
>> it lets etherip push the packet straight through.
>
> So "direct output" means skip the queue(s)?
The ifq specifically, yes.
> So are we heading towards a design where ifq_enqueue(9) is going to be
> bypassed for pseudo-drivers unless people use HFSC? If that's the case
> I'm afraid that code path will rot.
I don't have a good answer to that, except to say that bridge maybe be popular
enough to keep pushing packets through if_enqueue().
>
>> i dont think the ifq stuff is slow, but it does allow the stack to
>> concurrently send packets with etherip without having to create an ifq
>> per cpu. it does rely on per cpu counters, but theyre relatively small.
>
> I don't understand how it allows the stack to concurrently send packets.
> All packets seems to be serialized in a single task when ip{,6}_send()
> is called.
In the etherip(4) case right now, yes. Presumably ip{,6}_send() will be fixed
one day though, and etherip and co will be able to take advantage of it.
Similar treatment for vlan(4) has more immediate benefit.
>
>> the other advantage is it stops ether_output recursing in the etherip
>> path. this helps profiling, but that is a fairly niche benefit so maybe
>> just focus on the concurrency.
>
> I don't understand. How is ether_output() recursing?
Hrm. Yes. ether_output doesn't recurse. Thinking about it, maybe I meant
ifq_start.
>
>> ok?
>
> I'm a bit sad about where this is going. Because the design you adopted
> will forces us to modify all pseudo-drivers. Apart the fact that you're
> now breaking etherip(4) into bridge(4) /!\, I believe we can do better.
It doesn't force anything. Hopefully there's some performance benefit from
changes like this to pseudo drivers, but there's no technical code change
requirement because of it.
However, you are right about bridge, which this diff will break, so let's drop
it for now.
>
> We had this design:
>
> STACK DRIVER
>
> ether_output() -> if_enqueue() -> ifq_enqueue() + ifq_start() | drv_start()
> `. |
> `--> bridge_output() |
Sigh. Why does ethernet get tentacles into the stack like this?
> The problem that you want to solve, if I'm not mistaken, is to remove
> the useless stack/driver separation for pseudo-drivers. In other words
> stop queueing & delaying packets. This only makes sense for "physical"
> nics where mbufs are put into DMA buckets, right?
There's two benefits to ifqs on physical nics which do not apply to pseudo
interfaces. Real interfaces have things like rings, and access to those rings
requires serialisation. ifq provides that serialisation so many cpus can try
and queue packets on the ring, but ifq makes sure that only one of the cpus
does the work. vlan(4) and etherip(4) don't have a single thing that they need
to serialise access to, they just need to put a header on and push it down.
The second benefit is that it gives us a nice place to do tx mitigation. The
cost of putting a packet on the wire is shared between the actually filling in
descriptors on a ring, and then actually notifying the chip that the ring has
work to do. Notifying the chip is surprisingly expensive, so it is better to
amortise that cost by putting multiple packets on the ring before notifying the
chip. We aren't doing this currently, but ifqs are the right place to do this.
There's no chip notification overhead on pseudo interfaces, so applying tx mit
to them just adds latency, and likely unpredictable latency as packets bubble
down interfaces.
> In that case I'd suggest modifying if_enqueue() or ifq_enqueue() to treat
> IFXF_CLONED devices as special if they are not using HSFC.
I'll think about it. My first thought is that it is unfair for every interface
to have this check added to the output path, but then I remember that we check
to see if every interface is part of a bridge on output too, even non ethernet
ones like gif and pppoe, and even though that makes me sad it's working ok in
practice.
> In other words I suggest you only need to modify what happens *after*
> if_enqueue(), the driver part or actual drv_start(). Why about having
> a generic:
>
> if (ifp->if_enqueue != NULL && (ifq_is_priq(&ifp->if_snd)) {
> counters_pkt(ifp->if_counters, ifc_opackets, ifc_obytes,
> m->m_pkthdr.len);
> ifp->if_enqueue(ifp, m);
> return (0);
> }
>
> I'm not sure if we can do better than introducing a new per-ifp method.
> This is what you called etherip_send() in your diff. But I believe the
> vlan(4) example is more interesting because if calls if_enqueue(9) directly!
>
> So I disagree with the changes requiring a custom `if_output' and with the
> bypassing of if_enqueue(). That said I could be completely mistaken :)
Let me tinker. I've got other stuff in my head at the moment. Maybe that is why
this diff is so terrible.