Thanks for your suggestions.

One more detail I didn't mention: Roughly speaking, the client is doing "read ahead", but it only reads ahead by a limited amount (about 4 blocks, each of 128KiB). The surprising behavior is that when four readahead threads are allowed to run concurrently their aggregate throughput is much lower than when all the readaheads are serialized through a single thread.

Traces (with strace and/or tcpdump) show frequent stalls of roughly 200ms where nothing seems to move across the channel and all client-side system calls are waiting. 200ms is suspiciously close to the linux 'rto_min' parameter, which was the first thing that led me to suspect TCP incast collapse. We get some improvement by reducing rto_min on the server, and we also get some improvement by reducing SO_RCVBUF in the client. But as I said, both have tradeoffs, so I'm interested if anyone else has encountered or overcome this particular problem.

I do not see the dropoff from single-thread to multi-thread when I client and server on the same host. I.e., I get around 500MB/s with one client and roughly the same total bandwidth with multiple clients. I'm sure that with some tuning, the 500MB/s could be improved, but that's not the issue here.

Here are the ethtool reports:

On the client:
drdws0134$ ethtool eth0
Settings for eth0:
    Supported ports: [ TP ]
    Supported link modes:   10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
                            1000baseT/Full
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Advertised link modes:  10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
                            1000baseT/Full
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    MDI-X: on (auto)
Cannot get wake-on-lan settings: Operation not permitted
    Current message level: 0x00000007 (7)
                   drv probe link
    Link detected: yes
drdws0134$

On the server:

$ ethtool eth0
Settings for eth0:
    Supported ports: [ TP ]
    Supported link modes:   1000baseT/Full
                            10000baseT/Full
    Supported pause frame use: No
    Supports auto-negotiation: No
    Advertised link modes:  Not reported
    Advertised pause frame use: No
    Advertised auto-negotiation: No
    Speed: 10000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 0
    Transceiver: internal
    Auto-negotiation: off
    MDI-X: Unknown
Cannot get wake-on-lan settings: Operation not permitted
Cannot get link status: Operation not permitted
$


On 07/06/2017 03:08 AM, Guillaume Quintard wrote:
Two things: do you get the same results when the client is directly on the Varnish server? (ie. not going through the switch) And is each new request opening a new connection?

--
Guillaume Quintard

On Thu, Jul 6, 2017 at 6:45 AM, Andrei <[email protected] <mailto:[email protected]>> wrote:

    Out of curiosity, what does ethtool show for the related nics on
    both servers? I also have Varnish on a 10G server, and can reach
    around 7.7Gbit/s serving anywhere between 6-28k requests/second,
    however it did take some sysctl tuning and the westwood TCP
    congestion control algo

    On Wed, Jul 5, 2017 at 3:09 PM, John Salmon
    <[email protected]
    <mailto:[email protected]>> wrote:

        I've been using Varnish in an "intranet" application.  The
        picture is roughly:

          origin <-> Varnish <-- 10G channel ---> switch <-- 1G
        channel --> client

        The machine running Varnish is a high-performance server.  It can
        easily saturate a 10Gbit channel.  The machine running the
        client is a
        more modest desktop workstation, but it's fully capable of
        saturating
        a 1Gbit channel.

        The client makes HTTP requests for objects of size 128kB.

        When the client makes those requests serially, "useful" data is
        transferred at about 80% of the channel bandwidth of the Gigabit
        link, which seems perfectly reasonable.

        But when the client makes the requests in parallel (typically
        4-at-a-time, but it can vary), *total* throughput drops to
        about 25%
        of the channel bandwidth, i.e., about 30Mbyte/sec.

        After looking at traces and doing a fair amount of
        experimentation, we
        have reached the tentative conclusion that we're seeing "TCP
        Incast
        Throughput Collapse" (see references below)

        The literature on "TCP Incast Throughput Collapse" typically
        describes
        scenarios where a large number of servers overwhelm a single
        inbound
        port.  I haven't found any discussion of incast collapse with
        only one
        server, but it seems like a natural consequence of a
        10Gigabit-capable
        server feeding a 1-Gigabit downlink.

        Has anybody else seen anything similar?  With Varnish or other
        single
        servers on 10Gbit to 1Gbit links.

        The literature offers a variety of mitigation strategies, but
        there are
        non-trivial tradeoffs and none appears to be a silver bullet.

        If anyone has seen TCP Incast Collapse with Varnish, were you
        able to work
        around it, and if so, how?

        Thanks,
        John Salmon

        References:

        http://www.pdl.cmu.edu/Incast/

        Annotated Bibliography in:
        
https://lists.freebsd.org/pipermail/freebsd-net/2015-November/043926.html
        
<https://lists.freebsd.org/pipermail/freebsd-net/2015-November/043926.html>

-- *.*

        _______________________________________________
        varnish-misc mailing list
        [email protected]
        <mailto:[email protected]>
        https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
        <https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc>



    _______________________________________________
    varnish-misc mailing list
    [email protected] <mailto:[email protected]>
    https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
    <https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc>



--
*.*
_______________________________________________
varnish-misc mailing list
[email protected]
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

Reply via email to