I'm having trouble understanding the concept of readahead in an HTTP context.
You are using the malloc cache storage, right? -- Guillaume Quintard On Thu, Jul 6, 2017 at 7:15 PM, John Salmon <[email protected]> wrote: > Thanks for your suggestions. > > One more detail I didn't mention: Roughly speaking, the client is doing > "read ahead", but it only reads ahead by a limited amount (about 4 blocks, > each of 128KiB). The surprising behavior is that when four readahead > threads are allowed to run concurrently their aggregate throughput is much > lower than when all the readaheads are serialized through a single thread. > > Traces (with strace and/or tcpdump) show frequent stalls of roughly 200ms > where nothing seems to move across the channel and all client-side system > calls are waiting. 200ms is suspiciously close to the linux 'rto_min' > parameter, which was the first thing that led me to suspect TCP incast > collapse. We get some improvement by reducing rto_min on the server, and > we also get some improvement by reducing SO_RCVBUF in the client. But as I > said, both have tradeoffs, so I'm interested if anyone else has encountered > or overcome this particular problem. > > I do not see the dropoff from single-thread to multi-thread when I client > and server on the same host. I.e., I get around 500MB/s with one client > and roughly the same total bandwidth with multiple clients. I'm sure that > with some tuning, the 500MB/s could be improved, but that's not the issue > here. > > Here are the ethtool reports: > > On the client: > drdws0134$ ethtool eth0 > Settings for eth0: > Supported ports: [ TP ] > Supported link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Full > Supported pause frame use: No > Supports auto-negotiation: Yes > Advertised link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Full > Advertised pause frame use: No > Advertised auto-negotiation: Yes > Speed: 1000Mb/s > Duplex: Full > Port: Twisted Pair > PHYAD: 1 > Transceiver: internal > Auto-negotiation: on > MDI-X: on (auto) > Cannot get wake-on-lan settings: Operation not permitted > Current message level: 0x00000007 (7) > drv probe link > Link detected: yes > drdws0134$ > > On the server: > > $ ethtool eth0 > Settings for eth0: > Supported ports: [ TP ] > Supported link modes: 1000baseT/Full > 10000baseT/Full > Supported pause frame use: No > Supports auto-negotiation: No > Advertised link modes: Not reported > Advertised pause frame use: No > Advertised auto-negotiation: No > Speed: 10000Mb/s > Duplex: Full > Port: Twisted Pair > PHYAD: 0 > Transceiver: internal > Auto-negotiation: off > MDI-X: Unknown > Cannot get wake-on-lan settings: Operation not permitted > Cannot get link status: Operation not permitted > $ > > > On 07/06/2017 03:08 AM, Guillaume Quintard wrote: > > Two things: do you get the same results when the client is directly on the > Varnish server? (ie. not going through the switch) And is each new request > opening a new connection? > > -- > Guillaume Quintard > > On Thu, Jul 6, 2017 at 6:45 AM, Andrei <[email protected]> wrote: > >> Out of curiosity, what does ethtool show for the related nics on both >> servers? I also have Varnish on a 10G server, and can reach around >> 7.7Gbit/s serving anywhere between 6-28k requests/second, however it did >> take some sysctl tuning and the westwood TCP congestion control algo >> >> On Wed, Jul 5, 2017 at 3:09 PM, John Salmon < >> [email protected]> wrote: >> >>> I've been using Varnish in an "intranet" application. The picture is >>> roughly: >>> >>> origin <-> Varnish <-- 10G channel ---> switch <-- 1G channel --> >>> client >>> >>> The machine running Varnish is a high-performance server. It can >>> easily saturate a 10Gbit channel. The machine running the client is a >>> more modest desktop workstation, but it's fully capable of saturating >>> a 1Gbit channel. >>> >>> The client makes HTTP requests for objects of size 128kB. >>> >>> When the client makes those requests serially, "useful" data is >>> transferred at about 80% of the channel bandwidth of the Gigabit >>> link, which seems perfectly reasonable. >>> >>> But when the client makes the requests in parallel (typically >>> 4-at-a-time, but it can vary), *total* throughput drops to about 25% >>> of the channel bandwidth, i.e., about 30Mbyte/sec. >>> >>> After looking at traces and doing a fair amount of experimentation, we >>> have reached the tentative conclusion that we're seeing "TCP Incast >>> Throughput Collapse" (see references below) >>> >>> The literature on "TCP Incast Throughput Collapse" typically describes >>> scenarios where a large number of servers overwhelm a single inbound >>> port. I haven't found any discussion of incast collapse with only one >>> server, but it seems like a natural consequence of a 10Gigabit-capable >>> server feeding a 1-Gigabit downlink. >>> >>> Has anybody else seen anything similar? With Varnish or other single >>> servers on 10Gbit to 1Gbit links. >>> >>> The literature offers a variety of mitigation strategies, but there are >>> non-trivial tradeoffs and none appears to be a silver bullet. >>> >>> If anyone has seen TCP Incast Collapse with Varnish, were you able to >>> work >>> around it, and if so, how? >>> >>> Thanks, >>> John Salmon >>> >>> References: >>> >>> http://www.pdl.cmu.edu/Incast/ >>> >>> Annotated Bibliography in: >>> https://lists.freebsd.org/pipermail/freebsd-net/2015-Novembe >>> r/043926.html >>> >>> -- >>> *.* >>> >>> _______________________________________________ >>> varnish-misc mailing list >>> [email protected] >>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc >>> >> >> >> _______________________________________________ >> varnish-misc mailing list >> [email protected] >> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc >> > > > -- > *.* >
_______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
