collapsed_forwarding plugin can be configured either as a global plugin (plugin.config) or as a remap plugin (remap.config). We’ve had it setup as a remap plugin as it’s mainly relevant for HLS streaming scenarios and we’ve had other static objects served from the same delivery servers which never run into such concurrency related problems.
If you’d like to set this up also in remap mode, you’d simply append it to the relevant remap rules. If you’d like to use it in global mode, it should be okay to add it to plugin.config as the last line. The doc link for the plugin should also have the set up instructions. https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/collapsed_forwarding.en.html Thanks, Sudheer > On Mar 8, 2018, at 6:20 PM, Dunkin, Nick <[email protected]> wrote: > > Hi Sudheer > > Sorry, one quick follow up question. > > We have other plugins declared in the plugin.config file. Should the > collapsed forwarding plugin precede them or go at the end? Or does it not > matter? > > Thanks > > Nick > > > > Sent from my iPhone > > On Mar 8, 2018, at 8:54 PM, Sudheer Vinukonda <[email protected]> > wrote: > >> I can see how the documentation might be slightly misleading. You are right >> that these settings by themselves are orthogonal to whether or not you've >> collapsed forwarding plugin enabled. And configuring them would indeed avoid >> Thundering Herd to the Origins, in the sense that, at most, one request per >> object is leaked upstream. >> >> However, in terms of the net result to the client, as clearly described in >> the docs, the best these settings can achieve is to return an error to the >> client or a stale copy, when applicable/available (for example, an older >> manifest file in case of HLS streaming). This is generally not a desirable >> behavior for many video solutions and this is where the collapsed_forwarding >> plugin comes into play. That plugin essentially is built on top of the >> open_write_fail_action, intercepts the error from going back to the client >> and waits until the cache is filled with the needed object. The net result, >> in this case, is clearly better experience to the users and friendlier to >> the clients (e.g video players). >> >> Technically, using collapsed_forwarding plugin would still be an >> "out-of-the-box" solution, as long as you compile the plugin and set it up >> correctly. >> >> More info about how the plugin works is at Collapsed Forwarding Plugin — >> Apache Traffic Server 8.0.0 documentation >> >> Collapsed Forwarding Plugin — Apache Traffic Server 8.0.0 documentation >> >> >> Hope this helps. >> >> Thanks, >> >> Sudheer >> >> >> >> >> On Thursday, March 8, 2018, 3:22:56 PM PST, Dunkin, Nick >> <[email protected]> wrote: >> >> >> Hi Sudheer, >> >> I really think I’m missing something. Please allow me to check my >> understanding from the beginning. >> >> We followed the documentation at the link I provided earlier, specifically >> the section on Reducing Origin Server Requests (Avoiding the Thundering >> Herd). >> >> We added the required prerequisite configurations (as per the documentation): >> >> CONFIG proxy.config.cache.enable_read_while_writer INT 1 >> CONFIG proxy.config.http.background_fill_active_timeout INT 0 >> CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000 >> CONFIG proxy.config.cache.max_doc_size INT 0 >> >> And we also chose sensible settings for each of the following configurations >> (as per the documentation): >> >> CONFIG proxy.config.cache.read_while_writer.max_retries INT xxx >> CONFIG proxy.config.cache.read_while_writer_retry.delay INT xxx >> CONFIG proxy.config.http.cache.max_open_read_retries INT xxx >> CONFIG proxy.config.http.cache.open_read_retry_time INT xxx >> CONFIG proxy.config.http.cache.open_write_fail_action INT xxx >> >> The documentation then states - "Once these are enabled, you have something >> that is very close, but not quite the same, to Squid’s Collapsed Forwarding.” >> >> AFAIK none of this involves the Collapsed Forwarding plugin. The >> documentation doesn’t mention the Collapsed Forwarding plugin. We don’t >> have the Collapsed Forwarding plugin declared in our plugin.config. >> >> It was my understanding that these settings were orthogonal to the Collapsed >> Forwarding plugin but provided similar functionality “out of the box”. >> >> Please can you let me know if I have misunderstood the documentation? Maybe >> this section of the documentation is outdated? >> >> Many thanks for your patience, >> >> Nick >> >> From: Sudheer Vinukonda <[email protected]> >> Date: Thursday, March 8, 2018 at 5:37 PM >> To: "[email protected]" <[email protected]>, Nick >> Dunkin <[email protected]> >> Subject: Re: Parent.config and thundering herd. >> >> AFAIK, collapsed_forwarding plugin is still actively used in production for >> live and vod streaming by a few companies and I'm not aware of any plans to >> deprecate it (we did agree on deprecating coallpsed_connection plugin which >> is somewhat similar in what it does, but, different in how it does -- >> perhaps, you were referring to that?). >> >> I copied an older link earlier for open_write_fail_action, mainly because, >> it hasn't changed much in 7.x in what it does. Please see below 7.x >> reference. >> >> With the open_write_fail_action feature (within the ATS core), ATS would >> return errors to all but one, on seeing multiple concurrent requests for the >> same object. For example, if you were doing a live streaming and a 1000 >> clients requested for the same segment file that is not in the Delivery >> Server's cache yet, enabling open_write_fail_action feature allows to return >> 502 to 999 clients, while the other request fetches the segment and >> populates the cache. As long as the clients retry, this should mostly work. >> However, if you do not like to return errors to clients (we certainly did >> not, as it'd make things much worse by causing a retry storm), >> collapsed_forwarding plugin can hold those requests waiting for the one >> request that was proxy'ed over to the Origin to fetch the segment and fill >> the cache. Once the segment is fetched and the writing to cache begins, the >> other requests can then join the party (that's where, read-while-writer >> comes into picture), and start streaming to all the clients at the same time. >> >> Now, it's possible that you may have never used the collapsed_forwarding >> plugin and somehow happened to not see the problem of returning 502 errors >> to clients, but, it's always possible depending on the scale, concurrency >> (and in particular, the origin latency). Perhaps, enabling parent proxy may >> have exposed the problem, by somehow making the latency worse? >> >> >> >> records.config — Apache Traffic Server 7.0.0 documentation >> >> records.config — Apache Traffic Server 7.0.0 documentation >> >> >> >> <1520548213187blob.jpg> >> >> >> >> On Thursday, March 8, 2018, 11:26:13 AM PST, Dunkin, Nick >> <[email protected]> wrote: >> >> >> HI Sudheer, >> >> I’m not sure we’re quite on the same page but I’m grateful for your input. >> This is all for ATS ver 7.0 and the documentation I’m talking about is on >> this page >> >> https://docs.trafficserver.apache.org/en/7.0.x/admin-guide/configuration/cache-basics.en.html >> >> In the section "Reducing Origin Server Requests (Avoiding the Thundering >> Herd)” >> >> There is nothing in that section about these settings being associated with >> the Collapsed Forwarding plugin. In fact there is no mention of the >> Collapsed Forwarding plugin at all. Now I’m a little confused. >> >> Is anyone able to clarify this for me? I thought I understood but maybe I >> don’t. >> >> Thanks, >> >> Nick >> >> >> From: Sudheer Vinukonda <[email protected]> >> Date: Thursday, March 8, 2018 at 1:36 PM >> To: "[email protected]" <[email protected]>, Nick >> Dunkin <[email protected]> >> Subject: Re: Parent.config and thundering herd. >> >> Hmm..I'm not sure that collapsed_forwarding plugin is deprecated. The plugin >> in fact is based on the settings you mentioned below and allows to block >> multiple parallel requests for the same object from leaking upstream. >> >> Using the settings alone, without the plugin would not actually achieve any >> request coalescing for cache miss scenarios -- it'd simply result in >> returning an error back to the client. Is that what you meant by "seeing >> request coalescing"? Or is your use case, not involving cache misses, but, >> stale cache (e.g VOD)? >> >> records.config — Apache Traffic Server 6.2.1 documentation >> >> records.config — Apache Traffic Server 6.2.1 documentation >> >> proxy.config.http.cache.open_write_fail_action >> Scope: CONFIG >> Type: INT >> Default: 0 >> Reloadable: Yes >> Overridable: Yes >> This setting indicates the action taken on failing to obtain the cache open >> write lock on either a cache miss or a cache hit stale. This typically >> happens when there is more than one request to the same cache object >> simultaneously. During such a scenario, all but one (which goes to the >> origin) request is served either a stale copy or an error depending on this >> setting. >> 0 = default, disable cache and goto origin server >> 1 = return a 502 error on a cache miss >> 2 = serve stale if object’s age is under >> proxy.config.http.cache.max_stale_age, else go to origin server >> 3 = return a 502 error on a cache miss or serve stale on a cache revalidate >> if object’s age is under proxy.config.http.cache.max_stale_age, else go to >> origin server >> 4 = return a 502 error on either a cache miss or on a revalidation >> >> >> >> >> >> Thanks, >> >> Sudheer >> >> >> >> On Thursday, March 8, 2018, 8:58:46 AM PST, Dunkin, Nick >> <[email protected]> wrote: >> >> >> HI Sudheer, >> >> Thanks for the reply. I couldn’t think of any reason either, but I wanted >> to check with the community. >> >> Just for clarification. We’re not using the Collapsed-Forwarding plugin >> explicitly, I understood that that plugin was deprecated in favor of the >> three configuration areas I mentioned: >> Read While Writer >> Open Read Retry Timeout >> Open Write Fail Action >> We certainly don’t have the Collapsed-Forwarding plugin in the plugin.config >> and we are seeing request coalescing. >> >> Thanks, >> >> Nick >> >> From: Sudheer Vinukonda <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Thursday, March 8, 2018 at 11:49 AM >> To: "[email protected]" <[email protected]> >> Subject: Re: Parent.config and thundering herd. >> >> I haven’t looked at parent proxy setup much, but at a high level, I can’t >> think of any reason why an origin failover mechanism would impact request >> coalescing using collapsed forwarding plugin. The open write fail action >> works based on the cache key for the object and as long as that doesn’t >> change, it shouldn’t matter which origin it is pulled from. As a matter of >> fact, we have had origin failover setup using a custom plugin as well as >> request coalescing enabled in our HLS delivery servers and didn’t see any >> problems with it. >> >> Is it possible the access failures are resulting in preventing the object >> from being downloaded or being cached somehow? If the object is never >> cached, then you will see problems with request coalescing. >> >> Thanks, >> >> Sudheer >> >> On Mar 8, 2018, at 7:25 AM, Dunkin, Nick <[email protected]> wrote: >> >>> Hi, >>> >>> We’ve been using the Thundering Herd protection provided by Read While >>> Writer, Open Read Retry Timeout and Open Write Fail Action and have been >>> getting some great results. However the behavior seems to change when we >>> start using parent.config in order to provide some simple origin failover >>> (I.e simple Primary/Secondary Origin kind of thing). My initial tests are >>> showing multiple access failures and not very much in the way of request >>> coalescing. >>> >>> I don’t all have the details with me now, but at a high level, should we >>> expect Read While Writer, Open Read Retry Timeout and Open Write Fail >>> Action to all work in the same way when >>> proxy.config.http.parent_proxy_routing_enable is enabled and we have a >>> simple Primary/Secondary Origin configured with "parent_is_proxy=false”? >>> Especially when most of the time the Primary Origin will be up and >>> available. Are there any gotchas we should be aware of? >>> >>> All this testing is with ATS 7.0 currently. >>> >>> Thanks for your insight. >>> >>> Nick >>> >> >> Nick Dunkin >> >> Principal Engineer >> >> o: 678.258.4071 >> >> e: [email protected] >> >> 4375 River Green Pkwy # 100, Duluth, GA 30096, USA >> >> <319E5E02-1647-4542-836C-D389403ADE5F.png> >> >> <1520548213187blob.jpg>
