Re: Backup Cache Group Selection
We should also have some uniqueness constraints on the new table {primary, fallback} and {primary, order}. —Eric > On Mar 13, 2018, at 12:59 PM, Rawlin Peterswrote: > > To clarify, if we got a hit in the CZF for the client's IP, then we > should *not* fail when all specified backup CGs are unavailable, > fallbackToClosest is set to True, and the DS is set to "CZF only". In > that case we should find the closest available CG (and fail if there > are none). If your current implementation does not follow that > behavior, it should be fixed to do so. > > Feel free to close your existing PR and create a new one if that's > easier for you. Just be sure to add a comment in the old PR that > references your new PR, and I'll continue to review. > > Also, you can check traffic_ops_golang/routes.go [1] to see if > existing Perl API endpoints have been rewritten into Go yet. I don't > see the /cachegroup endpoints in that file yet, so you should be > alright to update the existing Perl endpoints. However, any *new* > cachegroup API endpoints needed should be written in > traffic_ops_golang. > > As for DB schema updates for this, I was thinking one new column > (fallback_to_closest - nullable, default true) in the cachegroup table > and a new cachegroup_fallback table with at least 3 columns (primary - > FK of the primary CG, fallback - FK of the fallback CG, order - > integer specifying the order of the fallback list). > > So given that, I imagine we'll need a new API endpoint like > `/api/1.3/cachegroups/{:id}/fallbacks` that we can use to add, remove, > or update ordering of fallbacks for a particular CG. This API should > be restricted to cachegroups of type EDGE_LOC only. > > However, I haven't done much new API work yet, so hopefully the > contributors who've done a lot of API work can chime in and make sure > the new API endpoint is in line with things like Swagger and other > requirements. > > - Rawlin > > [1] > https://github.com/apache/incubator-trafficcontrol/blob/master/traffic_ops/traffic_ops_golang/routes.go > > On Tue, Mar 13, 2018 at 10:05 AM, David Neuman > wrote: >> What happens when Geo Limit is set to "CZF Only" with all backup Caches >> are unavailable and fallbackToClosest is set to True. Current >> implementation will fail this. Should we do Geo lookup now in this change? >> >> In this case we should fail. If the Geo Limit is set to “CZF Only” then >> that is all we use. >> >> >> On Tue, Mar 13, 2018 at 8:17 AM, Vijay Anand < >> vijayanand.jayaman...@gmail.com> wrote: >> >>> @Rawlin, >>> >>> Let me try and get the changes done for TR according to your suggestions. >>> This change would be like: >>> 1. CZF only contains cache groups which should map back to TO's Cache Group >>> configurations (cr-config) >>> 2. Backup configurations should reach TR via cr-config in the format you >>> detailed. >>> 3. fallbackToClosest will be True by default. If backupLocation config is >>> present, it will be assumed as false unless otherwise it is stated as TRUE >>> explicitly. >>> 4. This will work irrespective of the coverage Zones in CZF as long as the >>> backup Cache Group specified is in cr-config. >>> >>> I have a doubt in this as well. >>> >>> What happens when Geo Limit is set to "CZF Only" with all backup Caches >>> are unavailable and fallbackToClosest is set to True. Current >>> implementation will fail this. Should we do Geo lookup now in this change? >>> >>> Shall i delete my existing PR and create a new one with these changes? >>> >>> I will try to get the necessary changes for TO (Perl Mojo) along with this. >>> Would require your help in TO (Golang) and TP. Will keep you posted. >>> >>> Thanks, >>> Vijayanand S >>> >>> >>> >>> >>> On Tue, Mar 13, 2018 at 3:04 AM, Rawlin Peters >>> wrote: >>> If we start by putting this in the Cache Group API first, then later we really only have to worry about adding the CIDRs to the API. The backup config is really just relationships between cache groups, which makes perfect sense to model in a relational DB rather than the CZF. Why put something in the CZF to just tear it out later? - Rawlin On Mon, Mar 12, 2018 at 3:12 PM, Dave Neuman wrote: > Good point Rawlin, but I think it does answer your questions. CZF for now, > whatever the new CZF thing is after that. > > On Mon, Mar 12, 2018 at 1:44 PM, Rawlin Peters < >>> rawlin.pet...@gmail.com> > wrote: > >> The original scope of this thread was determining where the Backup >> Cache Group config should live (API vs CZF), not necessarily about >> building the entire CZF in the database, although I'm +1 on that idea >> as well. I think any decisions made about doing that should probably >> be started in a separate thread. >> >> - Rawlin >> >> On Mon, Mar 12, 2018 at
Re: Delivery Service Origin Refactor
replies inline On Mon, Mar 12, 2018 at 5:21 PM, Nir Sopherwrote: > Thank you Rawlin for the clarification:) You're welcome. Anything I can do to help :) > > Still, I feel like I'm missing a piece of the puzzle here. > Maybe I do no understand the relations of "origin" and "steering target" > > As I see it the router job is to send end users to the optimal cache. It > has 2 tools for doing so: CZF and Geo > Using the CZF is preferable, as it is based on the real network topology. > Geo is a best effort solution, used when we cannot do better. It is not > necessarily optimal, and has GEO misses, but we must use it since we cannot > map all IPs. Yes, the client's location will be found from the CZF first, falling back to GEO upon a CZF-miss. Then the most optimal edge cachegroup is chosen for each steering target deliveryservice. Then, the resulting list of target deliveryservices will be sorted by total distance following the path from client -> edge -> origin. > > The cache job is to fetch the content and serve the user. > It can be optimized to bring the content from the optimal Origin. It can be > configured to do so by specifying the best origin per cache group (in ops > DB). This is intentionally done as a CLIENT_STEERING deliveryservice so that a smart client can make the decision to use a different deliveryservice upon failure. If this decision was made at the caching proxy level, it would end up being like an optimized version of MSO (multi-site origin) where the client only has a single URL to request and the most optimal origin of multiple origins is chosen by the caching proxy. I don't think that's a bad idea; it's just not the architecture we want for this. By doing it as client steering we can also assign weights/ordering between colocated origins and update those steering assignments at any time. We can form the steering target list very flexibly this way. > I might be naive here, but as the amount of cache groups is reasonable, and > their network location is much clearer the the end user location, the > mapping and configuration would be reasonable. Therefore, using sub-optimal > Geo as a tool for choosing the Origin can be avoided. In practice, you could set the coordinates of the Origin to that of the most optimal cachegroup, rather than assigning the Origin directly to said cachegroup. The effect would be the same I believe. > > I also did not understand if the suggestion is to use the client location > for choosing the origin, or the cache group location for choosing the > origin. > Using the client location for choosing the origin practically ignores the > accurate information provided by the CZF. It's a combination of the client location, the edge location, and the origin location (total distance from client -> edge -> origin). > > What am I missing? > 10x > Nir > > On Mon, Mar 12, 2018 at 11:19 PM, Rawlin Peters > wrote: > >> Hey Nir, >> >> I think part of the motivation for doing this in Traffic Router rather >> than the Caching Proxy is separation of concerns. TR is already >> concerned with routing a client to the best cache based upon the >> client's location, so TR is already well-equipped to make the decision >> of how Delivery Services (origins) should be prioritized based upon >> the client's location. That way the Caching Proxy (e.g. ATS) doesn't >> need to concern itself with its own location, the client's location, >> and the location of origins; it just needs to know how to get the >> origin's content and cache it. All the client needs to know is that >> they have a prioritized list of URLs to choose from; they don't need >> to be concerned about origin/edge locations because that >> prioritization will be made for them by TR. >> >> The target DSes will have different origins primarily because they >> will be in different locations, and the origins should be >> interchangeable in terms of the content they provide because a smart >> client may fail over to any of the target DSes in a CLIENT_STEERING DS >> for the same content. >> >> - Rawlin >> >> On Mon, Mar 12, 2018 at 2:37 PM, Nir Sopher wrote: >> > Hi Rawlin, >> > Can you please add a few word for the motivation behind basing the >> steering >> > target selection on the location of the client? >> > As the content goes through the caches, isn't it the job of the cache to >> > select the best origin for the cache? Why the client should be the one >> to >> > take the origin location into consideration? >> > Why the target DSes have different origins in the first place? Are they >> > have different characteristics additionally to their location? >> > Thanks, >> > Nir >> > >> > -- Forwarded message -- >> > From: Rawlin Peters >> > Date: Mon, Mar 12, 2018 at 9:46 PM >> > Subject: Delivery Service Origin Refactor >> > To: dev@trafficcontrol.incubator.apache.org >> > >> > >> > Hey folks, >> > >> > As promised,
Re: Backup Cache Group Selection
To clarify, if we got a hit in the CZF for the client's IP, then we should *not* fail when all specified backup CGs are unavailable, fallbackToClosest is set to True, and the DS is set to "CZF only". In that case we should find the closest available CG (and fail if there are none). If your current implementation does not follow that behavior, it should be fixed to do so. Feel free to close your existing PR and create a new one if that's easier for you. Just be sure to add a comment in the old PR that references your new PR, and I'll continue to review. Also, you can check traffic_ops_golang/routes.go [1] to see if existing Perl API endpoints have been rewritten into Go yet. I don't see the /cachegroup endpoints in that file yet, so you should be alright to update the existing Perl endpoints. However, any *new* cachegroup API endpoints needed should be written in traffic_ops_golang. As for DB schema updates for this, I was thinking one new column (fallback_to_closest - nullable, default true) in the cachegroup table and a new cachegroup_fallback table with at least 3 columns (primary - FK of the primary CG, fallback - FK of the fallback CG, order - integer specifying the order of the fallback list). So given that, I imagine we'll need a new API endpoint like `/api/1.3/cachegroups/{:id}/fallbacks` that we can use to add, remove, or update ordering of fallbacks for a particular CG. This API should be restricted to cachegroups of type EDGE_LOC only. However, I haven't done much new API work yet, so hopefully the contributors who've done a lot of API work can chime in and make sure the new API endpoint is in line with things like Swagger and other requirements. - Rawlin [1] https://github.com/apache/incubator-trafficcontrol/blob/master/traffic_ops/traffic_ops_golang/routes.go On Tue, Mar 13, 2018 at 10:05 AM, David Neumanwrote: > What happens when Geo Limit is set to "CZF Only" with all backup Caches > are unavailable and fallbackToClosest is set to True. Current > implementation will fail this. Should we do Geo lookup now in this change? > > In this case we should fail. If the Geo Limit is set to “CZF Only” then > that is all we use. > > > On Tue, Mar 13, 2018 at 8:17 AM, Vijay Anand < > vijayanand.jayaman...@gmail.com> wrote: > >> @Rawlin, >> >> Let me try and get the changes done for TR according to your suggestions. >> This change would be like: >> 1. CZF only contains cache groups which should map back to TO's Cache Group >> configurations (cr-config) >> 2. Backup configurations should reach TR via cr-config in the format you >> detailed. >> 3. fallbackToClosest will be True by default. If backupLocation config is >> present, it will be assumed as false unless otherwise it is stated as TRUE >> explicitly. >> 4. This will work irrespective of the coverage Zones in CZF as long as the >> backup Cache Group specified is in cr-config. >> >> I have a doubt in this as well. >> >> What happens when Geo Limit is set to "CZF Only" with all backup Caches >> are unavailable and fallbackToClosest is set to True. Current >> implementation will fail this. Should we do Geo lookup now in this change? >> >> Shall i delete my existing PR and create a new one with these changes? >> >> I will try to get the necessary changes for TO (Perl Mojo) along with this. >> Would require your help in TO (Golang) and TP. Will keep you posted. >> >> Thanks, >> Vijayanand S >> >> >> >> >> On Tue, Mar 13, 2018 at 3:04 AM, Rawlin Peters >> wrote: >> >> > If we start by putting this in the Cache Group API first, then later >> > we really only have to worry about adding the CIDRs to the API. The >> > backup config is really just relationships between cache groups, which >> > makes perfect sense to model in a relational DB rather than the CZF. >> > Why put something in the CZF to just tear it out later? >> > >> > - Rawlin >> > >> > On Mon, Mar 12, 2018 at 3:12 PM, Dave Neuman wrote: >> > > Good point Rawlin, but I think it does answer your questions. CZF for >> > now, >> > > whatever the new CZF thing is after that. >> > > >> > > On Mon, Mar 12, 2018 at 1:44 PM, Rawlin Peters < >> rawlin.pet...@gmail.com> >> > > wrote: >> > > >> > >> The original scope of this thread was determining where the Backup >> > >> Cache Group config should live (API vs CZF), not necessarily about >> > >> building the entire CZF in the database, although I'm +1 on that idea >> > >> as well. I think any decisions made about doing that should probably >> > >> be started in a separate thread. >> > >> >> > >> - Rawlin >> > >> >> > >> On Mon, Mar 12, 2018 at 1:11 PM, Dave Neuman >> wrote: >> > >> > +1 on building the CZF in the database. Jan tried to go down that >> > rabbit >> > >> > hole but realized it was a pretty hard problem to solve. I think >> > this is >> > >> > something we might want to re-visit. Maybe this is something we >> > should >> > >> > discuss at
Re: Backup Cache Group Selection
What happens when Geo Limit is set to "CZF Only" with all backup Caches are unavailable and fallbackToClosest is set to True. Current implementation will fail this. Should we do Geo lookup now in this change? In this case we should fail. If the Geo Limit is set to “CZF Only” then that is all we use. On Tue, Mar 13, 2018 at 8:17 AM, Vijay Anand < vijayanand.jayaman...@gmail.com> wrote: > @Rawlin, > > Let me try and get the changes done for TR according to your suggestions. > This change would be like: > 1. CZF only contains cache groups which should map back to TO's Cache Group > configurations (cr-config) > 2. Backup configurations should reach TR via cr-config in the format you > detailed. > 3. fallbackToClosest will be True by default. If backupLocation config is > present, it will be assumed as false unless otherwise it is stated as TRUE > explicitly. > 4. This will work irrespective of the coverage Zones in CZF as long as the > backup Cache Group specified is in cr-config. > > I have a doubt in this as well. > > What happens when Geo Limit is set to "CZF Only" with all backup Caches > are unavailable and fallbackToClosest is set to True. Current > implementation will fail this. Should we do Geo lookup now in this change? > > Shall i delete my existing PR and create a new one with these changes? > > I will try to get the necessary changes for TO (Perl Mojo) along with this. > Would require your help in TO (Golang) and TP. Will keep you posted. > > Thanks, > Vijayanand S > > > > > On Tue, Mar 13, 2018 at 3:04 AM, Rawlin Peters> wrote: > > > If we start by putting this in the Cache Group API first, then later > > we really only have to worry about adding the CIDRs to the API. The > > backup config is really just relationships between cache groups, which > > makes perfect sense to model in a relational DB rather than the CZF. > > Why put something in the CZF to just tear it out later? > > > > - Rawlin > > > > On Mon, Mar 12, 2018 at 3:12 PM, Dave Neuman wrote: > > > Good point Rawlin, but I think it does answer your questions. CZF for > > now, > > > whatever the new CZF thing is after that. > > > > > > On Mon, Mar 12, 2018 at 1:44 PM, Rawlin Peters < > rawlin.pet...@gmail.com> > > > wrote: > > > > > >> The original scope of this thread was determining where the Backup > > >> Cache Group config should live (API vs CZF), not necessarily about > > >> building the entire CZF in the database, although I'm +1 on that idea > > >> as well. I think any decisions made about doing that should probably > > >> be started in a separate thread. > > >> > > >> - Rawlin > > >> > > >> On Mon, Mar 12, 2018 at 1:11 PM, Dave Neuman > wrote: > > >> > +1 on building the CZF in the database. Jan tried to go down that > > rabbit > > >> > hole but realized it was a pretty hard problem to solve. I think > > this is > > >> > something we might want to re-visit. Maybe this is something we > > should > > >> > discuss at our meetup and then update this thread with our > decisions? > > >> > > > >> > On Mon, Mar 12, 2018 at 11:25 AM, Rawlin Peters < > > rawlin.pet...@gmail.com > > >> > > > >> > wrote: > > >> > > > >> >> @VijayAnand: > > >> >> > > >> >> Right, a Coverage Zone that doesn't map to a Cache Group in TO > won't > > >> >> be chosen as a backup in case of failure, but you could have a > > >> >> Coverage-Zone-not-in-TO that configures Coverage-Zones-in-TO as > > >> >> backups. But, I think the general sentiment right now is that all > > >> >> Coverage Zones in the CZF should map back to Cache Groups in TO, so > > >> >> the backup config should also be done via the Cache Group API. > > >> >> > > >> >> So from the Traffic Router perspective, the process should become: > > >> >> 1. Rather than parsing from the CZF into the NetworkNode class, > parse > > >> >> Cache Group backup config from the CRConfig into the existing > > >> >> CacheLocation class > > >> >> 2. in the DS request flow, the NetworkNode will map back to a > > >> >> registered CacheLocation which would contain the backup CG config > > >> >> > > >> >> The rest of the PR's behavior should stay the same, it's just a > > matter > > >> >> of the config being located in a different class. To give you an > idea > > >> >> of how I would format it in the CRConfig (so you know how to parse > it > > >> >> out), here is a snippet of "edgeLocations" from CRConfig.json: > > >> >> > > >> >> "edgeLocations": { > > >> >> "edge-cg-1": { > > >> >> "latitude": 1.00, > > >> >> "longitude": 2.00, > > >> >> "backupLocations": { > > >> >> "list": ["edge-cg-2"], > > >> >> "fallbackToClosest": false > > >> >> } > > >> >> }, > > >> >> "edge-cg-2": { > > >> >> "latitude": 3.00, > > >> >> "longitude": 4.00 > > >> >> }, > > >> >> } > > >> >> > > >> >> The "backupLocations" section would still remain optional (if > > missing, > > >> >> follow existing
Re: Backup Cache Group Selection
@Rawlin, Let me try and get the changes done for TR according to your suggestions. This change would be like: 1. CZF only contains cache groups which should map back to TO's Cache Group configurations (cr-config) 2. Backup configurations should reach TR via cr-config in the format you detailed. 3. fallbackToClosest will be True by default. If backupLocation config is present, it will be assumed as false unless otherwise it is stated as TRUE explicitly. 4. This will work irrespective of the coverage Zones in CZF as long as the backup Cache Group specified is in cr-config. I have a doubt in this as well. What happens when Geo Limit is set to "CZF Only" with all backup Caches are unavailable and fallbackToClosest is set to True. Current implementation will fail this. Should we do Geo lookup now in this change? Shall i delete my existing PR and create a new one with these changes? I will try to get the necessary changes for TO (Perl Mojo) along with this. Would require your help in TO (Golang) and TP. Will keep you posted. Thanks, Vijayanand S On Tue, Mar 13, 2018 at 3:04 AM, Rawlin Peterswrote: > If we start by putting this in the Cache Group API first, then later > we really only have to worry about adding the CIDRs to the API. The > backup config is really just relationships between cache groups, which > makes perfect sense to model in a relational DB rather than the CZF. > Why put something in the CZF to just tear it out later? > > - Rawlin > > On Mon, Mar 12, 2018 at 3:12 PM, Dave Neuman wrote: > > Good point Rawlin, but I think it does answer your questions. CZF for > now, > > whatever the new CZF thing is after that. > > > > On Mon, Mar 12, 2018 at 1:44 PM, Rawlin Peters > > wrote: > > > >> The original scope of this thread was determining where the Backup > >> Cache Group config should live (API vs CZF), not necessarily about > >> building the entire CZF in the database, although I'm +1 on that idea > >> as well. I think any decisions made about doing that should probably > >> be started in a separate thread. > >> > >> - Rawlin > >> > >> On Mon, Mar 12, 2018 at 1:11 PM, Dave Neuman wrote: > >> > +1 on building the CZF in the database. Jan tried to go down that > rabbit > >> > hole but realized it was a pretty hard problem to solve. I think > this is > >> > something we might want to re-visit. Maybe this is something we > should > >> > discuss at our meetup and then update this thread with our decisions? > >> > > >> > On Mon, Mar 12, 2018 at 11:25 AM, Rawlin Peters < > rawlin.pet...@gmail.com > >> > > >> > wrote: > >> > > >> >> @VijayAnand: > >> >> > >> >> Right, a Coverage Zone that doesn't map to a Cache Group in TO won't > >> >> be chosen as a backup in case of failure, but you could have a > >> >> Coverage-Zone-not-in-TO that configures Coverage-Zones-in-TO as > >> >> backups. But, I think the general sentiment right now is that all > >> >> Coverage Zones in the CZF should map back to Cache Groups in TO, so > >> >> the backup config should also be done via the Cache Group API. > >> >> > >> >> So from the Traffic Router perspective, the process should become: > >> >> 1. Rather than parsing from the CZF into the NetworkNode class, parse > >> >> Cache Group backup config from the CRConfig into the existing > >> >> CacheLocation class > >> >> 2. in the DS request flow, the NetworkNode will map back to a > >> >> registered CacheLocation which would contain the backup CG config > >> >> > >> >> The rest of the PR's behavior should stay the same, it's just a > matter > >> >> of the config being located in a different class. To give you an idea > >> >> of how I would format it in the CRConfig (so you know how to parse it > >> >> out), here is a snippet of "edgeLocations" from CRConfig.json: > >> >> > >> >> "edgeLocations": { > >> >> "edge-cg-1": { > >> >> "latitude": 1.00, > >> >> "longitude": 2.00, > >> >> "backupLocations": { > >> >> "list": ["edge-cg-2"], > >> >> "fallbackToClosest": false > >> >> } > >> >> }, > >> >> "edge-cg-2": { > >> >> "latitude": 3.00, > >> >> "longitude": 4.00 > >> >> }, > >> >> } > >> >> > >> >> The "backupLocations" section would still remain optional (if > missing, > >> >> follow existing behavior of falling back to next closest CG). > Existing > >> >> defaults in the current PR should remain the same. > >> >> > >> >> How would you feel about making those changes in your PR? Feel free > to > >> >> tackle the new TO API and Traffic Portal changes too if you want, but > >> >> I don't want to burden you with this unexpected work if you don't > want > >> >> it. I (or another willing contributor) could work on the necessary TO > >> >> API and Traffic Portal changes sometime in the near future and > >> >> integrate them with your Traffic Router enhancement. > >> >> > >> >> - Rawlin > >> >> > >> >> > >> >>