Re: Backup Cache Group Selection

2018-03-13 Thread Eric Friedrich (efriedri)
We should also have some uniqueness constraints on the new table
{primary, fallback} and {primary, order}. 




—Eric

> On Mar 13, 2018, at 12:59 PM, Rawlin Peters  wrote:
> 
> To clarify, if we got a hit in the CZF for the client's IP, then we
> should *not* fail when all specified backup CGs are unavailable,
> fallbackToClosest is set to True, and the DS is set to "CZF only". In
> that case we should find the closest available CG (and fail if there
> are none). If your current implementation does not follow that
> behavior, it should be fixed to do so.
> 
> Feel free to close your existing PR and create a new one if that's
> easier for you. Just be sure to add a comment in the old PR that
> references your new PR, and I'll continue to review.
> 
> Also, you can check traffic_ops_golang/routes.go [1] to see if
> existing Perl API endpoints have been rewritten into Go yet. I don't
> see the /cachegroup endpoints in that file yet, so you should be
> alright to update the existing Perl endpoints. However, any *new*
> cachegroup API endpoints needed should be written in
> traffic_ops_golang.
> 
> As for DB schema updates for this, I was thinking one new column
> (fallback_to_closest - nullable, default true) in the cachegroup table
> and a new cachegroup_fallback table with at least 3 columns (primary -
> FK of the primary CG, fallback - FK of the fallback CG, order -
> integer specifying the order of the fallback list).
> 
> So given that, I imagine we'll need a new API endpoint like
> `/api/1.3/cachegroups/{:id}/fallbacks` that we can use to add, remove,
> or update ordering of fallbacks for a particular CG. This API should
> be restricted to cachegroups of type EDGE_LOC only.
> 
> However, I haven't done much new API work yet, so hopefully the
> contributors who've done a lot of API work can chime in and make sure
> the new API endpoint is in line with things like Swagger and other
> requirements.
> 
> - Rawlin
> 
> [1] 
> https://github.com/apache/incubator-trafficcontrol/blob/master/traffic_ops/traffic_ops_golang/routes.go
> 
> On Tue, Mar 13, 2018 at 10:05 AM, David Neuman  
> wrote:
>> What happens when Geo Limit is set to "CZF Only"  with all backup Caches
>> are unavailable and fallbackToClosest is set to True. Current
>> implementation will fail this. Should we do Geo lookup now in this change?
>> 
>> In this case we should fail. If the Geo Limit is set to “CZF Only” then
>> that is all we use.
>> 
>> 
>> On Tue, Mar 13, 2018 at 8:17 AM, Vijay Anand <
>> vijayanand.jayaman...@gmail.com> wrote:
>> 
>>> @Rawlin,
>>> 
>>> Let me try and get the changes done for TR according to your suggestions.
>>> This change would be like:
>>> 1. CZF only contains cache groups which should map back to TO's Cache Group
>>> configurations (cr-config)
>>> 2. Backup configurations should reach TR via cr-config in the format you
>>> detailed.
>>> 3. fallbackToClosest will be True by default. If backupLocation config is
>>> present, it will be assumed as false unless otherwise it is stated as TRUE
>>> explicitly.
>>> 4. This will work irrespective of the coverage Zones in CZF as long as the
>>> backup Cache Group specified is in cr-config.
>>> 
>>> I have a doubt in this as well.
>>> 
>>> What happens when Geo Limit is set to "CZF Only"  with all backup Caches
>>> are unavailable and fallbackToClosest is set to True. Current
>>> implementation will fail this. Should we do Geo lookup now in this change?
>>> 
>>> Shall i delete my existing PR and create a new one with these changes?
>>> 
>>> I will try to get the necessary changes for TO (Perl Mojo) along with this.
>>> Would require your help in TO (Golang) and TP. Will keep you posted.
>>> 
>>> Thanks,
>>> Vijayanand S
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Mar 13, 2018 at 3:04 AM, Rawlin Peters 
>>> wrote:
>>> 
 If we start by putting this in the Cache Group API first, then later
 we really only have to worry about adding the CIDRs to the API. The
 backup config is really just relationships between cache groups, which
 makes perfect sense to model in a relational DB rather than the CZF.
 Why put something in the CZF to just tear it out later?
 
 - Rawlin
 
 On Mon, Mar 12, 2018 at 3:12 PM, Dave Neuman  wrote:
> Good point Rawlin, but I think it does answer your questions.  CZF for
 now,
> whatever the new CZF thing is after that.
> 
> On Mon, Mar 12, 2018 at 1:44 PM, Rawlin Peters <
>>> rawlin.pet...@gmail.com>
> wrote:
> 
>> The original scope of this thread was determining where the Backup
>> Cache Group config should live (API vs CZF), not necessarily about
>> building the entire CZF in the database, although I'm +1 on that idea
>> as well. I think any decisions made about doing that should probably
>> be started in a separate thread.
>> 
>> - Rawlin
>> 
>> On Mon, Mar 12, 2018 at 

Re: Delivery Service Origin Refactor

2018-03-13 Thread Rawlin Peters
replies inline

On Mon, Mar 12, 2018 at 5:21 PM, Nir Sopher  wrote:
> Thank you Rawlin for the clarification:)

You're welcome. Anything I can do to help :)

>
> Still, I feel like I'm missing a piece of the puzzle here.
> Maybe I do no understand the relations of "origin" and "steering target"
>
> As I see it the router job is to send end users to the optimal cache. It
> has 2 tools for doing so: CZF and Geo
> Using the CZF is preferable, as it is based on the real network topology.
> Geo is a best effort solution, used when we cannot do better. It is not
> necessarily optimal, and has GEO misses, but we must use it since we cannot
> map all IPs.


Yes, the client's location will be found from the CZF first, falling
back to GEO upon a CZF-miss. Then the most optimal edge cachegroup is
chosen for each steering target deliveryservice. Then, the resulting
list of target deliveryservices will be sorted by total distance
following the path from client -> edge -> origin.

>
> The cache job is to fetch the content and serve the user.
> It can be optimized to bring the content from the optimal Origin. It can be
> configured to do so by specifying the best origin per cache group (in ops
> DB).


This is intentionally done as a CLIENT_STEERING deliveryservice so
that a smart client can make the decision to use a different
deliveryservice upon failure. If this decision was made at the caching
proxy level, it would end up being like an optimized version of MSO
(multi-site origin) where the client only has a single URL to request
and the most optimal origin of multiple origins is chosen by the
caching proxy. I don't think that's a bad idea; it's just not the
architecture we want for this. By doing it as client steering we can
also assign weights/ordering between colocated origins and update
those steering assignments at any time. We can form the steering
target list very flexibly this way.


> I might be naive here, but as the amount of cache groups is reasonable, and
> their network location is much clearer the the end user location, the
> mapping and configuration would be reasonable. Therefore, using sub-optimal
> Geo as a tool for choosing the Origin can be avoided.


In practice, you could set the coordinates of the Origin to that of
the most optimal cachegroup, rather than assigning the Origin directly
to said cachegroup. The effect would be the same I believe.

>
> I also did not understand if the suggestion is to use the client location
> for choosing the origin, or the cache group location for choosing the
> origin.
> Using the client location for choosing the origin practically ignores the
> accurate information provided by the CZF.


It's a combination of the client location, the edge location, and the
origin location (total distance from client -> edge -> origin).

>
> What am I missing?
> 10x
> Nir
>
> On Mon, Mar 12, 2018 at 11:19 PM, Rawlin Peters 
> wrote:
>
>> Hey Nir,
>>
>> I think part of the motivation for doing this in Traffic Router rather
>> than the Caching Proxy is separation of concerns. TR is already
>> concerned with routing a client to the best cache based upon the
>> client's location, so TR is already well-equipped to make the decision
>> of how Delivery Services (origins) should be prioritized based upon
>> the client's location. That way the Caching Proxy (e.g. ATS) doesn't
>> need to concern itself with its own location, the client's location,
>> and the location of origins; it just needs to know how to get the
>> origin's content and cache it. All the client needs to know is that
>> they have a prioritized list of URLs to choose from; they don't need
>> to be concerned about origin/edge locations because that
>> prioritization will be made for them by TR.
>>
>> The target DSes will have different origins primarily because they
>> will be in different locations, and the origins should be
>> interchangeable in terms of the content they provide because a smart
>> client may fail over to any of the target DSes in a CLIENT_STEERING DS
>> for the same content.
>>
>> - Rawlin
>>
>> On Mon, Mar 12, 2018 at 2:37 PM, Nir Sopher  wrote:
>> > Hi Rawlin,
>> > Can you please add a few word for the motivation behind basing the
>> steering
>> > target selection on the location of the client?
>> > As the content goes through the caches, isn't it the job of the cache to
>> > select the best origin for the cache?  Why the client should be the one
>> to
>> > take the origin location into consideration?
>> > Why the target DSes have different origins in the first place? Are they
>> > have different characteristics additionally to their location?
>> > Thanks,
>> > Nir
>> >
>> > -- Forwarded message --
>> > From: Rawlin Peters 
>> > Date: Mon, Mar 12, 2018 at 9:46 PM
>> > Subject: Delivery Service Origin Refactor
>> > To: dev@trafficcontrol.incubator.apache.org
>> >
>> >
>> > Hey folks,
>> >
>> > As promised, 

Re: Backup Cache Group Selection

2018-03-13 Thread Rawlin Peters
To clarify, if we got a hit in the CZF for the client's IP, then we
should *not* fail when all specified backup CGs are unavailable,
fallbackToClosest is set to True, and the DS is set to "CZF only". In
that case we should find the closest available CG (and fail if there
are none). If your current implementation does not follow that
behavior, it should be fixed to do so.

Feel free to close your existing PR and create a new one if that's
easier for you. Just be sure to add a comment in the old PR that
references your new PR, and I'll continue to review.

Also, you can check traffic_ops_golang/routes.go [1] to see if
existing Perl API endpoints have been rewritten into Go yet. I don't
see the /cachegroup endpoints in that file yet, so you should be
alright to update the existing Perl endpoints. However, any *new*
cachegroup API endpoints needed should be written in
traffic_ops_golang.

As for DB schema updates for this, I was thinking one new column
(fallback_to_closest - nullable, default true) in the cachegroup table
and a new cachegroup_fallback table with at least 3 columns (primary -
FK of the primary CG, fallback - FK of the fallback CG, order -
integer specifying the order of the fallback list).

So given that, I imagine we'll need a new API endpoint like
`/api/1.3/cachegroups/{:id}/fallbacks` that we can use to add, remove,
or update ordering of fallbacks for a particular CG. This API should
be restricted to cachegroups of type EDGE_LOC only.

However, I haven't done much new API work yet, so hopefully the
contributors who've done a lot of API work can chime in and make sure
the new API endpoint is in line with things like Swagger and other
requirements.

- Rawlin

[1] 
https://github.com/apache/incubator-trafficcontrol/blob/master/traffic_ops/traffic_ops_golang/routes.go

On Tue, Mar 13, 2018 at 10:05 AM, David Neuman  wrote:
> What happens when Geo Limit is set to "CZF Only"  with all backup Caches
> are unavailable and fallbackToClosest is set to True. Current
> implementation will fail this. Should we do Geo lookup now in this change?
>
> In this case we should fail. If the Geo Limit is set to “CZF Only” then
> that is all we use.
>
>
> On Tue, Mar 13, 2018 at 8:17 AM, Vijay Anand <
> vijayanand.jayaman...@gmail.com> wrote:
>
>> @Rawlin,
>>
>> Let me try and get the changes done for TR according to your suggestions.
>> This change would be like:
>> 1. CZF only contains cache groups which should map back to TO's Cache Group
>> configurations (cr-config)
>> 2. Backup configurations should reach TR via cr-config in the format you
>> detailed.
>> 3. fallbackToClosest will be True by default. If backupLocation config is
>> present, it will be assumed as false unless otherwise it is stated as TRUE
>> explicitly.
>> 4. This will work irrespective of the coverage Zones in CZF as long as the
>> backup Cache Group specified is in cr-config.
>>
>> I have a doubt in this as well.
>>
>> What happens when Geo Limit is set to "CZF Only"  with all backup Caches
>> are unavailable and fallbackToClosest is set to True. Current
>> implementation will fail this. Should we do Geo lookup now in this change?
>>
>> Shall i delete my existing PR and create a new one with these changes?
>>
>> I will try to get the necessary changes for TO (Perl Mojo) along with this.
>> Would require your help in TO (Golang) and TP. Will keep you posted.
>>
>> Thanks,
>> Vijayanand S
>>
>>
>>
>>
>> On Tue, Mar 13, 2018 at 3:04 AM, Rawlin Peters 
>> wrote:
>>
>> > If we start by putting this in the Cache Group API first, then later
>> > we really only have to worry about adding the CIDRs to the API. The
>> > backup config is really just relationships between cache groups, which
>> > makes perfect sense to model in a relational DB rather than the CZF.
>> > Why put something in the CZF to just tear it out later?
>> >
>> > - Rawlin
>> >
>> > On Mon, Mar 12, 2018 at 3:12 PM, Dave Neuman  wrote:
>> > > Good point Rawlin, but I think it does answer your questions.  CZF for
>> > now,
>> > > whatever the new CZF thing is after that.
>> > >
>> > > On Mon, Mar 12, 2018 at 1:44 PM, Rawlin Peters <
>> rawlin.pet...@gmail.com>
>> > > wrote:
>> > >
>> > >> The original scope of this thread was determining where the Backup
>> > >> Cache Group config should live (API vs CZF), not necessarily about
>> > >> building the entire CZF in the database, although I'm +1 on that idea
>> > >> as well. I think any decisions made about doing that should probably
>> > >> be started in a separate thread.
>> > >>
>> > >> - Rawlin
>> > >>
>> > >> On Mon, Mar 12, 2018 at 1:11 PM, Dave Neuman 
>> wrote:
>> > >> > +1 on building the CZF in the database.  Jan tried to go down that
>> > rabbit
>> > >> > hole but realized it was a pretty hard problem to solve.  I think
>> > this is
>> > >> > something we might want to re-visit.  Maybe this is something we
>> > should
>> > >> > discuss at 

Re: Backup Cache Group Selection

2018-03-13 Thread David Neuman
What happens when Geo Limit is set to "CZF Only"  with all backup Caches
are unavailable and fallbackToClosest is set to True. Current
implementation will fail this. Should we do Geo lookup now in this change?

In this case we should fail. If the Geo Limit is set to “CZF Only” then
that is all we use.
​

On Tue, Mar 13, 2018 at 8:17 AM, Vijay Anand <
vijayanand.jayaman...@gmail.com> wrote:

> @Rawlin,
>
> Let me try and get the changes done for TR according to your suggestions.
> This change would be like:
> 1. CZF only contains cache groups which should map back to TO's Cache Group
> configurations (cr-config)
> 2. Backup configurations should reach TR via cr-config in the format you
> detailed.
> 3. fallbackToClosest will be True by default. If backupLocation config is
> present, it will be assumed as false unless otherwise it is stated as TRUE
> explicitly.
> 4. This will work irrespective of the coverage Zones in CZF as long as the
> backup Cache Group specified is in cr-config.
>
> I have a doubt in this as well.
>
> What happens when Geo Limit is set to "CZF Only"  with all backup Caches
> are unavailable and fallbackToClosest is set to True. Current
> implementation will fail this. Should we do Geo lookup now in this change?
>
> Shall i delete my existing PR and create a new one with these changes?
>
> I will try to get the necessary changes for TO (Perl Mojo) along with this.
> Would require your help in TO (Golang) and TP. Will keep you posted.
>
> Thanks,
> Vijayanand S
>
>
>
>
> On Tue, Mar 13, 2018 at 3:04 AM, Rawlin Peters 
> wrote:
>
> > If we start by putting this in the Cache Group API first, then later
> > we really only have to worry about adding the CIDRs to the API. The
> > backup config is really just relationships between cache groups, which
> > makes perfect sense to model in a relational DB rather than the CZF.
> > Why put something in the CZF to just tear it out later?
> >
> > - Rawlin
> >
> > On Mon, Mar 12, 2018 at 3:12 PM, Dave Neuman  wrote:
> > > Good point Rawlin, but I think it does answer your questions.  CZF for
> > now,
> > > whatever the new CZF thing is after that.
> > >
> > > On Mon, Mar 12, 2018 at 1:44 PM, Rawlin Peters <
> rawlin.pet...@gmail.com>
> > > wrote:
> > >
> > >> The original scope of this thread was determining where the Backup
> > >> Cache Group config should live (API vs CZF), not necessarily about
> > >> building the entire CZF in the database, although I'm +1 on that idea
> > >> as well. I think any decisions made about doing that should probably
> > >> be started in a separate thread.
> > >>
> > >> - Rawlin
> > >>
> > >> On Mon, Mar 12, 2018 at 1:11 PM, Dave Neuman 
> wrote:
> > >> > +1 on building the CZF in the database.  Jan tried to go down that
> > rabbit
> > >> > hole but realized it was a pretty hard problem to solve.  I think
> > this is
> > >> > something we might want to re-visit.  Maybe this is something we
> > should
> > >> > discuss at our meetup and then update this thread with our
> decisions?
> > >> >
> > >> > On Mon, Mar 12, 2018 at 11:25 AM, Rawlin Peters <
> > rawlin.pet...@gmail.com
> > >> >
> > >> > wrote:
> > >> >
> > >> >> @VijayAnand:
> > >> >>
> > >> >> Right, a Coverage Zone that doesn't map to a Cache Group in TO
> won't
> > >> >> be chosen as a backup in case of failure, but you could have a
> > >> >> Coverage-Zone-not-in-TO that configures Coverage-Zones-in-TO as
> > >> >> backups. But, I think the general sentiment right now is that all
> > >> >> Coverage Zones in the CZF should map back to Cache Groups in TO, so
> > >> >> the backup config should also be done via the Cache Group API.
> > >> >>
> > >> >> So from the Traffic Router perspective, the process should become:
> > >> >> 1. Rather than parsing from the CZF into the NetworkNode class,
> parse
> > >> >> Cache Group backup config from the CRConfig into the existing
> > >> >> CacheLocation class
> > >> >> 2. in the DS request flow, the NetworkNode will map back to a
> > >> >> registered CacheLocation which would contain the backup CG config
> > >> >>
> > >> >> The rest of the PR's behavior should stay the same, it's just a
> > matter
> > >> >> of the config being located in a different class. To give you an
> idea
> > >> >> of how I would format it in the CRConfig (so you know how to parse
> it
> > >> >> out), here is a snippet of "edgeLocations" from CRConfig.json:
> > >> >>
> > >> >> "edgeLocations": {
> > >> >> "edge-cg-1": {
> > >> >>   "latitude": 1.00,
> > >> >>   "longitude": 2.00,
> > >> >>   "backupLocations": {
> > >> >>   "list": ["edge-cg-2"],
> > >> >>   "fallbackToClosest": false
> > >> >>   }
> > >> >> },
> > >> >> "edge-cg-2": {
> > >> >>   "latitude": 3.00,
> > >> >>   "longitude": 4.00
> > >> >> },
> > >> >> }
> > >> >>
> > >> >> The "backupLocations" section would still remain optional (if
> > missing,
> > >> >> follow existing 

Re: Backup Cache Group Selection

2018-03-13 Thread Vijay Anand
@Rawlin,

Let me try and get the changes done for TR according to your suggestions.
This change would be like:
1. CZF only contains cache groups which should map back to TO's Cache Group
configurations (cr-config)
2. Backup configurations should reach TR via cr-config in the format you
detailed.
3. fallbackToClosest will be True by default. If backupLocation config is
present, it will be assumed as false unless otherwise it is stated as TRUE
explicitly.
4. This will work irrespective of the coverage Zones in CZF as long as the
backup Cache Group specified is in cr-config.

I have a doubt in this as well.

What happens when Geo Limit is set to "CZF Only"  with all backup Caches
are unavailable and fallbackToClosest is set to True. Current
implementation will fail this. Should we do Geo lookup now in this change?

Shall i delete my existing PR and create a new one with these changes?

I will try to get the necessary changes for TO (Perl Mojo) along with this.
Would require your help in TO (Golang) and TP. Will keep you posted.

Thanks,
Vijayanand S




On Tue, Mar 13, 2018 at 3:04 AM, Rawlin Peters 
wrote:

> If we start by putting this in the Cache Group API first, then later
> we really only have to worry about adding the CIDRs to the API. The
> backup config is really just relationships between cache groups, which
> makes perfect sense to model in a relational DB rather than the CZF.
> Why put something in the CZF to just tear it out later?
>
> - Rawlin
>
> On Mon, Mar 12, 2018 at 3:12 PM, Dave Neuman  wrote:
> > Good point Rawlin, but I think it does answer your questions.  CZF for
> now,
> > whatever the new CZF thing is after that.
> >
> > On Mon, Mar 12, 2018 at 1:44 PM, Rawlin Peters 
> > wrote:
> >
> >> The original scope of this thread was determining where the Backup
> >> Cache Group config should live (API vs CZF), not necessarily about
> >> building the entire CZF in the database, although I'm +1 on that idea
> >> as well. I think any decisions made about doing that should probably
> >> be started in a separate thread.
> >>
> >> - Rawlin
> >>
> >> On Mon, Mar 12, 2018 at 1:11 PM, Dave Neuman  wrote:
> >> > +1 on building the CZF in the database.  Jan tried to go down that
> rabbit
> >> > hole but realized it was a pretty hard problem to solve.  I think
> this is
> >> > something we might want to re-visit.  Maybe this is something we
> should
> >> > discuss at our meetup and then update this thread with our decisions?
> >> >
> >> > On Mon, Mar 12, 2018 at 11:25 AM, Rawlin Peters <
> rawlin.pet...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> @VijayAnand:
> >> >>
> >> >> Right, a Coverage Zone that doesn't map to a Cache Group in TO won't
> >> >> be chosen as a backup in case of failure, but you could have a
> >> >> Coverage-Zone-not-in-TO that configures Coverage-Zones-in-TO as
> >> >> backups. But, I think the general sentiment right now is that all
> >> >> Coverage Zones in the CZF should map back to Cache Groups in TO, so
> >> >> the backup config should also be done via the Cache Group API.
> >> >>
> >> >> So from the Traffic Router perspective, the process should become:
> >> >> 1. Rather than parsing from the CZF into the NetworkNode class, parse
> >> >> Cache Group backup config from the CRConfig into the existing
> >> >> CacheLocation class
> >> >> 2. in the DS request flow, the NetworkNode will map back to a
> >> >> registered CacheLocation which would contain the backup CG config
> >> >>
> >> >> The rest of the PR's behavior should stay the same, it's just a
> matter
> >> >> of the config being located in a different class. To give you an idea
> >> >> of how I would format it in the CRConfig (so you know how to parse it
> >> >> out), here is a snippet of "edgeLocations" from CRConfig.json:
> >> >>
> >> >> "edgeLocations": {
> >> >> "edge-cg-1": {
> >> >>   "latitude": 1.00,
> >> >>   "longitude": 2.00,
> >> >>   "backupLocations": {
> >> >>   "list": ["edge-cg-2"],
> >> >>   "fallbackToClosest": false
> >> >>   }
> >> >> },
> >> >> "edge-cg-2": {
> >> >>   "latitude": 3.00,
> >> >>   "longitude": 4.00
> >> >> },
> >> >> }
> >> >>
> >> >> The "backupLocations" section would still remain optional (if
> missing,
> >> >> follow existing behavior of falling back to next closest CG).
> Existing
> >> >> defaults in the current PR should remain the same.
> >> >>
> >> >> How would you feel about making those changes in your PR? Feel free
> to
> >> >> tackle the new TO API and Traffic Portal changes too if you want, but
> >> >> I don't want to burden you with this unexpected work if you don't
> want
> >> >> it. I (or another willing contributor) could work on the necessary TO
> >> >> API and Traffic Portal changes sometime in the near future and
> >> >> integrate them with your Traffic Router enhancement.
> >> >>
> >> >> - Rawlin
> >> >>
> >> >>
> >> >>