Re: Question about the poll model of the Traffic Monitor

2018-04-03 Thread Zhilin Huang (zhilhuan)
Thanks, Eric.

It works. I added the design doc which links to the google doc.

Thanks,
Zhilin


On 03/04/2018, 8:19 PM, "Eric Friedrich (efriedri)"  wrote:

Zhilin-
  I added you to the Wiki permissions. Please try again

—Eric

> On Apr 3, 2018, at 2:00 AM, Zhilin Huang (zhilhuan)  
wrote:
> 
> Hi Dave,
> 
> I could not find the edit button on this page. Looks like I do not have 
the authority to add the doc.
> 
> Thanks,
> Zhilin
> 
> 
> On 03/04/2018, 2:43 AM, "David Neuman"  wrote:
> 
>Hi Zhilin,
>Is it possible to get this design doc added to our wiki?  I create a 
design
>docs page here 
(https://cwiki.apache.org/confluence/display/TC/Design+Docs).
>I think it would be good to get the document there so it doesn't get 
lost
>over time.
> 
>Thanks!
>Dave
> 
>On Wed, Mar 28, 2018 at 10:41 PM, Zhilin Huang (zhilhuan) <
>zhilh...@cisco.com> wrote:
> 
>> Hi Guys,
>> 
>> Thanks a lot for the discussion. I should put the design earlier for
>> review, and sorry for the delay. Here is the link for the design doc:
>> https://docs.google.com/document/d/1vgq-pGNoLLYf7Y3cu5hWu67TUKpN5hucrp
>> -ZS9nSsd4/edit?usp=sharing
>> 
>> Short summary for the feature design:
>> ---
>> There is feature request from market to add secondary IPs support on edge
>> cache servers, and the functionality to assign a delivery service to a
>> secondary IP of an edge cache.
>> 
>> This feature requires Traffic Ops implementation to support secondary IP
>> configuration for edge cache, and delivery service assignment to 
secondary
>> IP.
>> 
>> Traffic Monitor should also monitor connectivity of secondary IPs
>> configured. And Traffic Router needs support to resolve streamer FQDN to
>> secondary IP assigned in a delivery service.
>> 
>> Traffic Server should record the IP serving client request. And should
>> reject request to an unassigned IP for a delivery service.
>> 
>> This design has taken compatibility into consideration: if no secondary 
IP
>> configured, or some parts of the system has not been upgraded to the
>> version supports this feature, the traffic will be served by primary IPs 
as
>> before.
>> ---
>> 
>> Replies for Robert's comments is embedded in the email thread. Much
>> appreciated and welcome to any further comments.
>> 
>> Thanks,
>> Zhilin
>> 
>> 
>> 
>> 
>> On 29/03/2018, 10:19 AM, "Neil Hao (nbaoping)" 
>> wrote:
>> 
>>Hi Robert/Nir,
>> 
>>Thanks very much for the quick and detail reply, and sorry for that I
>> didn’t make the whole feature clearly. Actually, it’s our Secondary IP
>> feature, which is a big feature that will bring change to all the
>> components in the Traffic Control. I thought our teammate reviewed the
>> design with you guys before, but it seems not. And after discussion, we
>> will start the whole feature design review with you guys soon, I think it
>> will be better to continue the discussion after that.
>> 
>>Thanks,
>>Neil
>> 
>>On 3/29/18, 1:16 AM, "Robert Butts"  wrote:
>> 
>>I agree with Nir, it's not as simple as changing a structure to
>> `[]URL`,
>>it's a bigger architectural design question.
>> 
>>How do you plan to mark caches Unavailable if they're unhealthy on
>> one
>>interface, but healthy on another?
>> 
>>Right now, Traffic Router needs a boolean for each cache, it
>> doesn't know
>>anything about multiple network interfaces, IPv4 vs IPv6, etc. It
>> only
>>knows the FQDN, which is all the clients it's giving DNS records
>> to will
>>know when they request the cache.
>> 
>>Questions:
>>Is a cache marked Unavailable when any interface is unreachable?
>> Or all of
>>them?
>> ZH> Actually, we will care about an IP availability instead of interface
>> availability. Please take a look at 3.1.2 of the design doc.
>> 
>>What if an interface is reachable, but one interface reports
>> different
>>stats than another interface? For example, what if someone
>> configures a
>>different caching proxy (ATS) on each interface?
>> ZH> Will only use 1 ATS to serve traffic from all IPs configured.
>> 
>>How are stats aggregated? Should the monitor aggregate all stats
>> from
>>different polls and interfaces together, and consider them the 
same
>>"server"? If not, how do we reconcile the different stats with
>> what 

Re: Question about the poll model of the Traffic Monitor

2018-04-03 Thread Zhilin Huang (zhilhuan)
Hi Dave,

I could not find the edit button on this page. Looks like I do not have the 
authority to add the doc.

Thanks,
Zhilin


On 03/04/2018, 2:43 AM, "David Neuman"  wrote:

Hi Zhilin,
Is it possible to get this design doc added to our wiki?  I create a design
docs page here (https://cwiki.apache.org/confluence/display/TC/Design+Docs).
I think it would be good to get the document there so it doesn't get lost
over time.

Thanks!
Dave

On Wed, Mar 28, 2018 at 10:41 PM, Zhilin Huang (zhilhuan) <
zhilh...@cisco.com> wrote:

> Hi Guys,
>
> Thanks a lot for the discussion. I should put the design earlier for
> review, and sorry for the delay. Here is the link for the design doc:
> https://docs.google.com/document/d/1vgq-pGNoLLYf7Y3cu5hWu67TUKpN5hucrp
> -ZS9nSsd4/edit?usp=sharing
>
> Short summary for the feature design:
> ---
> There is feature request from market to add secondary IPs support on edge
> cache servers, and the functionality to assign a delivery service to a
> secondary IP of an edge cache.
>
> This feature requires Traffic Ops implementation to support secondary IP
> configuration for edge cache, and delivery service assignment to secondary
> IP.
>
> Traffic Monitor should also monitor connectivity of secondary IPs
> configured. And Traffic Router needs support to resolve streamer FQDN to
> secondary IP assigned in a delivery service.
>
> Traffic Server should record the IP serving client request. And should
> reject request to an unassigned IP for a delivery service.
>
> This design has taken compatibility into consideration: if no secondary IP
> configured, or some parts of the system has not been upgraded to the
> version supports this feature, the traffic will be served by primary IPs 
as
> before.
> ---
>
> Replies for Robert's comments is embedded in the email thread. Much
> appreciated and welcome to any further comments.
>
> Thanks,
> Zhilin
>
>
>
>
> On 29/03/2018, 10:19 AM, "Neil Hao (nbaoping)" 
> wrote:
>
> Hi Robert/Nir,
>
> Thanks very much for the quick and detail reply, and sorry for that I
> didn’t make the whole feature clearly. Actually, it’s our Secondary IP
> feature, which is a big feature that will bring change to all the
> components in the Traffic Control. I thought our teammate reviewed the
> design with you guys before, but it seems not. And after discussion, we
> will start the whole feature design review with you guys soon, I think it
> will be better to continue the discussion after that.
>
> Thanks,
> Neil
>
> On 3/29/18, 1:16 AM, "Robert Butts"  wrote:
>
> I agree with Nir, it's not as simple as changing a structure to
> `[]URL`,
> it's a bigger architectural design question.
>
> How do you plan to mark caches Unavailable if they're unhealthy on
> one
> interface, but healthy on another?
>
> Right now, Traffic Router needs a boolean for each cache, it
> doesn't know
> anything about multiple network interfaces, IPv4 vs IPv6, etc. It
> only
> knows the FQDN, which is all the clients it's giving DNS records
> to will
> know when they request the cache.
>
> Questions:
> Is a cache marked Unavailable when any interface is unreachable?
> Or all of
> them?
> ZH> Actually, we will care about an IP availability instead of interface
> availability. Please take a look at 3.1.2 of the design doc.
>
> What if an interface is reachable, but one interface reports
> different
> stats than another interface? For example, what if someone
> configures a
> different caching proxy (ATS) on each interface?
> ZH> Will only use 1 ATS to serve traffic from all IPs configured.
>
> How are stats aggregated? Should the monitor aggregate all stats
> from
> different polls and interfaces together, and consider them the 
same
> "server"? If not, how do we reconcile the different stats with
> what the
> Monitor reports on `CrStates` and `CacheStats`? If so, again, what
> happens
> if different interfaces have different ATS instances, so e.g. the
> byte
> count on one is 100, and the other is 1000, then 101, then 1001.
> It simply
> won't work. Do we handle that? Or just ignore it, and document 
"all
> interfaces must report the same stats"? Do we try to detect that
> and give a
> useful error or warning?
> ZH> The bandwidth for interfaces will be aggregated. We will 

Re: Question about the poll model of the Traffic Monitor

2018-04-02 Thread David Neuman
Hi Zhilin,
Is it possible to get this design doc added to our wiki?  I create a design
docs page here (https://cwiki.apache.org/confluence/display/TC/Design+Docs).
I think it would be good to get the document there so it doesn't get lost
over time.

Thanks!
Dave

On Wed, Mar 28, 2018 at 10:41 PM, Zhilin Huang (zhilhuan) <
zhilh...@cisco.com> wrote:

> Hi Guys,
>
> Thanks a lot for the discussion. I should put the design earlier for
> review, and sorry for the delay. Here is the link for the design doc:
> https://docs.google.com/document/d/1vgq-pGNoLLYf7Y3cu5hWu67TUKpN5hucrp
> -ZS9nSsd4/edit?usp=sharing
>
> Short summary for the feature design:
> ---
> There is feature request from market to add secondary IPs support on edge
> cache servers, and the functionality to assign a delivery service to a
> secondary IP of an edge cache.
>
> This feature requires Traffic Ops implementation to support secondary IP
> configuration for edge cache, and delivery service assignment to secondary
> IP.
>
> Traffic Monitor should also monitor connectivity of secondary IPs
> configured. And Traffic Router needs support to resolve streamer FQDN to
> secondary IP assigned in a delivery service.
>
> Traffic Server should record the IP serving client request. And should
> reject request to an unassigned IP for a delivery service.
>
> This design has taken compatibility into consideration: if no secondary IP
> configured, or some parts of the system has not been upgraded to the
> version supports this feature, the traffic will be served by primary IPs as
> before.
> ---
>
> Replies for Robert's comments is embedded in the email thread. Much
> appreciated and welcome to any further comments.
>
> Thanks,
> Zhilin
>
>
>
>
> On 29/03/2018, 10:19 AM, "Neil Hao (nbaoping)" 
> wrote:
>
> Hi Robert/Nir,
>
> Thanks very much for the quick and detail reply, and sorry for that I
> didn’t make the whole feature clearly. Actually, it’s our Secondary IP
> feature, which is a big feature that will bring change to all the
> components in the Traffic Control. I thought our teammate reviewed the
> design with you guys before, but it seems not. And after discussion, we
> will start the whole feature design review with you guys soon, I think it
> will be better to continue the discussion after that.
>
> Thanks,
> Neil
>
> On 3/29/18, 1:16 AM, "Robert Butts"  wrote:
>
> I agree with Nir, it's not as simple as changing a structure to
> `[]URL`,
> it's a bigger architectural design question.
>
> How do you plan to mark caches Unavailable if they're unhealthy on
> one
> interface, but healthy on another?
>
> Right now, Traffic Router needs a boolean for each cache, it
> doesn't know
> anything about multiple network interfaces, IPv4 vs IPv6, etc. It
> only
> knows the FQDN, which is all the clients it's giving DNS records
> to will
> know when they request the cache.
>
> Questions:
> Is a cache marked Unavailable when any interface is unreachable?
> Or all of
> them?
> ZH> Actually, we will care about an IP availability instead of interface
> availability. Please take a look at 3.1.2 of the design doc.
>
> What if an interface is reachable, but one interface reports
> different
> stats than another interface? For example, what if someone
> configures a
> different caching proxy (ATS) on each interface?
> ZH> Will only use 1 ATS to serve traffic from all IPs configured.
>
> How are stats aggregated? Should the monitor aggregate all stats
> from
> different polls and interfaces together, and consider them the same
> "server"? If not, how do we reconcile the different stats with
> what the
> Monitor reports on `CrStates` and `CacheStats`? If so, again, what
> happens
> if different interfaces have different ATS instances, so e.g. the
> byte
> count on one is 100, and the other is 1000, then 101, then 1001.
> It simply
> won't work. Do we handle that? Or just ignore it, and document "all
> interfaces must report the same stats"? Do we try to detect that
> and give a
> useful error or warning?
> ZH> The bandwidth for interfaces will be aggregated. We will only have 1
> ATS to serve traffic from all interfaces. The connectivity check is IP
> based. And the stats collection will be interface based. Please take a look
> at 3.1.2 of the design doc for details.
>
> In Traffic Ops, servers have specific data used for polling.
> Traffic
> Monitor gets the stats URI path from Parameters, and the URI IP
> from the
> Servers table. It doesn't use the FQDN, Server Host or Server
> Domain. Where
> would these other interfaces come from? Parameters? Or another
> table linked
> to the servers table? (I'd really, really rather we didn't put
> more data in
> unsafe Parameters, 

Re: Question about the poll model of the Traffic Monitor

2018-03-28 Thread Zhilin Huang (zhilhuan)
Hi Guys,

Thanks a lot for the discussion. I should put the design earlier for review, 
and sorry for the delay. Here is the link for the design doc:
https://docs.google.com/document/d/1vgq-pGNoLLYf7Y3cu5hWu67TUKpN5hucrp-ZS9nSsd4/edit?usp=sharing

Short summary for the feature design:
---
There is feature request from market to add secondary IPs support on edge cache 
servers, and the functionality to assign a delivery service to a secondary IP 
of an edge cache.

This feature requires Traffic Ops implementation to support secondary IP 
configuration for edge cache, and delivery service assignment to secondary IP. 

Traffic Monitor should also monitor connectivity of secondary IPs configured. 
And Traffic Router needs support to resolve streamer FQDN to secondary IP 
assigned in a delivery service.

Traffic Server should record the IP serving client request. And should reject 
request to an unassigned IP for a delivery service.

This design has taken compatibility into consideration: if no secondary IP 
configured, or some parts of the system has not been upgraded to the version 
supports this feature, the traffic will be served by primary IPs as before.
---

Replies for Robert's comments is embedded in the email thread. Much appreciated 
and welcome to any further comments.

Thanks,
Zhilin




On 29/03/2018, 10:19 AM, "Neil Hao (nbaoping)"  wrote:

Hi Robert/Nir,

Thanks very much for the quick and detail reply, and sorry for that I 
didn’t make the whole feature clearly. Actually, it’s our Secondary IP feature, 
which is a big feature that will bring change to all the components in the 
Traffic Control. I thought our teammate reviewed the design with you guys 
before, but it seems not. And after discussion, we will start the whole feature 
design review with you guys soon, I think it will be better to continue the 
discussion after that.

Thanks,
Neil

On 3/29/18, 1:16 AM, "Robert Butts"  wrote:

I agree with Nir, it's not as simple as changing a structure to `[]URL`,
it's a bigger architectural design question.

How do you plan to mark caches Unavailable if they're unhealthy on one
interface, but healthy on another?

Right now, Traffic Router needs a boolean for each cache, it doesn't 
know
anything about multiple network interfaces, IPv4 vs IPv6, etc. It only
knows the FQDN, which is all the clients it's giving DNS records to will
know when they request the cache.

Questions:
Is a cache marked Unavailable when any interface is unreachable? Or all 
of
them?
ZH> Actually, we will care about an IP availability instead of interface 
availability. Please take a look at 3.1.2 of the design doc.

What if an interface is reachable, but one interface reports different
stats than another interface? For example, what if someone configures a
different caching proxy (ATS) on each interface?
ZH> Will only use 1 ATS to serve traffic from all IPs configured.

How are stats aggregated? Should the monitor aggregate all stats from
different polls and interfaces together, and consider them the same
"server"? If not, how do we reconcile the different stats with what the
Monitor reports on `CrStates` and `CacheStats`? If so, again, what 
happens
if different interfaces have different ATS instances, so e.g. the byte
count on one is 100, and the other is 1000, then 101, then 1001. It 
simply
won't work. Do we handle that? Or just ignore it, and document "all
interfaces must report the same stats"? Do we try to detect that and 
give a
useful error or warning?
ZH> The bandwidth for interfaces will be aggregated. We will only have 1 ATS to 
serve traffic from all interfaces. The connectivity check is IP based. And the 
stats collection will be interface based. Please take a look at 3.1.2 of the 
design doc for details.

In Traffic Ops, servers have specific data used for polling. Traffic
Monitor gets the stats URI path from Parameters, and the URI IP from the
Servers table. It doesn't use the FQDN, Server Host or Server Domain. 
Where
would these other interfaces come from? Parameters? Or another table 
linked
to the servers table? (I'd really, really rather we didn't put more 
data in
unsafe Parameters, which can not exist, not be properly formatted, need
safety checks in all code that ever uses them, and are confusing and 
opaque
to new users) Would these other interfaces be in addition to using the 
IP
from the Server table? Or replace it?

Do we have config options for all of these? Only some of them? In the
config file, or Traffic Ops fields?
ZH> Please take a look at 3.1.1 of the design doc. Basically, we will add new 
APIs, or new fields to 

Re: Question about the poll model of the Traffic Monitor

2018-03-28 Thread Neil Hao (nbaoping)
Hi Robert/Nir,

Thanks very much for the quick and detail reply, and sorry for that I didn’t 
make the whole feature clearly. Actually, it’s our Secondary IP feature, which 
is a big feature that will bring change to all the components in the Traffic 
Control. I thought our teammate reviewed the design with you guys before, but 
it seems not. And after discussion, we will start the whole feature design 
review with you guys soon, I think it will be better to continue the discussion 
after that.

Thanks,
Neil

On 3/29/18, 1:16 AM, "Robert Butts"  wrote:

I agree with Nir, it's not as simple as changing a structure to `[]URL`,
it's a bigger architectural design question.

How do you plan to mark caches Unavailable if they're unhealthy on one
interface, but healthy on another?

Right now, Traffic Router needs a boolean for each cache, it doesn't know
anything about multiple network interfaces, IPv4 vs IPv6, etc. It only
knows the FQDN, which is all the clients it's giving DNS records to will
know when they request the cache.

Questions:
Is a cache marked Unavailable when any interface is unreachable? Or all of
them?
What if an interface is reachable, but one interface reports different
stats than another interface? For example, what if someone configures a
different caching proxy (ATS) on each interface?
How are stats aggregated? Should the monitor aggregate all stats from
different polls and interfaces together, and consider them the same
"server"? If not, how do we reconcile the different stats with what the
Monitor reports on `CrStates` and `CacheStats`? If so, again, what happens
if different interfaces have different ATS instances, so e.g. the byte
count on one is 100, and the other is 1000, then 101, then 1001. It simply
won't work. Do we handle that? Or just ignore it, and document "all
interfaces must report the same stats"? Do we try to detect that and give a
useful error or warning?

In Traffic Ops, servers have specific data used for polling. Traffic
Monitor gets the stats URI path from Parameters, and the URI IP from the
Servers table. It doesn't use the FQDN, Server Host or Server Domain. Where
would these other interfaces come from? Parameters? Or another table linked
to the servers table? (I'd really, really rather we didn't put more data in
unsafe Parameters, which can not exist, not be properly formatted, need
safety checks in all code that ever uses them, and are confusing and opaque
to new users) Would these other interfaces be in addition to using the IP
from the Server table? Or replace it?

Do we have config options for all of these? Only some of them? In the
config file, or Traffic Ops fields?


I'd like to hear the use case too, and e.g. why it isn't better to simply
make each different interface a different server in Traffic Ops? How is the
Traffic Router routing to them, anyway? Are you setting up the same DNS
record to point to the IPs of all interfaces? How is that configured in
Traffic Ops then? I.e. which interfaces are configured as the Server IP and
IP6? Are we certain there aren't other issues in other Traffic Control
components, with a Server IP and IP6 not having a one-to-one relationship
with the FQDN A/ record?

Do we need to take the bigger step, of having a Traffic Ops Server have an
array of IPs? That's a lot more work (especially making sure it works
everywhere, e.g. Traffic Router), but it solves a lot of questions and
hackery, gives us a lot more flexibility, and matches the physical reality
better.


I'm not opposed to the idea, but we need to think through the architecture,
we need to be sure the added complexity is worth it over existing
solutions, we need to make all the options (e.g. Unavailable if any vs all)
configurable, and we need to make sure the common simple case of a single
Server IP and IP6 still work without additional configuration complexity.


On Wed, Mar 28, 2018 at 10:19 AM, Nir Sopher  wrote:

> Hi Eric/Neil,
> Isn't the question of supporting multi interfaces per server a much wider
> question? Architectural wise.
> What would be the desired behavior if the monitoring shows that only one 
of
> the interfaces is down? Will the router send traffic to the healthy
> interfaces? How?
> Nir
>
> On Wed, Mar 28, 2018, 19:10 Eric Friedrich (efriedri) 
> wrote:
>
> > The use case behind this question probably deserves a longer dev@ email.
> >
> > I will oversimplify: we are extending TC to support multiple IPv4 (or
> > multiple IPv6) addresses per edge cache (across 1 or more NICs).
> >
> > Assume all addresses are reachable from the TM.
> >
> > —Eric