Re: Question about the poll model of the Traffic Monitor
Thanks, Eric. It works. I added the design doc which links to the google doc. Thanks, Zhilin On 03/04/2018, 8:19 PM, "Eric Friedrich (efriedri)"wrote: Zhilin- I added you to the Wiki permissions. Please try again —Eric > On Apr 3, 2018, at 2:00 AM, Zhilin Huang (zhilhuan) wrote: > > Hi Dave, > > I could not find the edit button on this page. Looks like I do not have the authority to add the doc. > > Thanks, > Zhilin > > > On 03/04/2018, 2:43 AM, "David Neuman" wrote: > >Hi Zhilin, >Is it possible to get this design doc added to our wiki? I create a design >docs page here (https://cwiki.apache.org/confluence/display/TC/Design+Docs). >I think it would be good to get the document there so it doesn't get lost >over time. > >Thanks! >Dave > >On Wed, Mar 28, 2018 at 10:41 PM, Zhilin Huang (zhilhuan) < >zhilh...@cisco.com> wrote: > >> Hi Guys, >> >> Thanks a lot for the discussion. I should put the design earlier for >> review, and sorry for the delay. Here is the link for the design doc: >> https://docs.google.com/document/d/1vgq-pGNoLLYf7Y3cu5hWu67TUKpN5hucrp >> -ZS9nSsd4/edit?usp=sharing >> >> Short summary for the feature design: >> --- >> There is feature request from market to add secondary IPs support on edge >> cache servers, and the functionality to assign a delivery service to a >> secondary IP of an edge cache. >> >> This feature requires Traffic Ops implementation to support secondary IP >> configuration for edge cache, and delivery service assignment to secondary >> IP. >> >> Traffic Monitor should also monitor connectivity of secondary IPs >> configured. And Traffic Router needs support to resolve streamer FQDN to >> secondary IP assigned in a delivery service. >> >> Traffic Server should record the IP serving client request. And should >> reject request to an unassigned IP for a delivery service. >> >> This design has taken compatibility into consideration: if no secondary IP >> configured, or some parts of the system has not been upgraded to the >> version supports this feature, the traffic will be served by primary IPs as >> before. >> --- >> >> Replies for Robert's comments is embedded in the email thread. Much >> appreciated and welcome to any further comments. >> >> Thanks, >> Zhilin >> >> >> >> >> On 29/03/2018, 10:19 AM, "Neil Hao (nbaoping)" >> wrote: >> >>Hi Robert/Nir, >> >>Thanks very much for the quick and detail reply, and sorry for that I >> didn’t make the whole feature clearly. Actually, it’s our Secondary IP >> feature, which is a big feature that will bring change to all the >> components in the Traffic Control. I thought our teammate reviewed the >> design with you guys before, but it seems not. And after discussion, we >> will start the whole feature design review with you guys soon, I think it >> will be better to continue the discussion after that. >> >>Thanks, >>Neil >> >>On 3/29/18, 1:16 AM, "Robert Butts" wrote: >> >>I agree with Nir, it's not as simple as changing a structure to >> `[]URL`, >>it's a bigger architectural design question. >> >>How do you plan to mark caches Unavailable if they're unhealthy on >> one >>interface, but healthy on another? >> >>Right now, Traffic Router needs a boolean for each cache, it >> doesn't know >>anything about multiple network interfaces, IPv4 vs IPv6, etc. It >> only >>knows the FQDN, which is all the clients it's giving DNS records >> to will >>know when they request the cache. >> >>Questions: >>Is a cache marked Unavailable when any interface is unreachable? >> Or all of >>them? >> ZH> Actually, we will care about an IP availability instead of interface >> availability. Please take a look at 3.1.2 of the design doc. >> >>What if an interface is reachable, but one interface reports >> different >>stats than another interface? For example, what if someone >> configures a >>different caching proxy (ATS) on each interface? >> ZH> Will only use 1 ATS to serve traffic from all IPs configured. >> >>How are stats aggregated? Should the monitor aggregate all stats >> from >>different polls and interfaces together, and consider them the same >>"server"? If not, how do we reconcile the different stats with >> what
Re: Question about the poll model of the Traffic Monitor
Hi Dave, I could not find the edit button on this page. Looks like I do not have the authority to add the doc. Thanks, Zhilin On 03/04/2018, 2:43 AM, "David Neuman"wrote: Hi Zhilin, Is it possible to get this design doc added to our wiki? I create a design docs page here (https://cwiki.apache.org/confluence/display/TC/Design+Docs). I think it would be good to get the document there so it doesn't get lost over time. Thanks! Dave On Wed, Mar 28, 2018 at 10:41 PM, Zhilin Huang (zhilhuan) < zhilh...@cisco.com> wrote: > Hi Guys, > > Thanks a lot for the discussion. I should put the design earlier for > review, and sorry for the delay. Here is the link for the design doc: > https://docs.google.com/document/d/1vgq-pGNoLLYf7Y3cu5hWu67TUKpN5hucrp > -ZS9nSsd4/edit?usp=sharing > > Short summary for the feature design: > --- > There is feature request from market to add secondary IPs support on edge > cache servers, and the functionality to assign a delivery service to a > secondary IP of an edge cache. > > This feature requires Traffic Ops implementation to support secondary IP > configuration for edge cache, and delivery service assignment to secondary > IP. > > Traffic Monitor should also monitor connectivity of secondary IPs > configured. And Traffic Router needs support to resolve streamer FQDN to > secondary IP assigned in a delivery service. > > Traffic Server should record the IP serving client request. And should > reject request to an unassigned IP for a delivery service. > > This design has taken compatibility into consideration: if no secondary IP > configured, or some parts of the system has not been upgraded to the > version supports this feature, the traffic will be served by primary IPs as > before. > --- > > Replies for Robert's comments is embedded in the email thread. Much > appreciated and welcome to any further comments. > > Thanks, > Zhilin > > > > > On 29/03/2018, 10:19 AM, "Neil Hao (nbaoping)" > wrote: > > Hi Robert/Nir, > > Thanks very much for the quick and detail reply, and sorry for that I > didn’t make the whole feature clearly. Actually, it’s our Secondary IP > feature, which is a big feature that will bring change to all the > components in the Traffic Control. I thought our teammate reviewed the > design with you guys before, but it seems not. And after discussion, we > will start the whole feature design review with you guys soon, I think it > will be better to continue the discussion after that. > > Thanks, > Neil > > On 3/29/18, 1:16 AM, "Robert Butts" wrote: > > I agree with Nir, it's not as simple as changing a structure to > `[]URL`, > it's a bigger architectural design question. > > How do you plan to mark caches Unavailable if they're unhealthy on > one > interface, but healthy on another? > > Right now, Traffic Router needs a boolean for each cache, it > doesn't know > anything about multiple network interfaces, IPv4 vs IPv6, etc. It > only > knows the FQDN, which is all the clients it's giving DNS records > to will > know when they request the cache. > > Questions: > Is a cache marked Unavailable when any interface is unreachable? > Or all of > them? > ZH> Actually, we will care about an IP availability instead of interface > availability. Please take a look at 3.1.2 of the design doc. > > What if an interface is reachable, but one interface reports > different > stats than another interface? For example, what if someone > configures a > different caching proxy (ATS) on each interface? > ZH> Will only use 1 ATS to serve traffic from all IPs configured. > > How are stats aggregated? Should the monitor aggregate all stats > from > different polls and interfaces together, and consider them the same > "server"? If not, how do we reconcile the different stats with > what the > Monitor reports on `CrStates` and `CacheStats`? If so, again, what > happens > if different interfaces have different ATS instances, so e.g. the > byte > count on one is 100, and the other is 1000, then 101, then 1001. > It simply > won't work. Do we handle that? Or just ignore it, and document "all > interfaces must report the same stats"? Do we try to detect that > and give a > useful error or warning? > ZH> The bandwidth for interfaces will be aggregated. We will
Re: Question about the poll model of the Traffic Monitor
Hi Zhilin, Is it possible to get this design doc added to our wiki? I create a design docs page here (https://cwiki.apache.org/confluence/display/TC/Design+Docs). I think it would be good to get the document there so it doesn't get lost over time. Thanks! Dave On Wed, Mar 28, 2018 at 10:41 PM, Zhilin Huang (zhilhuan) < zhilh...@cisco.com> wrote: > Hi Guys, > > Thanks a lot for the discussion. I should put the design earlier for > review, and sorry for the delay. Here is the link for the design doc: > https://docs.google.com/document/d/1vgq-pGNoLLYf7Y3cu5hWu67TUKpN5hucrp > -ZS9nSsd4/edit?usp=sharing > > Short summary for the feature design: > --- > There is feature request from market to add secondary IPs support on edge > cache servers, and the functionality to assign a delivery service to a > secondary IP of an edge cache. > > This feature requires Traffic Ops implementation to support secondary IP > configuration for edge cache, and delivery service assignment to secondary > IP. > > Traffic Monitor should also monitor connectivity of secondary IPs > configured. And Traffic Router needs support to resolve streamer FQDN to > secondary IP assigned in a delivery service. > > Traffic Server should record the IP serving client request. And should > reject request to an unassigned IP for a delivery service. > > This design has taken compatibility into consideration: if no secondary IP > configured, or some parts of the system has not been upgraded to the > version supports this feature, the traffic will be served by primary IPs as > before. > --- > > Replies for Robert's comments is embedded in the email thread. Much > appreciated and welcome to any further comments. > > Thanks, > Zhilin > > > > > On 29/03/2018, 10:19 AM, "Neil Hao (nbaoping)"> wrote: > > Hi Robert/Nir, > > Thanks very much for the quick and detail reply, and sorry for that I > didn’t make the whole feature clearly. Actually, it’s our Secondary IP > feature, which is a big feature that will bring change to all the > components in the Traffic Control. I thought our teammate reviewed the > design with you guys before, but it seems not. And after discussion, we > will start the whole feature design review with you guys soon, I think it > will be better to continue the discussion after that. > > Thanks, > Neil > > On 3/29/18, 1:16 AM, "Robert Butts" wrote: > > I agree with Nir, it's not as simple as changing a structure to > `[]URL`, > it's a bigger architectural design question. > > How do you plan to mark caches Unavailable if they're unhealthy on > one > interface, but healthy on another? > > Right now, Traffic Router needs a boolean for each cache, it > doesn't know > anything about multiple network interfaces, IPv4 vs IPv6, etc. It > only > knows the FQDN, which is all the clients it's giving DNS records > to will > know when they request the cache. > > Questions: > Is a cache marked Unavailable when any interface is unreachable? > Or all of > them? > ZH> Actually, we will care about an IP availability instead of interface > availability. Please take a look at 3.1.2 of the design doc. > > What if an interface is reachable, but one interface reports > different > stats than another interface? For example, what if someone > configures a > different caching proxy (ATS) on each interface? > ZH> Will only use 1 ATS to serve traffic from all IPs configured. > > How are stats aggregated? Should the monitor aggregate all stats > from > different polls and interfaces together, and consider them the same > "server"? If not, how do we reconcile the different stats with > what the > Monitor reports on `CrStates` and `CacheStats`? If so, again, what > happens > if different interfaces have different ATS instances, so e.g. the > byte > count on one is 100, and the other is 1000, then 101, then 1001. > It simply > won't work. Do we handle that? Or just ignore it, and document "all > interfaces must report the same stats"? Do we try to detect that > and give a > useful error or warning? > ZH> The bandwidth for interfaces will be aggregated. We will only have 1 > ATS to serve traffic from all interfaces. The connectivity check is IP > based. And the stats collection will be interface based. Please take a look > at 3.1.2 of the design doc for details. > > In Traffic Ops, servers have specific data used for polling. > Traffic > Monitor gets the stats URI path from Parameters, and the URI IP > from the > Servers table. It doesn't use the FQDN, Server Host or Server > Domain. Where > would these other interfaces come from? Parameters? Or another > table linked > to the servers table? (I'd really, really rather we didn't put > more data in > unsafe Parameters,
Re: Question about the poll model of the Traffic Monitor
Hi Guys, Thanks a lot for the discussion. I should put the design earlier for review, and sorry for the delay. Here is the link for the design doc: https://docs.google.com/document/d/1vgq-pGNoLLYf7Y3cu5hWu67TUKpN5hucrp-ZS9nSsd4/edit?usp=sharing Short summary for the feature design: --- There is feature request from market to add secondary IPs support on edge cache servers, and the functionality to assign a delivery service to a secondary IP of an edge cache. This feature requires Traffic Ops implementation to support secondary IP configuration for edge cache, and delivery service assignment to secondary IP. Traffic Monitor should also monitor connectivity of secondary IPs configured. And Traffic Router needs support to resolve streamer FQDN to secondary IP assigned in a delivery service. Traffic Server should record the IP serving client request. And should reject request to an unassigned IP for a delivery service. This design has taken compatibility into consideration: if no secondary IP configured, or some parts of the system has not been upgraded to the version supports this feature, the traffic will be served by primary IPs as before. --- Replies for Robert's comments is embedded in the email thread. Much appreciated and welcome to any further comments. Thanks, Zhilin On 29/03/2018, 10:19 AM, "Neil Hao (nbaoping)"wrote: Hi Robert/Nir, Thanks very much for the quick and detail reply, and sorry for that I didn’t make the whole feature clearly. Actually, it’s our Secondary IP feature, which is a big feature that will bring change to all the components in the Traffic Control. I thought our teammate reviewed the design with you guys before, but it seems not. And after discussion, we will start the whole feature design review with you guys soon, I think it will be better to continue the discussion after that. Thanks, Neil On 3/29/18, 1:16 AM, "Robert Butts" wrote: I agree with Nir, it's not as simple as changing a structure to `[]URL`, it's a bigger architectural design question. How do you plan to mark caches Unavailable if they're unhealthy on one interface, but healthy on another? Right now, Traffic Router needs a boolean for each cache, it doesn't know anything about multiple network interfaces, IPv4 vs IPv6, etc. It only knows the FQDN, which is all the clients it's giving DNS records to will know when they request the cache. Questions: Is a cache marked Unavailable when any interface is unreachable? Or all of them? ZH> Actually, we will care about an IP availability instead of interface availability. Please take a look at 3.1.2 of the design doc. What if an interface is reachable, but one interface reports different stats than another interface? For example, what if someone configures a different caching proxy (ATS) on each interface? ZH> Will only use 1 ATS to serve traffic from all IPs configured. How are stats aggregated? Should the monitor aggregate all stats from different polls and interfaces together, and consider them the same "server"? If not, how do we reconcile the different stats with what the Monitor reports on `CrStates` and `CacheStats`? If so, again, what happens if different interfaces have different ATS instances, so e.g. the byte count on one is 100, and the other is 1000, then 101, then 1001. It simply won't work. Do we handle that? Or just ignore it, and document "all interfaces must report the same stats"? Do we try to detect that and give a useful error or warning? ZH> The bandwidth for interfaces will be aggregated. We will only have 1 ATS to serve traffic from all interfaces. The connectivity check is IP based. And the stats collection will be interface based. Please take a look at 3.1.2 of the design doc for details. In Traffic Ops, servers have specific data used for polling. Traffic Monitor gets the stats URI path from Parameters, and the URI IP from the Servers table. It doesn't use the FQDN, Server Host or Server Domain. Where would these other interfaces come from? Parameters? Or another table linked to the servers table? (I'd really, really rather we didn't put more data in unsafe Parameters, which can not exist, not be properly formatted, need safety checks in all code that ever uses them, and are confusing and opaque to new users) Would these other interfaces be in addition to using the IP from the Server table? Or replace it? Do we have config options for all of these? Only some of them? In the config file, or Traffic Ops fields? ZH> Please take a look at 3.1.1 of the design doc. Basically, we will add new APIs, or new fields to
Re: Question about the poll model of the Traffic Monitor
Hi Robert/Nir, Thanks very much for the quick and detail reply, and sorry for that I didn’t make the whole feature clearly. Actually, it’s our Secondary IP feature, which is a big feature that will bring change to all the components in the Traffic Control. I thought our teammate reviewed the design with you guys before, but it seems not. And after discussion, we will start the whole feature design review with you guys soon, I think it will be better to continue the discussion after that. Thanks, Neil On 3/29/18, 1:16 AM, "Robert Butts"wrote: I agree with Nir, it's not as simple as changing a structure to `[]URL`, it's a bigger architectural design question. How do you plan to mark caches Unavailable if they're unhealthy on one interface, but healthy on another? Right now, Traffic Router needs a boolean for each cache, it doesn't know anything about multiple network interfaces, IPv4 vs IPv6, etc. It only knows the FQDN, which is all the clients it's giving DNS records to will know when they request the cache. Questions: Is a cache marked Unavailable when any interface is unreachable? Or all of them? What if an interface is reachable, but one interface reports different stats than another interface? For example, what if someone configures a different caching proxy (ATS) on each interface? How are stats aggregated? Should the monitor aggregate all stats from different polls and interfaces together, and consider them the same "server"? If not, how do we reconcile the different stats with what the Monitor reports on `CrStates` and `CacheStats`? If so, again, what happens if different interfaces have different ATS instances, so e.g. the byte count on one is 100, and the other is 1000, then 101, then 1001. It simply won't work. Do we handle that? Or just ignore it, and document "all interfaces must report the same stats"? Do we try to detect that and give a useful error or warning? In Traffic Ops, servers have specific data used for polling. Traffic Monitor gets the stats URI path from Parameters, and the URI IP from the Servers table. It doesn't use the FQDN, Server Host or Server Domain. Where would these other interfaces come from? Parameters? Or another table linked to the servers table? (I'd really, really rather we didn't put more data in unsafe Parameters, which can not exist, not be properly formatted, need safety checks in all code that ever uses them, and are confusing and opaque to new users) Would these other interfaces be in addition to using the IP from the Server table? Or replace it? Do we have config options for all of these? Only some of them? In the config file, or Traffic Ops fields? I'd like to hear the use case too, and e.g. why it isn't better to simply make each different interface a different server in Traffic Ops? How is the Traffic Router routing to them, anyway? Are you setting up the same DNS record to point to the IPs of all interfaces? How is that configured in Traffic Ops then? I.e. which interfaces are configured as the Server IP and IP6? Are we certain there aren't other issues in other Traffic Control components, with a Server IP and IP6 not having a one-to-one relationship with the FQDN A/ record? Do we need to take the bigger step, of having a Traffic Ops Server have an array of IPs? That's a lot more work (especially making sure it works everywhere, e.g. Traffic Router), but it solves a lot of questions and hackery, gives us a lot more flexibility, and matches the physical reality better. I'm not opposed to the idea, but we need to think through the architecture, we need to be sure the added complexity is worth it over existing solutions, we need to make all the options (e.g. Unavailable if any vs all) configurable, and we need to make sure the common simple case of a single Server IP and IP6 still work without additional configuration complexity. On Wed, Mar 28, 2018 at 10:19 AM, Nir Sopher wrote: > Hi Eric/Neil, > Isn't the question of supporting multi interfaces per server a much wider > question? Architectural wise. > What would be the desired behavior if the monitoring shows that only one of > the interfaces is down? Will the router send traffic to the healthy > interfaces? How? > Nir > > On Wed, Mar 28, 2018, 19:10 Eric Friedrich (efriedri) > wrote: > > > The use case behind this question probably deserves a longer dev@ email. > > > > I will oversimplify: we are extending TC to support multiple IPv4 (or > > multiple IPv6) addresses per edge cache (across 1 or more NICs). > > > > Assume all addresses are reachable from the TM. > > > > —Eric