Re: [PATCH] neighbour.c: Avoid GC directly after state change
On 04/21/2015 05:58 AM, YOSHIFUJI Hideaki wrote: Ulf Samuelsson wrote: How many neighbors do you want to maintain? I guess you have to increase the number of gc_thresh1. The current use cases have up to 2048 entries. This is expected to grow in the future. The 3.4 kernel used in the system today is limited to 1024, but that has been raised to about 10k. The gc_thresh1 test is not implemented in 3.4 but can be backported, but still not convinced it is a good idea. Why? A good solution makes sure that: * equipment which is connected NEVER IS garbage collected * equipment which is disconnected IS garbage collected. The threshold idea does not meet the criteria for a good solution. With this solution you keep unnecessary entries in the table. If you ever pass the limit, then equipment which should not be garbage collected may be. It relies on someone keeping track of traffic loss, so needs more maintenance by the SysOp. The ARP probes should be considered to be NECESSARY traffic to maintain a quality link. Obviously not everyone would want to make this trade-off. To complicate things, one requirement is that for some interfaces you always want to keep things alive, if connected, but for other interfaces you want things to be removed to conserve memory. Actually you would want to do this selection on a subnet level. If you want to introduce per-interface parameter, I am okay with it. Internal discussions resulted in a proposal to change the patch, so that you have a keepalive flag which is tested after it has been decided to exit the REACHABLE state. if the keepalive flag is set, you always go to DELAY state from REACHABLE. No. And why is it a bad idea to have a high quality connection? Best Regards, Ulf Samuelsson -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] neighbour.c: Avoid GC directly after state change
Ulf Samuelsson wrote: On 04/21/2015 05:58 AM, YOSHIFUJI Hideaki wrote: Ulf Samuelsson wrote: How many neighbors do you want to maintain? I guess you have to increase the number of gc_thresh1. The current use cases have up to 2048 entries. This is expected to grow in the future. The 3.4 kernel used in the system today is limited to 1024, but that has been raised to about 10k. The gc_thresh1 test is not implemented in 3.4 but can be backported, but still not convinced it is a good idea. Why? A good solution makes sure that: * equipment which is connected NEVER IS garbage collected * equipment which is disconnected IS garbage collected. The threshold idea does not meet the criteria for a good solution. We try providing good solution if you have less than gc_thresh1 entries only. Otherwise, we try hard to protect ourselves. With this solution you keep unnecessary entries in the table. If you ever pass the limit, then equipment which should not be garbage collected may be. It relies on someone keeping track of traffic loss, so needs more maintenance by the SysOp.try pr The ARP probes should be considered to be NECESSARY traffic to maintain a quality link. Obviously not everyone would want to make this trade-off. To complicate things, one requirement is that for some interfaces you always want to keep things alive, if connected, but for other interfaces you want things to be removed to conserve memory. Actually you would want to do this selection on a subnet level. If you want to introduce per-interface parameter, I am okay with it. Internal discussions resulted in a proposal to change the patch, so that you have a keepalive flag which is tested after it has been decided to exit the REACHABLE state. if the keepalive flag is set, you always go to DELAY state from REACHABLE. No. And why is it a bad idea to have a high quality connection? We reclaim neighbor entries as much as possible to protect ourselves if the number is below gc_thresh1. We could stop purging entries, but the idea was rejected AFAIK. That is our design. Again, you should increase gc_thresh1, first. -- Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com Technical Division, MIRACLE LINUX CORPORATION -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] neighbour.c: Avoid GC directly after state change
On 04/22/2015 12:46 PM, YOSHIFUJI Hideaki wrote: Ulf Samuelsson wrote: On 04/21/2015 05:58 AM, YOSHIFUJI Hideaki wrote: Ulf Samuelsson wrote: How many neighbors do you want to maintain? I guess you have to increase the number of gc_thresh1. The current use cases have up to 2048 entries. This is expected to grow in the future. The 3.4 kernel used in the system today is limited to 1024, but that has been raised to about 10k. The gc_thresh1 test is not implemented in 3.4 but can be backported, but still not convinced it is a good idea. Why? A good solution makes sure that: * equipment which is connected NEVER IS garbage collected * equipment which is disconnected IS garbage collected. The threshold idea does not meet the criteria for a good solution. We try providing good solution if you have less than gc_thresh1 entries only. Otherwise, we try hard to protect ourselves. Protect against what? With this solution you keep unnecessary entries in the table. If you ever pass the limit, then equipment which should not be garbage collected may be. It relies on someone keeping track of traffic loss, so needs more maintenance by the SysOp.try pr The ARP probes should be considered to be NECESSARY traffic to maintain a quality link. Obviously not everyone would want to make this trade-off. To complicate things, one requirement is that for some interfaces you always want to keep things alive, if connected, but for other interfaces you want things to be removed to conserve memory. Actually you would want to do this selection on a subnet level. If you want to introduce per-interface parameter, I am okay with it. Internal discussions resulted in a proposal to change the patch, so that you have a keepalive flag which is tested after it has been decided to exit the REACHABLE state. if the keepalive flag is set, you always go to DELAY state from REACHABLE. No. And why is it a bad idea to have a high quality connection? We reclaim neighbor entries as much as possible to protect ourselves if the number is below gc_thresh1. We could stop purging entries, but the idea was rejected AFAIK. That is our design. No you keep disconnected entries which can be removed in the table as long as you are below gc_tresh1, so you waste memory. This is a trade-off to reduce communication over the wire at the expense of quality of service. If you can live with the drawback of extra bandwidth usage, then doing probes before you remove an entry will reduce memory usage and improve the quality of the communication. Why do you want to decide which trade-off is best? Why not let the system designer make that decision? I am not asking to stop purging entries. Entries which are not needed *should* be purged. But entries which are needed should not be purged, because that means traffic loss. This design does not differentiate between the two providing unnecessary risks and wastes memory. We know beforehand that if a neighbour is connected to a certain interface, it should always be present in the table. You do not want us to use that knowledge when configuring the kernel. I think that the idea that was rejected must have been something different. Again, you should increase gc_thresh1, first. To what number? When equipment is delivered from a factory, it is not known how many entries are needed, and if it becomes known, it can later grow. Why add extra maintenance to the system? Best Regards, Ulf Samuelsson -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] neighbour.c: Avoid GC directly after state change
Ulf Samuelsson wrote: How many neighbors do you want to maintain? I guess you have to increase the number of gc_thresh1. The current use cases have up to 2048 entries. This is expected to grow in the future. The 3.4 kernel used in the system today is limited to 1024, but that has been raised to about 10k. The gc_thresh1 test is not implemented in 3.4 but can be backported, but still not convinced it is a good idea. Why? To complicate things, one requirement is that for some interfaces you always want to keep things alive, if connected, but for other interfaces you want things to be removed to conserve memory. Actually you would want to do this selection on a subnet level. If you want to introduce per-interface parameter, I am okay with it. Internal discussions resulted in a proposal to change the patch, so that you have a keepalive flag which is tested after it has been decided to exit the REACHABLE state. if the keepalive flag is set, you always go to DELAY state from REACHABLE. No. -- Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com Technical Division, MIRACLE LINUX CORPORATION -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] neighbour.c: Avoid GC directly after state change
On 04/20/2015 04:33 AM, YOSHIFUJI Hideaki wrote: Hi, Ulf Samuelsson wrote: From RFC2461: | REACHABLE Roughly speaking, the neighbor is known to have been | reachable recently (within tens of seconds ago). : | STALE The neighbor is no longer known to be reachable but | until traffic is sent to the neighbor, no attempt | should be made to verify its reachability. | DELAY The neighbor is no longer known to be reachable, and | traffic has recently been sent to the neighbor. | Rather than probe the neighbor immediately, however, | delay sending probes for a short while in order to | give upper layer protocols a chance to provide | reachability confirmation. It is all depending on the meaning of the word recently. You imply, that if timeouts have been triggered, then it is no longer recent, but that is not the only interpretation, it is up to the implementer to decide what is recently. That quoted text is just a brief description. The document has detailed state machine. It is not *mandatory* to follow the state machine strictly, Page 85: This appendix contains a summary of the rules specified in Sections 7.2 and 7.3. This document does not mandate that implementations adhere to this model as long as their external behavior is consistent with that described in this document. The kernel does not follow the state machine today. The kernel already have a test which compares neigh-used + timeout with current time, and move the entry to DELAY. This is not documented in the state machine so there is already a precedent to compare neigh-compared + timeout with current time and move the entry into DELAY state. Obviously, some people would not want you to send probes before going STALE, so it needs to be configurable. Therefore, if a timeout occurs due to no traffic, they must be probed before they are garbage collected. It is what we do in PROBE state. Yes, but you have to start by moving it into DELAY state first, to init the probe counter. If you move the entry from REACHABLE to DELAY, then the probe counter may be any value. If this is not acceptable, how do you propose to solve the problem that you cannot make remote units inaccessible for more than a fraction of a second? How many neighbors do you want to maintain? I guess you have to increase the number of gc_thresh1. The current use cases have up to 2048 entries. This is expected to grow in the future. The 3.4 kernel used in the system today is limited to 1024, but that has been raised to about 10k. The gc_thresh1 test is not implemented in 3.4 but can be backported, but still not convinced it is a good idea. To complicate things, one requirement is that for some interfaces you always want to keep things alive, if connected, but for other interfaces you want things to be removed to conserve memory. Actually you would want to do this selection on a subnet level. Internal discussions resulted in a proposal to change the patch, so that you have a keepalive flag which is tested after it has been decided to exit the REACHABLE state. if the keepalive flag is set, you always go to DELAY state from REACHABLE. Best Regards, Ulf Samuelsson -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] neighbour.c: Avoid GC directly after state change
Hi, Ulf Samuelsson wrote: From RFC2461: | REACHABLE Roughly speaking, the neighbor is known to have been | reachable recently (within tens of seconds ago). : | STALE The neighbor is no longer known to be reachable but | until traffic is sent to the neighbor, no attempt | should be made to verify its reachability. | DELAY The neighbor is no longer known to be reachable, and | traffic has recently been sent to the neighbor. | Rather than probe the neighbor immediately, however, | delay sending probes for a short while in order to | give upper layer protocols a chance to provide | reachability confirmation. It is all depending on the meaning of the word recently. You imply, that if timeouts have been triggered, then it is no longer recent, but that is not the only interpretation, it is up to the implementer to decide what is recently. That quoted text is just a brief description. The document has detailed state machine. Therefore, if a timeout occurs due to no traffic, they must be probed before they are garbage collected. It is what we do in PROBE state. If this is not acceptable, how do you propose to solve the problem that you cannot make remote units inaccessible for more than a fraction of a second? How many neighbors do you want to maintain? I guess you have to increase the number of gc_thresh1. -- Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com Technical Division, MIRACLE LINUX CORPORATION -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] neighbour.c: Avoid GC directly after state change
On 04/16/2015 07:16 AM, YOSHIFUJI Hideaki wrote: Hi, Ulf Samuelsson wrote: The desired functionality is that if communication stops, you want to send out ARP probes, before the entry is deleted. The current (pseudo) code of the neigh timer is: if (state NUD_REACHABLE) { if (now = confirmed + reachable_time)) { ... /* We are OK */ } else if (now used + DELAY_PROBE_TIME) {/* Never happens */ state = NUD_DELAY; } else { state = NUD_STALE; notify = 1; } We never see the state beeing changed from REACHABLE to DELAY, so the probes are not beeing sent out, instead you always go from REACHABLE to STALE. That's right. But not acceptable, in telecom. DELAY_PROBE_TIME is set to (5 x HZ) and used seems to be only set by the periodic_work routine when the neigh entry is in STALE state, and then it is too late. It is also set by arp_find which is used by broken devices. In STALE state, neigh-used is set by neigh_event_send(), called by neigh_resolve_output() via neigh-output(). In practice, the second condition: (now used + DELAY_PROBE_TIME) is never used. What is the intention of this test? That's right. It is NOT used in normal condition unless reachable time is too short. By adding a new test + parameter, we would get the desired functionality, and no need to listen for notifications or doing ARP state updates from applications. if (now = confirmed + reachable_time)) { ... /* We are OK */ +else if (now = confirmed + reprobe_time)) { + state = NUD_DELAY; } else if (now used + DELAY_PROBE_TIME))) {/* Never happens */ state = NUD_DELAY; } else { state = NUD_STALE; notify = 1; } This way the entry would remain in REACHABLE while normal communication occurs, then it would enter DELAY state to probe, and if that fails, it goes to STALE state. No, it is not what REACHABLE and DELAY mean. From RFC2461: | REACHABLE Roughly speaking, the neighbor is known to have been | reachable recently (within tens of seconds ago). : | STALE The neighbor is no longer known to be reachable but | until traffic is sent to the neighbor, no attempt | should be made to verify its reachability. | DELAY The neighbor is no longer known to be reachable, and | traffic has recently been sent to the neighbor. | Rather than probe the neighbor immediately, however, | delay sending probes for a short while in order to | give upper layer protocols a chance to provide | reachability confirmation. It is all depending on the meaning of the word recently. You imply, that if timeouts have been triggered, then it is no longer recent, but that is not the only interpretation, it is up to the implementer to decide what is recently. You can argue, that for REACHABLE they define it as (within tens of seconds ago), but in a standards document, that is not enough, so the definition of STALE is perfectly OK due to this ambiguity. We have the situation in that machines enter and exit the network, at unpredictable times, and while traffic is sporadic, they still need to be reachable. They should not enter FAILED state unless they leave the network. I see also in the RFC2461: To reduce unnecessary network traffic, probe messages are only sent to neighbors to which the node is actively sending packets. In telecom applications, as long as the neighbour is present on the network, the node will be sending packets, even if it is not that frequent. These probes are *neccessary* for the system to work properly, due to the long time for garbage collection. The PROBE state need to be entered once, and only when these probes get no answer, the entry should move into STALE. I think that is compliant with the statement above. Since they leave at unpredictable times, it is not good to set them to PERMANENT. Therefore, if a timeout occurs due to no traffic, they must be probed before they are garbage collected. If this is not acceptable, how do you propose to solve the problem that you cannot make remote units inaccessible for more than a fraction of a second? Best Regards, Ulf Samuelsson -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] neighbour.c: Avoid GC directly after state change
No reply on this in net/core/neighbour.c: neigh_timer_handler I see: if (state NUD_REACHABLE) { if (time_before_eq(now, neigh-confirmed + neigh-parms-reachable_time)) { neigh_dbg(2, neigh %p is still alive\n, neigh); next = neigh-confirmed + neigh-parms-reachable_time; } else if (time_before_eq(now, neigh-used + NEIGH_VAR(neigh-parms, DELAY_PROBE_TIME))) { neigh_dbg(2, neigh %p is delayed\n, neigh); neigh-nud_state = NUD_DELAY; neigh-updated = jiffies; neigh_suspect(neigh); next = now + NEIGH_VAR(neigh-parms, DELAY_PROBE_TIME); } else { neigh_dbg(2, neigh %p is suspected\n, neigh); neigh-nud_state = NUD_STALE; neigh-updated = jiffies; neigh_suspect(neigh); notify = 1; } } else ... Why is the second test there in the first place? In the normal case, neigh-used does not get updated until the entry is found STALE in the periodic work. Why not use neigh-confirmed and another sysctl variable? if (state NUD_REACHABLE) { if (time_before_eq(now, neigh-confirmed + neigh-parms-reachable_time)) { neigh_dbg(2, neigh %p is still alive\n, neigh); next = neigh-confirmed + neigh-parms-reachable_time; } else if (time_before_eq(now, - neigh-used + - NEIGH_VAR(neigh-parms, DELAY_PROBE_TIME))) { +neigh-confirmed + + NEIGH_VAR(neigh-parms, DELAY_REPROBE_TIME))) { neigh_dbg(2, neigh %p is delayed\n, neigh); neigh-nud_state = NUD_DELAY; neigh-updated = jiffies; neigh_suspect(neigh); next = now + NEIGH_VAR(neigh-parms, DELAY_PROBE_TIME); } else { neigh_dbg(2, neigh %p is suspected\n, neigh); neigh-nud_state = NUD_STALE; neigh-updated = jiffies; neigh_suspect(neigh); notify = 1; } } else ... if DELAY_REPROBE_TIME is larger than reachable_time, then the kernel will send out ARP probes when it is about to lose communication with a remote machine. This is what we need. If it is smaller, then it will go from REACHABLE to STALE. The initial value of DELAY_REPROBE_TIME needs to be settable in Kconfig to allow the selection of functionality. I am told that setting stuff using sysctl has a performance penalty, when interfaces are dynamically created and deleted in hundreds. Best Regards, Ulf Samuelsson On 04/10/2015 10:26 AM, Ulf Samuelsson wrote: On 03/12/2015 07:26 PM, David Miller wrote: I hate changes like this. By making this a Kconfig options it cannot be dynamic, and every distribution is going to have to scratch their head and decide what to set this to. That's seriously suboptimal. If you want to change behavior based upon whether userspace is managing the damn table, make it so the user doing so has to ask for the new behavior at _RUN TIME_ via the netlink interface or similar. Picking the guard time itself at compile time is also undesirable. And you don't even want a damn timer, what you want is for the state of the entry to be frozen and for the user to release it by either adjusting the state to something else or marking in some other way to allow it to be unfrozen and released again. Why put it to chance with some timeout? Make things explicit. The desired functionality is that if communication stops, you want to send out ARP probes, before the entry is deleted. The current (pseudo) code of the neigh timer is: if (state NUD_REACHABLE) { if (now = confirmed + reachable_time)) { ... /* We are OK */ } else if (now used + DELAY_PROBE_TIME) {/* Never happens */ state = NUD_DELAY; } else { state = NUD_STALE; notify = 1; } We never see the state beeing changed from REACHABLE to DELAY, so the probes are not beeing sent out, instead you always go from REACHABLE to STALE. DELAY_PROBE_TIME is set to (5 x HZ) and used seems to be only set by the periodic_work routine when the neigh entry is in STALE state, and then it is too late. It is also set by arp_find which is used by broken devices. In practice, the second condition: (now used + DELAY_PROBE_TIME) is never used. What is the intention of this test? By adding a new test + parameter, we would get the desired functionality, and no need to listen for notifications or doing ARP state updates from applications. if (now = confirmed + reachable_time)) { ... /* We are OK */ +else if (now = confirmed + reprobe_time)) { + state = NUD_DELAY; } else if (now used + DELAY_PROBE_TIME))) {/* Never happens */ state =
Re: [PATCH] neighbour.c: Avoid GC directly after state change
Hi, Ulf Samuelsson wrote: The desired functionality is that if communication stops, you want to send out ARP probes, before the entry is deleted. The current (pseudo) code of the neigh timer is: if (state NUD_REACHABLE) { if (now = confirmed + reachable_time)) { ... /* We are OK */ } else if (now used + DELAY_PROBE_TIME) {/* Never happens */ state = NUD_DELAY; } else { state = NUD_STALE; notify = 1; } We never see the state beeing changed from REACHABLE to DELAY, so the probes are not beeing sent out, instead you always go from REACHABLE to STALE. That's right. DELAY_PROBE_TIME is set to (5 x HZ) and used seems to be only set by the periodic_work routine when the neigh entry is in STALE state, and then it is too late. It is also set by arp_find which is used by broken devices. In STALE state, neigh-used is set by neigh_event_send(), called by neigh_resolve_output() via neigh-output(). In practice, the second condition: (now used + DELAY_PROBE_TIME) is never used. What is the intention of this test? That's right. It is NOT used in normal condition unless reachable time is too short. By adding a new test + parameter, we would get the desired functionality, and no need to listen for notifications or doing ARP state updates from applications. if (now = confirmed + reachable_time)) { ... /* We are OK */ +else if (now = confirmed + reprobe_time)) { + state = NUD_DELAY; } else if (now used + DELAY_PROBE_TIME))) {/* Never happens */ state = NUD_DELAY; } else { state = NUD_STALE; notify = 1; } This way the entry would remain in REACHABLE while normal communication occurs, then it would enter DELAY state to probe, and if that fails, it goes to STALE state. No, it is not what REACHABLE and DELAY mean. From RFC2461: | REACHABLE Roughly speaking, the neighbor is known to have been | reachable recently (within tens of seconds ago). : | STALE The neighbor is no longer known to be reachable but | until traffic is sent to the neighbor, no attempt | should be made to verify its reachability. | DELAY The neighbor is no longer known to be reachable, and | traffic has recently been sent to the neighbor. | Rather than probe the neighbor immediately, however, | delay sending probes for a short while in order to | give upper layer protocols a chance to provide | reachability confirmation. -- Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com Technical Division, MIRACLE LINUX CORPORATION -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html