Re: [PATCH] neighbour.c: Avoid GC directly after state change

2015-04-22 Thread Ulf Samuelsson


On 04/21/2015 05:58 AM, YOSHIFUJI Hideaki wrote:

Ulf Samuelsson wrote:

How many neighbors do you want to maintain?
I guess you have to increase the number of gc_thresh1.

The current use cases have up to 2048 entries.
This is expected to grow in the future.
The 3.4 kernel used in the system today is limited to 1024,
but that has been raised to about 10k.

The gc_thresh1 test is not implemented in 3.4 but can be backported,
but still not convinced it is a good idea.

Why?


A good solution makes sure that:
* equipment which is connected NEVER IS garbage collected
* equipment which is disconnected IS garbage collected.

The threshold idea does not meet the criteria for a good solution.

With this solution you keep unnecessary entries in the table.
If you ever pass the limit, then equipment which should not
be garbage collected may be.
It relies on someone keeping track of traffic loss,
so needs more maintenance by the SysOp.

The ARP probes should be considered to be NECESSARY traffic
to maintain a quality link.
Obviously not everyone would want to make this trade-off.



To complicate things, one requirement is that for some interfaces
you always want to keep things alive, if connected, but
for other interfaces you want things to be removed
to conserve memory.
Actually you would want to do this selection on a subnet level.

If you want to introduce per-interface parameter, I am okay with it.


Internal discussions resulted in a proposal to change the patch,
so that you have a keepalive flag which is tested after
it has been decided to exit the REACHABLE state.

if the keepalive flag is set, you always go to DELAY state from REACHABLE.

No.


And why is it a bad idea to have a high quality connection?

Best Regards,
Ulf Samuelsson

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] neighbour.c: Avoid GC directly after state change

2015-04-22 Thread YOSHIFUJI Hideaki
Ulf Samuelsson wrote:
 
 On 04/21/2015 05:58 AM, YOSHIFUJI Hideaki wrote:
 Ulf Samuelsson wrote:
 How many neighbors do you want to maintain?
 I guess you have to increase the number of gc_thresh1.
 The current use cases have up to 2048 entries.
 This is expected to grow in the future.
 The 3.4 kernel used in the system today is limited to 1024,
 but that has been raised to about 10k.

 The gc_thresh1 test is not implemented in 3.4 but can be backported,
 but still not convinced it is a good idea.
 Why?

 A good solution makes sure that:
 * equipment which is connected NEVER IS garbage collected
 * equipment which is disconnected IS garbage collected.
 
 The threshold idea does not meet the criteria for a good solution.

We try providing good solution if you have less than gc_thresh1
entries only.  Otherwise, we try hard to protect ourselves.


 With this solution you keep unnecessary entries in the table.
 If you ever pass the limit, then equipment which should not
 be garbage collected may be.
 It relies on someone keeping track of traffic loss,
 so needs more maintenance by the SysOp.try pr
 
 The ARP probes should be considered to be NECESSARY traffic
 to maintain a quality link.
 Obviously not everyone would want to make this trade-off.
 
 
 To complicate things, one requirement is that for some interfaces
 you always want to keep things alive, if connected, but
 for other interfaces you want things to be removed
 to conserve memory.
 Actually you would want to do this selection on a subnet level.
 If you want to introduce per-interface parameter, I am okay with it.

 Internal discussions resulted in a proposal to change the patch,
 so that you have a keepalive flag which is tested after
 it has been decided to exit the REACHABLE state.

 if the keepalive flag is set, you always go to DELAY state from REACHABLE.
 No.

 And why is it a bad idea to have a high quality connection?

We reclaim neighbor entries as much as possible to protect
ourselves if the number is below gc_thresh1.  We could stop
purging entries, but the idea was rejected AFAIK.  That is
our design.

Again, you should increase gc_thresh1, first.

-- 
Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] neighbour.c: Avoid GC directly after state change

2015-04-22 Thread Ulf Samuelsson


On 04/22/2015 12:46 PM, YOSHIFUJI Hideaki wrote:

Ulf Samuelsson wrote:

On 04/21/2015 05:58 AM, YOSHIFUJI Hideaki wrote:

Ulf Samuelsson wrote:

How many neighbors do you want to maintain?
I guess you have to increase the number of gc_thresh1.

The current use cases have up to 2048 entries.
This is expected to grow in the future.
The 3.4 kernel used in the system today is limited to 1024,
but that has been raised to about 10k.

The gc_thresh1 test is not implemented in 3.4 but can be backported,
but still not convinced it is a good idea.

Why?


A good solution makes sure that:
* equipment which is connected NEVER IS garbage collected
* equipment which is disconnected IS garbage collected.

The threshold idea does not meet the criteria for a good solution.

We try providing good solution if you have less than gc_thresh1
entries only.  Otherwise, we try hard to protect ourselves.


Protect against what?





With this solution you keep unnecessary entries in the table.
If you ever pass the limit, then equipment which should not
be garbage collected may be.
It relies on someone keeping track of traffic loss,
so needs more maintenance by the SysOp.try pr

The ARP probes should be considered to be NECESSARY traffic
to maintain a quality link.
Obviously not everyone would want to make this trade-off.



To complicate things, one requirement is that for some interfaces
you always want to keep things alive, if connected, but
for other interfaces you want things to be removed
to conserve memory.
Actually you would want to do this selection on a subnet level.

If you want to introduce per-interface parameter, I am okay with it.


Internal discussions resulted in a proposal to change the patch,
so that you have a keepalive flag which is tested after
it has been decided to exit the REACHABLE state.

if the keepalive flag is set, you always go to DELAY state from REACHABLE.

No.


And why is it a bad idea to have a high quality connection?

We reclaim neighbor entries as much as possible to protect
ourselves if the number is below gc_thresh1.  We could stop
purging entries, but the idea was rejected AFAIK.  That is
our design.

No you keep disconnected entries which can be removed
in the table as long as you are below gc_tresh1, so you waste memory.

This is a trade-off to reduce communication over the wire
at the expense of quality of service.

If you can live with the drawback of extra bandwidth usage,
then doing probes before you remove an entry will reduce
memory usage and improve the quality of the communication.

Why do you want to decide which trade-off is best?
Why not let the system designer make that decision?

I am not asking to stop purging entries.
Entries which are not needed *should* be purged.
But entries which are needed should not be purged,
because that means traffic loss.
This design does not differentiate between the two
providing unnecessary risks and wastes memory.

We know beforehand that if a neighbour is connected
to a certain interface, it should always be present in the table.
You do not want us to use that knowledge when configuring the kernel.

I think that the idea that was rejected must have been something different.


Again, you should increase gc_thresh1, first.


To what number?
When equipment is delivered from a factory, it is not known
how many entries are needed, and if it becomes known, it can later grow.
Why add extra maintenance to the system?


Best Regards,
Ulf Samuelsson

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] neighbour.c: Avoid GC directly after state change

2015-04-20 Thread YOSHIFUJI Hideaki
Ulf Samuelsson wrote:
 How many neighbors do you want to maintain?
 I guess you have to increase the number of gc_thresh1.
 The current use cases have up to 2048 entries.
 This is expected to grow in the future.
 The 3.4 kernel used in the system today is limited to 1024,
 but that has been raised to about 10k.
 
 The gc_thresh1 test is not implemented in 3.4 but can be backported,
 but still not convinced it is a good idea.

Why?

 To complicate things, one requirement is that for some interfaces
 you always want to keep things alive, if connected, but
 for other interfaces you want things to be removed
 to conserve memory.
 Actually you would want to do this selection on a subnet level.

If you want to introduce per-interface parameter, I am okay with it.

 
 Internal discussions resulted in a proposal to change the patch,
 so that you have a keepalive flag which is tested after
 it has been decided to exit the REACHABLE state.
 
 if the keepalive flag is set, you always go to DELAY state from REACHABLE.

No.

-- 
Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] neighbour.c: Avoid GC directly after state change

2015-04-20 Thread Ulf Samuelsson


On 04/20/2015 04:33 AM, YOSHIFUJI Hideaki wrote:

Hi,

Ulf Samuelsson wrote:

  From RFC2461:

|  REACHABLE   Roughly speaking, the neighbor is known to have been
|  reachable recently (within tens of seconds ago).
:
|  STALE   The neighbor is no longer known to be reachable but
|  until traffic is sent to the neighbor, no attempt
|  should be made to verify its reachability.
|  DELAY   The neighbor is no longer known to be reachable, and
|  traffic has recently been sent to the neighbor.
|  Rather than probe the neighbor immediately, however,
|  delay sending probes for a short while in order to
|  give upper layer protocols a chance to provide
|  reachability confirmation.



It is all depending on the meaning of the word recently.
You imply, that if timeouts have been triggered, then it is no longer recent,
but that is not the only interpretation, it is up to the implementer to decide
what is recently.

That quoted text is just a brief description.  The document has detailed
state machine.



It is not *mandatory* to follow the state machine strictly, Page 85:

   This appendix contains a summary of the rules specified in Sections
   7.2 and 7.3.  This document does not mandate that implementations
   adhere to this model as long as their external behavior is consistent
   with that described in this document.

The kernel does not follow the state machine today.
The kernel already have a test which compares

neigh-used + timeout with current time,
and move the entry to DELAY.

This is not documented in the state machine so there is already
a precedent to compare

neigh-compared + timeout with current time
and move the entry into DELAY state.

Obviously, some people would not want you to send probes before going STALE,
so it needs to be configurable.

Therefore, if a timeout occurs due to no traffic, they must be probed before
they are garbage collected.

It is what we do in PROBE state.
Yes, but you have to start by moving it into DELAY state first, to init 
the probe counter.

If you move the entry from REACHABLE to DELAY, then the probe counter
may be any value.




If this is not acceptable, how do you propose to solve the problem that you 
cannot
make remote units inaccessible for more than a fraction of a second?

How many neighbors do you want to maintain?
I guess you have to increase the number of gc_thresh1.

The current use cases have up to 2048 entries.
This is expected to grow in the future.
The 3.4 kernel used in the system today is limited to 1024,
but that has been raised to about 10k.

The gc_thresh1 test is not implemented in 3.4 but can be backported,
but still not convinced it is a good idea.

To complicate things, one requirement is that for some interfaces
you always want to keep things alive, if connected, but
for other interfaces you want things to be removed
to conserve memory.
Actually you would want to do this selection on a subnet level.

Internal discussions resulted in a proposal to change the patch,
so that you have a keepalive flag which is tested after
it has been decided to exit the REACHABLE state.

if the keepalive flag is set, you always go to DELAY state from REACHABLE.


Best Regards,
Ulf Samuelsson




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] neighbour.c: Avoid GC directly after state change

2015-04-19 Thread YOSHIFUJI Hideaki
Hi,

Ulf Samuelsson wrote:
  From RFC2461:

 |  REACHABLE   Roughly speaking, the neighbor is known to have been
 |  reachable recently (within tens of seconds ago).
 :
 |  STALE   The neighbor is no longer known to be reachable but
 |  until traffic is sent to the neighbor, no attempt
 |  should be made to verify its reachability.
 |  DELAY   The neighbor is no longer known to be reachable, and
 |  traffic has recently been sent to the neighbor.
 |  Rather than probe the neighbor immediately, however,
 |  delay sending probes for a short while in order to
 |  give upper layer protocols a chance to provide
 |  reachability confirmation.


 
 It is all depending on the meaning of the word recently.
 You imply, that if timeouts have been triggered, then it is no longer 
 recent,
 but that is not the only interpretation, it is up to the implementer to decide
 what is recently.

That quoted text is just a brief description.  The document has detailed
state machine.


 Therefore, if a timeout occurs due to no traffic, they must be probed before
 they are garbage collected.

It is what we do in PROBE state.


 If this is not acceptable, how do you propose to solve the problem that you 
 cannot
 make remote units inaccessible for more than a fraction of a second?

How many neighbors do you want to maintain?
I guess you have to increase the number of gc_thresh1.

-- 
Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] neighbour.c: Avoid GC directly after state change

2015-04-17 Thread Ulf Samuelsson


On 04/16/2015 07:16 AM, YOSHIFUJI Hideaki wrote:

Hi,

Ulf Samuelsson wrote:


The desired functionality is that if communication stops,
you want to send out ARP probes, before the entry is deleted.

The current (pseudo) code of the neigh timer is:

 if (state  NUD_REACHABLE) {
 if (now = confirmed + reachable_time)) {
 ... /* We are OK */
 } else if (now  used + DELAY_PROBE_TIME) {/* Never happens */
 state = NUD_DELAY;
 } else {
 state = NUD_STALE;
 notify = 1;
 }

We never see the state beeing changed from REACHABLE to DELAY,
so the probes are not beeing sent out, instead you always go
from REACHABLE to STALE.

That's right.

But not acceptable, in telecom.




DELAY_PROBE_TIME is set to (5 x HZ) and used
seems to be only set by the periodic_work routine
when the neigh entry is in STALE state, and then it is too late.
It is also set by arp_find which is used by broken devices.


In STALE state, neigh-used is set by neigh_event_send(), called
by neigh_resolve_output() via neigh-output().





In practice, the second condition: (now  used + DELAY_PROBE_TIME) is never 
used.
What is the intention of this test?

That's right.  It is NOT used in normal condition unless
reachable time is too short.



By adding a new test + parameter, we would get the desired functionality,
and no need to listen for notifications or doing ARP state updates from 
applications.

 if (now = confirmed + reachable_time)) {
 ... /* We are OK */
+else if (now = confirmed + reprobe_time)) {
+   state = NUD_DELAY;
 } else if (now  used + DELAY_PROBE_TIME))) {/* Never happens */
 state = NUD_DELAY;
 } else {
 state = NUD_STALE;
 notify = 1;
 }

This way the entry would remain in REACHABLE while normal communication occurs,
then it would enter DELAY state to probe, and if that fails, it goes to STALE 
state.

No, it is not what REACHABLE and DELAY mean.

 From RFC2461:

|  REACHABLE   Roughly speaking, the neighbor is known to have been
|  reachable recently (within tens of seconds ago).
:
|  STALE   The neighbor is no longer known to be reachable but
|  until traffic is sent to the neighbor, no attempt
|  should be made to verify its reachability.
|  DELAY   The neighbor is no longer known to be reachable, and
|  traffic has recently been sent to the neighbor.
|  Rather than probe the neighbor immediately, however,
|  delay sending probes for a short while in order to
|  give upper layer protocols a chance to provide
|  reachability confirmation.




It is all depending on the meaning of the word recently.
You imply, that if timeouts have been triggered, then it is no longer 
recent,
but that is not the only interpretation, it is up to the implementer to 
decide

what is recently.

You can argue, that for REACHABLE they define it as (within tens of 
seconds ago),

but in a standards document, that is not enough,
so the definition of STALE is perfectly OK due to this ambiguity.

We have the situation in that machines enter and exit the network, at 
unpredictable times,

and while traffic is sporadic, they still need to be reachable.
They should not enter FAILED state unless they leave the network.

I see also in the RFC2461:
To reduce unnecessary network traffic, probe messages are only sent to
neighbors to which the node is actively sending packets.

In telecom applications, as long as the neighbour is present on the network,
the node will be sending packets, even if it is not that frequent.

These probes are *neccessary* for the system to work properly,
due to the long time for garbage collection.

The PROBE state need to be entered once, and only when these probes get 
no answer,

the entry should move into STALE.
I think that is compliant with the statement above.

Since they leave at unpredictable times, it is not good to set them to 
PERMANENT.


Therefore, if a timeout occurs due to no traffic, they must be probed before
they are garbage collected.

If this is not acceptable, how do you propose to solve the problem that 
you cannot

make remote units inaccessible for more than a fraction of a second?


Best Regards,
Ulf Samuelsson

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] neighbour.c: Avoid GC directly after state change

2015-04-15 Thread Ulf Samuelsson

No reply on this

in net/core/neighbour.c: neigh_timer_handler I see:

if (state  NUD_REACHABLE) {
if (time_before_eq(now,
   neigh-confirmed + neigh-parms-reachable_time)) {
neigh_dbg(2, neigh %p is still alive\n, neigh);
next = neigh-confirmed + neigh-parms-reachable_time;
} else if (time_before_eq(now,
  neigh-used +
  NEIGH_VAR(neigh-parms, DELAY_PROBE_TIME))) {
neigh_dbg(2, neigh %p is delayed\n, neigh);
neigh-nud_state = NUD_DELAY;
neigh-updated = jiffies;
neigh_suspect(neigh);
next = now + NEIGH_VAR(neigh-parms, DELAY_PROBE_TIME);
} else {
neigh_dbg(2, neigh %p is suspected\n, neigh);
neigh-nud_state = NUD_STALE;
neigh-updated = jiffies;
neigh_suspect(neigh);
notify = 1;
}
} else ...

Why is the second test there in the first place?

In the normal case, neigh-used does not get updated until the entry 
is found STALE

in the periodic work.

Why not use neigh-confirmed and another sysctl variable?

if (state  NUD_REACHABLE) {
if (time_before_eq(now,
   neigh-confirmed + neigh-parms-reachable_time)) {
neigh_dbg(2, neigh %p is still alive\n, neigh);
next = neigh-confirmed + neigh-parms-reachable_time;
} else if (time_before_eq(now,
- neigh-used +
-  NEIGH_VAR(neigh-parms, DELAY_PROBE_TIME))) {
+neigh-confirmed +
+ NEIGH_VAR(neigh-parms, DELAY_REPROBE_TIME))) {
neigh_dbg(2, neigh %p is delayed\n, neigh);
neigh-nud_state = NUD_DELAY;
neigh-updated = jiffies;
neigh_suspect(neigh);
next = now + NEIGH_VAR(neigh-parms, DELAY_PROBE_TIME);
} else {
neigh_dbg(2, neigh %p is suspected\n, neigh);
neigh-nud_state = NUD_STALE;
neigh-updated = jiffies;
neigh_suspect(neigh);
notify = 1;
}
} else ...

if DELAY_REPROBE_TIME is larger than reachable_time, then
the kernel will send out ARP probes when it is about
to lose communication with a remote machine.
This is what we need.

If it is smaller, then
it will go from REACHABLE to STALE.

The initial value of DELAY_REPROBE_TIME needs to be settable in Kconfig
to allow the selection of functionality.

I am told that setting stuff using sysctl has a performance penalty, when
interfaces are dynamically created and deleted in hundreds.

Best Regards,
Ulf Samuelsson
 


On 04/10/2015 10:26 AM, Ulf Samuelsson wrote:


On 03/12/2015 07:26 PM, David Miller wrote:

I hate changes like this.

By making this a Kconfig options it cannot be dynamic, and every
distribution is going to have to scratch their head and decide
what to set this to.

That's seriously suboptimal.

If you want to change behavior based upon whether userspace is
managing the damn table, make it so the user doing so has to
ask for the new behavior at _RUN TIME_ via the netlink interface
or similar.

Picking the guard time itself at compile time is also undesirable.

And you don't even want a damn timer, what you want is for the
state of the entry to be frozen and for the user to release
it by either adjusting the state to something else or marking
in some other way to allow it to be unfrozen and released again.

Why put it to chance with some timeout?  Make things explicit.


The desired functionality is that if communication stops,
you want to send out ARP probes, before the entry is deleted.

The current (pseudo) code of the neigh timer is:

if (state  NUD_REACHABLE) {
if (now = confirmed + reachable_time)) {
... /* We are OK */
} else if (now  used + DELAY_PROBE_TIME) {/* Never 
happens */

state = NUD_DELAY;
} else {
state = NUD_STALE;
notify = 1;
}

We never see the state beeing changed from REACHABLE to DELAY,
so the probes are not beeing sent out, instead you always go
from REACHABLE to STALE.

DELAY_PROBE_TIME is set to (5 x HZ) and used
seems to be only set by the periodic_work routine
when the neigh entry is in STALE state, and then it is too late.
It is also set by arp_find which is used by broken devices.

In practice, the second condition: (now  used + DELAY_PROBE_TIME) 
is never used.

What is the intention of this test?

By adding a new test + parameter, we would get the desired functionality,
and no need to listen for notifications or doing ARP state updates 
from applications.


if (now = confirmed + reachable_time)) {
... /* We are OK */
+else if (now = confirmed + reprobe_time)) {
+   state = NUD_DELAY;
} else if (now  used + DELAY_PROBE_TIME))) {/* Never 
happens */

state = 

Re: [PATCH] neighbour.c: Avoid GC directly after state change

2015-04-15 Thread YOSHIFUJI Hideaki
Hi,

Ulf Samuelsson wrote:

 The desired functionality is that if communication stops,
 you want to send out ARP probes, before the entry is deleted.
 
 The current (pseudo) code of the neigh timer is:
 
 if (state  NUD_REACHABLE) {
 if (now = confirmed + reachable_time)) {
 ... /* We are OK */
 } else if (now  used + DELAY_PROBE_TIME) {/* Never happens */
 state = NUD_DELAY;
 } else {
 state = NUD_STALE;
 notify = 1;
 }
 
 We never see the state beeing changed from REACHABLE to DELAY,
 so the probes are not beeing sent out, instead you always go
 from REACHABLE to STALE.

That's right.


 DELAY_PROBE_TIME is set to (5 x HZ) and used
 seems to be only set by the periodic_work routine
 when the neigh entry is in STALE state, and then it is too late.
 It is also set by arp_find which is used by broken devices.
 

In STALE state, neigh-used is set by neigh_event_send(), called
by neigh_resolve_output() via neigh-output().


 In practice, the second condition: (now  used + DELAY_PROBE_TIME) is 
 never used.
 What is the intention of this test?

That's right.  It is NOT used in normal condition unless
reachable time is too short.


 
 By adding a new test + parameter, we would get the desired functionality,
 and no need to listen for notifications or doing ARP state updates from 
 applications.
 
 if (now = confirmed + reachable_time)) {
 ... /* We are OK */
 +else if (now = confirmed + reprobe_time)) {
 +   state = NUD_DELAY;
 } else if (now  used + DELAY_PROBE_TIME))) {/* Never happens */
 state = NUD_DELAY;
 } else {
 state = NUD_STALE;
 notify = 1;
 }
 
 This way the entry would remain in REACHABLE while normal communication 
 occurs,
 then it would enter DELAY state to probe, and if that fails, it goes to STALE 
 state.

No, it is not what REACHABLE and DELAY mean.

From RFC2461:

|  REACHABLE   Roughly speaking, the neighbor is known to have been
|  reachable recently (within tens of seconds ago).
:
|  STALE   The neighbor is no longer known to be reachable but
|  until traffic is sent to the neighbor, no attempt
|  should be made to verify its reachability.
|  DELAY   The neighbor is no longer known to be reachable, and
|  traffic has recently been sent to the neighbor.
|  Rather than probe the neighbor immediately, however,
|  delay sending probes for a short while in order to
|  give upper layer protocols a chance to provide
|  reachability confirmation.


-- 
Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html