The RELOAD routing is fundamentally flawed and for that reason I am tossing the Chord implementation that it defines and replacing "periodic stabilization" with "correction on use and correct on change" also I am ditching finger tables all together as they aren't needed whatsoever with these changes. Also this brings global broadcast and multicast that can be registered via the hash table which are very useful for ICE and any other signaling with little overhead.

I guess I will have to write my own RFC since RELOAD is IMHO using outdated overlay technology and since having said that shouldn't it be a flawless design and not broken right now? I think so.

@Narayan You could not be more wrong about "the number of fingers a node has should provide a rough estimate (with accuracy in the order of log N) of the size of the overlay". People use this formula in Kadmelia, Chord, Chimera, etc and it does not work whatsoever and your accuracy is even wrong too. I use the algorithm described here: http://dks.sics.se/pub/netsize.pdf

-g

On May 11, 2009, at 12:47 PM, Narayanan, Vidya wrote:

I've described a number of comments on section 9 below. But, if the responses can be separated out under each topic, it will help the discussion quite a bit, given the length of this email.

Routing (Section 9.3):
----------------------
The routing rule as specified does not work in all cases. If a node with ID N and immediate successor S receives a message for resource id k such that N < k < S, since k is neither a node directly attached to N and nor is there a node whose ID is between N and k, the message cannot be routed. The routing rule should instead read as follows:

"If a peer is not responsible for a Resource-ID k, but is directly connected to a node with Node-ID k, then it routes the message to that node. Otherwise, it routes the request to the peer in the routing table that has the largest Node-ID that is in the interval between the peer and k. **If no such node is found, it finds the smallest node id that is greater than k and routes the message to that node.** The routing table is the union of the neighbor table and the finger table."

The new text is as enclosed in "**  **" above.

Redundancy (Section 9.4):
-------------------------
On the open issue, there should be no need to wait for successful redundant copy storage to return a STORE response. It doesn't make sense to send a failure if one of the redundant storage attempts fails anyway - so, the wait becomes a pointless exercise.

  "Note that a malicious node can return a success response but not
  store the data locally or in the replica set.  Requesting peers that
  wish to ensure that the replication actually occurred SHOULD [[Open
  Issue:  SHOULD or MAY?]] contact each peer listed in the replicas
  field of the Store response and retrieve a copy of the data."

The above text is not just relevant for redundant copies of data stored - it is true for even the actual resource-id owner. Also, in light of not waiting for the replica responses, the STORE response will not contain a list of replicas - so, this needs to be corrected.

Joining (Section 9.5):
----------------------
It is not clear why the join procedure requires all neighbor and finger connections to be established prior to sending the JOIN message itself. To be functional in the overlay, only the immediate successor relationships must be correctly in place. The other successors, predecessors and fingers can be established subsequently. This also doesn't cause nodes to establish unnecessary connections unless the node has successfully joined. There is also the question of whether a JP is supposed to establish a TLS connection to the BP to send messages through it, which is presently not addressed in the draft. It seems to me that a more streamlined join process can be defined as follows:

1. JP connects to its chosen bootstrap node. JP MUST form a TLS session with the bootstrap node, using the certificate or PSK required for overlay membership. 2. JP sends a Join request message to its own Node-ID n, through the BP. This is routed to the admitting peer (AP). The AP sends a response to the Join. 3. JP also sends an Attach request message to n through the BP. This may be sent in parallel with the Join message. The AP sends an attach response, resulting in connection establishment between the JP and AP. 4. AP does a series of Store requests to JP to store the data that JP will be responsible for. 5. AP sends JP an Update explicitly labeling JP as its predecessor. AP also sends an Update to all its other neighbors with the new value of its neighbor set (including JP). At this point, JP is part of the ring and responsible for a section of the overlay. AP SHOULD now store any data that JP is responsible for as part of replicas it will store (since AP is still JP's successor, it will need to store the data as well anyway). 6. JP sends Attach requests to the neighbors and establishes connections with the required successors and predecessors. 7. JP establishes finger connections by sending Attach requests to peers (n+2^i). (Comments on finger selection below).

Updates (Section 9.7):
----------------------
Section 9.7.1. states that every time a connection is lost, the peer should remove it from its neighbor table and find a different peer. Section 9.7.3. talks about attempting to re-establish connections when a neighbor is detected to be invalid. These are at odds. It seems to make sense that the peer would always try to re-establish connections and only remove that neighbor if the connection cannot be re-established.

Further, having the enrollment server involved in specifying the frequency of updates and probes seems strange. How is the enrollment server supposed to know the magic values here? It seems like it should be enough to specify a recommended value and leave it at that. Since the finger stabilization is for optimization and the neighbor stabilization is reactive anyway in the present draft, having these parameters in the config document seem to be of little use.

Finger entries (Section 9.7.3 and elsewhere):
---------------------------------------------
The draft requires 16 finger table entries - at first, I took those to mean 16 different resource ids that will be probed per formula. After reading 9.7.3, it appears that it is talking about 16 different fingers that a node should maintain. In a small overlay, that's ridiculously large. The text is also unclear on what happens when the overlay shrinks. Further, the text states that a peer SHOULD consider the finger table entry valid if it is in the range [n + 2^(numBitsInNodeId-i), n + 1.5 x 2^(numBitsInNodeId-i)]. For small overlays, there may be no entries in this range for every i in [0, 127].

A node should only need log N fingers to route in OlogN hops. What we need to specify is an algorithm that uses the same pattern in resource ids to probe - the fingers corresponding to multiple of those will collapse to the same node in a small overlay. In our implementation, the algorithm works as follows:

A node looks at the list of resource-ids corresponding to f=(n+2^i), for all values of i=[0,127]. Let's say that it first probes n+2^0 and obtains a finger of f0. It checks f0 against the next set of resource-ids to probe and determines the next larger resource-id to probe from the list. It populates the finger table entries for all other fingers f for which n < f < f0, with f0 as the corresponding finger. Depending on the number of nodes in the overlay, this should result in approximately logN fingers in the finger table at any time. Periodic stabilization should adjust the number of fingers automatically. In fact, the number of fingers a node has should provide a rough estimate (with accuracy in the order of log N) of the size of the overlay. The current text suggests that the number of fingers should be decided based on the overlay size, which does not seem intuitive or necessary.

Leaving (Section 9.9):
----------------------
In the event of a graceful leave, a peer should be able to send updates to all members of the neighbor set. It seems redundant to have the peer send a Leave message, followed by every peer receiving the Leave to send updates. Is there a reason for this?

Thanks,
Vidya
_______________________________________________
P2PSIP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/p2psip

_______________________________________________
P2PSIP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/p2psip

Reply via email to